Our first question is what does the typical coder look like? And also who is the least typical coder. I am going to try to answer this questions using the modal (i.e. most frequent) attributes...

In [1]:
import pandas as pd

df = pd.read_csv("data/survey_results_public.csv")

df.head()

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
0,1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
1,2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
3,4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
4,5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


There's a lot of ways we could slice this pie but let's focus on the Basic Information section to start with as that will give as manageable number of nicely grouped fields.

In [27]:
selection = ["MainBranch", "Hobbyist", "OpenSourcer", "OpenSource", "Employment", "Country"]

pd.options.display.max_colwidth = 100
pd.options.display.max_columns = 100

df[selection].describe(include="all")

Unnamed: 0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country
count,88331,88883,88883,86842,87181,88751
unique,5,2,4,3,6,179
top,I am a developer by profession,Yes,Never,The quality of OSS and closed source software is about the same,Employed full-time,United States
freq,65679,71257,32295,41527,64440,20949


In [17]:
topOnes = df[(df.MainBranch == "I am a developer by profession") & (df.Hobbyist == "Yes")
             & (df.OpenSourcer == "Never")
             & (df.OpenSource == "The quality of OSS and closed source software is about the same")
             & (df.Employment == "Employed full-time") & (df.Country == "United States")]

topOnes.describe(include="all")

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase,BasicInfoCombo
count,6457.0,6457,6457,6457,6457,6457,6457,6399,6354,6007,...,5202,5987.0,6293,6215,5792,5818,6158,6354,6373,6457
unique,,1,1,1,1,1,124,3,9,12,...,15,,7,2,7,67,2,3,3,124
top,,I am a developer by profession,Yes,Never,The quality of OSS and closed source software is about the same,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or software engineering",...,Tech articles written by other developers;Industry news about technologies you're interested in;Tech meetups or events in your area;Courses on technologies you're interested in,,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy,I am a developer by profession~~Yes~~Never~~The quality of OSS and closed source software is about the same~~Employed full-time~~United States
freq,,6457,6457,6457,6457,6457,1801,5580,3512,3905,...,1214,,5726,6182,5417,4180,3913,4354,4623,1801
mean,45047.114914,,,,,,,,,,...,,30.882412,,,,,,,,
std,25697.102576,,,,,,,,,,...,,7.990574,,,,,,,,
min,16.0,,,,,,,,,,...,,1.0,,,,,,,,
25%,22720.0,,,,,,,,,,...,,25.0,,,,,,,,
50%,45365.0,,,,,,,,,,...,,29.0,,,,,,,,
75%,67663.0,,,,,,,,,,...,,34.0,,,,,,,,


In [10]:
len(topOnes) / len(df)

0.020262592396746285

SO even just based on those 6 fields we have narrowed the "typical" coder down to just 2% of the entire cohort.

Just because each field contains the most frequent value it doesn't mean the combination is the most frequest. So now let's see if we can find the combination of answers which produces the most matches...

In [14]:
df["BasicInfoCombo"] = df["MainBranch"] + "~~" + df["Hobbyist"] + "~~" + df["OpenSourcer"] + "~~" + df["OpenSource"] + "~~" + df["Employment"] + "~~" + df["Country"]
    
df["BasicInfoCombo"].value_counts()[:5]

I am a developer by profession~~Yes~~Less than once per year~~The quality of OSS and closed source software is about the same~~Employed full-time~~United States                                               1873
I am a developer by profession~~Yes~~Never~~The quality of OSS and closed source software is about the same~~Employed full-time~~United States                                                                 1801
I am a developer by profession~~Yes~~Less than once a month but more than once per year~~The quality of OSS and closed source software is about the same~~Employed full-time~~United States                    1524
I am a developer by profession~~Yes~~Less than once per year~~OSS is, on average, of HIGHER quality than proprietary / closed source software~~Employed full-time~~United States                               1359
I am a developer by profession~~Yes~~Less than once a month but more than once per year~~OSS is, on average, of HIGHER quality than proprietary / closed

In [15]:
topCombo = df[(df.MainBranch == "I am a developer by profession") & (df.Hobbyist == "Yes")
             & (df.OpenSourcer == "Less than once per year")
             & (df.OpenSource == "The quality of OSS and closed source software is about the same")
             & (df.Employment == "Employed full-time") & (df.Country == "United States")]

topCombo.describe(include="all")

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase,BasicInfoCombo
count,1873.0,1873,1873,1873,1873,1873,1873,1871,1868,1792,...,1354,1732.0,1810,1807,1721,1745,1820,1850,1860,1873
unique,,1,1,1,1,1,1,3,9,12,...,15,,5,2,5,45,2,3,3,1
top,,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software is about the same,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or software engineering",...,Tech articles written by other developers;Industry news about technologies you're interested in;Tech meetups or events in your area;Courses on technologies you're interested in,,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy,I am a developer by profession~~Yes~~Less than once per year~~The quality of OSS and closed source software is about the same~~Employed full-time~~United States
freq,,1873,1873,1873,1873,1873,1873,1754,1141,1130,...,267,,1652,1790,1587,1420,1037,1437,1381,1873
mean,44151.398825,,,,,,,,,,...,,34.034065,,,,,,,,
std,25995.61158,,,,,,,,,,...,,8.969587,,,,,,,,
min,23.0,,,,,,,,,,...,,18.0,,,,,,,,
25%,21753.0,,,,,,,,,,...,,28.0,,,,,,,,
50%,44105.0,,,,,,,,,,...,,32.0,,,,,,,,
75%,65869.0,,,,,,,,,,...,,38.0,,,,,,,,


In [16]:
len(topCombo) / len(df)

0.02107264606280166

And as suspected the combination of modal values did NOT give the most frequest combination. He (and unsurprisingly it is a he) does a little more open-sourcing than we first determined.

Now arguably including Country is skewing the results somewhat so let's take that out see where we are

In [18]:
topOnes = df[(df.MainBranch == "I am a developer by profession") & (df.Hobbyist == "Yes")
             & (df.OpenSourcer == "Never")
             & (df.OpenSource == "The quality of OSS and closed source software is about the same")
             & (df.Employment == "Employed full-time")]

topOnes.describe(include="all")

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase,BasicInfoCombo
count,6457.0,6457,6457,6457,6457,6457,6457,6399,6354,6007,...,5202,5987.0,6293,6215,5792,5818,6158,6354,6373,6457
unique,,1,1,1,1,1,124,3,9,12,...,15,,7,2,7,67,2,3,3,124
top,,I am a developer by profession,Yes,Never,The quality of OSS and closed source software is about the same,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or software engineering",...,Tech articles written by other developers;Industry news about technologies you're interested in;Tech meetups or events in your area;Courses on technologies you're interested in,,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy,I am a developer by profession~~Yes~~Never~~The quality of OSS and closed source software is about the same~~Employed full-time~~United States
freq,,6457,6457,6457,6457,6457,1801,5580,3512,3905,...,1214,,5726,6182,5417,4180,3913,4354,4623,1801
mean,45047.114914,,,,,,,,,,...,,30.882412,,,,,,,,
std,25697.102576,,,,,,,,,,...,,7.990574,,,,,,,,
min,16.0,,,,,,,,,,...,,1.0,,,,,,,,
25%,22720.0,,,,,,,,,,...,,25.0,,,,,,,,
50%,45365.0,,,,,,,,,,...,,29.0,,,,,,,,
75%,67663.0,,,,,,,,,,...,,34.0,,,,,,,,


In [19]:
len(topOnes) / len(df)

0.07264606280166061

Up to 7% for the single fields modes. And the combined mode...

In [20]:
df["BasicInfoCombo"] = df["MainBranch"] + "~~" + df["Hobbyist"] + "~~" + df["OpenSourcer"] + "~~" + df["OpenSource"] + "~~" + df["Employment"]
    
df["BasicInfoCombo"].value_counts()[:5]

I am a developer by profession~~Yes~~Never~~The quality of OSS and closed source software is about the same~~Employed full-time                                                                 6457
I am a developer by profession~~Yes~~Less than once per year~~The quality of OSS and closed source software is about the same~~Employed full-time                                               6415
I am a developer by profession~~Yes~~Less than once per year~~OSS is, on average, of HIGHER quality than proprietary / closed source software~~Employed full-time                               5545
I am a developer by profession~~Yes~~Less than once a month but more than once per year~~OSS is, on average, of HIGHER quality than proprietary / closed source software~~Employed full-time    5366
I am a developer by profession~~Yes~~Less than once a month but more than once per year~~The quality of OSS and closed source software is about the same~~Employed full-time                    5329
Name: BasicInfo

In [28]:
topCombo = df[(df.MainBranch == "I am a developer by profession") & (df.Hobbyist == "Yes")
             & (df.OpenSourcer == "Never")
             & (df.OpenSource == "The quality of OSS and closed source software is about the same")
             & (df.Employment == "Employed full-time")]

topCombo.describe(include="all")

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase,BasicInfoCombo
count,6457.0,6457,6457,6457,6457,6457,6457,6399,6354,6007,6202,6294,6407,6423.0,6408.0,6417.0,6457,6439,6191,6192,6197,6457,6431,6052,6381,6401,6326,6457,6457,5261.0,5849,5255.0,5999.0,6302,6196,6399,6379,6398,6393,4288.0,6336,6175,6286,6419,6163,5744,5187,5892,5653,5039,4858,4456,4902,6408,6413,6284,4555,4204,6318,6382,6359,6266,6393,5966,6197.0,6438,6428,6420,6346,4967,6436,5185,6418,6411,6429,6302,5202,5987.0,6293,6215,5792,5818,6158,6354,6373,6457
unique,,1,1,1,1,1,124,3,9,12,258,9,1458,51.0,40.0,45.0,5,5,4,3,4,3,6,63,2,149,6,93,93,,3,,,3,129,7,3,5,3,,4,4,3,2362,3164,850,1004,1464,1933,572,664,449,992,1290,4,16,5,5,2,4,3,13,3,5,13.0,6,55,5,5,4,3,6,3,3,6,6,15,,7,2,7,67,2,3,3,1
top,,I am a developer by profession,Yes,Never,The quality of OSS and closed source software is about the same,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or software engineering","Taught yourself a new language, framework, or tool without taking a formal course",20 to 99 employees,"Developer, full-stack",10.0,16.0,2.0,Very satisfied,Slightly satisfied,Very confident,No,Not sure,"I’m not actively looking, but I am open to new opportunities",Less than a year ago,Interview with people in peer roles;Interview with people in senior / management roles,No,"Languages, frameworks, and other technologies I'd be working with;Office environment or company ...",I was preparing for a job search,USD,United States dollar,,Yearly,,,"There is a schedule and/or spec (made by me or by a colleague), and my work somewhat aligns",Being tasked with non-development work;Distracting work environment;Meetings,Less than once per month / Never,Office,A little above average,"Yes, because I see value in code review",,"Yes, it's part of our process",Developers and management have nearly equal input into purchasing new technology,I have little or no influence,C#;HTML/CSS;JavaScript;SQL,C#;HTML/CSS;JavaScript;SQL,Microsoft SQL Server,Microsoft SQL Server,Windows,Windows,ASP.NET;jQuery,React.js,Node.js,Node.js,Visual Studio;Visual Studio Code,Windows,I do not use containers,Not at all,Useful for immutable record keeping outside of currency,Yes,Yes,Yes,Reddit,In real life (in person),Username,2010.0,Multiple times per day,Find answers to specific questions,3-5 times per week,Stack Overflow was much faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Industry news about technologies you're interested in;...,,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy,I am a developer by profession~~Yes~~Never~~The quality of OSS and closed source software is abo...
freq,,6457,6457,6457,6457,6457,1801,5580,3512,3905,673,1500,995,561.0,697.0,814.0,2625,2240,2565,3209,2368,3986,2193,1110,5459,518,2958,1900,1900,,3069,,,3250,224,3437,4003,2813,3997,,2426,1974,3232,227,100,695,456,777,314,384,277,671,479,448,3815,3407,3799,1221,4003,3477,5268,1219,4131,4480,831.0,2200,2719,1809,3333,2033,5211,2175,3118,3681,1809,5030,1214,,5726,6182,5417,4180,3913,4354,4623,6457
mean,45047.114914,,,,,,,,,,,,,,,,,,,,,,,,,,,,,901946.6,,129517.3,44.227163,,,,,,,4.270714,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30.882412,,,,,,,,
std,25697.102576,,,,,,,,,,,,,,,,,,,,,,,,,,,,,18997230.0,,294571.7,85.581879,,,,,,,4.543397,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7.990574,,,,,,,,
min,16.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,0.0,2.0,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,
25%,22720.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,25000.0,,27492.0,40.0,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,25.0,,,,,,,,
50%,45365.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60000.0,,54397.0,40.0,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,29.0,,,,,,,,
75%,67663.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,106000.0,,95000.0,42.0,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,34.0,,,,,,,,


In [23]:
len(topCombo) / len(df)

0.07264606280166061

Our non-open-sourcer retains his lead this time! We can also see that a number of other fields have clear most frequent values. With this (very) simple model we've built up quite a picture of the typical coder!

* Professional developer and a hobbyist
* Doesn't contribute to open source (but thinks the quality is about the same as a closed source software)
* Not a student 
* Has a Bachelor's degree majoring in Computer Science or similar
* Not actively looking for job (but open to new opportunities)
* Never been asked to solve FizzBuzz
* Works to a schedule (somewhat)
* Works from home less than once per month (or never) and prefers working in the office
* Takes part in code reviews and sees the value in them
* Has little or influence on technology purchases at work
* Works on Windows and desn't use blockchain or containers
* Thinks people born today will have a better life than their parents
* Family IT support
* Prefers real life conversations
* Calls it their Username
* Straight white male with no dependents