## Stack Overflow Ceveloper Survey 2020

December 2011 Stack Overflow launched their Annual User Survey to measure changes in user demographics and trends from the previous year. Stack Overflow have continued to reach out to the developer community to ask them a variety of questions, everything from their favorite technologies to their job preferences. The survey results are published and are available to view here: https://insights.stackoverflow.com/survey

The aim of this project is to explore the data provided by Stack Overflows' developer community.


In [1]:
# Dependencies
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
plt.style.use('ggplot')
import datetime

In [2]:
# Read csv file and specify the parameters: set the index column as respondent
sOverflow = pd.read_csv("data/survey_2020.csv", index_col='Respondent')
sOverflow.head()

Unnamed: 0_level_0,MainBranch,Hobbyist,Age,Age1stCode,CompFreq,CompTotal,ConvertedComp,Country,CurrencyDesc,CurrencySymbol,...,SurveyEase,SurveyLength,Trans,UndergradMajor,WebframeDesireNextYear,WebframeWorkedWith,WelcomeChange,WorkWeekHrs,YearsCode,YearsCodePro
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a developer by profession,Yes,,13,Monthly,,,Germany,European Euro,EUR,...,Neither easy nor difficult,Appropriate in length,No,"Computer science, computer engineering, or sof...",ASP.NET Core,ASP.NET;ASP.NET Core,Just as welcome now as I felt last year,50.0,36,27.0
2,I am a developer by profession,No,,19,,,,United Kingdom,Pound sterling,GBP,...,,,,"Computer science, computer engineering, or sof...",,,Somewhat more welcome now than last year,,7,4.0
3,I code primarily as a hobby,Yes,,15,,,,Russian Federation,,,...,Neither easy nor difficult,Appropriate in length,,,,,Somewhat more welcome now than last year,,4,
4,I am a developer by profession,Yes,25.0,18,,,,Albania,Albanian lek,ALL,...,,,No,"Computer science, computer engineering, or sof...",,,Somewhat less welcome now than last year,40.0,7,4.0
5,"I used to be a developer by profession, but no...",Yes,31.0,16,,,,United States,,,...,Easy,Too short,No,"Computer science, computer engineering, or sof...",Django;Ruby on Rails,Ruby on Rails,Just as welcome now as I felt last year,,15,8.0


In [3]:
#Check number of rows & columns
sOverflow.shape

(64461, 60)

In [4]:
#Filter data
stackO = sOverflow[['DevType','OpSys','YearsCode','YearsCodePro','Country',
                    'Hobbyist','Age','Gender','Ethnicity','Employment',
                    'EdLevel','UndergradMajor','NEWEdImpt','NEWLearn',
                    'LanguageDesireNextYear','LanguageWorkedWith']]
stackO.head()
#64461 rows × 16 columns

Unnamed: 0_level_0,DevType,OpSys,YearsCode,YearsCodePro,Country,Hobbyist,Age,Gender,Ethnicity,Employment,EdLevel,UndergradMajor,NEWEdImpt,NEWLearn,LanguageDesireNextYear,LanguageWorkedWith
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,"Developer, desktop or enterprise applications;...",Windows,36,27.0,Germany,Yes,,Man,White or of European descent,"Independent contractor, freelancer, or self-em...","Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Fairly important,Once a year,C#;HTML/CSS;JavaScript,C#;HTML/CSS;JavaScript
2,"Developer, full-stack;Developer, mobile",MacOS,7,4.0,United Kingdom,No,,,,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Fairly important,Once a year,Python;Swift,JavaScript;Swift
3,,Linux-based,4,,Russian Federation,Yes,,,,,,,,Once a decade,Objective-C;Python;Swift,Objective-C;Python;Swift
4,,Linux-based,7,4.0,Albania,Yes,25.0,Man,White or of European descent,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Not at all important/not necessary,Once a year,,
5,,Windows,15,8.0,United States,Yes,31.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Very important,Once a year,Java;Ruby;Scala,HTML/CSS;Ruby;SQL


In [5]:
stackO_US = stackO.loc[stackO['Country'] == 'United States']
stackO_US.head()
#12469 rows × 16 columns

Unnamed: 0_level_0,DevType,OpSys,YearsCode,YearsCodePro,Country,Hobbyist,Age,Gender,Ethnicity,Employment,EdLevel,UndergradMajor,NEWEdImpt,NEWLearn,LanguageDesireNextYear,LanguageWorkedWith
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
5,,Windows,15,8,United States,Yes,31.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Very important,Once a year,Java;Ruby;Scala,HTML/CSS;Ruby;SQL
8,"Developer, back-end;Developer, desktop or ente...",Linux-based,17,13,United States,Yes,36.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Not at all important/not necessary,Once a year,JavaScript,Python;SQL
14,"Developer, desktop or enterprise applications;...",Windows,5,1,United States,Yes,27.0,Man,White or of European descent,Employed full-time,"Associate degree (A.A., A.S., etc.)","Computer science, computer engineering, or sof...",Somewhat important,Every few months,HTML/CSS;JavaScript;SQL;TypeScript,HTML/CSS;JavaScript;SQL;TypeScript
17,"Developer, full-stack",Windows,7,3,United States,Yes,25.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Fairly important,Once a year,C#;Go;Haskell;HTML/CSS;JavaScript;Python;Ruby;...,C#;HTML/CSS;JavaScript;Python;SQL;VBA
18,"Developer, back-end",Linux-based,19,12,United States,Yes,32.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Very important,Once every few years,HTML/CSS;Perl,Bash/Shell/PowerShell;HTML/CSS;Perl


In [6]:
len(stackO_US[stackO_US['YearsCode']=='Less than 1 year'])

94

In [7]:
#convert 'Less than 1 year' to 0.6 (months) to allow data conversion to numeric
stackO_US= stackO_US.replace('Less than 1 year','0.6')
stackO_US.head()

Unnamed: 0_level_0,DevType,OpSys,YearsCode,YearsCodePro,Country,Hobbyist,Age,Gender,Ethnicity,Employment,EdLevel,UndergradMajor,NEWEdImpt,NEWLearn,LanguageDesireNextYear,LanguageWorkedWith
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
5,,Windows,15,8,United States,Yes,31.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Very important,Once a year,Java;Ruby;Scala,HTML/CSS;Ruby;SQL
8,"Developer, back-end;Developer, desktop or ente...",Linux-based,17,13,United States,Yes,36.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Not at all important/not necessary,Once a year,JavaScript,Python;SQL
14,"Developer, desktop or enterprise applications;...",Windows,5,1,United States,Yes,27.0,Man,White or of European descent,Employed full-time,"Associate degree (A.A., A.S., etc.)","Computer science, computer engineering, or sof...",Somewhat important,Every few months,HTML/CSS;JavaScript;SQL;TypeScript,HTML/CSS;JavaScript;SQL;TypeScript
17,"Developer, full-stack",Windows,7,3,United States,Yes,25.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Fairly important,Once a year,C#;Go;Haskell;HTML/CSS;JavaScript;Python;Ruby;...,C#;HTML/CSS;JavaScript;Python;SQL;VBA
18,"Developer, back-end",Linux-based,19,12,United States,Yes,32.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Very important,Once every few years,HTML/CSS;Perl,Bash/Shell/PowerShell;HTML/CSS;Perl


In [8]:
#check
len(stackO_US[stackO_US['YearsCode']=='Less than 1 year'])


0

In [9]:
len(stackO_US[stackO_US['YearsCode']=='0.6'])

94

In [10]:
len(stackO_US[stackO_US['YearsCode']=='More than 50 years'])

72

In [11]:
#convert 'More than 50 years' to 0.6 (months) to allow data conversion to numeric
stackO_US= stackO_US.replace('More than 50 years','50.6')
len(stackO_US[stackO_US['YearsCode']=='50.6'])

72

In [12]:
#convert columns to numeric
stackO_US['YearsCode'] = pd.to_numeric(stackO_US['YearsCode'])
stackO_US['YearsCodePro'] = pd.to_numeric(stackO_US['YearsCodePro'])
stackO_US.head()

Unnamed: 0_level_0,DevType,OpSys,YearsCode,YearsCodePro,Country,Hobbyist,Age,Gender,Ethnicity,Employment,EdLevel,UndergradMajor,NEWEdImpt,NEWLearn,LanguageDesireNextYear,LanguageWorkedWith
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
5,,Windows,15.0,8.0,United States,Yes,31.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Very important,Once a year,Java;Ruby;Scala,HTML/CSS;Ruby;SQL
8,"Developer, back-end;Developer, desktop or ente...",Linux-based,17.0,13.0,United States,Yes,36.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Not at all important/not necessary,Once a year,JavaScript,Python;SQL
14,"Developer, desktop or enterprise applications;...",Windows,5.0,1.0,United States,Yes,27.0,Man,White or of European descent,Employed full-time,"Associate degree (A.A., A.S., etc.)","Computer science, computer engineering, or sof...",Somewhat important,Every few months,HTML/CSS;JavaScript;SQL;TypeScript,HTML/CSS;JavaScript;SQL;TypeScript
17,"Developer, full-stack",Windows,7.0,3.0,United States,Yes,25.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Fairly important,Once a year,C#;Go;Haskell;HTML/CSS;JavaScript;Python;Ruby;...,C#;HTML/CSS;JavaScript;Python;SQL;VBA
18,"Developer, back-end",Linux-based,19.0,12.0,United States,Yes,32.0,Man,White or of European descent,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Computer science, computer engineering, or sof...",Very important,Once every few years,HTML/CSS;Perl,Bash/Shell/PowerShell;HTML/CSS;Perl


In [13]:
#Check number of rows & columns
stackO_US.shape

(12469, 16)

In [14]:
# Save and Export Cleaned File as CSV
stackO_US.to_csv("data/stackO_US.csv")

In [15]:
#Check data types
stackO_US.dtypes

DevType                    object
OpSys                      object
YearsCode                 float64
YearsCodePro              float64
Country                    object
Hobbyist                   object
Age                       float64
Gender                     object
Ethnicity                  object
Employment                 object
EdLevel                    object
UndergradMajor             object
NEWEdImpt                  object
NEWLearn                   object
LanguageDesireNextYear     object
LanguageWorkedWith         object
dtype: object

### StackOverflow Survey Question GENDER
Q: Which of the following describe you, if any? Please check all that apply. If you prefer not to answer, you may leave this question blank.

In [16]:
#Check reponse options
stackO_US['Gender'].unique()

array(['Man', 'Woman', nan,
       'Man;Non-binary, genderqueer, or gender non-conforming',
       'Non-binary, genderqueer, or gender non-conforming',
       'Woman;Non-binary, genderqueer, or gender non-conforming',
       'Woman;Man',
       'Woman;Man;Non-binary, genderqueer, or gender non-conforming'],
      dtype=object)

In [17]:
#Check the number of reponses for Gender

nan = stackO_US[stackO_US['Gender']=='nan']
male = stackO_US[stackO_US['Gender']=='Man']
female = stackO_US[stackO_US['Gender']=='Woman']
man = stackO_US[stackO_US['Gender']=='Man;Non-binary, genderqueer, or gender non-conforming']
lgbtq = stackO_US[stackO_US['Gender']=='Non-binary, genderqueer, or gender non-conforming']
woman = stackO_US[stackO_US['Gender']=='Woman;Non-binary, genderqueer, or gender non-conforming']
wm = stackO_US[stackO_US['Gender']=='Woman;Man']
wm_non = stackO_US[stackO_US['Gender']=='Woman;Man;Non-binary, genderqueer, or gender non-conforming']
unspecified = stackO_US['Gender'].isna().sum()

print(f"Nan:{len(nan)}")
print(f"Male:{len(male)}")
print(f"Female:{len(female)}")
print(f"Man:Non-Binary:{len(man)}")
print(f"Woman:Non-Binary:{len(woman)}")
print(f"LGBTQ:{len(lgbtq)}")
print(f"Woman:Man:{len(wm)}")
print(f"Woman:Man:Non-Binary:{len(wm_non)}")
print(f"Not Specified:{(unspecified)}")

Nan:0
Male:9176
Female:1086
Man:Non-Binary:42
Woman:Non-Binary:46
LGBTQ:131
Woman:Man:6
Woman:Man:Non-Binary:3
Not Specified:1979


In [18]:
years = stackO_US.sort_values(by ='YearsCode',ascending = True)
years

Unnamed: 0_level_0,DevType,OpSys,YearsCode,YearsCodePro,Country,Hobbyist,Age,Gender,Ethnicity,Employment,EdLevel,UndergradMajor,NEWEdImpt,NEWLearn,LanguageDesireNextYear,LanguageWorkedWith
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
57699,,,0.6,,United States,Yes,12.0,Woman,,Student,Primary/elementary school,,,,Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;G...,
63358,,,0.6,,United States,Yes,,,,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Another engineering discipline (such as civil,...",Very important,,,
48495,Scientist,Windows,0.6,,United States,Yes,20.0,Man,White or of European descent,"Independent contractor, freelancer, or self-em...",Primary/elementary school,,Somewhat important,Once a decade,,
26687,,Windows,0.6,,United States,Yes,,Man,Middle Eastern,"Not employed, and not looking for work",I never completed any formal education,,,,JavaScript,
54294,,Windows,0.6,,United States,Yes,17.0,Man,Hispanic or Latino/a/x,Student,"Secondary school (e.g. American high school, G...",,,,C++;Go;HTML/CSS;JavaScript;PHP;TypeScript,HTML/CSS
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42314,,,,,United States,Yes,,,,"Independent contractor, freelancer, or self-em...",,,,,,
43211,Data or business analyst;Data scientist or mac...,,,4.0,United States,Yes,,,,"Independent contractor, freelancer, or self-em...",,,,,,
44176,,,,,United States,Yes,,,,,,,,,,
45374,,Windows,,,United States,Yes,,,,Employed full-time,,,,Once a year,Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;G...,Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;G...


In [19]:
# Save and Export Cleaned File as CSV
years.to_csv("data/years.csv")

In [20]:
years.shape

(12469, 16)

In [21]:
# identify number of rows that do not specify years code
not_specified = years['YearsCode'].isna().sum()
not_specified

710

In [22]:
# drop number of rows that do not specify years code
#dropYears = years['YearsCode'].dropna()
