# Social Justice in Programming

This project investigates the level of equal opportunities in the field of professional developers.<br>
Survey results from Stackoverflow, which are freely accessible at https://insights.stackoverflow.com/survey, are used as the data basis.<br><br>
**Note: The notebook does not claim to fully reflect the complexity of this question. It only elaborates on what the Stackoverflow survey results indicate.*

The questions to be answered in this notebook are: 
- To what extent is there equal opportunity in the programming profession? <br> 
Interesting columns: HighestEducationParents, Gender, Race, Country, Salary, JobSatisfaction, CareerSatisfaction
<br><br>
- How much does your social environment influence your chances of being successful as a developer? <br>
Interesting columns: FriendsDevelopers, HighestEducationParents, Salary, JobSatisfaction, CareerSatisfaction > correlation or causality
<br><br>
- Can we see any development in the area of social justice? <br>
Interesting columns: Parents Education, Race, Origin, Salary, JobSatisfaction, CareerSatisfaction > different years
<br><br>
- How important is an open mind and tolerance to succeed as a programmer? <br>
Interesting columns: DiversityImportant, RightWrongWay, ChallengeMyself, OtherPeoplesCode
<br><br>
- How well can the salary be predicted on the basis of the starting conditions?
<br><br><br>
To answer these questions, the first step is to analyze the data base in an exploratory manner.


In [1]:
# necessary libraries
import pandas as pd
import seaborn as sns


### Let's get a feel for the data set

In [2]:
# read in the data
df = pd.read_csv("2017survey_results_public.csv", encoding="ISO-8859-1")
schema = pd.read_csv("2017survey_results_schema.csv", encoding="ISO-8859-1")
schema.set_index("Column", inplace=True)

In [3]:
# size of the dataset
print("shape of the data:",df.shape)

# data types
print("\n\n",df.info())

pd.set_option('display.max_columns',200)
df.head()

shape of the data: (51392, 154)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51392 entries, 0 to 51391
Columns: 154 entries, Respondent to ExpectedSalary
dtypes: float64(6), int64(1), object(147)
memory usage: 60.4+ MB


 None


Unnamed: 0,Respondent,Professional,ProgramHobby,Country,University,EmploymentStatus,FormalEducation,MajorUndergrad,HomeRemote,CompanySize,CompanyType,YearsProgram,YearsCodedJob,YearsCodedJobPast,DeveloperType,WebDeveloperType,MobileDeveloperType,NonDeveloperType,CareerSatisfaction,JobSatisfaction,ExCoderReturn,ExCoderNotForMe,ExCoderBalance,ExCoder10Years,ExCoderBelonged,ExCoderSkills,ExCoderWillNotCode,ExCoderActive,PronounceGIF,ProblemSolving,BuildingThings,LearningNewTech,BoringDetails,JobSecurity,DiversityImportant,AnnoyingUI,FriendsDevelopers,RightWrongWay,UnderstandComputers,SeriousWork,InvestTimeTools,WorkPayCare,KinshipDevelopers,ChallengeMyself,CompetePeers,ChangeWorld,JobSeekingStatus,HoursPerWeek,LastNewJob,AssessJobIndustry,AssessJobRole,AssessJobExp,AssessJobDept,AssessJobTech,AssessJobProjects,AssessJobCompensation,AssessJobOffice,AssessJobCommute,AssessJobRemote,AssessJobLeaders,AssessJobProfDevel,AssessJobDiversity,AssessJobProduct,AssessJobFinances,ImportantBenefits,ClickyKeys,JobProfile,ResumePrompted,LearnedHiring,ImportantHiringAlgorithms,ImportantHiringTechExp,ImportantHiringCommunication,ImportantHiringOpenSource,ImportantHiringPMExp,ImportantHiringCompanies,ImportantHiringTitles,ImportantHiringEducation,ImportantHiringRep,ImportantHiringGettingThingsDone,Currency,Overpaid,TabsSpaces,EducationImportant,EducationTypes,SelfTaughtTypes,TimeAfterBootcamp,CousinEducation,WorkStart,HaveWorkedLanguage,WantWorkLanguage,HaveWorkedFramework,WantWorkFramework,HaveWorkedDatabase,WantWorkDatabase,HaveWorkedPlatform,WantWorkPlatform,IDE,AuditoryEnvironment,Methodology,VersionControl,CheckInCode,ShipIt,OtherPeoplesCode,ProjectManagement,EnjoyDebugging,InTheZone,DifficultCommunication,CollaborateRemote,MetricAssess,EquipmentSatisfiedMonitors,EquipmentSatisfiedCPU,EquipmentSatisfiedRAM,EquipmentSatisfiedStorage,EquipmentSatisfiedRW,InfluenceInternet,InfluenceWorkstation,InfluenceHardware,InfluenceServers,InfluenceTechStack,InfluenceDeptTech,InfluenceVizTools,InfluenceDatabase,InfluenceCloud,InfluenceConsultants,InfluenceRecruitment,InfluenceCommunication,StackOverflowDescribes,StackOverflowSatisfaction,StackOverflowDevices,StackOverflowFoundAnswer,StackOverflowCopiedCode,StackOverflowJobListing,StackOverflowCompanyPage,StackOverflowJobSearch,StackOverflowNewQuestion,StackOverflowAnswer,StackOverflowMetaChat,StackOverflowAdsRelevant,StackOverflowAdsDistracting,StackOverflowModeration,StackOverflowCommunity,StackOverflowHelpful,StackOverflowBetter,StackOverflowWhatDo,StackOverflowMakeMoney,Gender,HighestEducationParents,Race,SurveyLong,QuestionsInteresting,QuestionsConfusing,InterestedAnswers,Salary,ExpectedSalary
0,1,Student,"Yes, both",United States,No,"Not employed, and not looking for work",Secondary school,,,,,2 to 3 years,,,,,,,,,,,,,,,,,"With a soft ""g,"" like ""jiff""",Strongly agree,Strongly agree,Agree,Disagree,Strongly agree,Agree,Agree,Disagree,Somewhat agree,Disagree,Strongly agree,Strongly agree,Strongly disagree,Agree,Agree,Disagree,Agree,"I'm not actively looking, but I am open to new...",0.0,Not applicable/ never,Very important,Very important,Important,Very important,Very important,Very important,Important,Very important,Very important,Very important,Very important,Very important,Somewhat important,Not very important,Somewhat important,Stock options; Vacation/days off; Remote options,Yes,Other,,,Important,Important,Important,Somewhat important,Important,Not very important,Not very important,Not at all important,Somewhat important,Very important,,,Tabs,,Online course; Open source contributions,,,,6:00 AM,Swift,Swift,,,,,iOS,iOS,Atom; Xcode,Turn on some music,,,,,,,,,,,,Somewhat satisfied,Not very satisfied,Not at all satisfied,Very satisfied,Satisfied,Not very satisfied,,,,,,,,,,,,I have created a CV or Developer Story on Stac...,9.0,Desktop; iOS app,At least once each week,Haven't done at all,Once or twice,Haven't done at all,Haven't done at all,Several times,Several times,Once or twice,Somewhat agree,Strongly disagree,Strongly disagree,Strongly agree,Agree,Strongly agree,Strongly agree,Strongly disagree,Male,High school,White or of European descent,Strongly disagree,Strongly agree,Disagree,Strongly agree,,
1,2,Student,"Yes, both",United Kingdom,"Yes, full-time",Employed part-time,Some college/university study without earning ...,Computer science or software engineering,"More than half, but not all, the time",20 to 99 employees,"Privately-held limited company, not in startup...",9 to 10 years,,,,,,,,,,,,,,,,,"With a hard ""g,"" like ""gift""",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Other,,Some other way,Important,Important,Important,Important,Somewhat important,Somewhat important,Not very important,Somewhat important,Not very important,Very important,British pounds sterling (Â£),,Spaces,,Online course; Self-taught; Hackathon; Open so...,Official documentation; Stack Overflow Q&A; Other,,,10:00 AM,JavaScript; Python; Ruby; SQL,Java; Python; Ruby; SQL,.NET Core,.NET Core,MySQL; SQLite,MySQL; SQLite,Amazon Web Services (AWS),Linux Desktop; Raspberry Pi; Amazon Web Servic...,Atom; Notepad++; Vim; PyCharm; RubyMine; Visua...,"Put on some ambient sounds (e.g. whale songs, ...",,Git,Multiple times a day,Agree,Disagree,Strongly disagree,Agree,Somewhat agree,Disagree,Strongly disagree,Customer satisfaction; On time/in budget; Peer...,Not very satisfied,Satisfied,Satisfied,Satisfied,Somewhat satisfied,Satisfied,No influence at all,No influence at all,No influence at all,No influence at all,No influence at all,No influence at all,No influence at all,No influence at all,No influence at all,No influence at all,No influence at all,I have created a CV or Developer Story on Stac...,8.0,Desktop; iOS browser; iOS app; Android browser...,Several times,Several times,Once or twice,Once or twice,Once or twice,Haven't done at all,Several times,At least once each week,Disagree,Strongly disagree,Strongly disagree,Strongly agree,Agree,Strongly agree,Strongly agree,Strongly disagree,Male,A master's degree,White or of European descent,Somewhat agree,Somewhat agree,Disagree,Strongly agree,,37500.0
2,3,Professional developer,"Yes, both",United Kingdom,No,Employed full-time,Bachelor's degree,Computer science or software engineering,"Less than half the time, but at least one day ...","10,000 or more employees",Publicly-traded corporation,20 or more years,20 or more years,,Other,,,,8.0,9.0,,,,,,,,,"With a hard ""g,"" like ""gift""",Strongly agree,Strongly agree,Strongly agree,Somewhat agree,Agree,Strongly agree,Agree,Somewhat agree,Disagree,Disagree,Agree,Somewhat agree,Disagree,Somewhat agree,Agree,Disagree,Agree,,,,,,,,,,,,,,,,,,,,Yes,,,,,,,,,,,,,,British pounds sterling (Â£),Neither underpaid nor overpaid,Spaces,Not very important,Self-taught; Coding competition; Hackathon; Op...,Official documentation; Trade book; Textbook; ...,,,9:00 AM,Java; PHP; Python,C; Python; Rust,,,MySQL,,,,Sublime Text; Vim; IntelliJ,Turn on some music,Agile; Lean; Scrum; Extreme; Pair; Kanban,Mercurial,Multiple times a day,Agree,Disagree,Disagree,Agree,Agree,Disagree,Somewhat agree,Customer satisfaction; Benchmarked product per...,Very satisfied,Somewhat satisfied,Satisfied,Satisfied,Somewhat satisfied,Very satisfied,A lot of influence,Some influence,Some influence,Some influence,A lot of influence,Some influence,Some influence,Some influence,Some influence,Some influence,Some influence,I have created a CV or Developer Story on Stac...,8.0,Desktop; iOS browser; iOS app,Once or twice,Haven't done at all,Haven't done at all,Haven't done at all,Haven't done at all,Haven't done at all,At least once each day,At least once each day,Disagree,Disagree,Strongly disagree,Strongly agree,Agree,Agree,Agree,Disagree,Male,A professional degree,White or of European descent,Somewhat agree,Agree,Disagree,Agree,113750.0,
3,4,Professional non-developer who sometimes write...,"Yes, both",United States,No,Employed full-time,Doctoral degree,A non-computer-focused engineering discipline,"Less than half the time, but at least one day ...","10,000 or more employees",Non-profit/non-governmental organization or pr...,14 to 15 years,9 to 10 years,,,,,Data scientist,6.0,3.0,,,,,,,,,"With a soft ""g,"" like ""jiff""",Strongly agree,Strongly agree,Strongly agree,Disagree,Somewhat agree,Agree,Agree,Agree,Somewhat agree,Strongly disagree,Strongly agree,Agree,Disagree,Strongly agree,Strongly agree,Somewhat agree,Agree,I am actively looking for a job,5.0,Between 2 and 4 years ago,Somewhat important,Somewhat important,Somewhat important,Important,Important,Very important,Important,Very important,Important,Somewhat important,Not very important,Very important,Important,Very important,Very important,Stock options; Annual bonus; Health benefits; ...,Yes,LinkedIn; Other,,"A friend, family member, or former colleague t...",Somewhat important,Somewhat important,Very important,Very important,Somewhat important,Somewhat important,Not very important,Not very important,Important,Very important,,,Spaces,,,,,,9:00 AM,Matlab; Python; R; SQL,Matlab; Python; R; SQL,React,Hadoop; Node.js; React,MongoDB; Redis; SQL Server; MySQL; SQLite,MongoDB; Redis; SQL Server; MySQL; SQLite,Windows Desktop; Linux Desktop; Mac OS; Amazon...,Windows Desktop; Linux Desktop; Mac OS; Amazon...,Notepad++; Sublime Text; TextMate; Vim; IPytho...,Turn on some music,Agile,Git,Multiple times a day,Somewhat agree,Agree,Somewhat agree,Somewhat agree,Strongly agree,Disagree,Somewhat agree,,,,,,,,,,,,,,,,,,,I have created a CV or Developer Story on Stac...,10.0,Desktop; iOS browser; iOS app,At least once each week,Several times,At least once each week,Several times,At least once each week,Several times,At least once each day,At least once each day,Agree,Strongly disagree,Strongly disagree,Strongly agree,Strongly agree,Agree,Strongly agree,Disagree,Male,A doctoral degree,White or of European descent,Agree,Agree,Somewhat agree,Strongly agree,,
4,5,Professional developer,"Yes, I program as a hobby",Switzerland,No,Employed full-time,Master's degree,Computer science or software engineering,Never,10 to 19 employees,"Privately-held limited company, not in startup...",20 or more years,10 to 11 years,,Mobile developer; Graphics programming; Deskto...,,,,6.0,8.0,,,,,,,,,"With a soft ""g,"" like ""jiff""",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Satisfied,Satisfied,Satisfied,Satisfied,Satisfied,Satisfied,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


The most interesting columns in regard to the questions are
<br><br><br>
>Q1:
- **Initial conditons:** Race, HighestEducationParents, Gender
- **Influenceable conditions:** FormalEducation, Country
<br><br>
>Q2:
- **Social environment:** HighestEducationParents, FriendsDevelopers
<br><br>
>Q3:
- **Mindset:** RightWrongWay, LearningNewTech, DiversityImportant
<br><br><br>
To evaluate whether a developer is successful, the following parameters are evaluated:
- **Evaluation parameters:** JobSatisfaction, CareerSatisfaction, Salary
<br><br>
Of course, these evaluation criteria must take into account whether the survey participant is a professional programmer. <br> 
We should also take into account how long the person has been working in the profession, as it is known that the salary increases with experience.
- **Additional parameters:** YearsProgram, Professional


In [32]:
INITIAL_CONDITIONS = ["Race", "HighestEducationParents", "Gender"]
INFLUENCEABLE_CONDITIONS = ["FormalEducation", "Country"]
SOCIAL_ENVIRONMENT = ["HighestEducationParents", "FriendsDevelopers"]
MINDSET = ["RightWrongWay", "LearningNewTech", "DiversityImportant"]
EVALUATION_PARAMETERS = ["JobSatisfaction", "CareerSatisfaction", "Salary"]
ADDITIONAL_PARAMETERS = ["YearsProgram", "Professional"]

columns_of_interest = list(set([item for list in [INITIAL_CONDITIONS, INFLUENCEABLE_CONDITIONS, SOCIAL_ENVIRONMENT, 
                                                  MINDSET, ADDITIONAL_PARAMETERS, EVALUATION_PARAMETERS] for item in list]))


In [33]:
for index, row in schema.loc[columns_of_interest].iterrows():
    question = schema.loc[index, "Question"]
    print(f"{index}: {question}")

Race: Which of the following do you identify as?
Country: In which country do you currently live?
CareerSatisfaction: Career satisfaction rating
Professional: Which of the following best describes you?
LearningNewTech: Learning new technologies is fun
FormalEducation: Which of the following best describes the highest level of formal education that you've completed?
FriendsDevelopers: Most of my friends are developers, engineers, or scientists
Gender: Which of the following do you currently identify as?
YearsProgram: How long has it been since you first learned how to program?
RightWrongWay: There's a right and a wrong way to do everything
Salary: What is your current annual base salary, before taxes, and excluding bonuses, grants, or other compensation?
JobSatisfaction: Job satisfaction rating
DiversityImportant: Diversity in the workplace is important
HighestEducationParents: What is the highest level of education received by either of your parents?


In [34]:
df_filtered = df[columns_of_interest]
df_filtered.head()

Unnamed: 0,Race,Country,CareerSatisfaction,Professional,LearningNewTech,FormalEducation,FriendsDevelopers,Gender,YearsProgram,RightWrongWay,Salary,JobSatisfaction,DiversityImportant,HighestEducationParents
0,White or of European descent,United States,,Student,Agree,Secondary school,Disagree,Male,2 to 3 years,Somewhat agree,,,Agree,High school
1,White or of European descent,United Kingdom,,Student,,Some college/university study without earning ...,,Male,9 to 10 years,,,,,A master's degree
2,White or of European descent,United Kingdom,8.0,Professional developer,Strongly agree,Bachelor's degree,Somewhat agree,Male,20 or more years,Disagree,113750.0,9.0,Strongly agree,A professional degree
3,White or of European descent,United States,6.0,Professional non-developer who sometimes write...,Strongly agree,Doctoral degree,Agree,Male,14 to 15 years,Somewhat agree,,3.0,Agree,A doctoral degree
4,,Switzerland,6.0,Professional developer,,Master's degree,,,20 or more years,,,8.0,,


In [35]:
# datatypes in the dataset
df_filtered.dtypes

Race                        object
Country                     object
CareerSatisfaction         float64
Professional                object
LearningNewTech             object
FormalEducation             object
FriendsDevelopers           object
Gender                      object
YearsProgram                object
RightWrongWay               object
Salary                     float64
JobSatisfaction            float64
DiversityImportant          object
HighestEducationParents     object
dtype: object

In [36]:
# ditribution of the quantitative parameters
df_filtered.describe()

Unnamed: 0,CareerSatisfaction,Salary,JobSatisfaction
count,42695.0,12891.0,40376.0
mean,7.300574,56298.480641,6.957078
std,1.955444,39880.905277,2.167652
min,0.0,0.0,0.0
25%,6.0,26440.371839,6.0
50%,8.0,50000.0,7.0
75%,9.0,80000.0,8.0
max,10.0,197000.0,10.0


In [37]:
# proportions of missing evaluation parameters
prop_missing_salary = df_filtered["Salary"].isnull().mean()
prop_missing_jobsf = df_filtered["JobSatisfaction"].isnull().mean()
prop_missing_careersf = df_filtered["CareerSatisfaction"].isnull().mean()
print("Missing evaluation parameters")
print(f"Salary: {round(prop_missing_salary*100, 1)} %")
print(f"JobSatisfaction: {round(prop_missing_jobsf*100, 1)} %")
print(f"CareerSatisfaction: {round(prop_missing_careersf*100, 1)} %")

# missing evaluation parameters combined
print("\nData left after removing every row with missing evaluation parameters:")
print(f"{round((df_filtered.dropna(subset=EVALUATION_PARAMETERS, axis=0).shape[0]/df.shape[0])*100, 1)} %, -> {df_filtered.dropna(subset=EVALUATION_PARAMETERS, axis=0).shape[0]}")



Missing evaluation parameters
Salary: 74.9 %
JobSatisfaction: 21.4 %
CareerSatisfaction: 16.9 %

Data left after removing every row with missing evaluation parameters:
25.0 %, -> 12847


In [38]:
df_eval = df_filtered.dropna(subset=EVALUATION_PARAMETERS, axis=0)


In [55]:
def get_value_counts(colum):
    return df_eval[colum].value_counts()
dict_a = {}
for col in columns_of_interest:
    dict_a[col]=get_value_counts(col)
dict_a[columns_of_interest[0]]
# wie damit umgehen, dass es mehrfach angaben gibt?

White or of European descent                                                                                                                                                                                                       8685
South Asian                                                                                                                                                                                                                         682
Hispanic or Latino/Latina                                                                                                                                                                                                           419
East Asian                                                                                                                                                                                                                          308
Middle Eastern                                                          

In [11]:
print(df_filtered["Race"].value_counts(), df_filtered["Professional"].value_counts())

White or of European descent                                                                                                                                                                         23415
South Asian                                                                                                                                                                                           2657
Hispanic or Latino/Latina                                                                                                                                                                             1289
East Asian                                                                                                                                                                                            1285
Middle Eastern                                                                                                                                                                              