# stackoverflow_survey_analysis

In this jupyter notebook the analysis of the Stack Overflow survey from 2020 and 2024 is presented.   
For the analysis, the [CRISP-DM](https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining) process is used.

## 1. Business Understanding
The main objective of this analysis is to get used to the data science process according to [CRISP-DM](https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining) taught during the [Udacity - Data Scientist](https://www.udacity.com/enrollment/nd025) Course and to apply the acquired knowledge to a real-world problem.

To achieve this, the analysis will use the [Stack Overflow surveys from 2020 and 2024](https://survey.stackoverflow.co/) to address the following three questions:

**When comparing the results from 2020 to 2024 ...**  

- **Question 1: ... did the difficulty of the survey increase over the past four years?**
  - *Relevance*: Understanding changes in survey difficulty over time can help to ensure that any observed trends or patterns in the results are not due to changes in survey complexity.

- **Question 2: ... did the past four years change the general job statisfaction?**   
  - *Relevance*: Analyzing shifts in job satisfaction over time can provide insights into the impact of external factors such as economic conditions or general changes in working conditions.

- **Question 3: ... did the past four years change job compensation?**   
  - *Relevance*: Examining changes in job compensation over the past four years helps to see trends in salary growth, inflation adjustments, and how economic factors or industry shifts may have influenced compensation practices.




## 2. Data understanding

This chapter describes the steps taken to understand the provided data by the Stack Overflow surveys from 2020 and 2024

### 2.1 Import packages and load dataframes

First, we import the relevant packages for the analysis

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Then we import the relevant survays into our workspace

In [6]:
df_20 = pd.read_csv('./data/stack-overflow-developer-survey-2020/survey_results_public.csv')
df_20_schema = pd.read_csv('./data/stack-overflow-developer-survey-2020/survey_results_schema.csv')
df_24 = pd.read_csv('./data/stack-overflow-developer-survey-2024/survey_results_public.csv')
df_24_schema = pd.read_csv('./data/stack-overflow-developer-survey-2024/survey_results_schema.csv')

Now, lets take a look at the head of the 2020 and 2024 survey

In [7]:
df_20.head()

Unnamed: 0,Respondent,MainBranch,Hobbyist,Age,Age1stCode,CompFreq,CompTotal,ConvertedComp,Country,CurrencyDesc,...,SurveyEase,SurveyLength,Trans,UndergradMajor,WebframeDesireNextYear,WebframeWorkedWith,WelcomeChange,WorkWeekHrs,YearsCode,YearsCodePro
0,1,I am a developer by profession,Yes,,13,Monthly,,,Germany,European Euro,...,Neither easy nor difficult,Appropriate in length,No,"Computer science, computer engineering, or sof...",ASP.NET Core,ASP.NET;ASP.NET Core,Just as welcome now as I felt last year,50.0,36,27.0
1,2,I am a developer by profession,No,,19,,,,United Kingdom,Pound sterling,...,,,,"Computer science, computer engineering, or sof...",,,Somewhat more welcome now than last year,,7,4.0
2,3,I code primarily as a hobby,Yes,,15,,,,Russian Federation,,...,Neither easy nor difficult,Appropriate in length,,,,,Somewhat more welcome now than last year,,4,
3,4,I am a developer by profession,Yes,25.0,18,,,,Albania,Albanian lek,...,,,No,"Computer science, computer engineering, or sof...",,,Somewhat less welcome now than last year,40.0,7,4.0
4,5,"I used to be a developer by profession, but no...",Yes,31.0,16,,,,United States,,...,Easy,Too short,No,"Computer science, computer engineering, or sof...",Django;Ruby on Rails,Ruby on Rails,Just as welcome now as I felt last year,,15,8.0


In [8]:
df_24.head()

Unnamed: 0,ResponseId,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
0,1,I am a developer by profession,Under 18 years old,"Employed, full-time",Remote,Apples,Hobby,Primary/elementary school,Books / Physical media,,...,,,,,,,,,,
1,2,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,
2,3,I am a developer by profession,45-54 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,...,,,,,,,Appropriate in length,Easy,,
3,4,I am learning to code,18-24 years old,"Student, full-time",,Apples,,Some college/university study without earning ...,"Other online resources (e.g., videos, blogs, f...",Stack Overflow;How-to videos;Interactive tutorial,...,,,,,,,Too long,Easy,,
4,5,I am a developer by profession,18-24 years old,"Student, full-time",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,...,,,,,,,Too short,Easy,,


### 2.2 Find common colums

Lets find the coloums which are available in both datafields

In [9]:
common_columns = df_24.columns.intersection(df_20.columns)
common_columns_list = common_columns.tolist()
common_columns_list

['MainBranch',
 'Age',
 'Employment',
 'EdLevel',
 'YearsCode',
 'YearsCodePro',
 'DevType',
 'OrgSize',
 'Country',
 'CompTotal',
 'NEWSOSites',
 'SOVisitFreq',
 'SOAccount',
 'SOPartFreq',
 'SOComm',
 'SurveyLength',
 'SurveyEase',
 'JobSat']

show description of each column

In [10]:
pd.set_option('display.max_colwidth', None) # do not limit output
loc_column = common_columns_list
for loc_column in common_columns_list:
    # Print the question text from df_20_schema
    print(f"df_20_schema: Question text for '{loc_column}'")
    print(df_20_schema[df_20_schema['Column'] == loc_column]['QuestionText'].to_string(index=False))

    # Print the question text from df_24_schema
    print(f"df_24_schema: Question text for '{loc_column}'")
    print(df_24_schema[df_24_schema['qname'] == loc_column]['question'].to_string(index=False))
    
    print("\n" + "-"*50 + "\n")  # Separator between different columns for better readability
pd.reset_option('display.max_colwidth')

df_20_schema: Question text for 'MainBranch'
Which of the following options best describes you today? Here, by "developer" we mean "someone who writes code."
df_24_schema: Question text for 'MainBranch'
Which of the following options best describes you today? For the purpose of this survey, a developer is "someone who writes code".*

--------------------------------------------------

df_20_schema: Question text for 'Age'
What is your age (in years)? If you prefer not to answer, you may leave this question blank.
df_24_schema: Question text for 'Age'
What is your age?*

--------------------------------------------------

df_20_schema: Question text for 'Employment'
Which of the following best describes your current employment status?
df_24_schema: Question text for 'Employment'
Which of the following best describes your current employment status? Select all that apply.*

--------------------------------------------------

df_20_schema: Question text for 'EdLevel'
Which of the following

Lets have a short look at the content of the common columns

In [11]:
df_20[common_columns_list].head()

Unnamed: 0,MainBranch,Age,Employment,EdLevel,YearsCode,YearsCodePro,DevType,OrgSize,Country,CompTotal,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,SurveyLength,SurveyEase,JobSat
0,I am a developer by profession,,"Independent contractor, freelancer, or self-em...","Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",36,27.0,"Developer, desktop or enterprise applications;...",2 to 9 employees,Germany,,Stack Overflow (public Q&A for anyone who codes),Multiple times per day,No,,"No, not at all",Appropriate in length,Neither easy nor difficult,Slightly satisfied
1,I am a developer by profession,,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",7,4.0,"Developer, full-stack;Developer, mobile","1,000 to 4,999 employees",United Kingdom,,Stack Overflow (public Q&A for anyone who code...,Multiple times per day,Yes,Less than once per month or monthly,"Yes, definitely",,,Very dissatisfied
2,I code primarily as a hobby,,,,4,,,,Russian Federation,,Stack Overflow (public Q&A for anyone who codes),Daily or almost daily,Yes,A few times per month or weekly,"Yes, somewhat",Appropriate in length,Neither easy nor difficult,
3,I am a developer by profession,25.0,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",7,4.0,,20 to 99 employees,Albania,,Stack Overflow (public Q&A for anyone who code...,Multiple times per day,Yes,A few times per month or weekly,"Yes, definitely",,,Slightly dissatisfied
4,"I used to be a developer by profession, but no...",31.0,Employed full-time,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",15,8.0,,,United States,,Stack Overflow (public Q&A for anyone who code...,A few times per month or weekly,Yes,Less than once per month or monthly,"Yes, somewhat",Too short,Easy,


In [12]:
df_24[common_columns_list].head()

Unnamed: 0,MainBranch,Age,Employment,EdLevel,YearsCode,YearsCodePro,DevType,OrgSize,Country,CompTotal,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,SurveyLength,SurveyEase,JobSat
0,I am a developer by profession,Under 18 years old,"Employed, full-time",Primary/elementary school,,,,,United States of America,,I have never visited Stack Overflow or the Sta...,,,,,,,
1,I am a developer by profession,35-44 years old,"Employed, full-time","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",20.0,17.0,"Developer, full-stack",,United Kingdom of Great Britain and Northern I...,,Stack Overflow for Teams (private knowledge sh...,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",,,
2,I am a developer by profession,45-54 years old,"Employed, full-time","Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",37.0,27.0,Developer Experience,,United Kingdom of Great Britain and Northern I...,,Stack Overflow;Stack Exchange;Stack Overflow B...,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",Appropriate in length,Easy,
3,I am learning to code,18-24 years old,"Student, full-time",Some college/university study without earning ...,4.0,,"Developer, full-stack",,Canada,,Stack Overflow,Daily or almost daily,No,,"No, not really",Too long,Easy,
4,I am a developer by profession,18-24 years old,"Student, full-time","Secondary school (e.g. American high school, G...",9.0,,"Developer, full-stack",,Norway,,Stack Overflow for Teams (private knowledge sh...,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",Too short,Easy,
