# Developer tendencies in 2021 - A data based appproach

The analysis will be performed with the data obtained from the Stack Overflow survey performed each year. If want to check the information in detail please refer to: https://insights.stackoverflow.com/survey

## Question of interest

This project will perform an analysis on the developer tendencies regarding several aspects, where the main questons targeted to be answered are the next ones:

1. Which are the most used programming languages and which are the higher paid?
2. From years of coding, years of pro coding or education level, which generates the highest salary?
3. Which way to learn gives the biggest salary? In which type of companies each of these ways to learn people work?

### Data entry 
Firstly, the dataset was reviewed to understand in detail the information we have. 

In [5]:
# Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
%matplotlib inline

# Pandas settings
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
#pd.options.display.max_rows = 100

# Data import
survey_2021 = pd.read_csv('survey_results_public_2021.csv')

print("Survey 2021 columns: ", survey_2021.dtypes, "\n")
survey_2021.head()

Survey 2021 columns:  ResponseId                        int64
MainBranch                       object
Employment                       object
Country                          object
US_State                         object
UK_Country                       object
EdLevel                          object
Age1stCode                       object
LearnCode                        object
YearsCode                        object
YearsCodePro                     object
DevType                          object
OrgSize                          object
Currency                         object
CompTotal                       float64
CompFreq                         object
LanguageHaveWorkedWith           object
LanguageWantToWorkWith           object
DatabaseHaveWorkedWith           object
DatabaseWantToWorkWith           object
PlatformHaveWorkedWith           object
PlatformWantToWorkWith           object
WebframeHaveWorkedWith           object
WebframeWantToWorkWith           object
MiscTechHaveWorked

Unnamed: 0,ResponseId,MainBranch,Employment,Country,US_State,UK_Country,EdLevel,Age1stCode,LearnCode,YearsCode,YearsCodePro,DevType,OrgSize,Currency,CompTotal,CompFreq,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSys,NEWStuck,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,NEWOtherComms,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,SurveyLength,SurveyEase,ConvertedCompYearly
0,1,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Slovakia,,,"Secondary school (e.g. American high school, G...",18 - 24 years,Coding Bootcamp;Other online resources (ex: vi...,,,"Developer, mobile",20 to 99 employees,EUR European Euro,4800.0,Monthly,C++;HTML/CSS;JavaScript;Objective-C;PHP;Swift,Swift,PostgreSQL;SQLite,SQLite,,,Laravel;Symfony,,,,,,PHPStorm;Xcode,Atom;Xcode,MacOS,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Multiple times per day,Yes,A few times per month or weekly,"Yes, definitely",No,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,62268.0
1,2,I am a student who is learning to code,"Student, full-time",Netherlands,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",7.0,,,,,,,JavaScript;Python,,PostgreSQL,,,,Angular;Flask;Vue.js,,Cordova,,Docker;Git;Yarn,Git,Android Studio;IntelliJ;Notepad++;PyCharm,,Windows,Visit Stack Overflow;Google it,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,"Yes, definitely",No,18-24 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,
2,3,"I am not primarily a developer, but I write co...","Student, full-time",Russian Federation,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",,,,,,,,Assembly;C;Python;R;Rust,Julia;Python;Rust,SQLite,SQLite,Heroku,,Flask,Flask,NumPy;Pandas;TensorFlow;Torch/PyTorch,Keras;NumPy;Pandas;TensorFlow;Torch/PyTorch,,,IPython/Jupyter;PyCharm;RStudio;Sublime Text;V...,IPython/Jupyter;RStudio;Sublime Text;Visual St...,MacOS,Visit Stack Overflow;Google it;Watch help / tu...,Stack Overflow;Stack Exchange,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",Yes,18-24 years old,Man,No,Prefer not to say,Prefer not to say,None of the above,None of the above,Appropriate in length,Easy,
3,4,I am a developer by profession,Employed full-time,Austria,,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",11 - 17 years,,,,"Developer, front-end",100 to 499 employees,EUR European Euro,,Monthly,JavaScript;TypeScript,JavaScript;TypeScript,,,,,Angular;jQuery,Angular;jQuery,,,,,,,Windows,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,Neutral,No,35-44 years old,Man,No,Straight / Heterosexual,White or of European descent,I am deaf / hard of hearing,,Appropriate in length,Neither easy nor difficult,
4,5,I am a developer by profession,"Independent contractor, freelancer, or self-em...",United Kingdom of Great Britain and Northern I...,,England,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",5 - 10 years,Friend or family member,17.0,10.0,"Developer, desktop or enterprise applications;...","Just me - I am a freelancer, sole proprietor, ...",GBP\tPound sterling,,,Bash/Shell;HTML/CSS;Python;SQL,Bash/Shell;HTML/CSS;Python;SQL,Elasticsearch;PostgreSQL;Redis,Cassandra;Elasticsearch;PostgreSQL;Redis,,,Flask,Flask,Apache Spark;Hadoop;NumPy;Pandas,Hadoop;NumPy;Pandas,Docker;Git;Kubernetes;Yarn,Docker;Git;Kubernetes;Yarn,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim;Vis...,Linux-based,Visit Stack Overflow;Go for a walk or other ph...,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per week,"Yes, somewhat",No,25-34 years old,Man,No,,White or of European descent,None of the above,,Appropriate in length,Easy,


In [6]:
# Data Cleaning

survey_2021.dropna(subset=['CompTotal'],inplace=True)

In [41]:
# Functions needed to implement

test_df = survey_2021.head(2)

def get_average_df(categorycolumn, averagecolumn, categoryname, averagename):
    
    print(categorycolumn)
    start_index = 0
    df = pd.DataFrame(columns=[categoryname,averagename]) # Create dataframe
    categories_count = categorycolumn.count(';') + 1 # Count the categories in the row
    
    for i in range(categories_count):
        
        if i!=0: 
            start_index = end_index
            
        end_index = categorycolumn.find(';', start_index+1, len(categorycolumn))
        string = categorycolumn[start_index:end_index]

        if string in df.columns:
            df[averagename][string] = (df[averagename][string] + averagecolumn) / 2
        else:
            dict = {categoryname: string, averagename: averagecolumn}
            df = df.append(dict, ignore_index = True)
        
        print('Startindex: ', start_index, 'Endindex: ', end_index,'\n')
        print(string,"\n")

In [42]:
holis = test_df.apply(lambda row: get_average_df(row['LanguageHaveWorkedWith'],row['CompTotal'],
                                                 'Language','Average compensation'), axis=1)




C++;HTML/CSS;JavaScript;Objective-C;PHP;Swift
Startindex:  0 Endindex:  3 

C++ 

Startindex:  3 Endindex:  12 

;HTML/CSS 

Startindex:  12 Endindex:  23 

;JavaScript 

Startindex:  23 Endindex:  35 

;Objective-C 

Startindex:  35 Endindex:  39 

;PHP 

Startindex:  39 Endindex:  -1 

;Swif 

C++;Python
Startindex:  0 Endindex:  3 

C++ 

Startindex:  3 Endindex:  -1 

;Pytho 



In [8]:
print(survey_2021['LanguageHaveWorkedWith'].value_counts(dropna=False))

HTML/CSS;JavaScript;Node.js;TypeScript                                                                                                                                                                                                               573
Python                                                                                                                                                                                                                                               548
C#;HTML/CSS;JavaScript;SQL                                                                                                                                                                                                                           417
HTML/CSS;JavaScript;PHP;SQL                                                                                                                                                                                                                          416
C#  

In [15]:
print(survey_2021['LanguageHaveWorkedWith'][0])
print(survey_2021['LanguageHaveWorkedWith'][0].count(';'))

language_df = pd.DataFrame(columns=['Language','Average Salary'])
display(language_df)

    

C++;HTML/CSS;JavaScript;Objective-C;PHP;Swift
5


Unnamed: 0,Language,Average Salary
