## Overview

In this project, we will conduct a data analysis using the Stack Overflow Annual Developer Survey dataset. Our goal is to answer the following questions:

1. In which country do developers earn the most?
2. How important is remote working for workers?
3. How much influence does coding experience have on salary?
4. Do individuals with a master's degree have a better chance of securing a job as a developer?

In [59]:
from locale import currency

import pandas as pd

df = pd.read_csv('dataset/survey_results_public.csv')

### In which country do developers earn the most?
First I look for a column that gives me information about the salary

In [60]:
df['CompTotal'].head()

0          NaN
1     285000.0
2     250000.0
3     156000.0
4    1320000.0
Name: CompTotal, dtype: float64

This column shows us total income. However, these are given in different currencies. I need information about which currency it is.

In [61]:
df['Currency'].head()

0                          NaN
1    USD\tUnited States dollar
2    USD\tUnited States dollar
3    USD\tUnited States dollar
4         PHP\tPhilippine peso
Name: Currency, dtype: object

This column can be used to determine which currency it is. We want a uniform result, so we will convert the total earnings into euros.
Rows that have no values ​​in one of the two columns are deleted

In [62]:
# Delete row where is NA in column 'CompTotal' or 'Currency'
df = df.dropna(subset=['CompTotal', 'Currency'])

# Delete ow where is empty in column 'CompTotal' or Currency
df = df[(df['CompTotal'] != '') & (df['Currency'] != '')]

Now we have to convert the values into euros if they are not already in euros.
First I change the 'Currency' column so that only the part remains that I need for my conversion

In [63]:
df['Currency'] = df['Currency'].apply(lambda x: x.strip().split("\t")[0])
df['Currency']

1        USD
2        USD
3        USD
4        PHP
5        GBP
        ... 
89175    USD
89177    MXN
89178    USD
89179    BRL
89183    IRR
Name: Currency, Length: 48225, dtype: object

I see that it didn't work for EUR. After investigating, I found that the separator here is not a tab but a simple space

In [64]:
df['Currency'] = df['Currency'].apply(lambda x: x.split(" ")[0])
df['Currency']

1        USD
2        USD
3        USD
4        PHP
5        GBP
        ... 
89175    USD
89177    MXN
89178    USD
89179    BRL
89183    IRR
Name: Currency, Length: 48225, dtype: object

Now we convert and save the values in a new column

In [65]:
import requests

url_api = 'https://api.exchangerate-api.com/v4/latest/EUR'

response = requests.get(url_api)
data = response.json()

rates = data['rates']

def convert_to_euro(currency, value):
    if currency in rates:
        return value / rates[currency]
    else:
        return None
    
df['CompTotal in EUR'] = df.apply(lambda row: convert_to_euro(row['Currency'], row['CompTotal']), axis=1)

Adjusting the column order

In [66]:
columns = df.columns.tolist()
columns

['ResponseId',
 'Q120',
 'MainBranch',
 'Age',
 'Employment',
 'RemoteWork',
 'CodingActivities',
 'EdLevel',
 'LearnCode',
 'LearnCodeOnline',
 'LearnCodeCoursesCert',
 'YearsCode',
 'YearsCodePro',
 'DevType',
 'OrgSize',
 'PurchaseInfluence',
 'TechList',
 'BuyNewTool',
 'Country',
 'Currency',
 'CompTotal',
 'LanguageHaveWorkedWith',
 'LanguageWantToWorkWith',
 'DatabaseHaveWorkedWith',
 'DatabaseWantToWorkWith',
 'PlatformHaveWorkedWith',
 'PlatformWantToWorkWith',
 'WebframeHaveWorkedWith',
 'WebframeWantToWorkWith',
 'MiscTechHaveWorkedWith',
 'MiscTechWantToWorkWith',
 'ToolsTechHaveWorkedWith',
 'ToolsTechWantToWorkWith',
 'NEWCollabToolsHaveWorkedWith',
 'NEWCollabToolsWantToWorkWith',
 'OpSysPersonal use',
 'OpSysProfessional use',
 'OfficeStackAsyncHaveWorkedWith',
 'OfficeStackAsyncWantToWorkWith',
 'OfficeStackSyncHaveWorkedWith',
 'OfficeStackSyncWantToWorkWith',
 'AISearchHaveWorkedWith',
 'AISearchWantToWorkWith',
 'AIDevHaveWorkedWith',
 'AIDevWantToWorkWith',
 'NEWSO

I copied the list and I'm now changing the order

In [67]:
new_order = ['ResponseId',
 'Q120',
 'MainBranch',
 'Age',
 'Employment',
 'RemoteWork',
 'CodingActivities',
 'EdLevel',
 'LearnCode',
 'LearnCodeOnline',
 'LearnCodeCoursesCert',
 'YearsCode',
 'YearsCodePro',
 'DevType',
 'OrgSize',
 'PurchaseInfluence',
 'TechList',
 'BuyNewTool',
 'Country',
 'Currency',
 'CompTotal',
 'CompTotal in EUR',            
 'LanguageHaveWorkedWith',
 'LanguageWantToWorkWith',
 'DatabaseHaveWorkedWith',
 'DatabaseWantToWorkWith',
 'PlatformHaveWorkedWith',
 'PlatformWantToWorkWith',
 'WebframeHaveWorkedWith',
 'WebframeWantToWorkWith',
 'MiscTechHaveWorkedWith',
 'MiscTechWantToWorkWith',
 'ToolsTechHaveWorkedWith',
 'ToolsTechWantToWorkWith',
 'NEWCollabToolsHaveWorkedWith',
 'NEWCollabToolsWantToWorkWith',
 'OpSysPersonal use',
 'OpSysProfessional use',
 'OfficeStackAsyncHaveWorkedWith',
 'OfficeStackAsyncWantToWorkWith',
 'OfficeStackSyncHaveWorkedWith',
 'OfficeStackSyncWantToWorkWith',
 'AISearchHaveWorkedWith',
 'AISearchWantToWorkWith',
 'AIDevHaveWorkedWith',
 'AIDevWantToWorkWith',
 'NEWSOSites',
 'SOVisitFreq',
 'SOAccount',
 'SOPartFreq',
 'SOComm',
 'SOAI',
 'AISelect',
 'AISent',
 'AIAcc',
 'AIBen',
 'AIToolInterested in Using',
 'AIToolCurrently Using',
 'AIToolNot interested in Using',
 'AINextVery different',
 'AINextNeither different nor similar',
 'AINextSomewhat similar',
 'AINextVery similar',
 'AINextSomewhat different',
 'TBranch',
 'ICorPM',
 'WorkExp',
 'Knowledge_1',
 'Knowledge_2',
 'Knowledge_3',
 'Knowledge_4',
 'Knowledge_5',
 'Knowledge_6',
 'Knowledge_7',
 'Knowledge_8',
 'Frequency_1',
 'Frequency_2',
 'Frequency_3',
 'TimeSearching',
 'TimeAnswering',
 'ProfessionalTech',
 'Industry',
 'SurveyLength',
 'SurveyEase',
 'ConvertedCompYearly']

df = df[new_order]

Now we can compare salaries consistently