# Which countries have the most amount of people working from home?

I'm going to look if there is a difference between each country in the amount of people that work from home.


In [1]:
#import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./survey_results_public.csv')
df.head()

Unnamed: 0,Respondent,Professional,ProgramHobby,Country,University,EmploymentStatus,FormalEducation,MajorUndergrad,HomeRemote,CompanySize,...,StackOverflowMakeMoney,Gender,HighestEducationParents,Race,SurveyLong,QuestionsInteresting,QuestionsConfusing,InterestedAnswers,Salary,ExpectedSalary
0,1,Student,"Yes, both",United States,No,"Not employed, and not looking for work",Secondary school,,,,...,Strongly disagree,Male,High school,White or of European descent,Strongly disagree,Strongly agree,Disagree,Strongly agree,,
1,2,Student,"Yes, both",United Kingdom,"Yes, full-time",Employed part-time,Some college/university study without earning ...,Computer science or software engineering,"More than half, but not all, the time",20 to 99 employees,...,Strongly disagree,Male,A master's degree,White or of European descent,Somewhat agree,Somewhat agree,Disagree,Strongly agree,,37500.0
2,3,Professional developer,"Yes, both",United Kingdom,No,Employed full-time,Bachelor's degree,Computer science or software engineering,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A professional degree,White or of European descent,Somewhat agree,Agree,Disagree,Agree,113750.0,
3,4,Professional non-developer who sometimes write...,"Yes, both",United States,No,Employed full-time,Doctoral degree,A non-computer-focused engineering discipline,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A doctoral degree,White or of European descent,Agree,Agree,Somewhat agree,Strongly agree,,
4,5,Professional developer,"Yes, I program as a hobby",Switzerland,No,Employed full-time,Master's degree,Computer science or software engineering,Never,10 to 19 employees,...,,,,,,,,,,


In [2]:
#get the first 5 countries with more respondents
grouped = df.groupby('Country').count().reset_index()
sortedvalues = grouped.sort_values('Respondent', ascending=False)
countries = sortedvalues['Country'].head()

In [3]:
#get only the country and remote frequency of each respondent
df2 = df[['Country', 'HomeRemote']]

#I decided to drop all the rows in which the HomeRemote column is null because
#we can't really predict the remote frequency of the respondents
df2 = df2.dropna()

In [4]:
#get only the respondents with the countries selected
df2 = df2[df2['Country'].isin(countries)]

#group the dataframe by country and home remote
df3 = df2.groupby(['Country', 'HomeRemote']).size().reset_index(name='Count')
df3

Unnamed: 0,Country,HomeRemote,Count
0,Canada,A few days each month,769
1,Canada,About half the time,60
2,Canada,All or almost all the time (I'm full-time remote),196
3,Canada,It's complicated,100
4,Canada,"Less than half the time, but at least one day ...",149
5,Canada,"More than half, but not all, the time",66
6,Canada,Never,575
7,Germany,A few days each month,1291
8,Germany,About half the time,124
9,Germany,All or almost all the time (I'm full-time remote),225


In [5]:
#sum all the respondents for each country to calculate the percentages later
df4 = df2.groupby('Country').size().reset_index(name='Total')
df4

Unnamed: 0,Country,Total
0,Canada,1915
1,Germany,3650
2,India,4199
3,United Kingdom,3881
4,United States,10016


In [6]:
#store the answers that count towards working from home in a variable
remotetypes = ['A few days each month', 'About half the time', 'All or almost all the time (I\'m full-time remote)', 'Less than half the time, but at least one day each week', 'More than half, but not all, the time']

#get the respondents that answered with one of those types
df5 = df3[df3['HomeRemote'].isin(remotetypes)]
df6 = df5.groupby('Country').sum().reset_index()
df6

Unnamed: 0,Country,Count
0,Canada,1240
1,Germany,2076
2,India,2691
3,United Kingdom,2266
4,United States,7007


In [7]:
#calculate the percentages of respondents working from home of each country
percentages = []
for index, row in df6.iterrows():
    percentages.append(row['Count']/df4[df4['Country'] == row['Country']]['Total'].values[0])
percentages

[0.6475195822454308,
 0.5687671232876712,
 0.6408668730650154,
 0.5838701365627416,
 0.6995806709265175]

In [8]:
#store those percentages in the dataframe
df4['Percentage'] = percentages
df4

Unnamed: 0,Country,Total,Percentage
0,Canada,1915,0.64752
1,Germany,3650,0.568767
2,India,4199,0.640867
3,United Kingdom,3881,0.58387
4,United States,10016,0.699581


In [9]:
#remove the Total column and sort values so the final table looks better
df7 = df4.drop('Total', axis=1)
df7.sort_values('Percentage', ascending=False)

Unnamed: 0,Country,Percentage
4,United States,0.699581
0,Canada,0.64752
2,India,0.640867
3,United Kingdom,0.58387
1,Germany,0.568767


# Summary


Total of respondents working from home:

    - United States: 69.9%
    - Canada: 64.7%
    - India: 64%
    - United Kingdom: 58.3%
    - Germany: 56.8%

Overral we can see that the United States has the most amount of respondents working from home even if it's only a few days each month, with a 69.9%.

Meanwhile Germany has the least amount of respondents working from home with a 56.8%.