# The future of employment

How susceptible are jobs to computerisation? 

> We examine how susceptible jobs are to computerisation. To assess this, we begin by implementing a novel methodology to estimate the probability of computerisation for 702 detailed occupations, using a **Gaussian process classifier**. Based on these estimates, we examine expected impacts of future computerisation on US labour market outcomes, with the primary objective of analysing the number of jobs at risk and the relationship between an occupations probability of computerisation, wages and educational attainment.

C. Frey, M. Osborne  *The future of employment: How susceptible are jobs to computerisation?* Technological Forecasting & Social Change 114 (2017) 254–280 


# GPy


The Gaussian processes framework in Python. https://github.com/SheffieldML/GPy

In [14]:
!pip install --upgrade GPy

Collecting GPy
[?25l  Downloading https://files.pythonhosted.org/packages/4a/84/91dc7d63fa32d83a799d32071d56fe481bc1ce6b090509999e3463bfeeea/GPy-1.9.9-cp37-cp37m-macosx_10_9_x86_64.whl (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 8.2kB/s ta 0:00:016
Collecting paramz>=0.9.0 (from GPy)
[?25l  Downloading https://files.pythonhosted.org/packages/d8/37/4abbeb78d30f20d3402887f46e6e9f3ef32034a9dea65d243654c82c8553/paramz-0.9.5.tar.gz (71kB)
[K    100% |████████████████████████████████| 71kB 25kB/s ta 0:00:0101
Building wheels for collected packages: paramz
  Building wheel for paramz (setup.py) ... [?25ldone
[?25h  Stored in directory: /Users/datalab/Library/Caches/pip/wheels/c8/4a/0e/6e0dc85541825f991c431619e25b870d4b812c911214690cf8
Successfully built paramz
Installing collected packages: paramz, GPy
Successfully installed GPy-1.9.9 paramz-0.9.5


In [15]:
import pandas as pd
import numpy as np
import pylab as plt
import seaborn as sns
import GPy

In [24]:
df = pd.read_csv('../data/jobdata.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,soc,Element Name,id,label,Data Value,computerization
0,0,11-1011,Assisting and Caring for Others,70,0,2.205,0.015
1,1,11-1011,"Cramped Work Space, Awkward Positions",70,0,1.415,0.015
2,2,11-1011,Fine Arts,70,0,0.915,0.015
3,3,11-1011,Finger Dexterity,70,0,2.0,0.015
4,4,11-1011,Manual Dexterity,70,0,0.0,0.015


In [28]:
data_list=list(df['Data Value'])
X=[]
for i in range(0,585,9):
    list1=data_list[i:i+9]
    X.append(list1)
X=np.array(X)

len(X)

65

In [29]:
data_list1=list(df['label'])
Y=[]
for i in range(0,585,9):
    list1=data_list1[i]
    Y.append(list1)
Y=np.array(Y)
Y=Y[:,np.newaxis]
Y[:3]

array([[0],
       [0],
       [0]])

In [37]:
kernel = GPy.kern.RBF(input_dim=9, variance=1., lengthscale=1.)
m = GPy.models.GPRegression(X,Y,kernel)
m.optimize(messages=False)

<paramz.optimization.optimization.opt_lbfgsb at 0x105b9c898>

In [38]:
print(m)


Name : GP regression
Objective : 28.130643010540453
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
  [1mGP_regression.         [0;0m  |               value  |  constraints  |  priors
  [1mrbf.variance           [0;0m  |  0.3113242734729479  |      +ve      |        
  [1mrbf.lengthscale        [0;0m  |   3.933616340596464  |      +ve      |        
  [1mGaussian_noise.variance[0;0m  |  0.0964434513555219  |      +ve      |        


In [36]:
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score 

X1, X2, y1, y2 = train_test_split(X, Y, random_state=0,
                                  train_size=0.6, test_size = 0.4)
m = GPy.models.GPRegression(X1,y1,kernel)
m.optimize(messages=False)
y2_model = m.predict(X2)[0]

for i in range(len(y2_model)):
        if y2_model[i]>0.5:
            y2_model[i]=1
        else:
            y2_model[i]=0
    
accuracy_score(y2, y2_model)

0.9230769230769231

End.