## ML Analysis

First, we import the `TensorFlow` and `numpy` libraries.

In [583]:
import tensorflow as tf
import pandas as pd
import numpy as np

We want our neural network to "learn" the relationship between list of inputs and list of outputs

**Example**
Here `x` represents the input array, which is a 2D array with 4 column (input) variables. `y` represents the output variable, which is 1D array with 1 output value per row in the `x` array.

In [584]:
x = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0, 2.0, 1.0, 4.5, 2.3, 6.7, 1.0], dtype=float)
y = np.array([-3.0, -1.0, 1.0, 3.0, 4.0, 2.0, 1.0, 3.0, 4.0, 2.0, 1.0, 2.0], dtype=float)

### Importing data

In [585]:
data = pd.read_csv("data.csv")
data = data.loc[:,['resp_id', 'age', 'gender', 'student', 'wfh_now', 'prod_change']]
data.columns = data.columns.to_series().apply(lambda x: x.strip())
data

Unnamed: 0,resp_id,age,gender,student,wfh_now,prod_change
0,11,44,Male,No,Yes,Increased somewhat
1,29,39,Male,No,Yes,Decreased somewhat
2,30,49,Female,No,Yes,Decreased somewhat
3,31,27,Male,No,Yes,Decreased somewhat
4,34,32,Female,"Yes, Full-time",Yes,In some ways it has increased and in other way...
...,...,...,...,...,...,...
2900,9319,67,Male,No,Question not displayed to respondent,Question not displayed to respondent
2901,9322,47,Female,No,No,Increased somewhat
2902,9323,59,Male,No,No,About the same
2903,9325,60,Female,No,No,Decreased somewhat


### Cleaning up data

We now have to convert the data from text to decimal values, to provide consistency over all the columns which will help the model fit our data.

Gender: 0 Male, 1 Female

Student: 0 Not a student, 1 Student Full time, 2 Student Part time

WFH_Now: 0 No, 1 Yes

Prod_Change: 0 Decreased Significantly, 1 Decreased Somewhat, 2 Both, 3 Same, 4 Increased somewhat, 5 Increased significantly

In [586]:
Genders = {
  'Male' : 0,
  'Female' : 1,
}

Student = {
  'No' : 0,
  'Yes' : 1,
}

WFH_Now = {
  'No' : 0,
  'Yes' : 1,
}

Prod_Change = {
  'Decreased significantly': 0,
  'Decreased somewhat': 1,
  'Both': 2,
  'Same': 3,
  'Increased somewhat': 4,
  'Increased significantly': 5,
}

for index, row in data.iterrows():
  # print(index)
  gender = row['gender']
  student = row['student']
  wfh_now = row['wfh_now']
  prod_change = row['prod_change']

  data.loc[index, 'gender'] = Genders.get(gender, -1)
  data.loc[index, 'student'] = Student.get(student, -1)
  data.loc[index, 'wfh_now'] = WFH_Now.get(wfh_now, -1)
  data.loc[index, 'prod_change'] = Prod_Change.get(prod_change, -1)
  
  # drop all the rows that have missing values (aka have a -1 in any column)
  if (data.loc[index, 'gender'] == -1 or data.loc[index, 'student']==-1 or data.loc[index, 'wfh_now']==-1 or data.loc[index, 'prod_change']==-1):
    data.drop(labels=index, axis=0, inplace=True)

data

Unnamed: 0,resp_id,age,gender,student,wfh_now,prod_change
0,11,44,0,0,1,4
1,29,39,0,0,1,1
2,30,49,1,0,1,1
3,31,27,0,0,1,1
10,46,63,1,0,1,4
...,...,...,...,...,...,...
2887,9280,61,1,0,1,4
2888,9281,65,1,0,1,4
2890,9286,67,0,0,1,4
2901,9322,47,1,0,0,4


We create the simplest possible neural network. It has 1 layer, that layer has 1 neuron, and the input is just 1 value.

In [587]:
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])
model.summary()

Model: "sequential_26"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_162 (Dense)           (None, 1)                 2         
                                                                 
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________


Next, we compile the neural network by specifying a **loss function** and an **optimization algorithm**.

In [588]:
model.compile(optimizer='sgd', loss='mean_squared_error')

Let's now **train** our neural network to fit the data. The neural network will try to guess the relationship between the values in `x` and the values in `y`. The loss function will measure how good or how bad this guess is and, based on this, the optimization algorithm will make another guess. We then repeat this process for a certin number of iterations (`epochs`).

In [589]:
x = data.loc[:,['gender']]
y = data.loc[:,['prod_change']]

x = np.asarray(x).astype('float32')
y = np.asarray(y).astype('float32')

model.fit(x, y, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2e9e360d0>

Finally, let's test our neural network by using it to predict the value of `prod_change` for a previously unseen value of `gender` (for example, `gender=1`). **What do you think the value of `prod_change` will be?** 

In [590]:
print(model.predict([1.0,0.0]))

[[2.346479 ]
 [2.3014915]]


This is a reasonable prediction since we only used binary gender as input and productivity levels (0-5) as output. And we currently assume no correlation between the two categories but we will improve our model but adding more features and improving the neural network.

### Reflection
* The most challenging part of this project is estimating individual productivity since there are many factors that could affect human behvaior, it's hard to draw conclusion, and it's also hard to use all the features from our survey data to simply predict productivity.
* Our initial insights is that workplace productivity is affected when people change scenes and switch between working in person to working from home. And the choice to work from home is fueled by other factors including education, income, job type, etc. 
* We don't have concrete results yet because we cleaned up the data and found out we might not have enough data of a certain feature to use it in the final model. We will have to tweak the model significantly.
* Biggest problems currently facing: Tweaking the neural network, training data and minimizing loss.
* We are on track to completing the project on time.
* Yes it's worth proceeding with the project because we have enough diverse set of features to draw important conclusions affecting individual's (and families) work from home productivity in general.

### Wave 1 Train

In [591]:
wave1 = pd.read_csv("Wave1_train.csv")      # All Data

train = wave1.iloc[:821,:]            # Training Data
validation = wave1.iloc[821:,:]       # Validation Data

test = pd.read_csv("Wave1_test.csv")  # Test Data
train.columns = train.columns.to_series().apply(lambda x: x.strip())
train.shape, validation.shape
train

Unnamed: 0,resp_id,WFH_PRE,Job_Clerical or administrative support,"Job_Manufacturing, construction, maintenance, or farming","Job_Professional, managerial, or technical",Job_Sales or service,Workload_increased,Workload_decreased,Increased_productivity,Decreased_productivity,hhveh_harm,age,gender,Number_bedrooms,Race_white,Gradutae_degree,High_income(LessThan_100K),More_income(LMoreThan_35K),ProEnvironment
0,11,0,0,0,0,0,1,0,1,0,2,44,1,5,1,0,0,0,1
1,29,1,0,0,1,0,0,0,0,1,1,39,1,2,1,1,0,1,1
2,30,0,0,0,1,0,0,0,0,1,4,49,0,4,1,0,0,1,1
3,31,1,0,0,1,0,0,0,0,1,1,27,1,1,1,1,0,1,1
4,34,1,0,0,1,0,0,1,0,0,0,32,0,5,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
816,3710,1,0,0,0,1,0,0,0,0,2,63,1,4,1,1,0,1,1
817,3714,0,1,0,0,0,0,0,0,0,1,62,1,3,1,0,0,0,1
818,3730,0,0,1,0,0,0,1,0,1,2,57,1,2,1,0,0,0,1
819,3733,1,0,0,1,0,0,0,0,0,2,34,0,3,1,1,0,1,0


We create the simplest possible neural network. It has 1 layer, that layer has 1 neuron, and the input is just 1 value.

#### Age Job Type Model

In [592]:
w1_input1 = tf.keras.layers.Input(shape=(1,))
w1_input2 = tf.keras.layers.Input(shape=(1,))
w1_input3 = tf.keras.layers.Input(shape=(1,))
w1_input4 = tf.keras.layers.Input(shape=(1,))
w1_input5 = tf.keras.layers.Input(shape=(1,))

merged = tf.keras.layers.Concatenate(axis=1)([w1_input1, w1_input2, w1_input3, w1_input4, w1_input5])
dense1 = tf.keras.layers.Dense(5, input_dim=1, activation=tf.keras.activations.sigmoid, use_bias=True)(merged)
output = tf.keras.layers.Dense(1, activation=tf.keras.activations.relu, use_bias=True)(dense1)
age_job_category_model = tf.keras.models.Model([w1_input1, w1_input2, w1_input3, w1_input4, w1_input5], output)
age_job_category_model.compile(optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])

age_job_category_model.summary()

Model: "model_68"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_332 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_333 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_334 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_335 (InputLayer)         [(None, 1)]          0           []                               
                                                                                           

In [593]:
# Train data
w1_input1 = train.loc[:,['age']]
w1_input2 = train.loc[:,['Job_Clerical or administrative support']]
w1_input3 = train.loc[:,['Job_Manufacturing, construction, maintenance, or farming']]
w1_input4 = train.loc[:,['Job_Professional, managerial, or technical']]
w1_input5 = train.loc[:,['Job_Sales or service']]

w1_input1 = np.asarray(w1_input1).astype('float32')
w1_input2 = np.asarray(w1_input2).astype('float32')
w1_input3 = np.asarray(w1_input3).astype('float32')
w1_input4 = np.asarray(w1_input4).astype('float32')
w1_input5 = np.asarray(w1_input5).astype('float32')

w1_y = train.loc[:,['WFH_PRE']]
w1_y = np.asarray(w1_y).astype('float32')

# Validation data
v_w1_input1 = validation.loc[:,['age']]
v_w1_input2 = validation.loc[:,['Job_Clerical or administrative support']]
v_w1_input3 = validation.loc[:,['Job_Manufacturing, construction, maintenance, or farming']]
v_w1_input4 = validation.loc[:,['Job_Professional, managerial, or technical']]
v_w1_input5 = validation.loc[:,['Job_Sales or service']]

v_w1_input1 = np.asarray(v_w1_input1).astype('float32')
v_w1_input2 = np.asarray(v_w1_input2).astype('float32')
v_w1_input3 = np.asarray(v_w1_input3).astype('float32')
v_w1_input4 = np.asarray(v_w1_input4).astype('float32')
v_w1_input5 = np.asarray(v_w1_input5).astype('float32')

v_w1_y = validation.loc[:,['WFH_PRE']]
v_w1_y = np.asarray(v_w1_y).astype('float32')

age_job_category_model.fit([w1_input1, w1_input2, w1_input3, w1_input4, w1_input5],w1_y, 
                        batch_size=36, epochs=100, validation_data=([v_w1_input1, v_w1_input2, v_w1_input3, v_w1_input4, v_w1_input5], v_w1_y))


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x2e662bf10>

In [594]:
# Test data
t_w1_input1 = test.loc[:,['age']]
t_w1_input2 = test.loc[:,['Job_Clerical or administrative support']]
t_w1_input3 = test.loc[:,['Job_Manufacturing, construction, maintenance, or farming']]
t_w1_input4 = test.loc[:,['Job_Professional, managerial, or technical']]
t_w1_input5 = test.loc[:,['Job_Sales or service']]

t_w1_input1 = np.asarray(t_w1_input1).astype('float32')
t_w1_input2 = np.asarray(t_w1_input2).astype('float32')
t_w1_input3 = np.asarray(t_w1_input3).astype('float32')
t_w1_input4 = np.asarray(t_w1_input4).astype('float32')
t_w1_input5 = np.asarray(t_w1_input5).astype('float32')

In [595]:
# prediction
age_job_categroy_wave1 = age_job_category_model.predict([t_w1_input1, t_w1_input2, t_w1_input3, t_w1_input4, t_w1_input5])
# age_job_category_model.predict([t_w1_input1, t_w1_input2, t_w1_input3, t_w1_input4, t_w1_input5])



#### Degree or Pro Environment Model

In [596]:
input1 = tf.keras.layers.Input(shape=(1,))
input2 = tf.keras.layers.Input(shape=(1,))
input3 = tf.keras.layers.Input(shape=(1,))
input4 = tf.keras.layers.Input(shape=(1,))
input5 = tf.keras.layers.Input(shape=(1,))

merged = tf.keras.layers.Concatenate(axis=1)([input1, input2, input3, input4, input5])
dense1 = tf.keras.layers.Dense(5, input_dim=1, activation=tf.keras.activations.sigmoid, use_bias=True)(merged)
output = tf.keras.layers.Dense(1, activation=tf.keras.activations.relu, use_bias=True)(dense1)
degree_environment_model = tf.keras.models.Model([input1, input2, input3, input4, input5], output)
degree_environment_model.compile(optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])
degree_environment_model.summary()

Model: "model_69"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_337 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_338 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_339 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_340 (InputLayer)         [(None, 1)]          0           []                               
                                                                                           

In [597]:
# Training Data
input1 = train.loc[:,['age']]
input2 = train.loc[:,['Workload_increased']]
input3 = train.loc[:,['Workload_decreased']]
input4 = train.loc[:,['Gradutae_degree']]
input5 = train.loc[:,['ProEnvironment']]

input1 = np.asarray(input1).astype('float32')
input2 = np.asarray(input2).astype('float32')
input3 = np.asarray(input3).astype('float32')
input4 = np.asarray(input4).astype('float32')
input5 = np.asarray(input5).astype('float32')

y = train.loc[:,['WFH_PRE']]
y = np.asarray(y).astype('float32')

# Validation Data
v_input1 = validation.loc[:,['age']]
v_input2 = validation.loc[:,['Workload_increased']]
v_input3 = validation.loc[:,['Workload_decreased']]
v_input4 = validation.loc[:,['Gradutae_degree']]
v_input5 = validation.loc[:,['ProEnvironment']]

v_input1 = np.asarray(v_input1).astype('float32')
v_input2 = np.asarray(v_input2).astype('float32')
v_input3 = np.asarray(v_input3).astype('float32')
v_input4 = np.asarray(v_input4).astype('float32')
v_input5 = np.asarray(v_input5).astype('float32')

v_y = validation.loc[:,['WFH_PRE']]
v_y = np.asarray(v_y).astype('float32')

degree_environment_model.fit([input1, input2, input3, input4, input5],y, batch_size=36, epochs=100, 
                        validation_data=([v_input1, v_input2, v_input3, v_input4, v_input5], v_y))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x2d13ace80>

In [598]:
# Test data
t2_input1 = test.loc[:,['age']]
t2_input2 = test.loc[:,['Workload_increased']]
t2_input3 = test.loc[:,['Workload_decreased']]
t2_input4 = test.loc[:,['Gradutae_degree']]
t2_input5 = test.loc[:,['ProEnvironment']]

t2_input1 = np.asarray(t2_input1).astype('float32')
t2_input2 = np.asarray(t2_input2).astype('float32')
t2_input3 = np.asarray(t2_input3).astype('float32')
t2_input4 = np.asarray(t2_input4).astype('float32')
t2_input5 = np.asarray(t2_input5).astype('float32')

In [599]:
# prediction
degree_environment_wave1 = degree_environment_model.predict([t2_input1, t2_input2, t2_input3, t2_input4, t2_input5])



### Wave 2 Train

In [613]:
wave2 = pd.read_csv("Wave2_train.csv")      # All Data

WFH_EXPECT = {
  'No' : 0,
  'Yes' : 1,
}

for index, row in wave2.iterrows():
  # print(index)
  wfh_expect = row['wfh_expect']

  wave2.loc[index, 'wfh_expect'] = WFH_EXPECT.get(wfh_expect, -1)


train = wave2.iloc[:684,:]                  # Training Data
validation = wave2.iloc[684:,:]             # Validation Data

test = pd.read_csv("Wave2_test.csv")        # Test Data
train.columns = train.columns.to_series().apply(lambda x: x.strip())
train.shape, validation.shape
train

Unnamed: 0,resp_id,wfh_expect,Job_Clerical or administrative support,"Job_Manufacturing, construction, maintenance, or farming","Job_Professional, managerial, or technical",Job_Sales or service,Workload_increased,Workload_decreased,Increased_productivity,Decreased_productivity,hhveh_harm,age,gender,Number_bedrooms,Race_white,Gradutae_degree,High_income(LessThan_100K),More_income(LMoreThan_35K)
0,1177,1,0,0,1,0,0,0,1,0,2,58,1,3,0,1,0,1
1,1188,1,0,0,1,0,0,0,1,0,1,66,0,2,1,0,0,0
2,1189,1,0,0,0,0,0,0,0,0,0,24,1,3,1,0,1,0
3,1194,1,0,0,1,0,0,0,0,0,1,31,1,3,1,1,0,1
4,1202,1,0,0,0,0,0,0,0,0,0,23,0,2,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
679,4593,0,0,0,0,1,0,1,0,1,4,59,0,3,1,0,0,1
680,4601,1,0,0,1,0,0,0,0,0,2,56,1,3,1,0,0,0
681,4613,1,0,0,0,1,1,0,1,0,2,66,0,4,1,1,0,1
682,4618,1,0,0,1,0,0,1,0,1,1,73,0,3,1,1,0,0


We create the simplest possible neural network. It has 1 layer, that layer has 1 neuron, and the input is just 1 value.

#### Age Job Type Model

In [614]:
input1 = tf.keras.layers.Input(shape=(1,))
input2 = tf.keras.layers.Input(shape=(1,))
input3 = tf.keras.layers.Input(shape=(1,))
input4 = tf.keras.layers.Input(shape=(1,))
input5 = tf.keras.layers.Input(shape=(1,))


merged = tf.keras.layers.Concatenate(axis=1)([input1, input2, input3, input4, input5])
dense1 = tf.keras.layers.Dense(5, input_dim=1, activation=tf.keras.activations.sigmoid, use_bias=True)(merged)
output = tf.keras.layers.Dense(1, activation=tf.keras.activations.relu, use_bias=True)(dense1)
age_job_category_model_wave2 = tf.keras.models.Model([input1, input2, input3, input4, input5], output)
age_job_category_model_wave2.compile(optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])
age_job_category_model_wave2.summary()

Model: "model_72"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_351 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_352 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_353 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_354 (InputLayer)         [(None, 1)]          0           []                               
                                                                                           

In [615]:
# Train data
input1 = train.loc[:,['age']]
input2 = train.loc[:,['Job_Clerical or administrative support']]
input3 = train.loc[:,['Job_Manufacturing, construction, maintenance, or farming']]
input4 = train.loc[:,['Job_Professional, managerial, or technical']]
input5 = train.loc[:,['Job_Sales or service']]

input1 = np.asarray(input1).astype('float32')
input2 = np.asarray(input2).astype('float32')
input3 = np.asarray(input3).astype('float32')
input4 = np.asarray(input4).astype('float32')
input5 = np.asarray(input5).astype('float32')

y = train.loc[:,['wfh_expect']]
y = np.asarray(y).astype('float32')

# Validation data
v_input1 = validation.loc[:,['age']]
v_input2 = validation.loc[:,['Job_Clerical or administrative support']]
v_input3 = validation.loc[:,['Job_Manufacturing, construction, maintenance, or farming']]
v_input4 = validation.loc[:,['Job_Professional, managerial, or technical']]
v_input5 = validation.loc[:,['Job_Sales or service']]

v_input1 = np.asarray(v_input1).astype('float32')
v_input2 = np.asarray(v_input2).astype('float32')
v_input3 = np.asarray(v_input3).astype('float32')
v_input4 = np.asarray(v_input4).astype('float32')
v_input5 = np.asarray(v_input5).astype('float32')

v_y = validation.loc[:,['wfh_expect']]
v_y = np.asarray(v_y).astype('float32')

age_job_category_model_wave2.fit([input1, input2, input3, input4, input5],y, 
                        batch_size=36, epochs=100, validation_data=([v_input1, v_input2, v_input3, v_input4, v_input5], v_y))


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x2dd1b0dc0>

In [616]:
# Test data
t_input1 = test.loc[:,['age']]
t_input2 = test.loc[:,['Job_Clerical or administrative support']]
t_input3 = test.loc[:,['Job_Manufacturing, construction, maintenance, or farming']]
t_input4 = test.loc[:,['Job_Professional, managerial, or technical']]
t_input5 = test.loc[:,['Job_Sales or service']]

t_input1 = np.asarray(t_input1).astype('float32')
t_input2 = np.asarray(t_input2).astype('float32')
t_input3 = np.asarray(t_input3).astype('float32')
t_input4 = np.asarray(t_input4).astype('float32')
t_input5 = np.asarray(t_input5).astype('float32')

In [617]:
# prediction
age_job_categroy_wave2 = age_job_category_model_wave2.predict([t_input1, t_input2, t_input3, t_input4, t_input5])



#### Degree or Pro Environment Model

In [618]:
input1 = tf.keras.layers.Input(shape=(1,))
input2 = tf.keras.layers.Input(shape=(1,))
input3 = tf.keras.layers.Input(shape=(1,))
input4 = tf.keras.layers.Input(shape=(1,))

merged = tf.keras.layers.Concatenate(axis=1)([input1, input2, input3, input4])
dense1 = tf.keras.layers.Dense(4, input_dim=1, activation=tf.keras.activations.sigmoid, use_bias=True)(merged)
output = tf.keras.layers.Dense(1, activation=tf.keras.activations.relu, use_bias=True)(dense1)
degree_environment_model_wave2 = tf.keras.models.Model([input1, input2, input3, input4], output)
degree_environment_model_wave2.compile(optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])
degree_environment_model_wave2.summary()

Model: "model_73"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_356 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_357 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_358 (InputLayer)         [(None, 1)]          0           []                               
                                                                                                  
 input_359 (InputLayer)         [(None, 1)]          0           []                               
                                                                                           

In [619]:
# Training Data
input1 = train.loc[:,['age']]
input2 = train.loc[:,['Workload_increased']]
input3 = train.loc[:,['Workload_decreased']]
input4 = train.loc[:,['Gradutae_degree']]

input1 = np.asarray(input1).astype('float32')
input2 = np.asarray(input2).astype('float32')
input3 = np.asarray(input3).astype('float32')
input4 = np.asarray(input4).astype('float32')

y = train.loc[:,['wfh_expect']]
y = np.asarray(y).astype('float32')

# Validation Data
v_input1 = validation.loc[:,['age']]
v_input2 = validation.loc[:,['Workload_increased']]
v_input3 = validation.loc[:,['Workload_decreased']]
v_input4 = validation.loc[:,['Gradutae_degree']]

v_input1 = np.asarray(v_input1).astype('float32')
v_input2 = np.asarray(v_input2).astype('float32')
v_input3 = np.asarray(v_input3).astype('float32')
v_input4 = np.asarray(v_input4).astype('float32')

v_y = validation.loc[:,['wfh_expect']]
v_y = np.asarray(v_y).astype('float32')

degree_environment_model_wave2.fit([input1, input2, input3, input4],y, batch_size=36, epochs=100, 
                        validation_data=([v_input1, v_input2, v_input3, v_input4], v_y))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x2e826e250>

In [620]:
# Test data
t_input1 = test.loc[:,['age']]
t_input2 = test.loc[:,['Workload_increased']]
t_input3 = test.loc[:,['Workload_decreased']]
t_input4 = test.loc[:,['Gradutae_degree']]

t_input1 = np.asarray(t_input1).astype('float32')
t_input2 = np.asarray(t_input2).astype('float32')
t_input3 = np.asarray(t_input3).astype('float32')
t_input4 = np.asarray(t_input4).astype('float32')

In [621]:
# prediction
degree_environment_wave2 = degree_environment_model_wave2.predict([t_input1, t_input2, t_input3, t_input4])



### Plots

Here are the numpy arrays for the results of the prediction, it's only wave 1 for now:

In [622]:
age_job_categroy_wave1

array([[0.49045208],
       [0.4901969 ],
       [0.4909247 ],
       [0.486573  ],
       [0.4887146 ],
       [0.4901969 ],
       [0.4846168 ],
       [0.4910452 ],
       [0.49113742],
       [0.49107912],
       [0.49103054],
       [0.49107912],
       [0.4909601 ],
       [0.4911473 ],
       [0.49009648],
       [0.4909247 ],
       [0.4857212 ],
       [0.48935583],
       [0.4911591 ],
       [0.49113742],
       [0.49103054],
       [0.48950985],
       [0.4909247 ],
       [0.49089178],
       [0.49109134],
       [0.49110195],
       [0.4887146 ],
       [0.48837855],
       [0.49004564],
       [0.49100932],
       [0.48776665],
       [0.4901969 ],
       [0.4910651 ],
       [0.4911112 ],
       [0.4900566 ],
       [0.4909247 ],
       [0.49115488],
       [0.49088785],
       [0.48862258],
       [0.49060574],
       [0.49074176],
       [0.4909849 ],
       [0.48989704],
       [0.49100932],
       [0.49111918],
       [0.48800066],
       [0.49042812],
       [0.491

In [623]:
degree_environment_wave1

array([[0.4874366 ],
       [0.4870615 ],
       [0.4877552 ],
       [0.4798993 ],
       [0.48575792],
       [0.4870615 ],
       [0.47700918],
       [0.48777547],
       [0.48781624],
       [0.48776588],
       [0.48778644],
       [0.48776588],
       [0.48773542],
       [0.48781833],
       [0.487127  ],
       [0.4877077 ],
       [0.4846481 ],
       [0.48660582],
       [0.48782   ],
       [0.48780903],
       [0.4877505 ],
       [0.48615637],
       [0.48772153],
       [0.48549968],
       [0.48780796],
       [0.48780385],
       [0.48571512],
       [0.48605603],
       [0.48676008],
       [0.48776403],
       [0.4848277 ],
       [0.4870615 ],
       [0.4878026 ],
       [0.4878122 ],
       [0.48644176],
       [0.4877552 ],
       [0.48782107],
       [0.4877404 ],
       [0.4823479 ],
       [0.4874755 ],
       [0.48766166],
       [0.4877912 ],
       [0.48669627],
       [0.48776403],
       [0.48781124],
       [0.48339912],
       [0.48730853],
       [0.487

In [624]:
age_job_categroy_wave2

array([[0.5052856 ],
       [0.5052857 ],
       [0.50528556],
       [0.5052927 ],
       [0.50528693],
       [0.5052857 ],
       [0.5053029 ],
       [0.50528556],
       [0.50530106],
       [0.50528556],
       [0.5052892 ],
       [0.50528556],
       [0.50528556],
       [0.50528556],
       [0.50528556],
       [0.50528556],
       [0.5052857 ],
       [0.50528556],
       [0.50530106],
       [0.50528616],
       [0.50528556],
       [0.50528556],
       [0.50528556],
       [0.50528556],
       [0.50528604],
       [0.5052957 ],
       [0.50528556],
       [0.50528556],
       [0.50528693],
       [0.5052875 ],
       [0.50528663],
       [0.50528556],
       [0.5052892 ],
       [0.5052857 ],
       [0.50528556],
       [0.50528556],
       [0.50528574],
       [0.50528556],
       [0.50528556],
       [0.50528556],
       [0.50530434],
       [0.5052856 ],
       [0.50528556],
       [0.5052858 ],
       [0.50528556],
       [0.50528556],
       [0.5052883 ],
       [0.505

In [625]:
degree_environment_wave2

array([[0.5009407 ],
       [0.49900123],
       [0.50068325],
       [0.49242285],
       [0.50065035],
       [0.49900123],
       [0.5020136 ],
       [0.50068325],
       [0.49305603],
       [0.50152445],
       [0.49566665],
       [0.5014489 ],
       [0.5013257 ],
       [0.5014489 ],
       [0.50088334],
       [0.5014166 ],
       [0.5007222 ],
       [0.5011673 ],
       [0.502093  ],
       [0.5002492 ],
       [0.50158566],
       [0.5015589 ],
       [0.50154346],
       [0.50137955],
       [0.4975479 ],
       [0.4950705 ],
       [0.5012981 ],
       [0.5014513 ],
       [0.49601772],
       [0.50084764],
       [0.49720916],
       [0.5010379 ],
       [0.49601772],
       [0.49900123],
       [0.50115496],
       [0.5014694 ],
       [0.5004518 ],
       [0.50068325],
       [0.501533  ],
       [0.5005686 ],
       [0.50282896],
       [0.49995723],
       [0.50092405],
       [0.49834815],
       [0.5010379 ],
       [0.5013947 ],
       [0.49474463],
       [0.499