# Setup of Noteboook

The follwing code clones the github repository with course files. 
Subsequently it imports all libraries and custom modules needed for this notebook

In [2]:
!git clone https://github.com/DataHow/analytics-course-scripts.git

Cloning into 'analytics-course-scripts'...
remote: Enumerating objects: 454, done.[K
remote: Counting objects: 100% (14/14), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 454 (delta 5), reused 11 (delta 3), pack-reused 440[K
Receiving objects: 100% (454/454), 14.54 MiB | 21.64 MiB/s, done.
Resolving deltas: 100% (276/276), done.


In [3]:
import pandas as pd
import numpy as np
import scipy
import copy
import importlib  
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import warnings
import importlib  
warnings.filterwarnings("ignore")

# import custom modules
simulator = importlib.import_module("analytics-course-scripts.scripts.modules.simulator")
modelhelpers = importlib.import_module("analytics-course-scripts.scripts.modules.modelhelpers")
plothelpers = importlib.import_module("analytics-course-scripts.scripts.modules.plothelpers")

## What is a learnable task?

Informally, task $𝒯$ is: dataset $𝔇$, loss function $𝓛$ $\;\; ⟹ \;\; $ model $f_θ$

with a learing goal of: $ \min_{ \, \theta} 𝓛(θ,𝔇) $

Different tasks can vary based on:
* different objects
* different people
* different objectives
* different conditions
* different ... molecules, processes, products, projects

*The bad news*: Different tasks need to share some structure. If this doesn't hold, you're better off using single-task learning.

*The good news*: There are many tasks with shared structure. The laws of physiscs that underly real data. People or organisims behaviour and intentions, etc..


## Problem Definitions

**The multi-task learning problem:** Learn *a set of tasks* more quickly or more proficiently than learning them independently.

**The transfer learning problem:** Given data on previous task(s), learn *a new task* more quickly and/or more proficiently


But doesn't multi-task learning reduce to single-task learning?

$$𝔇 = 𝚄 \:𝔇_i \;\; and \;\; 𝓛 = Σ\:𝓛_i $$


... Yes, it can!

Aggregating the data across tasks & learning a single model is one approach to multi-task learning.

But, ...

What if you want ot learn new tasks?

How do we tell the model what tasks to do?

What if aggregating doesn't work?

## How to distinguish between tasks?

Define a **task descriptor $z_i$**

So instead of $f_\theta(y|x) $, we learn $f_\theta(y|x,z_i) $

task descriptor **can be one-hot encoding** of the task index or, whatever meta-data you have. (personalization using features/attributes, description, or formal specification)

Remaining decisions on:
*  **the model** - how should the model be conditioned on $z_i$?, What parameters of the model should be shared? 

Choosing how to condition on $z_i$ is equivalent to choosing how and where to share parameters. Eg, multiplicative gating gives independent training with no shared parameters. Concatenation gives all parameters shared except the parameters directly following $z_i$

* **the objective** - how should the objective be formed?

Equally weight all tasks, or weight them differently. Decide manually based on importance or priority. Possibility to also dynamically adjust throughout training, various heuristics such as to encourage gradients in NN to have similar magnitudes, or optimize for the worst-case task loss.

* **the optimization** - how should the objective be optimized?

Ensure that the tasks are sampled roughly uniformly, regardless of data quantities. For regression make sure that *task labels are on the same scale!*


## What is transfer learning?

Solve target task  $𝒯_b$ after solving source task(s)  $𝒯_a$, by transferring knowledge learned from  $𝒯_a$.

Transfer learning is a valid solution to multi-task learning, but not vice versa.

What are some problems/applications where transfer learning might make sense?

* When $𝔇_a$ is very large
* When you don't care about solving $𝒯_a$ and $𝒯_b$ simultaneously
* Transfer learning via fine-tuning from pretrained models



# Import or Generate Dataset

In the beginning of this notebook, instead of importing we generate several different datasets.
 

In [33]:
""" Use previously generated data as a "Task A" from which we want to transfer knowledge"""
# Define filename to import
FILENAME = "owu.csv"
FILEPATH = "/content/analytics-course-scripts/scripts/datasets/"
# Import OWU data
A_doe = pd.read_csv(FILEPATH+FILENAME.replace(".csv","_doe.csv"),index_col=None, usecols =["feed_start","feed_end","Glc_feed_rate","Glc_0","VCD_0"])
A_owu = pd.read_csv(FILEPATH+FILENAME,index_col=None, usecols = ["X:VCD", "X:Glc", "X:Lac", "X:Titer","W:Feed"])
A_owu.index = pd.MultiIndex.from_product([list(range(int(len(A_owu)/15))),list(range(15))], names=["run","time"])
A_bwu = simulator.generate_bwu(A_owu)
A_tar = simulator.generate_y(A_bwu, return_aggr=True)

In [85]:
""" Generate new data as a "Task B" which we want to solve """
# DOE Dataset definition (variable = [lower bound, upper bound]) 
# Model parameters: Dictate the behaviour of the cell process
MU_G_MAX = 0.05   # 0.05
MU_D_MAX = 0.027  # 0.025
K_G_GLC  = 1      # 1
K_I_LAC  = 20     # 30
K_D_LAC  = 50     # 50
K_GLC    = 0.05   # 0.04
K_LAC    = 0.06   # 0.06
K_PROD   = 1      # 1
""" Train Set """
# Process parameters: Conditions at which process is run
FEED_START = [1, 4]       # [1, 4]
FEED_END = [8, 12]        # [8, 12]
GLC_FEED_RATE = [10,15]   # [5, 20]
GLC_0 = [30, 60.0]        # [10, 80.0]
VCD_0 = [0.1, 0.5]        # [0.1, 1.0]
# Collect parameters to dictionary
VAR_LIMS = {"mu_g_max":MU_G_MAX, "mu_d_max": MU_D_MAX, "K_g_Glc" : K_G_GLC, "K_I_Lac" : K_I_LAC, "K_d_Lac" : K_D_LAC, "k_Glc" : K_GLC, "k_Lac" : K_LAC, "k_Prod" : K_PROD,
    "feed_start" : FEED_START,"feed_end" : FEED_END, "Glc_feed_rate" : GLC_FEED_RATE, "Glc_0" : GLC_0, "VCD_0" : VCD_0}
# Number of experiments to generate
NUM_RUNS = 10
# Filename and filepath for the dataset 
FILENAME = "owu.csv"
FILEPATH = "/content/"
# Generate Dataset
data = simulator.generate_data(VAR_LIMS, NUM_RUNS, FILENAME)
# Import DOE and OWU
B_doe = pd.read_csv(FILEPATH+FILENAME.replace(".csv","_doe.csv"),index_col=None, usecols =["feed_start","feed_end","Glc_feed_rate","Glc_0","VCD_0"])
B_owu = pd.read_csv(FILEPATH+FILENAME,index_col=None, usecols = ["X:VCD", "X:Glc", "X:Lac", "X:Titer","W:Feed"])
B_owu.index = pd.MultiIndex.from_product([list(range(NUM_RUNS)),list(range(15))], names=["run","time"])
B_bwu = simulator.generate_bwu(B_owu)
B_tar = simulator.generate_y(B_bwu, return_aggr=True)
""" Test Set """
# Process parameters: Conditions at which process is run
FEED_START = [1, 4]       # [1, 4]
FEED_END = [8, 12]        # [8, 12]
GLC_FEED_RATE = [5,20]   # [5, 20]
GLC_0 = [10, 80.0]        # [10, 80.0]
VCD_0 = [0.1, 1.0]        # [0.1, 1.0]
# Collect parameters to dictionary
VAR_LIMS = {"mu_g_max":MU_G_MAX, "mu_d_max": MU_D_MAX, "K_g_Glc" : K_G_GLC, "K_I_Lac" : K_I_LAC, "K_d_Lac" : K_D_LAC, "k_Glc" : K_GLC, "k_Lac" : K_LAC, "k_Prod" : K_PROD,
    "feed_start" : FEED_START,"feed_end" : FEED_END, "Glc_feed_rate" : GLC_FEED_RATE, "Glc_0" : GLC_0, "VCD_0" : VCD_0}
# Number of experiments to generate
NUM_RUNS_TEST = 100
# Filename and filepath for the dataset 
FILENAME = "owu_test.csv"
FILEPATH = "/content/"
# Generate Dataset
data_test = simulator.generate_data(VAR_LIMS, NUM_RUNS_TEST, FILENAME)
# Import DOE and OWU
B_doe_test = pd.read_csv(FILEPATH+FILENAME.replace(".csv","_doe.csv"),index_col=None, usecols =["feed_start","feed_end","Glc_feed_rate","Glc_0","VCD_0"])
B_owu_test = pd.read_csv(FILEPATH+FILENAME,index_col=None, usecols = ["X:VCD", "X:Glc", "X:Lac", "X:Titer","W:Feed"])
B_owu_test.index = pd.MultiIndex.from_product([list(range(NUM_RUNS_TEST)),list(range(15))], names=["run","time"])
B_bwu_test = simulator.generate_bwu(B_owu_test)
B_tar_test = simulator.generate_y(B_bwu_test, return_aggr=True)


## Historical PLS model on BWU matrix

Here the BWU matrix is created. The vaues of the manipulated variables are added as columns at the beginning of the matrix. The following pre-processing was performed.

* Remove titer (and also lactate)
* Remove exceeding days
* Eliminate invariant columns
* Remove linearly dependent columns
* Add process parameters at the beginning
* Create a PLS model from the initial design to the final titer

In [86]:
""" Number of days of process history """
USE_DAYS = 10
""" Remove variables """
REMOVE_VARS = ['W:Feed','X:Titer','X:Lac'] # X:Titer, X:Lac, X:Glc, X:VCD. W:Feed
""" Normalize data """
USE_NORM = True
""" Number of latent variables """
N_LV = 6

In [87]:
def preprocess_bwu(bwu,doe,use_days=USE_DAYS,remove_vars=REMOVE_VARS):
    remove_columns = []
    # Remove Variables
    for ivar in ['X:Titer','X:Lac','X:Glc','X:VCD','W:Feed']:
        if ivar in REMOVE_VARS:
            remove_columns.extend([c for c in bwu.columns if c.startswith(ivar)])
    # Remove History
    for d in range(USE_DAYS,15):
        remove_columns.extend([c for c in bwu.columns if c.endswith(':0')])
        remove_columns.extend([c for c in bwu.columns if c.endswith(":"+str(d))])
    # Remove Invariant 
    remove_columns.extend(list(bwu.columns[~(bwu != bwu.iloc[0]).any().values]))
    # Add and remove columns
    X_preproc = pd.concat([doe, bwu.drop(set(remove_columns),axis=1)], axis=1)
    return X_preproc

## Single Task B model

Here in single task model we just look at the predictive performance of the model trained on task B data only. Meaning it has only 10 experiments available with some variables variability restricted, which would make it hard to generalize over the large test set of 100 experiments.

In [88]:
# Define Pipeline
pscaler = StandardScaler(with_mean=USE_NORM,with_std=USE_NORM)
pls_bwu = PLSRegression(n_components=N_LV)
pipe = Pipeline([('scaler', pscaler), ('pls', pls_bwu)])
# Train PLS model
y = B_tar
X = preprocess_bwu(B_bwu, B_doe, USE_DAYS, REMOVE_VARS)
X_columns = X.columns
print(X_columns)
pipe.fit(X,y)

# Make predictions
yhat = pipe.predict(X)
X_test = preprocess_bwu(B_bwu_test, B_doe_test, USE_DAYS, REMOVE_VARS)
y_test = B_tar_test
yhat_test = pipe.predict(X_test)

# Calculate error metrics
train_r2 = round(pipe.score(X,y),3)
train_abs_rmse = round(mean_squared_error(y, yhat,squared=False),3)
train_rel_rmse = round(mean_squared_error(y, yhat,squared=False) / np.std(np.array(y)),3)
test_r2 = round(pipe.score(X_test,y_test),3)
test_abs_rmse = round(mean_squared_error(y_test, yhat_test,squared=False),3)
test_rel_rmse = round(mean_squared_error(y_test, yhat_test,squared=False) / np.std(np.array(y_test)),3)
scen_1 = test_rel_rmse

# Plot observed vs predicted
fig = make_subplots(rows=1, cols=2, subplot_titles=(
    f"Train Set <br> R^2 = {train_r2} <br> Abs RMSE = {train_abs_rmse} <br> Rel RMSE = {train_rel_rmse}" ,
    f"Test Set <br> R^2 = {test_r2} <br> Abs RMSE = {test_abs_rmse} <br> Rel RMSE = {test_rel_rmse}"))
# Train set plot
fig.add_trace(go.Scatter(x=y.values.reshape(-1),y=yhat.reshape(-1),mode="markers"),row=1,col=1)
fig.add_shape(type="line",x0=min(yhat)[0],y0=min(yhat)[0],x1=max(yhat)[0],y1=max(yhat)[0], layer='below', line=dict(dash='dash'))
# Test set plot
fig.add_trace(go.Scatter(x=y_test.values.reshape(-1),y=yhat_test.reshape(-1),mode="markers"),row=1,col=2)
fig.add_shape(type="line",x0=min(yhat_test)[0],y0=min(yhat_test)[0],x1=max(yhat_test)[0],y1=max(yhat_test)[0], layer='below', line=dict(dash='dash'),row=1,col=2)
fig.update_layout(title_text = "Observed vs Predicted",showlegend=False)
fig.show()


Index(['feed_start', 'feed_end', 'Glc_feed_rate', 'Glc_0', 'VCD_0', 'X:VCD:1',
       'X:VCD:2', 'X:VCD:3', 'X:VCD:4', 'X:VCD:5', 'X:VCD:6', 'X:VCD:7',
       'X:VCD:8', 'X:VCD:9', 'X:Glc:1', 'X:Glc:2', 'X:Glc:3', 'X:Glc:4',
       'X:Glc:5', 'X:Glc:6', 'X:Glc:7', 'X:Glc:8', 'X:Glc:9'],
      dtype='object')


## Transfer learning over Task A & B without one-hot encoding

In [89]:
# Define Pipeline
pscaler = StandardScaler(with_mean=USE_NORM,with_std=USE_NORM)
pls_bwu = PLSRegression(n_components=N_LV)
pipe = Pipeline([('scaler', pscaler), ('pls', pls_bwu)])
# Train PLS model
y = pd.concat([B_tar,A_tar])
X = pd.concat([preprocess_bwu(B_bwu, B_doe, USE_DAYS, REMOVE_VARS),preprocess_bwu(A_bwu, A_doe, USE_DAYS, REMOVE_VARS)],axis=0)
#X['Task:B'] = np.concatenate((np.repeat(1,len(B_tar)),np.repeat(0,len(A_tar))),axis=None)
X_columns = X.columns
print(X_columns)
pipe.fit(X,y)

# Make predictions
yhat = pipe.predict(X)
X_test = preprocess_bwu(B_bwu_test, B_doe_test, USE_DAYS, REMOVE_VARS)
#X_test['Task:B'] = np.repeat(1,len(B_tar_test))
y_test = B_tar_test
yhat_test = pipe.predict(X_test)

# Calculate error metrics
train_r2 = round(pipe.score(X,y),3)
train_abs_rmse = round(mean_squared_error(y, yhat,squared=False),3)
train_rel_rmse = round(mean_squared_error(y, yhat,squared=False) / np.std(np.array(y)),3)
test_r2 = round(pipe.score(X_test,y_test),3)
test_abs_rmse = round(mean_squared_error(y_test, yhat_test,squared=False),3)
test_rel_rmse = round(mean_squared_error(y_test, yhat_test,squared=False) / np.std(np.array(y_test)),3)
scen_2 = test_rel_rmse

# Plot observed vs predicted
fig = make_subplots(rows=1, cols=2, subplot_titles=(
    f"Train Set <br> R^2 = {train_r2} <br> Abs RMSE = {train_abs_rmse} <br> Rel RMSE = {train_rel_rmse}" ,
    f"Test Set <br> R^2 = {test_r2} <br> Abs RMSE = {test_abs_rmse} <br> Rel RMSE = {test_rel_rmse}"))
# Train set plot
fig.add_trace(go.Scatter(x=y.values.reshape(-1),y=yhat.reshape(-1),mode="markers"),row=1,col=1)
fig.add_shape(type="line",x0=min(yhat)[0],y0=min(yhat)[0],x1=max(yhat)[0],y1=max(yhat)[0], layer='below', line=dict(dash='dash'))
# Test set plot
fig.add_trace(go.Scatter(x=y_test.values.reshape(-1),y=yhat_test.reshape(-1),mode="markers"),row=1,col=2)
fig.add_shape(type="line",x0=min(yhat_test)[0],y0=min(yhat_test)[0],x1=max(yhat_test)[0],y1=max(yhat_test)[0], layer='below', line=dict(dash='dash'),row=1,col=2)
fig.update_layout(title_text = "Observed vs Predicted",showlegend=False)
fig.show()


Index(['feed_start', 'feed_end', 'Glc_feed_rate', 'Glc_0', 'VCD_0', 'X:VCD:1',
       'X:VCD:2', 'X:VCD:3', 'X:VCD:4', 'X:VCD:5', 'X:VCD:6', 'X:VCD:7',
       'X:VCD:8', 'X:VCD:9', 'X:Glc:1', 'X:Glc:2', 'X:Glc:3', 'X:Glc:4',
       'X:Glc:5', 'X:Glc:6', 'X:Glc:7', 'X:Glc:8', 'X:Glc:9'],
      dtype='object')


## Transfer learning over task A & B with one-hot encoding

In [93]:
# Define Pipeline
pscaler = StandardScaler(with_mean=USE_NORM,with_std=USE_NORM)
pls_bwu = PLSRegression(n_components=N_LV)
pipe = Pipeline([('scaler', pscaler), ('pls', pls_bwu)])
# Train PLS model
y = pd.concat([B_tar,A_tar])
X = pd.concat([preprocess_bwu(B_bwu, B_doe, USE_DAYS, REMOVE_VARS),preprocess_bwu(A_bwu, A_doe, USE_DAYS, REMOVE_VARS)],axis=0)
X['Task:B'] = np.concatenate((np.repeat(1,len(B_tar)),np.repeat(0,len(A_tar))),axis=None)
X_columns = X.columns
print(X_columns)
pipe.fit(X,y)

# Make predictions
yhat = pipe.predict(X)
X_test = preprocess_bwu(B_bwu_test, B_doe_test, USE_DAYS, REMOVE_VARS)
X_test['Task:B'] = np.repeat(1,len(B_tar_test))
y_test = B_tar_test
yhat_test = pipe.predict(X_test)

# Calculate error metrics
train_r2 = round(pipe.score(X,y),3)
train_abs_rmse = round(mean_squared_error(y, yhat,squared=False),3)
train_rel_rmse = round(mean_squared_error(y, yhat,squared=False) / np.std(np.array(y)),3)
test_r2 = round(pipe.score(X_test,y_test),3)
test_abs_rmse = round(mean_squared_error(y_test, yhat_test,squared=False),3)
test_rel_rmse = round(mean_squared_error(y_test, yhat_test,squared=False) / np.std(np.array(y_test)),3)
scen_3 = test_rel_rmse

# Plot observed vs predicted
fig = make_subplots(rows=1, cols=2, subplot_titles=(
    f"Train Set <br> R^2 = {train_r2} <br> Abs RMSE = {train_abs_rmse} <br> Rel RMSE = {train_rel_rmse}" ,
    f"Test Set <br> R^2 = {test_r2} <br> Abs RMSE = {test_abs_rmse} <br> Rel RMSE = {test_rel_rmse}"))
# Train set plot
fig.add_trace(go.Scatter(x=y.values.reshape(-1),y=yhat.reshape(-1),mode="markers"),row=1,col=1)
fig.add_shape(type="line",x0=min(yhat)[0],y0=min(yhat)[0],x1=max(yhat)[0],y1=max(yhat)[0], layer='below', line=dict(dash='dash'))
# Test set plot
fig.add_trace(go.Scatter(x=y_test.values.reshape(-1),y=yhat_test.reshape(-1),mode="markers"),row=1,col=2)
fig.add_shape(type="line",x0=min(yhat_test)[0],y0=min(yhat_test)[0],x1=max(yhat_test)[0],y1=max(yhat_test)[0], layer='below', line=dict(dash='dash'),row=1,col=2)
fig.update_layout(title_text = "Observed vs Predicted",showlegend=False)
fig.show()


Index(['feed_start', 'feed_end', 'Glc_feed_rate', 'Glc_0', 'VCD_0', 'X:VCD:1',
       'X:VCD:2', 'X:VCD:3', 'X:VCD:4', 'X:VCD:5', 'X:VCD:6', 'X:VCD:7',
       'X:VCD:8', 'X:VCD:9', 'X:Glc:1', 'X:Glc:2', 'X:Glc:3', 'X:Glc:4',
       'X:Glc:5', 'X:Glc:6', 'X:Glc:7', 'X:Glc:8', 'X:Glc:9', 'Task:B'],
      dtype='object')


In [91]:
#Comparison between learning scenarios
fig = go.Figure([go.Bar(x=["Single Task B", "TL Tasks A & B without OHE", "TL Tasks A & B with OHE"], y=[scen_1, scen_2, scen_3], text=[scen_1, scen_2, scen_3],textposition='auto',)])
fig.update_layout(yaxis=dict(title='Relative RMSE'))
fig.show()