# Exercise: Error correcting Neural Networks

This is a rather long exercise where you will put most of what you have learned during the exercises to good use. The majority of it can be done without use of the LEGO robot so you can finish on your own time if you do not finish today. 

During this exercise we will train an error correcting neural network. The method is the similar to that employed by the group of [Anatole v. Lilienfeld's group for electronic structure calculations of molecules](https://pubs.acs.org/doi/10.1021/acs.jctc.5b00099) ([arxiv](https://arxiv.org/abs/1503.04987)). Generating data is rather expensive with the LEGO robots, and we cannot mix data from all 4 robots without inducing significant noise. We thus have to make as much as possible out of the precious data we have.

You will:
 * Change the SilicoColorMixer model to be as similar to the LEGO as possible
 * Train a neural network to predict the error between the update SilicoColorMixer and AiLEGO_master (the robot) color mixer
 * Test performance 

Import all the modules and functions you think you will be needing today.

In [1]:
# import statements here

In [2]:
# Teacher
import numpy as np
from IPython import display
import matplotlib.pyplot as plt
from plot_pie_charts import make_piechart_plot
from silico_color_mixer import SilicoColorMixer
from sklearn.neural_network import MLPRegressor
from scipy.optimize import minimize

We start with making sure you can actually do the last thing. Right now, most of the data you have on the robots are correlated as they originate from oprimization runs. We want to use some of this as a test dataset (dependent test set) but we also want a test dataset that is not correlated with the optimization runs. Run the robot to generate 9 data points where you decide the input list, like you did on the first robot day. This will be your independent data set. Use the cells below. You have done this before and can do it again; this time unassisted.

In [3]:
# for you to generate independent test data set on robot

In [4]:
# for you to generate independent test data set on robot

In [5]:
# for you to generate independent test data set on robot. Add more cells as needed

## Change SilicoColorMixer

Change the color mixer such that it is as similar to the LEGO robot as possible without changing the source code. By make sure the `SilicoColorMixer` and the robot are as similar as possible, you can use a smaller neural network to predict the error and require less training data.

First, lets change the rgb codes of the used colors in the `SilicoColorMixer`. If you cannot remember how to do this, go back and reread the documentation of the `SilicoColorMixer` or open the DocString by initializing and pressing `Shift`+`Tab`.

In [6]:
from silico_color_mixer import SilicoColorMixer

Initialize a mixer. We have plenty of noise in our lego data already so you can remove that. Change the colors such that the colors codes are that of the pure colors you have obtained with your robot. You should have those in your robot data. Pay attention to the input formats. You can find examples in the DocString.

In [7]:
# Cell for you to change colors used in SilicoColorMixer and initialize the mixer.

In [8]:
# Cell for you to change colors used in SilicoColorMixer and initialize the mixer. Add more as needed.

In [9]:
# Teacher
rgb_codes = {'red': [88, 26, 70],   # 
             'green': [22, 72, 59],
             'blue': [23, 117, 169],
             'yellow': [61, 68, 45]}

In [10]:
# Teacher
mixer = SilicoColorMixer(colors=['red', 'green', 'blue', 'yellow'], color_codes=rgb_codes, noise=False)

Try out your need updated silico mixer to make sure it works.

In [11]:
# Cell for you to try out the mixer. Add more cells as needed.

In [12]:
# Teacher
mixer.run_cuvette([1., 1., 1., 1.,])

(48.5, 70.75, 85.75)

## Optimization of weights

During the robot runs you might have noticed that the different colors have different strength. A little blue is very potent while you need a lot of yellow for the outcome to get a yellow tone. This difference if strength is also reflected in the measured rgb color codes. Update your silico model to reflect this by adding weights to each color input. 

It can be done using a wrapper function as the one below.

In [13]:
def run_cuvette_w_weights(colors_input):
    w_r, w_g, w_b, w_y = 1., 1., 1., 1.
    weighted_colors = [colors_input[0] * w_r,
                       colors_input[1] * w_g,
                       colors_input[2] * w_b,
                       colors_input[3] * w_y]
    return mixer.run_cuvette(weighted_colors)

In [14]:
run_cuvette_w_weights([1.,1.,1.,1.])

(48.5, 70.75, 85.75)

In the above example, all the weights, `w_r` ect., are `1.`. You need to optimize those. Go find your robot data and find a datapoint with input `[0.25, 0.25, 0.25, 0.25]`. You should all have at least one of those. Reinitialize your silico mixer giving the rgb code you obtained from that datapoint as target. Based on this single data point and what you have learned so far, optimize the weights. You will need to write a new wrapper function that takes the weights as input.

If you look closer at the `[0.25, 0.25, 0.25, 0.25]` data point, you will likely find that the values in the rgb code are so low that they cannot possibly be achived by a linear combination of the pure colors. Life is not linear, and when you physically mix all four colors they counteract each other. The resulting color is a blueish semitransparent grey with a low rgb signal. Your fit will thus not be all that great.

In [15]:
# Cell for you to work in.

In [16]:
# Cell for you to work in.

In [17]:
# Cell for you to work in.

In [18]:
# Cell for you to work in. Add more as needed

In [19]:
# Teacher
mixer = SilicoColorMixer(colors=['red', 'green', 'blue', 'yellow'], color_codes=rgb_codes, noise=False, target = (58, 66, 89))

In [20]:
# Teacher
def root_sum_sqr_err(input1, input2):
    """The sum of squared difference between input colors"""
    dif = np.asarray(input1)- np.asarray(input2)
    return (sum(dif**2)**(0.5))

In [21]:
# Teacher
# wrapper function
def run_cuvette_w_weights(weights):
    weighted_colors = [weights[0] * 0.25,
                       weights[1] * 0.25,
                       weights[2] * 0.25,
                       weights[3] * 0.25]
    return mixer.run_cuvette(weighted_colors, read_target=True)

In [22]:
run_cuvette_w_weights([1.0, 1.0, 1.0, 1.0])

((48.5, 70.75, 85.75), (58, 66, 89))

In [23]:
# Teacher
def find_weights(weights):
    color, target = run_cuvette_w_weights(weights)
    score = root_sum_sqr_err(color, target)
    print('color:', color, 'target:', target, 'input:', weights, 'score:', score)
    return score

In [24]:
# Teacher
res = minimize(find_weights, [1.0, 1.0, 1.0, 1.0], method='L-BFGS-B',
               bounds = 4*[[0.0, 10.0]], 
               options={'disp': True, 'eps': 0.1, 'maxiter': 100, 'gtol': 0.1, 'maxfun':10000})
res

color: (48.5, 70.75, 85.75) target: (58, 66, 89) input: [1. 1. 1. 1.] score: 11.107429945761531
color: (49.463414634146346, 69.65853658536585, 85.3658536585366) target: (58, 66, 89) input: [1.1 1.  1.  1. ] score: 9.973224116911194
color: (47.85365853658537, 70.78048780487805, 85.09756097560977) target: (58, 66, 89) input: [1.  1.1 1.  1. ] score: 11.875619524213398
color: (47.87804878048781, 71.87804878048782, 87.78048780487806) target: (58, 66, 89) input: [1.  1.  1.1 1. ] score: 11.768286364209095
color: (48.804878048780495, 70.6829268292683, 84.75609756097563) target: (58, 66, 89) input: [1.  1.  1.  1.1] score: 11.157543605025104
color: (86.71706950779195, 27.995669654545853, 68.81210139610366) target: (58, 66, 89) input: [10.          0.          0.          0.49886341] score: 51.73538881812875
color: (86.72917392354766, 27.976840563370295, 68.82330918847006) target: (58, 66, 89) input: [10.1         0.          0.          0.49886341] score: 51.75156944457714
color: (86.10646564

  return [float(i) * norm / sum(input_list) for i in input_list]


 score: 47.24404724407086
color: (23.0, 117.0, 169.0) target: (58, 66, 89) input: [0.  0.  0.1 0. ] score: 101.1236866416568
color: (61.0, 68.0, 45.0) target: (58, 66, 89) input: [0.  0.  0.  0.1] score: 44.14748010928823
color: (nan, nan, nan) target: (58, 66, 89) input: [0. 0. 0. 0.] score: nan
color: (88.0, 26.0, 70.0) target: (58, 66, 89) input: [0.1 0.  0.  0. ] score: 53.48831648126533
color: (22.0, 72.0, 59.0) target: (58, 66, 89) input: [0.  0.1 0.  0. ] score: 47.24404724407086
color: (23.0, 117.0, 169.0) target: (58, 66, 89) input: [0.  0.  0.1 0. ] score: 101.1236866416568
color: (61.0, 68.0, 45.0) target: (58, 66, 89) input: [0.  0.  0.  0.1] score: 44.14748010928823
color: (nan, nan, nan) target: (58, 66, 89) input: [0. 0. 0. 0.] score: nan
color: (88.0, 26.0, 70.0) target: (58, 66, 89) input: [0.1 0.  0.  0. ] score: 53.48831648126533
color: (22.0, 72.0, 59.0) target: (58, 66, 89) input: [0.  0.1 0.  0. ] score: 47.24404724407086
color: (23.0, 117.0, 169.0) target: (58, 6

color: (23.0, 117.0, 169.0) target: (58, 66, 89) input: [0.  0.  0.1 0. ] score: 101.1236866416568
color: (61.0, 68.0, 45.0) target: (58, 66, 89) input: [0.  0.  0.  0.1] score: 44.14748010928823
color: (nan, nan, nan) target: (58, 66, 89) input: [0. 0. 0. 0.] score: nan
color: (88.0, 26.0, 70.0) target: (58, 66, 89) input: [0.1 0.  0.  0. ] score: 53.48831648126533
color: (22.0, 72.0, 59.0) target: (58, 66, 89) input: [0.  0.1 0.  0. ] score: 47.24404724407086
color: (23.0, 117.0, 169.0) target: (58, 66, 89) input: [0.  0.  0.1 0. ] score: 101.1236866416568
color: (61.0, 68.0, 45.0) target: (58, 66, 89) input: [0.  0.  0.  0.1] score: 44.14748010928823
color: (nan, nan, nan) target: (58, 66, 89) input: [0. 0. 0. 0.] score: nan
color: (88.0, 26.0, 70.0) target: (58, 66, 89) input: [0.1 0.  0.  0. ] score: 53.48831648126533
color: (22.0, 72.0, 59.0) target: (58, 66, 89) input: [0.  0.1 0.  0. ] score: 47.24404724407086
color: (23.0, 117.0, 169.0) target: (58, 66, 89) input: [0.  0.  0.1

      fun: nan
 hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 8.00010419,  8.39201689, 16.81211619,  3.58770603])
  message: b'ABNORMAL_TERMINATION_IN_LNSRCH'
     nfev: 405
      nit: 7
   status: 2
  success: False
        x: array([1.39232306, 0.38718716, 0.95095003, 0.85839771])

Do the weights you find during optimization make sense? How sensitive are these weights to changes in the input colors and the target color? 

The noise in our data robot data is too large to use it to improve the silico model by incorporating weights into it. We would need to use more data points and proably also a more advanced model than simply adding weights. Lets see if we can close the gaps with an error correcting neural network rather than attempting to improve the silico model further. 

## Error correcting neural network

The other day you train a neural network to replicate the silico mixer. Today is slightly different as you will need it to predict the error between the robot and the silico mixer. Start by getting your data from the robot and splitting it into train set, dependent test set, and independent test set.

Start by copying the robot data file to your own data folder. It cannot easily be done from within the notebook so make a new login to gbar. From gbar run the below line. Replace X with your robot number.

Use pandas to load in the data. It might perhaps be a bit overkill, but pandas is a great tool for you to get an introduction to if you don't already know it.

In [25]:
try:
    import pandas as pd
except:
    !pip3 install pandas --user --upgrade
    import pandas as pd

You can then load in the data with the cell below

In [26]:
df = pd.read_csv('~/47332/data/datafile.csv', sep=';')
print(df)

                    Time                                            Vcolors  \
0    2020-06-10 11:58:57                               [1.0, 0.0, 0.0, 0.0]   
1    2020-06-10 12:07:34                               [0.5, 0.0, 0.0, 0.5]   
2    2020-06-10 12:11:54                               [0.5, 0.0, 0.5, 0.0]   
3    2020-06-10 12:14:56                               [0.5, 0.0, 0.5, 0.0]   
4    2020-06-10 12:20:42                               [0.5, 0.0, 0.5, 0.0]   
..                   ...                                                ...   
106  2020-06-18 08:35:24                               [0.0, 1.0, 0.0, 0.0]   
107  2020-06-18 08:38:45                               [0.0, 0.5, 0.5, 0.0]   
108  2020-06-18 08:42:22  [0.0, 0.3333333333333333, 0.6666666666666666, ...   
109  2020-06-18 08:45:31                               [0.0, 0.0, 1.0, 0.0]   
110  2020-06-18 08:49:04                               [0.5, 0.0, 0.5, 0.0]   

               RGB  Cuvette                        

In [27]:
data = df[['Vcolors','RGB']]
print(data)

                                               Vcolors            RGB
0                                 [1.0, 0.0, 0.0, 0.0]   (79, 18, 63)
1                                 [0.5, 0.0, 0.0, 0.5]   (78, 25, 68)
2                                 [0.5, 0.0, 0.5, 0.0]   (21, 19, 82)
3                                 [0.5, 0.0, 0.5, 0.0]   (22, 24, 85)
4                                 [0.5, 0.0, 0.5, 0.0]   (35, 38, 96)
..                                                 ...            ...
106                               [0.0, 1.0, 0.0, 0.0]   (24, 41, 58)
107                               [0.0, 0.5, 0.5, 0.0]   (23, 47, 74)
108  [0.0, 0.3333333333333333, 0.6666666666666666, ...   (22, 53, 83)
109                               [0.0, 0.0, 1.0, 0.0]  (20, 60, 107)
110                               [0.5, 0.0, 0.5, 0.0]   (19, 12, 41)

[111 rows x 2 columns]


Split the independent data you created from the dataframe. In case it is the last 9 points use the one below. Otherwise you have to change the indicies.

In [28]:
df1 = data.iloc[:-9, :]
df_indep_test = data.iloc[-9:, :]

# In case n points after the independent set
# n = 3
# df11 = date.iloc[:-9-n, :]
# df_independent = date.iloc[-9-n:-n, :]
# df12 = date.iloc[-n:, :]
# df1 = pd.concat([df11, df12])

Remove (on average) 20 percent of the points in the remaining data. These points will be the dependent test data.

In [29]:
msk = np.random.rand(len(df1)) < 0.8
df_train = df1[msk]
df_dep_test = df1[~msk]
print(len(df_train))
print(len(df_dep_test))

83
19


Now, turn the dataframes into lists. You can use `df_train.values.tolist()` and similar.

In [30]:
train_list = df_train.values.tolist()
for data in train_list:
    data[0] = eval(data[0])
    data[1] = list(eval(data[1]))
# print(train_list)
train_x = list(np.asarray(train_list)[:,0])
# print(train_x)
train_y = np.asarray(train_list)[:,1]
# print(train_y)

In [31]:
# For you to work in

In [32]:
# For you to work in

You will not be training to the robot data but to the difference between the robot and the silico mixer. Go ahead and make the difference list. Remember that in the end it should have the format `[[r0,g0,b0],[r1,g1,b2],...]`.

You might find numpy usefull as it allows elementwise difference between numpy arrays to be easily calculated. 
`list(np.asarray(point)-np.asarray(point_sil))`

In [33]:
# For you to work in

In [34]:
# Teacher
train_y_sil = []

for point in train_x:
    train_y_sil.append(list(mixer.run_cuvette(point)))
    
#print(train_y_sil)

train_y_diff = []
for point, point_sil in zip(train_y,train_y_sil):
    print(point, point_sil)
    train_y_diff.append(list(np.asarray(point)-np.asarray(point_sil)))

print(train_y_diff)

[79, 18, 63] [88.0, 26.0, 70.0]
[78, 25, 68] [74.5, 47.0, 57.5]
[21, 19, 82] [55.5, 71.5, 119.5]
[22, 24, 85] [55.5, 71.5, 119.5]
[35, 38, 96] [55.5, 71.5, 119.5]
[22, 26, 73] [48.5, 70.75, 85.75]
[29, 24, 67] [68.25, 48.375, 77.875]
[26, 42, 81] [35.25, 71.375, 72.375]
[28, 56, 110] [35.75, 93.875, 127.375]
[34, 45, 81] [54.75, 69.375, 65.375]
[30, 55, 80] [38.625, 81.9375, 89.6875]
[21, 31, 71] [48.5, 70.75, 85.75]
[23, 21, 66] [68.25, 48.375, 77.875]
[21, 45, 73] [35.25, 71.375, 72.375]
[18, 53, 99] [35.75, 93.875, 127.375]
[24, 40, 62] [54.75, 69.375, 65.375]
[17, 83, 119] [22.5, 94.5, 114.0]
[22, 84, 115] [22.454545454545457, 92.45454545454547, 109.0]
[17, 74, 107] [22.5, 94.5, 114.0]
[17, 55, 93] [28.454545454545453, 88.27272727272728, 110.0]
[17, 76, 108] [22.454545454545457, 92.45454545454547, 109.0]
[20, 84, 123] [22.545454545454547, 96.54545454545456, 119.0]
[17, 75, 90] [26.0, 92.0909090909091, 107.72727272727272]
[27, 67, 71] [34.5, 84.8125, 89.0]
[15, 54, 57] [36.9375, 84.

Train a neural network like you did the other day.

In [35]:
# For you to work in

In [36]:
# Teacher
mpl = MLPRegressor(solver='lbfgs', alpha=1e-5,
                   hidden_layer_sizes=(10,10,10, 3), random_state=1, max_iter=4000)

mpl.fit(train_x,train_y_diff)

MLPRegressor(alpha=1e-05, hidden_layer_sizes=(10, 10, 10, 3), max_iter=4000,
             random_state=1, solver='lbfgs')

You can now predict what to LEGO robot will yield by the sum of the silico mixer prediction and the error correcting neural network. Try to see how well it does on both the dependent and independent test sets.

In [37]:
# For you to work in

In [38]:
# For you to work in

In [39]:
# For you to work in

In [40]:
# For you to work in. Add cells as needed

In [41]:
# Teacher
test_dep_list = df_dep_test.values.tolist()
for data in test_dep_list:
    data[0] = eval(data[0])
    data[1] = list(eval(data[1]))
# print(train_list)
test_dep_list_x = list(np.asarray(test_dep_list)[:,0])
# print(train_x)
test_dep_list_y = np.asarray(test_dep_list)[:,1]
# print(train_y)

In [42]:
# Teacher
def root_sum_sqr_err(input_color1, input_color2):
    r_color, g_color, b_color = input_color1[0], input_color1[1], input_color1[2]
    r_color2, g_color2, b_color2 = input_color2[0], input_color2[1], input_color2[2]
    return ((r_color-r_color2)**2 +(g_color-g_color2)**2 + (b_color-b_color2)**2)**(0.5)

In [43]:
# Teacher
scores = []
for test_point, output in zip(test_dep_list_x,test_dep_list_y) :
    lego = output
    print(mpl.predict([test_point])[0])
    prediction = mixer.run_cuvette(test_point) + mpl.predict([test_point])[0]
    print(lego, prediction)
    scores.append(root_sum_sqr_err(lego, prediction))

[-21.92538823 -26.42699838 -19.76121006]
[32, 62, 79] [12.63711177 61.19800162 76.11378994]
[-14.76778956 -31.35103989 -12.18456002]
[19, 62, 107] [13.6867559  56.92168738 97.81543998]
[ -4.73991375 -20.02837381  -4.95400334]
[18, 60, 100] [ 17.80554079  76.51708074 114.04599666]
[-20.54343389 -25.90653551 -18.57160215]
[23, 62, 74] [14.78989944 59.76013116 72.42839785]
[-12.49122402 -29.62796063 -10.38565025]
[22, 52, 68] [26.13377598 52.30953937 79.30184975]
[-21.92538823 -26.42699838 -19.76121006]
[21, 49, 63] [12.63711177 61.19800162 76.11378994]
[-20.57821128 -26.90160076 -18.41914896]
[20, 64, 83] [ 8.22957472 63.2566764  84.27598782]
[-21.84404054 -32.25528912 -18.60295077]
[17, 51, 78] [11.53125261 52.95227967 81.56929894]
[-10.10391299 -21.13067595  -9.74189932]
[17, 62, 82] [ 12.82294937  73.07550969 103.49307463]
[-15.15144494 -31.5709668  -12.50080493]
[20, 44, 73] [13.63146997 56.49721749 96.84337339]
[ 11.26805577 -20.97862628  10.12219622]
[23, 53, 67] [33.26805577 51.02

In [44]:
#Teacher
print(scores)
print("Mean: ",np.mean(scores)," Standard deviation: ", np.std(scores))

[19.593234811150033, 11.763331359954698, 21.68275334819135, 8.654056671399173, 12.038094790533457, 19.766148580977962, 11.862697256151971, 6.816046422913283, 24.537052068660813, 27.66306335992089, 10.670128774144908, 31.980848443218687, 22.848440339630823, 16.19415122187797, 25.668305336719218, 6.129476360306565, 4.632926401863858, 9.084842229587904, 10.98659418927343]
Mean:  15.924852208761946  Standard deviation:  7.88423599167606


In [45]:
# Teacher
test_indep_list = df_indep_test.values.tolist()
for data in test_indep_list:
    data[0] = eval(data[0])
    data[1] = list(eval(data[1]))
# print(train_list)
test_indep_list_x = list(np.asarray(test_indep_list)[:,0])
# print(train_x)
test_indep_list_y = np.asarray(test_indep_list)[:,1]
# print(train_y)

In [46]:
# Teacher
scores = []
for test_point, output in zip(test_indep_list_x,test_indep_list_y) :
    lego = output
    prediction = mixer.run_cuvette(test_point) + mpl.predict([test_point])[0]
    print(lego, prediction)
    scores.append(root_sum_sqr_err(lego, prediction))

[54, 18, 45] [72.49738595 19.31881611 52.54932373]
[52, 19, 45] [69.59332245 21.18815777 53.46499654]
[54, 48, 50] [66.68401185 55.88549533 48.27833174]
[28, 39, 49] [32.01220081 48.25716747 42.94526111]
[24, 41, 58] [33.26805577 51.02137372 69.12219622]
[23, 47, 74] [ 17.72249665  74.46108138 109.01296807]
[22, 53, 83] [ 18.02699184  81.9997457  127.46740625]
[20, 60, 107] [ 18.63598223  97.07707433 164.37628261]
[19, 12, 41] [21.62020881 26.94818951 91.97851094]


In [47]:
#Teacher
print(scores)
print("Mean: ",np.mean(scores)," Standard deviation: ", np.std(scores))

[20.022109075569443, 19.64609873731534, 15.034272007187187, 11.766595425944802, 17.607613064673018, 44.80927320463964, 53.236456144439984, 68.32721120852129, 53.18949559639815]
Mean:  33.737680496076536  Standard deviation:  19.88021317220693


Are you satisfied with the results? Do you have an idea as to how you could improve the predictive power? How many data points do you need in the training to get good results? You can change the number of training points indirectly by changing the value in the `msk` used to split data into the train set and dependent test set. Do you think you can get by using fewer training points if you use a silico mixer with optimized color weights?

# Trash below here

In [48]:
df_train['Vcolors']

0                                   [1.0, 0.0, 0.0, 0.0]
1                                   [0.5, 0.0, 0.0, 0.5]
2                                   [0.5, 0.0, 0.5, 0.0]
3                                   [0.5, 0.0, 0.5, 0.0]
4                                   [0.5, 0.0, 0.5, 0.0]
                             ...                        
96     [0.35999990516344405, 3.387019856163049e-07, 0...
97     [0.7070728640708882, 1.9957291396603856e-06, 1...
98     [0.5465794517123573, 0.00012498958420133143, 4...
99     [0.5667655359096931, 1.4398433450455999e-05, 1...
100    [0.42886845942724516, 4.0349621400668035e-07, ...
Name: Vcolors, Length: 83, dtype: object