# Exercise: Error correcting Neural Networks

This is a rather long exercise where you will put most of what you have learned during the exercises to good use. The majority of it can be done without use of the LEGO robot so you can finish on your own time if you do not finish today. 

During this exercise we will train an error correcting neural network. The method is the similar to that employed by the group of [Anatole v. Lilienfeld's group for electronic structure calculations of molecules](https://pubs.acs.org/doi/10.1021/acs.jctc.5b00099) ([arxiv](https://arxiv.org/abs/1503.04987)). Generating data is rather expensive with the LEGO robots, and we cannot mix data from all 4 robots without inducing significant noise. We thus have to make as much as possible out of the precious data we have.

You will:
 * Change the SilicoColorMixer model to be as similar to the LEGO as possible
 * Train a neural network to predict the error between the update SilicoColorMixer and AiLEGO_master (the robot) color mixer
 * Test performance 

Import all the modules and functions you think you will be needing today.

In [1]:
# import statements here

In [2]:
# Teacher
import numpy as np
from IPython import display
import matplotlib.pyplot as plt
from plot_pie_charts import make_piechart_plot
from silico_color_mixer import SilicoColorMixer
from sklearn.neural_network import MLPRegressor
from scipy.optimize import minimize

We start with making sure you can actually do the last thing. Right now, most of the data you have on the robots are correlated as they originate from oprimization runs. We want to use some of this as a test dataset (dependent test set) but we also want a test dataset that is not correlated with the optimization runs. Run the robot to generate 9 data points where you decide the input list, like you did on the first robot day. This will be your independent data set. Use the cells below. You have done this before and can do it again; this time unassisted.

In [3]:
# for you to generate independent test data set on robot

In [4]:
# for you to generate independent test data set on robot

In [5]:
# for you to generate independent test data set on robot. Add more cells as needed

## Change SilicoColorMixer

Change the color mixer such that it is as similar to the LEGO robot as possible without changing the source code. By make sure the `SilicoColorMixer` and the robot are as similar as possible, you can use a smaller neural network to predict the error and require less training data.

First, lets change the rgb codes of the used colors in the `SilicoColorMixer`. If you cannot remember how to do this, go back and reread the documentation of the `SilicoColorMixer` or open the DocString by initializing and pressing `Shift`+`Tab`.

In [6]:
from silico_color_mixer import SilicoColorMixer

Initialize a mixer. We have plenty of noise in our lego data already so you can remove that. Change the colors such that the colors codes are that of the pure colors you have obtained with your robot. You should have those in your robot data. Pay attention to the input formats. You can find examples in the DocString.

In [7]:
# Cell for you to change colors used in SilicoColorMixer and initialize the mixer.

In [8]:
# Cell for you to change colors used in SilicoColorMixer and initialize the mixer. Add more as needed.

In [9]:
# Teacher
rgb_codes = {'red': [88, 26, 70],   # 
             'green': [22, 72, 59],
             'blue': [23, 117, 169],
             'yellow': [61, 68, 45]}

In [10]:
# Teacher
mixer = SilicoColorMixer(colors=['red', 'green', 'blue', 'yellow'], color_codes=rgb_codes, noise=False)

Try out your need updated silico mixer to make sure it works.

In [11]:
# Cell for you to try out the mixer. Add more cells as needed.

In [12]:
# Teacher
mixer.run_cuvette([1., 1., 1., 1.,])

(48.5, 70.75, 85.75)

## Optimization of weights

During the robot runs you might have noticed that the different colors have different strength. A little blue is very potent while you need a lot of yellow for the outcome to get a yellow tone. This difference if strength is also reflected in the measured rgb color codes. Update your silico model to reflect this by adding weights to each color input. 

It can be done using a wrapper function as the one below.

In [13]:
def run_cuvette_w_weights(colors_input):
    w_r, w_g, w_b, w_y = 1., 1., 1., 1.
    weighted_colors = [colors_input[0] * w_r,
                       colors_input[1] * w_g,
                       colors_input[2] * w_b,
                       colors_input[3] * w_y]
    return mixer.run_cuvette(weighted_colors)

In [14]:
run_cuvette_w_weights([1.,1.,1.,1.])

(48.5, 70.75, 85.75)

In the above example, all the weights, `w_r` ect., are `1.`. You need to optimize those. Go find your robot data and find a datapoint with input `[0.25, 0.25, 0.25, 0.25]`. You should all have at least one of those. Reinitialize your silico mixer giving the rgb code you obtained from that datapoint as target. Based on this single data point and what you have learned so far, optimize the weights. You will need to write a new wrapper function that takes the weights as input.

If you look closer at the `[0.25, 0.25, 0.25, 0.25]` data point, you will likely find that the values in the rgb code are so low that they cannot possibly be achived by a linear combination of the pure colors. Life is not linear, and when you physically mix all four colors they counteract each other. The resulting color is a blueish semitransparent grey with a low rgb signal. Your fit will thus not be all that great.

In [15]:
# Cell for you to work in.

In [16]:
# Cell for you to work in.

In [17]:
# Cell for you to work in.

In [18]:
# Cell for you to work in. Add more as needed

In [19]:
# Teacher
mixer = SilicoColorMixer(colors=['red', 'green', 'blue', 'yellow'], color_codes=rgb_codes, noise=False, target = (38, 46, 69))

In [20]:
# Teacher
def root_sum_sqr_err(input1, input2):
    """The sum of squared difference between input colors"""
    dif = np.asarray(input1)- np.asarray(input2)
    return (sum(dif**2)**(0.5))

In [21]:
# Teacher
# wrapper function
def run_cuvette_w_weights(weights):
    weighted_colors = [weights[0] * 0.25,
                       weights[1] * 0.25,
                       weights[2] * 0.25,
                       weights[3] * 0.25]
    return mixer.run_cuvette(weighted_colors, read_target=True)

In [22]:
run_cuvette_w_weights([1.0, 1.0, 1.0, 1.0])

((48.5, 70.75, 85.75), (38, 46, 69))

In [23]:
# Teacher
def find_weights(weights):
    color, target = run_cuvette_w_weights(weights)
    score = root_sum_sqr_err(color, target)
    print('color:', color, 'target:', target, 'input:', weights, 'score:', score)
    return score

In [24]:
# Teacher
res = minimize(find_weights, [1.0, 1.0, 1.0, 1.0], method='L-BFGS-B',
               bounds = 4*[[0.0, 10.0]], 
               options={'disp': True, 'eps': 0.1, 'maxiter': 100, 'gtol': 0.1, 'maxfun':10000})
res

color: (48.5, 70.75, 85.75) target: (38, 46, 69) input: [1. 1. 1. 1.] score: 31.676095087620887
color: (49.46341463414635, 69.65853658536587, 85.36585365853661) target: (38, 46, 69) input: [1.1 1.  1.  1. ] score: 30.967360145935785
color: (47.85365853658537, 70.78048780487806, 85.09756097560977) target: (38, 46, 69) input: [1.  1.1 1.  1. ] score: 31.149616879936207
color: (47.878048780487816, 71.87804878048782, 87.78048780487806) target: (38, 46, 69) input: [1.  1.  1.1 1. ] score: 33.46574335920506
color: (48.804878048780495, 70.6829268292683, 84.75609756097562) target: (38, 46, 69) input: [1.  1.  1.  1.1] score: 31.212928040461264
color: (59.700544231415705, 52.256772172445494, 59.50628288854068) target: (38, 46, 69) input: [8.08734942 6.26478208 0.         5.63167047] score: 24.498805737217747
color: (59.84145109664713, 52.126036108537775, 59.558532542911856) target: (38, 46, 69) input: [8.18734942 6.26478208 0.         5.63167047] score: 24.57068603343405
color: (59.512828059464

      fun: 13.515241749405648
 hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
      jac: array([-0.05548064,  0.08144641, -0.06463485,  1.74765638])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 55
      nit: 9
   status: 0
  success: True
        x: array([4.92653137, 8.82270336, 0.14603506, 0.        ])

Do the weights you find during optimization make sense? How sensitive are these weights to changes in the input colors and the target color? 

The noise in our data robot data is too large to use it to improve the silico model by incorporating weights into it. We would need to use more data points and proably also a more advanced model than simply adding weights. Lets see if we can close the gaps with an error correcting neural network rather than attempting to improve the silico model further. 

## Error correcting neural network

The other day you train a neural network to replicate the silico mixer. Today is slightly different as you will need it to predict the error between the robot and the silico mixer. Start by getting your data from the robot and splitting it into train set, dependent test set, and independent test set.

Start by copying the robot data file to your own data folder. It cannot easily be done from within the notebook so make a new login to gbar. From gbar run the below line. Replace X with your robot number.

Use pandas to load in the data. It might perhaps be a bit overkill, but pandas is a great tool for you to get an introduction to if you don't already know it.

In [25]:
try:
    import pandas as pd
except:
    !pip3 install pandas --user --upgrade
    import pandas as pd

You can then load in the data with the cell below

In [26]:
df = pd.read_csv('~/47332/data/datafile.csv', sep=';')
print(df)

FileNotFoundError: [Errno 2] File b'C:\\Users\\runch/47332/data/datafile.csv' does not exist: b'C:\\Users\\runch/47332/data/datafile.csv'

In [None]:
data = df[['Vcolors','RGB']]
print(data)

Split the independent data you created from the dataframe. In case it is the last 9 points use the one below. Otherwise you have to change the indicies.

In [None]:
df1 = data.iloc[:-9, :]
df_indep_test = data.iloc[-9:, :]

# In case n points after the independent set
# n = 3
# df11 = date.iloc[:-9-n, :]
# df_independent = date.iloc[-9-n:-n, :]
# df12 = date.iloc[-n:, :]
# df1 = pd.concat([df11, df12])

Remove (on average) 20 percent of the points in the remaining data. These points will be the dependent test data.

In [None]:
msk = np.random.rand(len(df1)) < 0.8
df_train = df1[msk]
df_dep_test = df1[~msk]
print(len(df_train))
print(len(df_dep_test))

Now, turn the dataframes into lists. You can use `df_train.values.tolist()` and similar.

In [None]:
train_list = df_train.values.tolist()
for data in train_list:
    data[0] = eval(data[0])
    data[1] = list(eval(data[1]))
# print(train_list)
train_x = list(np.asarray(train_list)[:,0])
# print(train_x)
train_y = np.asarray(train_list)[:,1]
# print(train_y)

In [None]:
# For you to work in

In [None]:
# For you to work in

You will not be training to the robot data but to the difference between the robot and the silico mixer. Go ahead and make the difference list. Remember that in the end it should have the format `[[r0,g0,b0],[r1,g1,b2],...]`.

You might find numpy usefull as it allows elementwise difference between numpy arrays to be easily calculated. 
`list(np.asarray(point)-np.asarray(point_sil))`

In [None]:
# For you to work in

In [None]:
# Teacher
train_y_sil = []

for point in train_x:
    train_y_sil.append(list(mixer.run_cuvette(point)))
    
#print(train_y_sil)

train_y_diff = []
for point, point_sil in zip(train_y,train_y_sil):
    print(point, point_sil)
    train_y_diff.append(list(np.asarray(point)-np.asarray(point_sil)))

print(train_y_diff)

Train a neural network like you did the other day.

In [None]:
# For you to work in

In [None]:
# Teacher
mpl = MLPRegressor(solver='lbfgs', alpha=1e-5,
                   hidden_layer_sizes=(10,10,10, 3), random_state=1, max_iter=4000)

mpl.fit(train_x,train_y_diff)

You can now predict what to LEGO robot will yield by the sum of the silico mixer prediction and the error correcting neural network. Try to see how well it does on both the dependent and independent test sets.

In [None]:
# For you to work in

In [None]:
# For you to work in

In [None]:
# For you to work in

In [None]:
# For you to work in. Add cells as needed

In [None]:
# Teacher
test_dep_list = df_dep_test.values.tolist()
for data in test_dep_list:
    data[0] = eval(data[0])
    data[1] = list(eval(data[1]))
# print(train_list)
test_dep_list_x = list(np.asarray(test_dep_list)[:,0])
# print(train_x)
test_dep_list_y = np.asarray(test_dep_list)[:,1]
# print(train_y)

In [None]:
# Teacher
def root_sum_sqr_err(input_color1, input_color2):
    r_color, g_color, b_color = input_color1[0], input_color1[1], input_color1[2]
    r_color2, g_color2, b_color2 = input_color2[0], input_color2[1], input_color2[2]
    return ((r_color-r_color2)**2 +(g_color-g_color2)**2 + (b_color-b_color2)**2)**(0.5)

In [None]:
# Teacher
scores = []
for test_point, output in zip(test_dep_list_x,test_dep_list_y) :
    lego = output
    prediction = mixer.run_cuvette(test_point) + mpl.predict([test_point])[0]
    print(mpl.predict([test_point])[0])
    print(lego, prediction)
    scores.append(root_sum_sqr_err(lego, prediction))

In [None]:
#Teacher
print(scores)
print("Mean: ",np.mean(scores)," Standard deviation: ", np.std(scores))

In [None]:
# Teacher
test_indep_list = df_indep_test.values.tolist()
for data in test_indep_list:
    data[0] = eval(data[0])
    data[1] = list(eval(data[1]))
# print(train_list)
test_indep_list_x = list(np.asarray(test_indep_list)[:,0])
# print(train_x)
test_indep_list_y = np.asarray(test_indep_list)[:,1]
# print(train_y)

In [None]:
# Teacher
scores = []
for test_point, output in zip(test_indep_list_x,test_indep_list_y) :
    lego = output
    prediction = mixer.run_cuvette(test_point) + mpl.predict([test_point])[0]
    print(lego, prediction)
    scores.append(root_sum_sqr_err(lego, prediction))

In [None]:
#Teacher
print(scores)
print("Mean: ",np.mean(scores)," Standard deviation: ", np.std(scores))

Are you satisfied with the results? Do you have an idea as to how you could improve the predictive power? How many data points do you need in the training to get good results? You can change the number of training points indirectly by changing the value in the `msk` used to split data into the train set and dependent test set. Do you think you can get by using fewer training points if you use a silico mixer with optimized color weights?

# Trash below here

In [None]:
df_train['Vcolors']