# Dano's CORVO & TPOT  notebook

In this notebook, I will try and use TPOT to asses what traditional ML algorithms would be useful to predict cognitive performance from EEG data in Neurodoro

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn as sk
from os import walk
from os import listdir
from os.path import isfile, join
from sklearn.model_selection import train_test_split
from tpot import TPOTRegressor

In [2]:
EPOCH_LENGTH = 440 # 2 seconds

In [3]:
# Data has been collected, let's import it

# I pruned this dataset so it only contains epochs for the first (dots) task
hubert_data = pd.read_csv("../muse-data/Hubert1Pruned.csv", header=0, index_col=False)


In [4]:
# Let's get our labels data set first because it's easier. We'll grab every 4th row from the Performance column

labels = hubert_data['Performance'].iloc[::4]

# Then we'll reindex the dataframe

labels = labels.reset_index().drop('index', axis=1)

# Convert to 1D array for TPOT

labels = np.array(labels).ravel()


In [5]:
# Seperate data into 4 dataframes, 1 for each electrode

chan1 = hubert_data.loc[:,'Channel':'110 hz'].loc[hubert_data['Channel'] == 1,].reset_index(drop=True)
chan1.columns = np.arange(1000,1111)
chan2 = hubert_data.loc[:,'Channel':'110 hz'].loc[hubert_data['Channel'] == 2,].reset_index(drop=True)
chan2.columns = np.arange(2000,2111)
chan3 = hubert_data.loc[:,'Channel':'110 hz'].loc[hubert_data['Channel'] == 3,].reset_index(drop=True)
chan3.columns = np.arange(3000,3111)
chan4 = hubert_data.loc[:,'Channel':'110 hz'].loc[hubert_data['Channel'] == 4,].reset_index(drop=True)
chan4.columns = np.arange(4000,4111)


# Concat all channel-specific dataframes together so that row = 2s epoch
# columns = [electrode 1 FFT bins] + [electrode 2 FFT bins] + ...
training_data = pd.concat([chan1.iloc[:,1:], chan2.iloc[:,1:], chan3.iloc[:,1:], chan4.iloc[:,1:]], axis=1, join_axes=[chan1.index])

In [6]:
print(training_data.shape)
labels.shape

(152, 440)


(152,)

# Nice!

In [7]:
# Create a TPOTClassifier that will run for 10 generations

pipeline_optimizer = TPOTRegressor(generations=10, population_size=20, cv=5,
                                    random_state=42, verbosity=3, config_dict='TPOT light') 

# Fit this baby! Takes a long time to run

pipeline_optimizer.fit(training_data, labels)  
  
# See what kind of score we get
print(pipeline_optimizer.score(training_data, labels))

Optimization Progress:   0%|          | 0/220 [00:00<?, ?pipeline/s]

20 operators have been imported by TPOT.
_pre_test decorator: _generate: num_test=0 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False
_pre_test decorator: _generate: num_test=0 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False


Optimization Progress:   9%|▉         | 20/220 [01:55<15:52,  4.76s/pipeline]

_pre_test decorator: _random_mutation_operator: num_test=0 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  18%|█▊        | 39/220 [03:14<11:06,  3.68s/pipeline]

Generation 1 - Current Pareto front scores:
1	755.3908005139929	LassoLarsCV(input_matrix, LassoLarsCV__normalize=True)

_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required.
_pre_test decorator: _random_mutation_operator: num_test=0 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  26%|██▋       | 58/220 [03:21<02:21,  1.14pipeline/s]

Generation 2 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)

_pre_test decorator: _random_mutation_operator: num_test=0 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False
_pre_test decorator: _random_mutation_operator: num_test=1 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False
_pre_test decorator: _random_mutation_operator: num_test=2 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False
_pre_test decorator: _random_mutation_

Optimization Progress:  29%|██▉       | 64/220 [03:21<01:16,  2.04pipeline/s]

Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  35%|███▌      | 77/220 [03:25<00:41,  3.41pipeline/s]

Generation 3 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)



Optimization Progress:  36%|███▋      | 80/220 [03:26<00:45,  3.10pipeline/s]

Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  44%|████▍     | 97/220 [04:06<05:51,  2.85s/pipeline]

Generation 4 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)

_pre_test decorator: _random_mutation_operator: num_test=0 Automatic alpha grid generation is not supported for l1_ratio=0. Please supply a grid by providing your estimator with the appropriate `alphas=` argument.
_pre_test decorator: _random_mutation_operator: num_test=1 Automatic alpha grid generation is not supported for l1_ratio=0. Please supply a grid by providing your estimator with the appropriate `alphas=` argument.


Optimization Progress:  45%|████▌     | 99/220 [04:06<04:05,  2.03s/pipeline]

Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  52%|█████▏    | 115/220 [05:19<07:53,  4.51s/pipeline]

Generation 5 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)

_pre_test decorator: _random_mutation_operator: num_test=0 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False


Optimization Progress:  54%|█████▎    | 118/220 [05:20<05:29,  3.23s/pipeline]

_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  60%|██████    | 133/220 [06:57<07:01,  4.84s/pipeline]

Generation 6 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)
2	746.0309062484648	KNeighborsRegressor(LassoLarsCV(input_matrix, LassoLarsCV__normalize=True), KNeighborsRegressor__n_neighbors=61, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)



Optimization Progress:  62%|██████▏   | 137/220 [06:57<04:43,  3.42s/pipeline]

Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  68%|██████▊   | 150/220 [08:22<05:15,  4.51s/pipeline]

Generation 7 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)
2	746.0309062484648	KNeighborsRegressor(LassoLarsCV(input_matrix, LassoLarsCV__normalize=True), KNeighborsRegressor__n_neighbors=61, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)



Optimization Progress:  71%|███████▏  | 157/220 [08:24<03:26,  3.28s/pipeline]

Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  76%|███████▌  | 167/220 [08:28<01:09,  1.32s/pipeline]

Generation 8 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)
2	746.0309062484648	KNeighborsRegressor(LassoLarsCV(input_matrix, LassoLarsCV__normalize=True), KNeighborsRegressor__n_neighbors=61, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)



Optimization Progress:  80%|████████  | 176/220 [08:31<00:53,  1.21s/pipeline]

Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


Optimization Progress:  84%|████████▍ | 185/220 [08:37<00:32,  1.09pipeline/s]

Generation 9 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)
2	746.0309062484648	KNeighborsRegressor(LassoLarsCV(input_matrix, LassoLarsCV__normalize=True), KNeighborsRegressor__n_neighbors=61, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)



Optimization Progress:  87%|████████▋ | 192/220 [08:38<00:19,  1.42pipeline/s]

Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.
Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.


                                                                              

Generation 10 - Current Pareto front scores:
1	752.4671636109712	KNeighborsRegressor(input_matrix, KNeighborsRegressor__n_neighbors=64, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)
2	746.0309062484648	KNeighborsRegressor(LassoLarsCV(input_matrix, LassoLarsCV__normalize=True), KNeighborsRegressor__n_neighbors=61, KNeighborsRegressor__p=DEFAULT, KNeighborsRegressor__weights=uniform)

638.398000679


In [8]:
pipeline_optimizer.export('tpot_exported_pipeline1.py')