# TPOT

Group 18 Members:

- Clara Pichler, 11917694
- Hannah Knapp, 11901857 
- Sibel Toprakkiran, 09426341

### Overview

1. Data Sets

2. Evaluation of TPOT

- Iris Dataset
- Congressional Voting Dataset
- Airfoil Dataset
- Abalone Data set


The evaluation of our implementation and AutoSklearn will be done in the files `ML_A3_Group18.ipynb` and `AutoSklearn.ipynb`.

In [1]:
from tpot import TPOTRegressor, TPOTClassifier

from sklearn import datasets
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report, mean_absolute_error, r2_score
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVC
import time
from sklearn.preprocessing import LabelEncoder
from sklearn.dummy import DummyClassifier, DummyRegressor
from sklearn.experimental import enable_iterative_imputer 
from sklearn.impute import IterativeImputer
from sklearn.linear_model import LinearRegression, Lasso

## Data Sets

In [2]:
iris = datasets.load_iris()
iris_data = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])
label_encoder = LabelEncoder()
iris_data['target'] = label_encoder.fit_transform(iris_data['target'])

df_voting = pd.read_csv('data/CongressionalVotingID.shuf.lrn.csv')

df_airfoil = pd.read_csv("data/airfoil_noise_data.csv")

url='./data/abalone.csv'
column_names = ["Sex", "Length", "Diameter", "Height", "Whole_weight", "Shucked_weight", "Viscera_weight", "Shell_weight", "Rings"]
df_abalone = pd.read_csv(url, header=0, names=column_names)
df_abalone = df_abalone[df_abalone.Height != 0]

pd.set_option('future.no_silent_downcasting', True)
df_voting = df_voting.replace({"democrat": 0,"republican": 1,"n": 0,"y": 1,"unknown": np.nan})
df_voting = df_voting.drop(columns=['ID'])

imp = IterativeImputer(max_iter=10, random_state=0)
df_voting = pd.DataFrame(imp.fit_transform(df_voting), columns=df_voting.columns)

df_abalone = df_abalone[df_abalone.Height != 0]
df_abalone = pd.get_dummies(df_abalone, columns=['Sex'], drop_first=False)



In [None]:
X_iris = iris_data.drop(['target'], axis=1)
y_iris = iris_data['target']

X_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(X_iris, y_iris, test_size=0.7, random_state=42)

X_voting = df_voting.drop(['class'], axis=1)
y_voting = df_voting['class']

X_train_voting, X_test_voting, y_train_voting, y_test_voting = train_test_split(X_voting, y_voting, test_size=0.7, random_state=42)

X_airfoil = df_airfoil.drop(['y'], axis=1)
y_airfoil = df_airfoil['y']

X_train_airfoil, X_test_airfoil, y_train_airfoil, y_test_airfoil = train_test_split(X_airfoil, y_airfoil, test_size=0.7, random_state=42)

X_abalone_reg = df_abalone.drop(['Rings'], axis=1)
y_abalone_reg = df_abalone['Rings']

X_train_abalone_reg, X_test_abalone_reg, y_train_abalone_reg, y_test_abalone_reg = train_test_split(X_abalone_reg, y_abalone_reg, test_size=0.6, random_state=42)

## Evaluation

In [7]:
classifier = TPOTClassifier(verbosity=2, population_size=50, generations=50, random_state=42)

reg = TPOTRegressor(verbosity=2, population_size=50, generations=50, random_state=42)

### Iris

In [8]:
classifier.fit(X_train_iris, y_train_iris)
print(classifier.score(X_test_iris, y_test_iris))
classifier.export('tpot_iris_pipeline.py')



Optimization Progress:   0%|          | 0/5050 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: 1.0

Generation 2 - Current best internal CV score: 1.0

Generation 3 - Current best internal CV score: 1.0

Generation 4 - Current best internal CV score: 1.0

Generation 5 - Current best internal CV score: 1.0

Generation 6 - Current best internal CV score: 1.0

Generation 7 - Current best internal CV score: 1.0

Generation 8 - Current best internal CV score: 1.0

Generation 9 - Current best internal CV score: 1.0

Generation 10 - Current best internal CV score: 1.0

Generation 11 - Current best internal CV score: 1.0

Generation 12 - Current best internal CV score: 1.0

Generation 13 - Current best internal CV score: 1.0

Generation 14 - Current best internal CV score: 1.0

Generation 15 - Current best internal CV score: 1.0

Generation 16 - Current best internal CV score: 1.0

Generation 17 - Current best internal CV score: 1.0

Generation 18 - Current best internal CV score: 1.0

Generation 19 - Current best internal CV score: 1.0

G

### Congressional Voting

In [9]:
classifier.fit(X_train_voting, y_train_voting)
print(classifier.score(X_test_voting, y_test_voting))
classifier.export('tpot_voting_pipeline.py')



Optimization Progress:   0%|          | 0/5050 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: 0.9692307692307693

Generation 2 - Current best internal CV score: 0.9692307692307693

Generation 3 - Current best internal CV score: 0.9692307692307693

Generation 4 - Current best internal CV score: 0.9692307692307693

Generation 5 - Current best internal CV score: 0.9692307692307693

Generation 6 - Current best internal CV score: 0.9692307692307693

Generation 7 - Current best internal CV score: 0.9692307692307693

Generation 8 - Current best internal CV score: 0.9692307692307693

Generation 9 - Current best internal CV score: 0.9692307692307693

Generation 10 - Current best internal CV score: 0.9692307692307693

Generation 11 - Current best internal CV score: 0.9692307692307693

Generation 12 - Current best internal CV score: 0.9846153846153847

Generation 13 - Current best internal CV score: 0.9846153846153847

Generation 14 - Current best internal CV score: 0.9846153846153847

Generation 15 - Current best internal CV score: 0.984615

### Airfoil

In [None]:
reg.fit(X_train_airfoil, y_train_airfoil)
print(reg.score(X_test_airfoil, y_test_airfoil))
reg.export('tpot_airfoil_pipeline.py')



Optimization Progress:   0%|          | 0/5050 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: -9.652584331750697

Generation 2 - Current best internal CV score: -9.652584331750697

Generation 3 - Current best internal CV score: -9.652584331750697

Generation 4 - Current best internal CV score: -8.216878030809218

Generation 5 - Current best internal CV score: -8.216878030809218

Generation 6 - Current best internal CV score: -7.56809293908942

Generation 7 - Current best internal CV score: -7.187945166659179

Generation 8 - Current best internal CV score: -7.187945166659179

Generation 9 - Current best internal CV score: -7.187945166659179

Generation 10 - Current best internal CV score: -7.167539276690086

Generation 11 - Current best internal CV score: -7.167539276690086

Generation 12 - Current best internal CV score: -6.780305887260747

Generation 13 - Current best internal CV score: -6.780305887260747

Generation 14 - Current best internal CV score: -6.728588159256153

Generation 15 - Current best internal CV score: -6.728588

### Abalone

In [46]:
reg.fit(X_train_abalone_reg, y_train_abalone_reg)
print(reg.score(X_test_abalone_reg, y_test_abalone_reg))
reg.export('tpot_abalone_pipeline.py')



Optimization Progress:   0%|          | 0/2550 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: -4.092582398780001

Generation 2 - Current best internal CV score: -4.092582398780001

Generation 3 - Current best internal CV score: -4.092582398780001

Generation 4 - Current best internal CV score: -4.092582398780001

Generation 5 - Current best internal CV score: -4.092582398780001

Generation 6 - Current best internal CV score: -4.092582398780001

Generation 7 - Current best internal CV score: -4.044693474183838

Generation 8 - Current best internal CV score: -4.044693474183838

Generation 9 - Current best internal CV score: -4.044693474183838

Generation 10 - Current best internal CV score: -4.044693474183838

Generation 11 - Current best internal CV score: -4.041146313278145

Generation 12 - Current best internal CV score: -4.038663822264688

Generation 13 - Current best internal CV score: -4.038663822264688

Generation 14 - Current best internal CV score: -4.038663822264688

Generation 15 - Current best internal CV score: -4.03833

Best pipeline: RandomForestRegressor(PCA(input_matrix, iterated_power=4, svd_solver=randomized), bootstrap=False, max_features=0.30000000000000004, min_samples_leaf=16, min_samples_split=15, n_estimators=100)
