## Automated ML using TPOT in Python
+ Genetic Algorithms: based on natural selection/survival of the fittest

#### Steps of Genetic Algorithms
+ Selection: find the best and fittest
+ Crossover: breed the best and the fittest to get a new generation
+ Mutation: mutate the offspring of the new generation till you get the best and fittest

### Pkgs
+ pip install tpot

#### Dependencies
+ scikit learn and numpy

In [1]:
!pip install tpot

Collecting tpot
[?25l  Downloading https://files.pythonhosted.org/packages/37/d8/719024ea20497eb6566ed5cc070e66e8c1e125e0e5d9966837cd00a3a83d/TPOT-0.11.2-py3-none-any.whl (76kB)
[K     |████▎                           | 10kB 18.9MB/s eta 0:00:01[K     |████████▋                       | 20kB 1.7MB/s eta 0:00:01[K     |████████████▉                   | 30kB 2.2MB/s eta 0:00:01[K     |█████████████████▏              | 40kB 2.4MB/s eta 0:00:01[K     |█████████████████████▌          | 51kB 2.0MB/s eta 0:00:01[K     |█████████████████████████▊      | 61kB 2.2MB/s eta 0:00:01[K     |██████████████████████████████  | 71kB 2.4MB/s eta 0:00:01[K     |████████████████████████████████| 81kB 2.3MB/s 
[?25hCollecting update-checker>=0.16
  Downloading https://files.pythonhosted.org/packages/d6/c3/aaf8a162df8e8f9d321237c7c0e63aff95b42d19f1758f96606e3cabb245/update_checker-0.17-py2.py3-none-any.whl
Collecting deap>=1.2
[?25l  Downloading https://files.pythonhosted.org/packages/0a/eb

In [0]:
# Load Pkgs
import pandas as pd 
from sklearn.model_selection import train_test_split

In [0]:
data_url = "https://raw.githubusercontent.com/Jcharis/Machine-Learning-Web-Apps/master/Iris-Species-Predictor-ML-Flask-App-With-Materialize.css/data/iris.csv"

In [0]:
df = pd.read_csv(data_url)

In [5]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [0]:
# Load Tpot
import tpot

In [7]:
# Methods and Attributes
dir(tpot)

['TPOTClassifier',
 'TPOTRegressor',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_version',
 'base',
 'builtins',
 'config',
 'decorators',
 'driver',
 'export_utils',
 'gp_deap',
 'gp_types',
 'main',
 'metrics',
 'operator_utils',
 'tpot']

In [11]:
df.shape


(150, 5)

In [12]:
df.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

In [0]:
d = {value:index for index, value in enumerate(df['species'].unique())}

In [52]:
d

{'setosa': 0, 'versicolor': 1, 'virginica': 2}

In [0]:

df['new_label'] = df['species'].map(d)

In [54]:
df['new_label']

0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: new_label, Length: 150, dtype: int64

In [0]:
xfeatures = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
ylabels = df['new_label']

In [0]:
# Split Our Dataset
x_train, x_test, y_train, y_test = train_test_split(xfeatures, ylabels, test_size=0.3, random_state=42)

In [58]:
x_train.shape

(105, 4)

#### AutoML with TPOT for best parameters and algorithm

In [0]:
from tpot import TPOTClassifier

In [0]:
tpot = TPOTClassifier(generations=5, verbosity=2)


In [61]:
# Fit our Data
tpot.fit(x_train, y_train)


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=600.0, style=ProgressStyle(de…

Generation 1 - Current best internal CV score: 0.9619047619047618
Generation 2 - Current best internal CV score: 0.9714285714285715
Generation 3 - Current best internal CV score: 0.9714285714285715
Generation 4 - Current best internal CV score: 0.9714285714285715
Generation 5 - Current best internal CV score: 0.980952380952381

Best pipeline: RandomForestClassifier(MultinomialNB(input_matrix, alpha=1.0, fit_prior=False), bootstrap=True, criterion=gini, max_features=1.0, min_samples_leaf=7, min_samples_split=16, n_estimators=100)


TPOTClassifier(config_dict=None, crossover_rate=0.1, cv=5,
               disable_update_check=False, early_stop=None, generations=5,
               log_file=<ipykernel.iostream.OutStream object at 0x7f9422d68320>,
               max_eval_time_mins=5, max_time_mins=None, memory=None,
               mutation_rate=0.9, n_jobs=1, offspring_size=None,
               periodic_checkpoint_folder=None, population_size=100,
               random_state=None, scoring=None, subsample=1.0, template=None,
               use_dask=False, verbosity=2, warm_start=False)

In [0]:
import numpy as np
ex2 = np.array([6.2,3.4,5.4,2.3]).reshape(1,-1)

In [63]:
# Prediction
tpot.predict(ex2)

array([2])

In [64]:
d

{'setosa': 0, 'versicolor': 1, 'virginica': 2}

In [65]:
# Check the accuracy
print(tpot.score(x_test, y_test))


1.0


In [0]:
# Export the Result 
tpot.export('tpot_iris_pipeline.py')

In [0]:
```python
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline, make_union
from tpot.builtins import StackingEstimator

# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=None)

# Average CV score on the training set was: 0.980952380952381
exported_pipeline = make_pipeline(
    StackingEstimator(estimator=MultinomialNB(alpha=1.0, fit_prior=False)),
    RandomForestClassifier(bootstrap=True, criterion="gini", max_features=1.0, min_samples_leaf=7, min_samples_split=16, n_estimators=100)
)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

```

In [0]:
# Thanks For Watching
# Jesus Saves @JCharisTech
# By Jesse E.Agbe(JCharis)