**TPOT AutoML Implementation**

**Install Dependencies**

In [1]:
pip install tpot

Collecting tpot
  Downloading TPOT-0.12.2-py3-none-any.whl.metadata (2.0 kB)
Collecting deap>=1.2 (from tpot)
  Downloading deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting update-checker>=0.16 (from tpot)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Collecting stopit>=1.1.1 (from tpot)
  Downloading stopit-1.1.2.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading TPOT-0.12.2-py3-none-any.whl (87 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.4/87.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.4/135.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Building wheel

 **Loading & Splitting Data**

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Initialsing TPOT**

In [3]:
from tpot import TPOTClassifier

# Initialize TPOT
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2, random_state=42, cv=5, n_jobs=-1, config_dict='TPOT light')

**Training and Evaluation**

In [4]:
# Fit TPOT on training data
tpot.fit(X_train, y_train)

# Evaluate the best model on the test set
print(f"Test Accuracy: {tpot.score(X_test, y_test)}")

Optimization Progress:   0%|          | 0/120 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: 0.9583333333333334

Generation 2 - Current best internal CV score: 0.9583333333333334

Generation 3 - Current best internal CV score: 0.9583333333333334

Generation 4 - Current best internal CV score: 0.9583333333333334

Generation 5 - Current best internal CV score: 0.9666666666666668

Best pipeline: MultinomialNB(input_matrix, alpha=10.0, fit_prior=False)
Test Accuracy: 0.9666666666666667


**Exporting Best Pipeline**

In [5]:
# Export the best pipeline
tpot.export('best_pipeline.py')

**Using generated pipeline for prediction**

In [10]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = load_iris()
# Convert the Bunch object to a pandas DataFrame
tpot_data = pd.DataFrame(tpot_data.data, columns=tpot_data.feature_names)
tpot_data["target"] = data.target
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=42)

# Average CV score on the training set was: 0.9666666666666668
exported_pipeline = MultinomialNB(alpha=10.0, fit_prior=False)
# Fix random state in exported estimator
if hasattr(exported_pipeline, 'random_state'):
    setattr(exported_pipeline, 'random_state', 42)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

# Calculate and print the accuracy
accuracy = accuracy_score(testing_target, results)
print(f"Accuracy: {accuracy}")

Accuracy: 0.9736842105263158
