**TPOT AutoML Implementation**

**Installing Dependencies**

In [None]:
pip install tpot

Collecting tpot
  Downloading TPOT-0.12.2-py3-none-any.whl.metadata (2.0 kB)
Collecting deap>=1.2 (from tpot)
  Downloading deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting update-checker>=0.16 (from tpot)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Collecting stopit>=1.1.1 (from tpot)
  Downloading stopit-1.1.2.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading TPOT-0.12.2-py3-none-any.whl (87 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.4/87.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.4/135.4 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Building wheel

**Load and Prepare Dataset**



In [None]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset (Iris dataset for classification)
data = load_iris()
X = data.data  # Features
y = data.target  # Target

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


**Configure and Run TPOT**

In [None]:
from tpot import TPOTClassifier

# Initialize TPOTClassifier with configuration
tpot = TPOTClassifier(
    generations=5,        # Number of generations (iterations of genetic search)
    population_size=20,    # Population size for each generation
    verbosity=2,           # Display progress details
    random_state=42,       # For reproducibility
    cv=5,                  # Cross-validation folds
    n_jobs=-1,
    config_dict='TPOT sparse'
)

# Train the TPOT AutoML model on the training data
tpot.fit(X_train, y_train)

# Evaluate the performance on the test set
print(f"Test Accuracy: {tpot.score(X_test, y_test)}")


Optimization Progress:   0%|          | 0/120 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: 0.9583333333333334

Generation 2 - Current best internal CV score: 0.9666666666666668

Generation 3 - Current best internal CV score: 0.9666666666666668

Generation 4 - Current best internal CV score: 0.9666666666666668

Generation 5 - Current best internal CV score: 0.9666666666666668

Best pipeline: MultinomialNB(input_matrix, alpha=10.0, fit_prior=False)
Test Accuracy: 0.9666666666666667


**Export the Best Model Pipeline**

In [12]:
# Export the best model pipeline to a Python file
tpot.export('best_model_pipeline.py')

print("The best pipeline has been exported to 'best_model_pipeline.py'")


The best pipeline has been exported to 'best_model_pipeline.py'


**Analyze the Exported Pipeline**

In [18]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score # Import accuracy_score

# Assuming 'data' is a sklearn.utils.Bunch object
# Convert the Bunch object to a pandas DataFrame
tpot_data = pd.DataFrame(data.data, columns=data.feature_names)
tpot_data['target'] = data.target # Add the target variable to the DataFrame

# Now you can use the drop method
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=42)

# Average CV score on the training set was: 0.9666666666666668
exported_pipeline = MultinomialNB(alpha=10.0, fit_prior=False)
# Fix random state in exported estimator
if hasattr(exported_pipeline, 'random_state'):
    setattr(exported_pipeline, 'random_state', 42)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

# Calculate and print the accuracy
accuracy = accuracy_score(testing_target, results)
print(f"Accuracy: {accuracy}")

Accuracy: 0.9736842105263158
