Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to print the second best pipeline? #1229

Open
m-alshehri opened this issue Sep 27, 2021 · 1 comment
Open

how to print the second best pipeline? #1229

m-alshehri opened this issue Sep 27, 2021 · 1 comment

Comments

@m-alshehri
Copy link

m-alshehri commented Sep 27, 2021

Hello,
I was just wondering if there's any way to print out the confusion matrix, classification report and the pipeline for the second-best pipeline?

the model now is printing the best pipeline as below but would also like to print the second-best pipeline.

model = TPOTClassifier(generations=10, scoring='balanced_accuracy', verbosity=2)
model.fit(X_train, y_train)
Optimization Progress: 48%
530/1100 [2:07:27<3:41:41, 23.34s/pipeline]

Generation 1 - Current best internal CV score: 0.8820838802533277
Generation 2 - Current best internal CV score: 0.8828284663262757
Generation 3 - Current best internal CV score: 0.8828284663262757
Generation 4 - Current best internal CV score: 0.8842320902149032
Generation 5 - Current best internal CV score: 0.8842320902149032
Generation 6 - Current best internal CV score: 0.8842320902149032
Generation 7 - Current best internal CV score: 0.8842320902149032
Generation 8 - Current best internal CV score: 0.8842320902149032
Generation 9 - Current best internal CV score: 0.8842320902149032
Generation 10 - Current best internal CV score: 0.8842320902149032
Best pipeline: BernoulliNB(KNeighborsClassifier(input_matrix, n_neighbors=41, p=1, weights=uniform), alpha=0.01, fit_prior=True)
TPOTClassifier(config_dict=None, crossover_rate=0.1, cv=5,
               disable_update_check=False, early_stop=None, generations=10,
               log_file=None, max_eval_time_mins=5, max_time_mins=None,
               memory=None, mutation_rate=0.9, n_jobs=1, offspring_size=None,
               periodic_checkpoint_folder=None, population_size=100,
               random_state=None, scoring='balanced_accuracy', subsample=1.0,
               template=None, use_dask=False, verbosity=2, warm_start=False)
Acc.: 0.8521771865980675
              precision    recall  f1-score   support

           0       1.00      0.85      0.92      7902
           1       0.05      0.97      0.10        67

    accuracy                           0.85      7969
   macro avg       0.53      0.91      0.51      7969
weighted avg       0.99      0.85      0.91      7969
Confusion Matrix:
[[6726 1176]
 [   2   65]]

Apologies if this was previously asked but searching Second Best returned nothing

Thanks,
m-alshehri

@wayneking517
Copy link

Give this a try:

my_dict = list(tpot.evaluated_individuals_.items())

model_scores = pd.DataFrame()
for model in my_dict:
    model_name = model[0]
    model_info = model[1]
    cv_score = model[1].get('internal_cv_score')  # Pull out cv_score as a column (i.e., sortable)
    model_scores = model_scores.append({'model': model_name,
                                        'cv_score': cv_score,
                                        'model_info': model_info,},
                                       ignore_index=True)

model_scores = model_scores.sort_values('cv_score', ascending=False)
top_models = model_scores.iloc[0:5,:]
top_models.to_csv('top_models.csv', index = False)

See https://github.com/EpistasisLab/tpot/issues/703

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants