New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve user method of seeing pipelines generated #1298
Comments
If anyone would like to give a look at this, it would require looking through how |
From #1224, how to access individual pipeline steps from autosklearn.pipeline.util import get_dataset
from autosklearn.classification import AutoSklearnClassifier
X_train, y_train, X_test, y_test = get_dataset('iris')
automodel = AutoSklearnClassifier(time_left_for_this_task=60)
automodel.fit(X_train, y_train)
# A list of pipelines with their weights [ (ensemble_weight, Pipeline) ]
models_with_weights = automodel.get_models_with_weights()
# Get the first model with it's weight
weight, model = models_with_weights[0]
# Note that these models and the following are sklearn compatible
# The steps in the models pipeline
# [
# ('data_preprocessing', DataPreprocessor),
# ('balancing', Balancing ),
# ('feature_preprocessor', FeaturePreprocessorChoice)
# ('classifier', ClassifierChoice)
# ]
models_steps = model.steps
# Get the ClassifierChoice wrapper
classifier_str, classifier = model.steps[-1]
# The autosklearn wrapped model
classifier = classifier.choice
print(type(classifier)) # autosklearn.pipeline.components.classification.random_forest.RandomForest
# The sklearn model
sklearn_classifier = classifier.estimator |
From #1206, how to access a specific model in the ensemble import sklearn
from sklearn import datasets
from autosklearn.classification import AutoSklearnClassifier
X, y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)
clf = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
clf.fit(X_train, y_train)
# Can find model id's from the `leaderboard()` function
wanted_model_id = ...
wanted_model = None
for (seed, model_id, budget), model in clf.automl_.models_.items():
if model_id == wanted_model_id:
wanted_model = model |
Hi @UserFindingSelf, Thanks for the kind words and wanting to contribute :) This looks like it would be quite useful for sure! I would be happy to help with a PR for this if you would like to share. The only thing I would change is perhaps using the In the future moving forward, where we would like easier access to models, the |
Thank you so much for your help! Your suggestion definitely makes more sense. Can you confirm if Thanks again! :) |
Yup that's the one! For context, the optimizer SMAC gives each "run" a number but for us, a "run" corresponds to a model configuration that is trained, hence it makes sense to present it as
|
Great! Just one more question. Is it okay to use |
So |
Hello @eddiebergman! I wasn't able to use Now, in line with the process described in Thank you! |
Hi @UserFindingSelf, Glad to hear it's working and even more glad you're running the tests! In general, on the github action servers where we run automated tests, they can take around 45 minutes (something we need to optimize). You can create a PR and I can schedule the tests to run on them, saving your machine from having to run them yourself! |
Awesome! Working on creating the PR then. |
Hey @eddiebergman! I have created the PR, please review whenever possible for you. I will make the necessary changes based on your suggestions then. |
Improved with #1321. Might need some improvements in the future but for now I'm closing this. |
After the #1321 improvement, would the updated code be like this? # Can find model id's from the `leaderboard()` function
wanted_model_id = clf.leaderboard()[clf.leaderboard()['rank'] == ...].index[0]
wanted_model = clf.show_models()[wanted_model_id] |
Please open a new issue @TomPham97 for any new questions. |
Currently, the easiest way for a user to see the pipelines included in the ensemble is through
estimator.show_models()
which just returns astr
which needs to be manually parsed and looked through. There could definitely be a nicer format to view any such pipeline and provide easy access.The text was updated successfully, but these errors were encountered: