sample_sklearn_workflow

Sample sklearn workflow for a dataset with numerical and categorical data

I use the HistGradientBoostingClassifier model to classify the sklearn covtype dataset. This repo focuses mostly on selecting the optimal feature selection strategies and preprocessing steps using GridSearchCV. I didn't do much dataset exploration about the relationships between variables before developing the model, but the Titanic repo does have more of this type of work.

Each of the jupyter files are exploring one facet of the dataset. In the order I made them:

initial_exploration.ipynb
scaling_experiment.ipynb
categorical_feature_selection.ipynb
num_dim_reduction.ipynb
predict.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sample_sklearn_workflow

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
categorical_feature_selection.ipynb		categorical_feature_selection.ipynb
initial_exploration.ipynb		initial_exploration.ipynb
num_dim_reduction.ipynb		num_dim_reduction.ipynb
predict.ipynb		predict.ipynb
requirements.txt		requirements.txt
scaling_experiment.ipynb		scaling_experiment.ipynb

ChrisDarley/sample_sklearn_workflow

Folders and files

Latest commit

History

Repository files navigation

sample_sklearn_workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages