Add video showing how to select a range of results in a parallel coordinate plot #447

ArturoAmorQ · 2021-08-30T15:59:08Z

Following Guillaume's suggestion in this forum comment, a video after this notebook showing how to select a range of results in a parallel coordinate plot may suffice to complement the explanation on how to use this tool.

This may also improve the scores for Quiz M3.2 Q4 and Q5, which are both below 70%.

## August 31th, 2021 ### Gael * TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig ### Olivier - input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`) - reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444) - next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release) - next: assist Adrin for the release process - next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432) - next: come back to intel to write a technical roadmap for a possible collaboration ### Julien - Was on holidays - Planned week @ Nexedi, Lille, from September 13th to 17th - Reviewed PRs - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE) - Others PRs prior to holidays - [`#20254`](scikit-learn/scikit-learn#20254) - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs - Next: comparing against scipy's - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help - Next: I need to block time to study Cython code. ### Mathis - `sklearn_benchmarks` - Adapting benchmark script to run on Margaret - Fix issue with profiling files too big to be deployed on Github Pages - Ensure deterministic benchmark results - Working on declarative pipeline specification - Next: run long HPO benchmarks on Margaret ### Arturo - Finished MOOC! - Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432)) - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22) - currently working on expanding the notes up to 70% - Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448) ### Jérémie - back from holidays, catching up - Mathis' benchmarks - trying to find what's going on with ASV benchmarks (asv should display the versions of all build and runtime depndencies for each run) ### Guillaume - back from holidays - Next: - release with Adrin - check the PR and issue trackers ### TODO / Next - Expand Loïc’s notes up to 70% (Arturo) - Create presentation to discuss my experience doing the MOOC (Arturo) - Help with the scikit-learn release (Olivier, Guillaume) - HR: Jeremy's renewal, Chiara's replacement (Gael) - Mathis's consulting gig (Olivier, Gael, Mathis)

ogrisel · 2021-10-05T07:33:32Z

+1

If you write down the script I can give you a review before the recording. Otherwise we can review a draft video.

ogrisel · 2021-10-05T07:35:31Z

I sorry you already posted the video on mattermost. I thought it was a new version of the previous one. Let me review it now.

ogrisel · 2021-10-05T07:51:18Z

About the actual content of the video I find it nice and informative and should be good to address the feedbacks on the forum for this module quiz questions.

ArturoAmorQ · 2021-10-05T08:14:07Z

If you write down the script I can give you a review before the recording.

import pandas as pd

cv_results = pd.read_csv("../figures/randomized_search_results.csv",
                         index_col=0)

cv_results.head()

def shorten_param(param_name):
    if "__" in param_name:
        return param_name.rsplit("__", 1)[1]
    return param_name

(cv_results.rename(
    shorten_param, axis=1).sort_values("mean_test_score", ascending= False)).head()

import numpy as np
import plotly.express as px

fig = px.parallel_coordinates(
    cv_results.rename(shorten_param, axis=1).apply({
        "learning_rate": np.log10,
        "max_leaf_nodes": np.log2,
        "max_bins": np.log2,
        "min_samples_leaf": np.log10,
        "l2_regularization": np.log10,
        "mean_test_score": lambda x: x}),
    color="mean_test_score",
    color_continuous_scale=px.colors.sequential.Viridis,
)
fig.show()

ogrisel · 2021-10-05T14:50:31Z

(cv_results.rename(
    shorten_param, axis=1).sort_values("mean_test_score", ascending= False)).head()

can be simplified to:

cv_results.rename(shorten_param, axis=1)

as the goal of this cell is just to demo the renaming function, not to analyze the resulting scores if I am not mistaken

ArturoAmorQ · 2021-10-05T15:02:56Z

as the goal of this cell is just to demo the renaming function, not to analyze the resulting scores if I am not mistaken

Indeed, you are right!

ArturoAmorQ changed the title ~~Add gif showing how to select a range of results in a parallel coordinate plot~~ Add video showing how to select a range of results in a parallel coordinate plot Oct 4, 2021

ArturoAmorQ added the video label Oct 4, 2021

ArturoAmorQ mentioned this issue Nov 30, 2021

ENH Add python code for notebook used in the parallel coordinate plot video #493

Merged

ogrisel closed this as completed in #493 Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add video showing how to select a range of results in a parallel coordinate plot #447

Add video showing how to select a range of results in a parallel coordinate plot #447

ArturoAmorQ commented Aug 30, 2021 •

edited

ogrisel commented Oct 5, 2021

ogrisel commented Oct 5, 2021

ogrisel commented Oct 5, 2021

ArturoAmorQ commented Oct 5, 2021

ogrisel commented Oct 5, 2021

ArturoAmorQ commented Oct 5, 2021

Add video showing how to select a range of results in a parallel coordinate plot #447

Add video showing how to select a range of results in a parallel coordinate plot #447

Comments

ArturoAmorQ commented Aug 30, 2021 • edited

ogrisel commented Oct 5, 2021

ogrisel commented Oct 5, 2021

ogrisel commented Oct 5, 2021

ArturoAmorQ commented Oct 5, 2021

ogrisel commented Oct 5, 2021

ArturoAmorQ commented Oct 5, 2021

ArturoAmorQ commented Aug 30, 2021 •

edited