Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video showing how to select a range of results in a parallel coordinate plot #447

Closed
ArturoAmorQ opened this issue Aug 30, 2021 · 6 comments · Fixed by #493
Closed
Labels

Comments

@ArturoAmorQ
Copy link
Collaborator

ArturoAmorQ commented Aug 30, 2021

Following Guillaume's suggestion in this forum comment, a video after this notebook showing how to select a range of results in a parallel coordinate plot may suffice to complement the explanation on how to use this tool.

This may also improve the scores for Quiz M3.2 Q4 and Q5, which are both below 70%.

ogrisel added a commit to scikit-learn-inria-fondation/follow-up that referenced this issue Sep 7, 2021
## August 31th, 2021

### Gael

* TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig

### Olivier

- input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs
  - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`)
- reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444)
- next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release)
- next: assist Adrin for the release process
- next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432)
- next: come back to intel to write a technical roadmap for a possible collaboration

### Julien

 - Was on holidays
 - Planned week @ Nexedi, Lille, from September 13th to 17th
 - Reviewed PRs
     - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module
     - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE)
     - Others PRs prior to holidays
 - [`#20254`](scikit-learn/scikit-learn#20254)
     - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex
     - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs
     - Next: comparing against scipy's 
     - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help
- Next: I need to block time to study Cython code.

### Mathis
- `sklearn_benchmarks`
  - Adapting benchmark script to run on Margaret
  - Fix issue with profiling files too big to be deployed on Github Pages
  - Ensure deterministic benchmark results
  - Working on declarative pipeline specification
  - Next: run long HPO benchmarks on Margaret

### Arturo

- Finished MOOC!
- Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432))
    - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22)
    - currently working on expanding the notes up to 70%
- Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448)

### Jérémie
- back from holidays, catching up
- Mathis' benchmarks
- trying to find what's going on with ASV benchmarks
  (asv should display the versions of all build and runtime depndencies for each run)

### Guillaume

- back from holidays
- Next:
    - release with Adrin
    - check the PR and issue trackers

### TODO / Next

- Expand Loïc’s notes up to 70% (Arturo)
- Create presentation to discuss my experience doing the MOOC (Arturo)
- Help with the scikit-learn release (Olivier, Guillaume)
- HR: Jeremy's renewal, Chiara's replacement (Gael)
- Mathis's consulting gig (Olivier, Gael, Mathis)
@ArturoAmorQ ArturoAmorQ changed the title Add gif showing how to select a range of results in a parallel coordinate plot Add video showing how to select a range of results in a parallel coordinate plot Oct 4, 2021
@ogrisel
Copy link
Collaborator

ogrisel commented Oct 5, 2021

+1

If you write down the script I can give you a review before the recording. Otherwise we can review a draft video.

@ogrisel
Copy link
Collaborator

ogrisel commented Oct 5, 2021

I sorry you already posted the video on mattermost. I thought it was a new version of the previous one. Let me review it now.

@ogrisel
Copy link
Collaborator

ogrisel commented Oct 5, 2021

About the actual content of the video I find it nice and informative and should be good to address the feedbacks on the forum for this module quiz questions.

@ArturoAmorQ
Copy link
Collaborator Author

If you write down the script I can give you a review before the recording.

import pandas as pd

cv_results = pd.read_csv("../figures/randomized_search_results.csv",
                         index_col=0)
cv_results.head()
def shorten_param(param_name):
    if "__" in param_name:
        return param_name.rsplit("__", 1)[1]
    return param_name
(cv_results.rename(
    shorten_param, axis=1).sort_values("mean_test_score", ascending= False)).head()
import numpy as np
import plotly.express as px

fig = px.parallel_coordinates(
    cv_results.rename(shorten_param, axis=1).apply({
        "learning_rate": np.log10,
        "max_leaf_nodes": np.log2,
        "max_bins": np.log2,
        "min_samples_leaf": np.log10,
        "l2_regularization": np.log10,
        "mean_test_score": lambda x: x}),
    color="mean_test_score",
    color_continuous_scale=px.colors.sequential.Viridis,
)
fig.show()

@ogrisel
Copy link
Collaborator

ogrisel commented Oct 5, 2021

(cv_results.rename(
    shorten_param, axis=1).sort_values("mean_test_score", ascending= False)).head()

can be simplified to:

cv_results.rename(shorten_param, axis=1)

as the goal of this cell is just to demo the renaming function, not to analyze the resulting scores if I am not mistaken

@ArturoAmorQ
Copy link
Collaborator Author

as the goal of this cell is just to demo the renaming function, not to analyze the resulting scores if I am not mistaken

Indeed, you are right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants