Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Rework narrative of GBDT notebook #763

Merged
merged 8 commits into from
May 17, 2024
Merged

Conversation

ArturoAmorQ
Copy link
Collaborator

Originally I wanted to rework only the wording, as well explaining the GBDT algo is the cornerstone for understanding HGBT. But then I decided to factorize the code by introducing a helper function to keep the focus on the narrative other than the code.

@glemaitre glemaitre self-requested a review April 26, 2024 14:14
Copy link
Collaborator

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments.

python_scripts/ensemble_gradient_boosting.py Outdated Show resolved Hide resolved
python_scripts/ensemble_gradient_boosting.py Outdated Show resolved Hide resolved
python_scripts/ensemble_gradient_boosting.py Outdated Show resolved Hide resolved

# %%
import pandas as pd
import numpy as np

# Create a random number generator that will be used to set the randomness
rng = np.random.RandomState(0)
rng = np.random.RandomState(0) # Create a random number generator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the generator next to the data generation. We should avoid showing a pattern where people would use the generator across different function and estimators.
So the best practice is just to show it next to the data generation. we could even slightly change the code and have:

def generate_data(n_samples=50, seed=0):
    rng = np.random.default_rng(seed)
    x = rng.normal(size=(n_samples,)) * ...
    noise = rng.normal(size=(s_samples,)) * 0.3

using default_rng should be encourage nowadays.

python_scripts/ensemble_gradient_boosting.py Outdated Show resolved Hide resolved
python_scripts/ensemble_gradient_boosting.py Outdated Show resolved Hide resolved
python_scripts/ensemble_gradient_boosting.py Show resolved Hide resolved
python_scripts/ensemble_gradient_boosting.py Show resolved Hide resolved
python_scripts/ensemble_gradient_boosting.py Show resolved Hide resolved
python_scripts/ensemble_gradient_boosting.py Outdated Show resolved Hide resolved
@glemaitre glemaitre merged commit 31bfaaf into INRIA:main May 17, 2024
3 checks passed
github-actions bot pushed a commit that referenced this pull request May 17, 2024
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> 31bfaaf
@ArturoAmorQ ArturoAmorQ deleted the gbdt_wording branch May 17, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants