-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add golden master with artificial data #101
Conversation
Could the golden master tests be based off our existing benchmark suite? |
Looks good to me! |
@ElizabethSantorellaQC, sure! Do you want the public data that the benchmarks are using or the wrapper around the results, or both? |
@MarcAntoineSchmidtQC I'm not sure. I'm not sure if including the benchmarks is a good idea at all, since we might want to change those and since they take a while to run. Just a thought. |
From a development efficiency standpoint, I really really like having the golden master testing integrated into the benchmarking suite. That way while I make optimizations or refactorings, I can just run one command to profile/benchmark and that command will also make sure I didn't introduce a bug. |
random_state=random_seed, | ||
start_params="zero", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not the default of "guess"? that gives better performance in my experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got some of the problems diverging when I used "guess" but easily converging when starting at zero. I feel like it's not a robust feature. Maybe we can add a **kwargs to the benchmark function so that you can specify this type of things without changing the default like I did. Do you like that idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped this (now using "guess"), but the main issue is with gamma. see issue #113
regularization_strength=0.1, | ||
) | ||
|
||
with open(git_root(f"golden_master/benchmarks_gm/{Pn}.pkl"), "wb") as fh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could just store the intercept and coefficients, since that's all we're testing anyway? Then we could store in a more human-readable format, which I have a mild preference for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good idea! JSON here we come.
(I started by also testing n_iter but this is platform-dependent for some problems, so I dropped it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, any guess why n_iter would be platform-dependent?
src/glm_benchmarks/main.py
Outdated
@@ -94,6 +94,7 @@ def execute_problem_library( | |||
single_precision: bool = False, | |||
print_diagnostics: bool = True, | |||
regularization_strength: float = None, | |||
model_kwargs: dict = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is model_kwargs doing here? And should line 116 not be **model_kwargs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It let's you run a benchmark with arbitrary model arguments without changing the underlying bench_sklearn_fork()
function.. For instance, something I did was to set model_kwargs = {'start_params': 'zero', 'solver': 'cd'}
.
The original reason I kept it a dictionary and did not use the **kwargs is to allow us to have different kwargs in the future (e.g. a fit_kwargs
). This is similar to https://seaborn.pydata.org/generated/seaborn.lmplot.html#seaborn.lmplot where you can set kwargs for the lineplot and the scatterplot separatly. But now I think this is not a good idea. Let me switch it to the **kwargs thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
* adding a bunch of parametrization * Moved Link and ExponentialDispersionModel classes to their own files * refactoring * removed util file * added sparse test and OHE features * removed old file * overwrite golden master * added gaussian * getting ready to merge master * golden master for benchmarks * regularization strenght overwriting parameter * changed test tolerance * changed abs tolerance * removed changes to distribution.py * added offset * switch to json and cleanup * automating skipped problems * switch from dict to ** * forgot to uncomment something * removed regularization_strenght param because we can use **kwargs * back to regularization_strenght. It was a bad idea. Co-authored-by: Marc-Antoine Schmidt <mschmidt@wayfair.com> Co-authored-by: Elizabeth Santorella <elizabeth.santorella@gmail.com>
To add new "simple" golden master tests, you need to modify the
gm_model_parameters
dictionary by adding an entry. Then, run the file with python (python tests/sklearn_fork/test_golden_master.py
). This will create the golden master data for the tests (stored in the golden_master folder).Whenever pytest is called, it will compare the new result to the one stored.