Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add golden master with artificial data #101

Merged
merged 26 commits into from
May 12, 2020

Conversation

MarcAntoineSchmidtQC
Copy link
Member

To add new "simple" golden master tests, you need to modify the gm_model_parameters dictionary by adding an entry. Then, run the file with python (python tests/sklearn_fork/test_golden_master.py). This will create the golden master data for the tests (stored in the golden_master folder).

Whenever pytest is called, it will compare the new result to the one stored.

@ElizabethSantorellaQC
Copy link
Contributor

Could the golden master tests be based off our existing benchmark suite?

@ElizabethSantorellaQC
Copy link
Contributor

Looks good to me!

@MarcAntoineSchmidtQC
Copy link
Member Author

@ElizabethSantorellaQC, sure! Do you want the public data that the benchmarks are using or the wrapper around the results, or both?

@ElizabethSantorellaQC
Copy link
Contributor

@MarcAntoineSchmidtQC I'm not sure. I'm not sure if including the benchmarks is a good idea at all, since we might want to change those and since they take a while to run. Just a thought.

@tbenthompson
Copy link
Collaborator

From a development efficiency standpoint, I really really like having the golden master testing integrated into the benchmarking suite. That way while I make optimizations or refactorings, I can just run one command to profile/benchmark and that command will also make sure I didn't introduce a bug.

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC changed the title [WIP] Add golden master with artificial data Add golden master with artificial data May 6, 2020
random_state=random_seed,
start_params="zero",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not the default of "guess"? that gives better performance in my experience

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got some of the problems diverging when I used "guess" but easily converging when starting at zero. I feel like it's not a robust feature. Maybe we can add a **kwargs to the benchmark function so that you can specify this type of things without changing the default like I did. Do you like that idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How recently did you try this? I improved the _guess_start_params function in one of the more recent commits to the offset branch. I'd be curious to hear for which problems guess_start_params is not producing good estimates (for #111 and maybe #112)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped this (now using "guess"), but the main issue is with gamma. see issue #113

regularization_strength=0.1,
)

with open(git_root(f"golden_master/benchmarks_gm/{Pn}.pkl"), "wb") as fh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could just store the intercept and coefficients, since that's all we're testing anyway? Then we could store in a more human-readable format, which I have a mild preference for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good idea! JSON here we come.

(I started by also testing n_iter but this is platform-dependent for some problems, so I dropped it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, any guess why n_iter would be platform-dependent?

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC marked this pull request as ready for review May 12, 2020 02:37
@@ -94,6 +94,7 @@ def execute_problem_library(
single_precision: bool = False,
print_diagnostics: bool = True,
regularization_strength: float = None,
model_kwargs: dict = {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is model_kwargs doing here? And should line 116 not be **model_kwargs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It let's you run a benchmark with arbitrary model arguments without changing the underlying bench_sklearn_fork() function.. For instance, something I did was to set model_kwargs = {'start_params': 'zero', 'solver': 'cd'}.

The original reason I kept it a dictionary and did not use the **kwargs is to allow us to have different kwargs in the future (e.g. a fit_kwargs). This is similar to https://seaborn.pydata.org/generated/seaborn.lmplot.html#seaborn.lmplot where you can set kwargs for the lineplot and the scatterplot separatly. But now I think this is not a good idea. Let me switch it to the **kwargs thing.

Copy link
Contributor

@ElizabethSantorellaQC ElizabethSantorellaQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC merged commit f64e196 into master May 12, 2020
@lbittarello lbittarello deleted the mschmidt_more_testing branch May 1, 2021 11:38
tbenthompson pushed a commit that referenced this pull request Oct 8, 2021
* adding a bunch of parametrization

* Moved Link and ExponentialDispersionModel classes to their own files

* refactoring

* removed util file

* added sparse test and OHE features

* removed old file

* overwrite golden master

* added gaussian

* getting ready to merge master

* golden master for benchmarks

* regularization strenght overwriting parameter

* changed test tolerance

* changed abs tolerance

* removed changes to distribution.py

* added offset

* switch to json and cleanup

* automating skipped problems

* switch from dict to **

* forgot to uncomment something

* removed regularization_strenght param because we can use **kwargs

* back to regularization_strenght. It was a bad idea.

Co-authored-by: Marc-Antoine Schmidt <mschmidt@wayfair.com>
Co-authored-by: Elizabeth Santorella <elizabeth.santorella@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants