Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark datasets (for examples) #50

Closed
mikkokotila opened this issue Aug 6, 2018 · 7 comments
Closed

benchmark datasets (for examples) #50

mikkokotila opened this issue Aug 6, 2018 · 7 comments
Assignees
Labels
priority: ICE For now, this is going to be iced.

Comments

@mikkokotila
Copy link
Contributor

Would be good to have a few benchmarks datasets where we know the "gold standard" result, and then use Talos with a very broad starting boundary and using the reducer approach to show how fast it is to get to the "gold standard" result assuming poor knowledge of hyperparameters as a starting point.

@x94carbone do you have suggestions for such datasets?

@matthewcarbone matthewcarbone self-assigned this Aug 6, 2018
@matthewcarbone
Copy link
Collaborator

I unfortunately do not. However, I know someone who might. Will get back to you 👍

@matthewcarbone
Copy link
Collaborator

Unfortunately, no luck. I think this is a very complicated problem and finding the local minima of the hyperparameter space, so to speak, is not trivial.

On the other hand, there is a way to get reproducible results during development using Keras. What we could do is agree on a "good enough" result as a benchmark, initialize the same way every time to ensure the answer is the same.

@mikkokotila
Copy link
Contributor Author

I think so. Regarding "good enough" I think that's quite important here >> the goal of an experiment is to find "good enough" model and once a given number of those models is found, the experiment can proceed to the steps we had discussed in #40. For example, somebody training a model for sentiment classification...good enough really can't be more than 90% as even human coders will disagree at about that rate (10% disagreement). So the first step would be to find "good enough" depending on the use-case (so user will input) and then the next step is to work on finding the best of many options at the good enough level where best now means generalization.

@matthewcarbone
Copy link
Collaborator

For sure. And I think for the purposes of unit-checking Scan (I'm using the term "unit" quite flexibly here... Scan is many units!) I think this will be suitable.

I will consider this my next project here since I can do it more or less independently of Reporting. Are all the branches (dev, master, daily-dev) compatible right now and if not do you have a recommended branch to start working from?

Also I apologize for my lack of contributions as late. I've been completely slammed.

@mikkokotila
Copy link
Contributor Author

@x94carbone Great :) I think we should merge dev with master at this point, to make sure all is in sync and then from master to daily-dev. I will create the pr.

@mikkokotila
Copy link
Contributor Author

@x94carbone there was some issues in dev that had passed tests, but it's now fixed and dev and daily-dev are in sync and master has nothing new compared to dev.

@matthewcarbone
Copy link
Collaborator

Ok sounds good. Thanks for clarifying.

Gonna store this here for safekeeping so I know where to look when I have time to work on this.

Reproducible results with Keras.

@mikkokotila mikkokotila added the priority: ICE For now, this is going to be iced. label May 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: ICE For now, this is going to be iced.
Projects
None yet
Development

No branches or pull requests

2 participants