benchmark datasets (for examples) #50

mikkokotila · 2018-08-06T18:53:23Z

Would be good to have a few benchmarks datasets where we know the "gold standard" result, and then use Talos with a very broad starting boundary and using the reducer approach to show how fast it is to get to the "gold standard" result assuming poor knowledge of hyperparameters as a starting point.

@x94carbone do you have suggestions for such datasets?

matthewcarbone · 2018-08-06T19:56:24Z

I unfortunately do not. However, I know someone who might. Will get back to you 👍

matthewcarbone · 2018-08-23T16:05:00Z

Unfortunately, no luck. I think this is a very complicated problem and finding the local minima of the hyperparameter space, so to speak, is not trivial.

On the other hand, there is a way to get reproducible results during development using Keras. What we could do is agree on a "good enough" result as a benchmark, initialize the same way every time to ensure the answer is the same.

mikkokotila · 2018-08-23T16:08:27Z

I think so. Regarding "good enough" I think that's quite important here >> the goal of an experiment is to find "good enough" model and once a given number of those models is found, the experiment can proceed to the steps we had discussed in #40. For example, somebody training a model for sentiment classification...good enough really can't be more than 90% as even human coders will disagree at about that rate (10% disagreement). So the first step would be to find "good enough" depending on the use-case (so user will input) and then the next step is to work on finding the best of many options at the good enough level where best now means generalization.

matthewcarbone · 2018-08-23T16:13:10Z

For sure. And I think for the purposes of unit-checking Scan (I'm using the term "unit" quite flexibly here... Scan is many units!) I think this will be suitable.

I will consider this my next project here since I can do it more or less independently of Reporting. Are all the branches (dev, master, daily-dev) compatible right now and if not do you have a recommended branch to start working from?

Also I apologize for my lack of contributions as late. I've been completely slammed.

mikkokotila · 2018-08-24T20:00:59Z

@x94carbone Great :) I think we should merge dev with master at this point, to make sure all is in sync and then from master to daily-dev. I will create the pr.

mikkokotila · 2018-08-25T16:49:54Z

@x94carbone there was some issues in dev that had passed tests, but it's now fixed and dev and daily-dev are in sync and master has nothing new compared to dev.

matthewcarbone · 2018-08-30T19:34:15Z

Ok sounds good. Thanks for clarifying.

Gonna store this here for safekeeping so I know where to look when I have time to work on this.

Reproducible results with Keras.

mikkokotila added the question label Aug 6, 2018

matthewcarbone self-assigned this Aug 6, 2018

matthewcarbone added must have and removed question labels Aug 30, 2018

mikkokotila added the priority: ICE For now, this is going to be iced. label May 3, 2019

mikkokotila closed this as completed May 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark datasets (for examples) #50

benchmark datasets (for examples) #50

mikkokotila commented Aug 6, 2018

matthewcarbone commented Aug 6, 2018

matthewcarbone commented Aug 23, 2018

mikkokotila commented Aug 23, 2018

matthewcarbone commented Aug 23, 2018

mikkokotila commented Aug 24, 2018

mikkokotila commented Aug 25, 2018

matthewcarbone commented Aug 30, 2018

benchmark datasets (for examples) #50

benchmark datasets (for examples) #50

Comments

mikkokotila commented Aug 6, 2018

matthewcarbone commented Aug 6, 2018

matthewcarbone commented Aug 23, 2018

mikkokotila commented Aug 23, 2018

matthewcarbone commented Aug 23, 2018

mikkokotila commented Aug 24, 2018

mikkokotila commented Aug 25, 2018

matthewcarbone commented Aug 30, 2018