Validator() #40

mikkokotila · 2018-07-29T20:40:15Z

Now that a lot of the issues are handled, I think the next big push is on putting the pieces together for the Validator() i.e what happens after Scan() and that leads to the information that is needed to train the production model finally.

I think that in #17 you had more or less nailed the outline of the approach, and I will follow that for now. We already have an objective measure for classification tasks in form of score_model, so I will focus on that use-case (class predictions) first.

matthewcarbone · 2018-07-30T18:03:08Z

Sounds good! I'll try and think about this some more when I can and respond with anything I come up with 👍

matthewcarbone · 2018-08-03T19:48:41Z

Ok so there are a couple things I've been thinking about, and a few layers of computational complexity to worry about. I'll elaborate.

We cannot really confidently choose the best HP data point based on Talos' current state. It certainly works but perhaps not to the degree some users would want. The reason is two-fold. Consider a data set and a train/validation/test split.

First (assume testing data is set aside) often times the validation loss / accuracy is highly dependent on the chunk of data chosen to be the validation set. Talos will currently only give a ballpark answer to the question of "which HP is best" for this reason. A statistical average over these different splits is often necessary. This is of course k-fold cross-validation and is definitely an option we should give the users to implement.
Second, I have also seen in my own work that even for a set HP point and a set k-fold, sometimes the random initializations of the weights has an impact on the answer. In my experience this more often occurs in CNN's but regardless it is still another layer of statistical averaging that should in principle be done before any conclusions can be drawn.

You see where I'm going with this! If Talos currently runs in O(N) time, where N is the number of HP permutations, then in a perfect world if we were to be sure about our choice of "best HP", we would need brute force O(NKL) time to do this, where K is the number of folds and L is the number of times you want to run statistical averages over the random initializations.

So I guess my comment on this discussion would be: should we focus on a Validator() yet, or should be try to be smarter about directing the flow of Scan() so that it finds the optimal HP sooner? Perhaps one of the search methods mentioned in previous issues? I dunno, @mikkokotila, but let's discuss before we start doing more work. 😄

By the way, this is where I implement hardware accelerators since it is much too slow on CPU's.

Google Colab is an amazing option for anyone who doesn't have access to a HPC cluster!

mikkokotila · 2018-08-05T16:54:06Z

I agree with everything you say above. Let's figure out the optimization layer first, and then move to validating. With this in mind, I'm working on a major rehaul / refactoring of the codebase so that it is less anxiety inducing to make major changes. Scan() is already completely cleaned, the param handling is completely rebuilt, as is reductions. These would be the three things that play some role in the optimization aspect.

As you may have noted, in the initial architecture I've assumed an approach where:

a scan is started after downsampling it (first reduction)
there is the possibility to apply reducers as the scan progress
each time a reducer is applied, the number of available permutations is lowered

The idea is that we could have many different strategies, which all take as input the results from the previous rounds (from the experiment log) and based on that input reduce the complexity of the rest of the experiment. This in effect happens by removing select items from self.param_log.

I've written an article which should be ready to publish in the next few days that goes a little bit deeper into the reasoning for this approach (as opposed to the approach where random/grid search is considered a taxonomically parallel to something like Bayesian).

Google Colab seems amazing, will definitely try it! :)

mikkokotila · 2018-08-05T17:05:31Z

This is related to #17 where some additional comments can be found.

matthewcarbone · 2018-08-05T18:00:13Z

I agree with everything you say above. Let's figure out the optimization layer first, and then move to validating. With this in mind, I'm working on a major rehaul / refactoring of the codebase so that it is less anxiety inducing to make major changes. Scan() is already completely cleaned, the param handling is completely rebuilt, as is reductions.

That is awesome news. I stuck a TODO in Scan() for precisely this reason. Can't wait to see it!

I've written an article which should be ready to publish in the next few days that goes a little bit deeper into the reasoning for this approach (as opposed to the approach where random/grid search is considered a taxonomically parallel to something like Bayesian).

Fantastic! We may want to begin linking things in the wiki or something 👍

Google Colab seems amazing, will definitely try it! :)

Please do. It really lowers the barrier of entry into this kind of work which I feel is incredibly important to the scientific community. Lots of smart people out there who want to do ML but don't have the firepower to train deep networks. If you need any help with figuring it out feel free to email me or something. Took me a bit to figure it all out 😄, no need for both of us to waste time!

In any case, after your refactoring I'll reread anything and help you clean up. Then we can move forward!

mikkokotila · 2018-09-04T21:38:37Z

This is a nice article (with a comprehensive collection) on the metrics topic

What’s WRONG with Metrics?

mikkokotila · 2018-10-01T19:56:21Z

@x94carbone just a heads up that this is moving :) It seems that saving the model does not need any messing around with tf session / graph objects, but we can just save the model as json inside a list in the Scan() object, and the model weights in a separate list in the Scan() object. Then load the model from the json, and set its weights from the corresponding weights. This seems to be the same way as one would do it from a file. Very clean. I will start testing this now.

I will then move on to implementing a k-fold cross validation for "best model" which we can then use to build towards the discussion we've had i.e. several best models being cross validated or and competed against each other in some meaningful way against various sampling methods.

mikkokotila · 2018-10-01T21:17:03Z

Well well. We have now implemented f1-score based kfold crossvalidation. If you look at /utils/predict the whole thing becomes quite apparent. The workflow is very simple. After you have concluded the experiment with Scan()

p = ta.Predict(s)
p.evaluate(x, y, average='macro')

Where 's' is the Scan object, and x and y is the cross-validation data. In this case it's multi-class (i.e. y dims > 1) so average is set to 'macro'.

TODO:

add data splitting facility that supports the workflow including both Scan and evaluation
add a proper Validation layer which takes several best models and does what Predict.evaluate is doing now

mikkokotila · 2019-01-01T11:01:58Z

This is now available through Evaluate() and Autom8(). Closing here.

mikkokotila added the must have label Jul 29, 2018

mikkokotila self-assigned this Jul 29, 2018

mikkokotila mentioned this issue Aug 5, 2018

Best model is not saved properly #17

Closed

This was referenced Aug 5, 2018

adding evolutionary algorithm optimizer #18

Closed

Abstract interface to optimisation (suggestion) #45

Closed

matthewcarbone mentioned this issue Aug 16, 2018

How to use f1 measure for "best" model? #56

Closed

mikkokotila mentioned this issue Aug 23, 2018

benchmark datasets (for examples) #50

Closed

mikkokotila closed this as completed Jan 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validator() #40

Validator() #40

mikkokotila commented Jul 29, 2018

matthewcarbone commented Jul 30, 2018

matthewcarbone commented Aug 3, 2018 •

edited

mikkokotila commented Aug 5, 2018

mikkokotila commented Aug 5, 2018

matthewcarbone commented Aug 5, 2018

mikkokotila commented Sep 4, 2018

mikkokotila commented Oct 1, 2018

mikkokotila commented Oct 1, 2018

mikkokotila commented Jan 1, 2019

Validator() #40

Validator() #40

Comments

mikkokotila commented Jul 29, 2018

matthewcarbone commented Jul 30, 2018

matthewcarbone commented Aug 3, 2018 • edited

mikkokotila commented Aug 5, 2018

mikkokotila commented Aug 5, 2018

matthewcarbone commented Aug 5, 2018

mikkokotila commented Sep 4, 2018

mikkokotila commented Oct 1, 2018

mikkokotila commented Oct 1, 2018

mikkokotila commented Jan 1, 2019

matthewcarbone commented Aug 3, 2018 •

edited