Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator() #40

Closed
mikkokotila opened this issue Jul 29, 2018 · 9 comments
Closed

Validator() #40

mikkokotila opened this issue Jul 29, 2018 · 9 comments
Assignees

Comments

@mikkokotila
Copy link
Contributor

Now that a lot of the issues are handled, I think the next big push is on putting the pieces together for the Validator() i.e what happens after Scan() and that leads to the information that is needed to train the production model finally.

I think that in #17 you had more or less nailed the outline of the approach, and I will follow that for now. We already have an objective measure for classification tasks in form of score_model, so I will focus on that use-case (class predictions) first.

@mikkokotila mikkokotila self-assigned this Jul 29, 2018
@matthewcarbone
Copy link
Collaborator

Sounds good! I'll try and think about this some more when I can and respond with anything I come up with 👍

@matthewcarbone
Copy link
Collaborator

matthewcarbone commented Aug 3, 2018

Ok so there are a couple things I've been thinking about, and a few layers of computational complexity to worry about. I'll elaborate.

We cannot really confidently choose the best HP data point based on Talos' current state. It certainly works but perhaps not to the degree some users would want. The reason is two-fold. Consider a data set and a train/validation/test split.

  • First (assume testing data is set aside) often times the validation loss / accuracy is highly dependent on the chunk of data chosen to be the validation set. Talos will currently only give a ballpark answer to the question of "which HP is best" for this reason. A statistical average over these different splits is often necessary. This is of course k-fold cross-validation and is definitely an option we should give the users to implement.
  • Second, I have also seen in my own work that even for a set HP point and a set k-fold, sometimes the random initializations of the weights has an impact on the answer. In my experience this more often occurs in CNN's but regardless it is still another layer of statistical averaging that should in principle be done before any conclusions can be drawn.

You see where I'm going with this! If Talos currently runs in O(N) time, where N is the number of HP permutations, then in a perfect world if we were to be sure about our choice of "best HP", we would need brute force O(NKL) time to do this, where K is the number of folds and L is the number of times you want to run statistical averages over the random initializations.

So I guess my comment on this discussion would be: should we focus on a Validator() yet, or should be try to be smarter about directing the flow of Scan() so that it finds the optimal HP sooner? Perhaps one of the search methods mentioned in previous issues? I dunno, @mikkokotila, but let's discuss before we start doing more work. 😄

By the way, this is where I implement hardware accelerators since it is much too slow on CPU's.

Google Colab is an amazing option for anyone who doesn't have access to a HPC cluster!

@mikkokotila
Copy link
Contributor Author

I agree with everything you say above. Let's figure out the optimization layer first, and then move to validating. With this in mind, I'm working on a major rehaul / refactoring of the codebase so that it is less anxiety inducing to make major changes. Scan() is already completely cleaned, the param handling is completely rebuilt, as is reductions. These would be the three things that play some role in the optimization aspect.

As you may have noted, in the initial architecture I've assumed an approach where:

  1. a scan is started after downsampling it (first reduction)
  2. there is the possibility to apply reducers as the scan progress
  3. each time a reducer is applied, the number of available permutations is lowered

The idea is that we could have many different strategies, which all take as input the results from the previous rounds (from the experiment log) and based on that input reduce the complexity of the rest of the experiment. This in effect happens by removing select items from self.param_log.

I've written an article which should be ready to publish in the next few days that goes a little bit deeper into the reasoning for this approach (as opposed to the approach where random/grid search is considered a taxonomically parallel to something like Bayesian).

Google Colab seems amazing, will definitely try it! :)

@mikkokotila
Copy link
Contributor Author

This is related to #17 where some additional comments can be found.

@matthewcarbone
Copy link
Collaborator

I agree with everything you say above. Let's figure out the optimization layer first, and then move to validating. With this in mind, I'm working on a major rehaul / refactoring of the codebase so that it is less anxiety inducing to make major changes. Scan() is already completely cleaned, the param handling is completely rebuilt, as is reductions.

That is awesome news. I stuck a TODO in Scan() for precisely this reason. Can't wait to see it!

I've written an article which should be ready to publish in the next few days that goes a little bit deeper into the reasoning for this approach (as opposed to the approach where random/grid search is considered a taxonomically parallel to something like Bayesian).

Fantastic! We may want to begin linking things in the wiki or something 👍

Google Colab seems amazing, will definitely try it! :)

Please do. It really lowers the barrier of entry into this kind of work which I feel is incredibly important to the scientific community. Lots of smart people out there who want to do ML but don't have the firepower to train deep networks. If you need any help with figuring it out feel free to email me or something. Took me a bit to figure it all out 😄, no need for both of us to waste time!

In any case, after your refactoring I'll reread anything and help you clean up. Then we can move forward!

@mikkokotila
Copy link
Contributor Author

This is a nice article (with a comprehensive collection) on the metrics topic

What’s WRONG with Metrics?

@mikkokotila
Copy link
Contributor Author

@x94carbone just a heads up that this is moving :) It seems that saving the model does not need any messing around with tf session / graph objects, but we can just save the model as json inside a list in the Scan() object, and the model weights in a separate list in the Scan() object. Then load the model from the json, and set its weights from the corresponding weights. This seems to be the same way as one would do it from a file. Very clean. I will start testing this now.

I will then move on to implementing a k-fold cross validation for "best model" which we can then use to build towards the discussion we've had i.e. several best models being cross validated or and competed against each other in some meaningful way against various sampling methods.

@mikkokotila
Copy link
Contributor Author

Well well. We have now implemented f1-score based kfold crossvalidation. If you look at /utils/predict the whole thing becomes quite apparent. The workflow is very simple. After you have concluded the experiment with Scan()

p = ta.Predict(s)
p.evaluate(x, y, average='macro')

Where 's' is the Scan object, and x and y is the cross-validation data. In this case it's multi-class (i.e. y dims > 1) so average is set to 'macro'.

TODO:

  • add data splitting facility that supports the workflow including both Scan and evaluation
  • add a proper Validation layer which takes several best models and does what Predict.evaluate is doing now

@mikkokotila
Copy link
Contributor Author

This is now available through Evaluate() and Autom8(). Closing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants