Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

How about this group-train subcommand #2579

Closed
Whu-wxy opened this issue Mar 8, 2019 · 1 comment
Closed

How about this group-train subcommand #2579

Whu-wxy opened this issue Mar 8, 2019 · 1 comment

Comments

@Whu-wxy
Copy link

Whu-wxy commented Mar 8, 2019

We often need to experiment with different configurations. If AllenNLP can automatically perform several training tasks in turn, it will save us some time. Yesterday I added a subcommand 'group-train', which is a wrapper of the train command ,and did a simple test, I found it’s very useful..

Command execution process
I just use allennlp-as-a-library-example for test.
experiments dir: venue_classifier.json, venue_classifier_boe.json, venue_classifier_boe_adam
Input: allennlp group-train /home/wxy/PycharmProjects/allennlp-as-a-library/experiments -s ./group_save --include-package my_library

process:
1.Before the training begins, create a training_progress.json file that will be used to record configuration files and whether its training has been completed.
{"venue_classifier_boe_adam": false, "venue_classifier.json": false, "venue_classifier_boe.json": false}

2.Use a ’for‘ loop to train each json file in turn.

3.Modify the training_progress.json file after a training task is completed.
{"venue_classifier_boe_adam": true, "venue_classifier.json": false, "venue_classifier_boe.json": false}

4.If the training is interrupted somewhere, it recovers training at the first 'false' according to the training_progress.json file and the training in the back is normal.

5.After all the training is completed, there are three dirs in the serialization_dir :ven_classifier,venue_classifier_boe,venue_classifier_boe_adam. Their names correspond to the configuration files. And training_progress.json file:
{"venue_classifier_boe_adam": true, "venue_classifier.json": true, "venue_classifier_boe.json": true}

Additional context
1.Because I am not familiar with the test module, I tested my code with allennlp-as-a-library-example.

2.If the training is interrupted due to an abnormality, the subsequent training will not continue, so skipping the abnormal training is a better choice. But I don't know how to do this.

3.This code I uploaded to the allennlp repository I forked down.

Do you have any suggestions for my code?
@matt-gardner

@matt-gardner
Copy link
Contributor

I'm glad you found a good way to do this that works well for you! I think there are a lot of different ways to solve this basic problem, and we so far have taken the position that these kinds of scripts or commands should be outside the main library. Maybe some day we'll put something in here, but because there are so many different setups a person could have (a single GPU, multiple CPUs, multiple GPUs, some cloud infrastructure, a cluster with a job queue...), it's hard to make a general solution. I think the right thing to do is leave a link to your solution, like you did, so that others with a similar situation can use it if they find it useful.

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants