Script run squad fine-tunned #57

andrelmfarias · 2019-02-23T18:28:16Z

No description provided.

…nned model using its saved weights

fmikaelian

@andrelmfarias It seems you created a branch from an old version of the develop branch. To avoid this you will need to git rebase your current branch. I usually always git pull the develop branch before creating a new branch or will rebase daily to make sure my new feature branch is up to date.

fmikaelian

Just a clarification because I am not sure: what is the purpose of run_squad_fine-tunned.py? Will it do both first fine-tune and second fine-tune? Or just the second one?

In any case, it would have been better to see the git difference between run_squad.py and run_squad_fine-tunned.py. Adding a new file won't show the change from the original run_squad.py and makes it more difficult to review what has changed.

To solve this, we can just print a diff between run_squad.py and run_squad_fine-tunned.py in this discussion and merge it with #58

andrelmfarias · 2019-03-08T10:11:40Z

@fmikaelian Yes, I did a mistake by not doing a git pull and not doing the changes in run_squad.py directly.

The idea of run_squad_fine-tunned.py was to do everything run_squad.py does plus the option of a second fine-tune using a model we already trained.

Do you think we should abort this idea and just copy+paste the vocabulary file in the same folder of the trained model instead, as you proposed previously?

fmikaelian · 2019-03-08T10:22:01Z

Just to let you know, I built a sklearn wrapper over run_squad.py.

For processing, we can do:

train_processor = BertProcessor(bert_model='bert-base-uncased', is_training=True)
train_examples, train_features = train_processor.fit_transform(X='data/train-v1.1.json')

We don't need the tokeniser file anymore, since we access the basic one directly: https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/reader/bertqa_sklearn.py#L798

For training and predict, we should now be able to do:

model = BertQA()
model.fit()
model.predict()

The advantage of this wrapper is that you can use the model in a notebook without having to do python run_squad.py --param. It is also flexible for second fine-tunings. See my method: https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/reader/bertqa_sklearn.py#L1002

The predict() method has a bug to be fixed but should work as expected (see #68)

andrelmfarias · 2019-03-08T10:24:28Z

Ok, I understand

Great!

fmikaelian · 2019-03-08T10:24:40Z

A good way to see if your work can be added is to perform a simple diff between your file run_squad_fine-tunned.py and run_squad.py. Can you paste it in this thread so we can discuss?

fmikaelian · 2019-03-08T10:26:59Z

Yesterday I runned this script (1st fine-tune on train SQuAD):

https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/pipeline/train.py#L30

It outputs a sklearn-like model object. I will add it to the releases.

Then, we should be able to use it for predict() like this:

https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/pipeline/predict.py#L30

andrelmfarias · 2019-03-08T10:38:16Z

A good way to see if your work can be added is to perform a simple diff between your file run_squad_fine-tunned.py and run_squad.py. Can you paste it in this thread so we can discuss?

Now that you implemented a wrapper that can access the Tokenizer directly, we won't have the error I had when using run_squad.py for a 2nd fine-tune anymore. It would be able to do the same thing run_squad_fine-tunned.py was able to do. Do you think we still will need to add my work?

fmikaelian · 2019-03-08T10:40:02Z

@andrelmfarias then I guess it is fine. We can move forward to validate the entire sklearn wrapper workflow 😄 . The board is up to date!

fmikaelian · 2019-03-08T11:07:21Z

I close this PR for now.

andrelmfarias added 2 commits February 21, 2019 18:39

Changed url address for squad/evaluate-v1.1.py script #35

9f34bd5

Created a litle mmodified script for running a 2nd train in a fine-tu…

f47aa14

…nned model using its saved weights

fmikaelian requested changes Feb 24, 2019

View reviewed changes

fmikaelian added priority: high 2️⃣ status: incomplete 💁‍♂️ type: enhancement 💡 tag: backend 🏭 labels Feb 26, 2019

fmikaelian closed this Mar 8, 2019

fmikaelian deleted the script_run_squad branch March 8, 2019 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script run squad fine-tunned #57

Script run squad fine-tunned #57

andrelmfarias commented Feb 23, 2019

fmikaelian left a comment

fmikaelian left a comment •

edited

Loading

andrelmfarias commented Mar 8, 2019

fmikaelian commented Mar 8, 2019 •

edited

Loading

andrelmfarias commented Mar 8, 2019

fmikaelian commented Mar 8, 2019 •

edited

Loading

fmikaelian commented Mar 8, 2019 •

edited

Loading

andrelmfarias commented Mar 8, 2019

fmikaelian commented Mar 8, 2019

fmikaelian commented Mar 8, 2019

Script run squad fine-tunned #57

Script run squad fine-tunned #57

Conversation

andrelmfarias commented Feb 23, 2019

fmikaelian left a comment

Choose a reason for hiding this comment

fmikaelian left a comment • edited Loading

Choose a reason for hiding this comment

andrelmfarias commented Mar 8, 2019

fmikaelian commented Mar 8, 2019 • edited Loading

andrelmfarias commented Mar 8, 2019

fmikaelian commented Mar 8, 2019 • edited Loading

fmikaelian commented Mar 8, 2019 • edited Loading

andrelmfarias commented Mar 8, 2019

fmikaelian commented Mar 8, 2019

fmikaelian commented Mar 8, 2019

fmikaelian left a comment •

edited

Loading

fmikaelian commented Mar 8, 2019 •

edited

Loading

fmikaelian commented Mar 8, 2019 •

edited

Loading

fmikaelian commented Mar 8, 2019 •

edited

Loading