Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script run squad fine-tunned #57

Closed
wants to merge 2 commits into from
Closed

Conversation

andrelmfarias
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@fmikaelian fmikaelian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrelmfarias It seems you created a branch from an old version of the develop branch. To avoid this you will need to git rebase your current branch. I usually always git pull the develop branch before creating a new branch or will rebase daily to make sure my new feature branch is up to date.

Copy link
Collaborator

@fmikaelian fmikaelian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a clarification because I am not sure: what is the purpose of run_squad_fine-tunned.py? Will it do both first fine-tune and second fine-tune? Or just the second one?

In any case, it would have been better to see the git difference between run_squad.py and run_squad_fine-tunned.py. Adding a new file won't show the change from the original run_squad.py and makes it more difficult to review what has changed.

To solve this, we can just print a diff between run_squad.py and run_squad_fine-tunned.py in this discussion and merge it with #58

@andrelmfarias
Copy link
Collaborator Author

@fmikaelian Yes, I did a mistake by not doing a git pull and not doing the changes in run_squad.py directly.

The idea of run_squad_fine-tunned.py was to do everything run_squad.py does plus the option of a second fine-tune using a model we already trained.

Do you think we should abort this idea and just copy+paste the vocabulary file in the same folder of the trained model instead, as you proposed previously?

@fmikaelian
Copy link
Collaborator

fmikaelian commented Mar 8, 2019

Just to let you know, I built a sklearn wrapper over run_squad.py.

For processing, we can do:

train_processor = BertProcessor(bert_model='bert-base-uncased', is_training=True)
train_examples, train_features = train_processor.fit_transform(X='data/train-v1.1.json')

We don't need the tokeniser file anymore, since we access the basic one directly: https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/reader/bertqa_sklearn.py#L798

For training and predict, we should now be able to do:

model = BertQA()
model.fit()
model.predict()

The advantage of this wrapper is that you can use the model in a notebook without having to do python run_squad.py --param. It is also flexible for second fine-tunings. See my method: https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/reader/bertqa_sklearn.py#L1002

The predict() method has a bug to be fixed but should work as expected (see #68)

@andrelmfarias
Copy link
Collaborator Author

Ok, I understand

Great!

@fmikaelian
Copy link
Collaborator

fmikaelian commented Mar 8, 2019

A good way to see if your work can be added is to perform a simple diff between your file run_squad_fine-tunned.py and run_squad.py. Can you paste it in this thread so we can discuss?

@fmikaelian
Copy link
Collaborator

fmikaelian commented Mar 8, 2019

Yesterday I runned this script (1st fine-tune on train SQuAD):

https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/pipeline/train.py#L30

It outputs a sklearn-like model object. I will add it to the releases.

Then, we should be able to use it for predict() like this:

https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/pipeline/predict.py#L30

@andrelmfarias
Copy link
Collaborator Author

A good way to see if your work can be added is to perform a simple diff between your file run_squad_fine-tunned.py and run_squad.py. Can you paste it in this thread so we can discuss?

Now that you implemented a wrapper that can access the Tokenizer directly, we won't have the error I had when using run_squad.py for a 2nd fine-tune anymore. It would be able to do the same thing run_squad_fine-tunned.py was able to do. Do you think we still will need to add my work?

@fmikaelian
Copy link
Collaborator

@andrelmfarias then I guess it is fine. We can move forward to validate the entire sklearn wrapper workflow 😄 . The board is up to date!

@fmikaelian
Copy link
Collaborator

I close this PR for now.

@fmikaelian fmikaelian closed this Mar 8, 2019
@fmikaelian fmikaelian deleted the script_run_squad branch March 8, 2019 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants