-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script run squad fine-tunned #57
Conversation
…nned model using its saved weights
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrelmfarias It seems you created a branch from an old version of the develop
branch. To avoid this you will need to git rebase
your current branch. I usually always git pull
the develop branch before creating a new branch or will rebase daily to make sure my new feature branch is up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a clarification because I am not sure: what is the purpose of run_squad_fine-tunned.py
? Will it do both first fine-tune and second fine-tune? Or just the second one?
In any case, it would have been better to see the git difference between run_squad.py
and run_squad_fine-tunned.py
. Adding a new file won't show the change from the original run_squad.py
and makes it more difficult to review what has changed.
To solve this, we can just print a diff between run_squad.py
and run_squad_fine-tunned.py
in this discussion and merge it with #58
@fmikaelian Yes, I did a mistake by not doing a git pull and not doing the changes in The idea of Do you think we should abort this idea and just copy+paste the vocabulary file in the same folder of the trained model instead, as you proposed previously? |
Just to let you know, I built a For processing, we can do: train_processor = BertProcessor(bert_model='bert-base-uncased', is_training=True)
train_examples, train_features = train_processor.fit_transform(X='data/train-v1.1.json') We don't need the tokeniser file anymore, since we access the basic one directly: https://github.com/fmikaelian/cdQA/blob/0ce83d2aacbb9a11b11eefeb32966350369566ce/cdqa/reader/bertqa_sklearn.py#L798 For training and predict, we should now be able to do: model = BertQA()
model.fit()
model.predict() The advantage of this wrapper is that you can use the model in a notebook without having to do The |
Ok, I understand Great! |
A good way to see if your work can be added is to perform a simple diff between your file |
Yesterday I runned this script (1st fine-tune on train SQuAD): It outputs a sklearn-like model object. I will add it to the releases. Then, we should be able to use it for |
Now that you implemented a wrapper that can access the Tokenizer directly, we won't have the error I had when using |
@andrelmfarias then I guess it is fine. We can move forward to validate the entire |
I close this PR for now. |
No description provided.