-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add files via upload #33
Conversation
Excellent job @olaidejoseph. Only that you forgot to lemmatize spacy_stop_words also. Since we are are lemmatizing our vocabulary via tokenizer function, all stop words need to be lemmatized also. 'become' and 'became' are two different words in spacy_stop_words. I will advice lemmatizing the stop words after compiling them. Good job overall...I will lemmatize all the stop words in our blended model. Reviewers can find the notebook here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Olaide, this was a huge one here... Thanks a lot.
I'm assuming this LSTM3_wine_review is your most recent notebook. There are a few clarifications I'd like to get.
- The three user_inputs you used, had the models been familiar with them earlier either during training or testing?
- It could have also been great if the second model the one of KerasClassifier(nlp_model) was made inclusive when doing your final top_5_variety testing on the user_inputs. The model looked really uniform across all cvs compared to the very first model which seemed to be overfitting.
Due to time, I'm not sure point two above will be feasible enough to try. If the answer to point one is NO, then I guess the third model (which you as well named model2), the one of train_test_split, is the best. I think we should proceed with it.
1. The description is not among the data provided, I got it online.
2. You are right, I should have used the Keras classifier model, I thought
it was the model I passed but it wasn't.
The train test split result was good.
There is no much difference, jolomi will change it when blending.
…On Thu, 12 Nov 2020, 08:40 Sharon Ibejih, ***@***.***> wrote:
***@***.**** commented on this pull request.
Hi Olaide, this was a huge one here... Thanks a lot.
I'm assuming this LSTM3_wine_review is your most recent notebook. There
are a few clarifications I'd like to get.
1. The three user_inputs you used, had the models been familiar with
them earlier either during training or testing?
2. It could have also been great if the second model the one of
KerasClassifier(nlp_model) was made inclusive when doing your final
top_5_variety testing on the user_inputs. The model looked really uniform
across all cvs compared to the very first model which seemed to be
overfitting.
Due to time, I'm not sure point two above will be feasible enough to try.
If the answer to point one is NO, then I guess the third model (which you
as well named model2), the one of train_test_split, is the best. I think we
should proceed with it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AM5NYOFXBTUQDYDHL3XHU3DSPOGNPANCNFSM4TSWCUOA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay Olaide. This is very good. Thanks!
Check your f1_score of your three notebooks...ypred is probabilities instead of the one hot encodings. Perhaps, that is why it is unusually higher than the accuracy. |
Also, move the for_fun notebook to the model folder. |
Okay, I will do that.
The first f1_score I calculated using the Keras classifier produces a
single result.
I used sparse categorical, so I only label encoded my variety column
(labels).
For the Keras classifier, model predict produces a single output not a
one-hot one.
For the keras, model.fit, since I used softmax, I will get the individual
probabilities of the 20 varieties.
With the help of np.argmax(), I was able to pick the position of the
highest probability, which is ordered in the same way as the label
encoding.
This explanation is based on the fun notebook.
…On Thu, 12 Nov 2020, 18:50 Jolomi Tosanwumi, ***@***.***> wrote:
Also, move the for_fun notebook to the model folder.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AM5NYOGVNSURD7L3VMPV4QDSPQN6FANCNFSM4TSWCUOA>
.
|
Yea @olaidejoseph. But if you look at this line of code in your notebooks... model was first fitted on the whole dataset with a validation split of 0.25, splitting into xtrain and xtest after that means some xtest has been seen by model during fitting. Specifically, model wasn't refitted on xtrain before calling the predict method that is why the testing f1score is unusually higher than the accuracy. check it out for correction. |
Hi guys, kindly review my work on LSTM.