Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #33

Closed
wants to merge 5 commits into from
Closed

Conversation

olaidejoseph
Copy link
Collaborator

Hi guys, kindly review my work on LSTM.

@Jolomi-Tosanwumi
Copy link
Collaborator

Excellent job @olaidejoseph. Only that you forgot to lemmatize spacy_stop_words also. Since we are are lemmatizing our vocabulary via tokenizer function, all stop words need to be lemmatized also. 'become' and 'became' are two different words in spacy_stop_words. I will advice lemmatizing the stop words after compiling them.

Good job overall...I will lemmatize all the stop words in our blended model.

Reviewers can find the notebook here

@Jolomi-Tosanwumi Jolomi-Tosanwumi linked an issue Nov 12, 2020 that may be closed by this pull request
Copy link
Collaborator

@sharonibejih sharonibejih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Olaide, this was a huge one here... Thanks a lot.

I'm assuming this LSTM3_wine_review is your most recent notebook. There are a few clarifications I'd like to get.

  1. The three user_inputs you used, had the models been familiar with them earlier either during training or testing?
  2. It could have also been great if the second model the one of KerasClassifier(nlp_model) was made inclusive when doing your final top_5_variety testing on the user_inputs. The model looked really uniform across all cvs compared to the very first model which seemed to be overfitting.

Due to time, I'm not sure point two above will be feasible enough to try. If the answer to point one is NO, then I guess the third model (which you as well named model2), the one of train_test_split, is the best. I think we should proceed with it.

@olaidejoseph
Copy link
Collaborator Author

olaidejoseph commented Nov 12, 2020 via email

Copy link
Collaborator

@sharonibejih sharonibejih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay Olaide. This is very good. Thanks!

@Jolomi-Tosanwumi
Copy link
Collaborator

Jolomi-Tosanwumi commented Nov 12, 2020

Check your f1_score of your three notebooks...ypred is probabilities instead of the one hot encodings. Perhaps, that is why it is unusually higher than the accuracy.

@Jolomi-Tosanwumi
Copy link
Collaborator

Also, move the for_fun notebook to the model folder.

@olaidejoseph
Copy link
Collaborator Author

olaidejoseph commented Nov 12, 2020 via email

@Jolomi-Tosanwumi
Copy link
Collaborator

Jolomi-Tosanwumi commented Nov 13, 2020

Yea @olaidejoseph. But if you look at this line of code in your notebooks...
y_pred_test = model.predict(X_test)

model was first fitted on the whole dataset with a validation split of 0.25, splitting into xtrain and xtest after that means some xtest has been seen by model during fitting. Specifically, model wasn't refitted on xtrain before calling the predict method that is why the testing f1score is unusually higher than the accuracy.

check it out for correction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Building LSTM model
3 participants