-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: Group 4: Poisonous_Mushroom_Predictor #25
Comments
Data analysis review checklistReviewer: @Kingslin0810Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1 hourReview Comments:Overall, the project is well-executed, and the final report clearly states the objective, data used, methodology of carrying out the prediction as well as results and limitations. There are also a lot of references to the research that makes the case solid. Good job team!
AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: @GloriaWYYConflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1.5 hoursReview Comments:Impressive work is shown in this group project. The project repository is well-organized in a way that resources are properly named and can be easily accessed. The I would like to bring some issues to your attention to help you improve your project:
AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: @suuuuperNOVAConflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 30 minutesReview Comments:
AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: vtaskaev1Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1.5 hoursReview Comments:Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
From "In src/preprocessor.py, you fit and transform the entire training set, and then pass this model to cross_validation.py to assess the model's performance. This would potentially break the Golden Rule because your preprocessor learns information from the whole training set, which means during cross-validation, information leaks from cross-validation split. This can be solved by removing preprocessor.fit_transform(df1, df2); you do not need this because with your pipeline defined in cross_validation.py, cross-validation will be performed properly and automatically for you, and you do not need to manually transform the data beforehand." We commented the line of code |
All four reviewers have mentioned that our project is lacking of automation. It is because we haven't finished the We have now added the |
Thank you for the feedback. I will update this post as issues are addressed. Regarding points in comment 1 Regarding points in comment 2
Regarding checklist items from multiple comments
|
I thankfully acknowledge the reviewers for taking the time to comment. |
Submitting authors: @dol23asuka @Kylemaj @mahm00d27
Repository: https://github.com/UBC-MDS/Poisonous_Mushroom_Predictor
Report link: https://github.com/UBC-MDS/Poisonous_Mushroom_Predictor/blob/main/doc/Poisonous_Mushroom_Predictor_Report.md
Abstract/executive summary:
As mushrooms have distinctive characteristics which help in identifying whether they are poisons or edible. In this project we have built a logistic regression classification model which can use several morphological characteristics of mushrooms to predict whether an observed mushroom is toxic or edible (non-toxic). Exploratory data analysis revealed definite distinctions between our target classes, as well as highlighting several key patterns which could serve as strong predictors. On the test data set of 1,625 observations our model performed extremely well with a 99% recall score and a 100% precision score. The model correctly classified 863 edible and 761 toxic mushrooms. One false negative result was produced (toxic mushroom identified as non-toxic). In the context of this problem, a false negative could result in someone being seriously or even fatally poisoned. We must therefore be far more concerned with minimizing false negatives than false positives. Given this precedent, we may consider tuning the threshold of our model in order to minimize false negatives at the potential cost of increasing false positives. Moving forward, we would like to further optimize our model, investigating if we could potentially get similar performance with less features. Finally, we would like to evaluate how our model performs on real observations from the field rather than hypothetical data.
Editor: @flor14
Reviewers: Cui_Vera, Ye_Wanying, Taskaev_Vadim, Lv_Kingslin
The text was updated successfully, but these errors were encountered: