Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizing naive_bayes_classifier #1223

Merged
merged 5 commits into from
Jul 13, 2023

Conversation

afinit
Copy link
Collaborator

@afinit afinit commented Jul 6, 2023

Describe your changes

This is to help optimize the naive_bayes_classifier a bit. I think it is a little more readable as well.

The piece of this function that is taking the most time is the calculation of joint probabilities, so i focused on changes in that space. After several iterations, I settled on the approach here where we're grabbing the indices for the PDFs separately from the loops. We are now doing this once per function call instead of for every class and this allows us to use numpy index slicing and array multiplication.

I did all the testing on this using the Naive Bayes Tutorial notebook in the plantcv-binder repo.
After getting all of the changes in, I ran the original code 30x and the new code 30x to get runtimes in milliseconds. The screenshot is pulled from that notebook where we see a speedup of about 4x.

This example contains 4 classes that we're processing here, so I imagine this optimization hangs on number of classes.

image

Type of update
feature enhancement

Associated issues
#1158

For the reviewer
See this page for instructions on how to review the pull request.

  • PR functionality reviewed in a Jupyter Notebook
  • All tests pass
  • Test coverage remains 100%
  • Documentation tested
  • New documentation pages added to plantcv/mkdocs.yml
  • Changes to function input/output signatures added to updating.md
  • Code reviewed
  • PR approved

@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Merging #1223 (5cad126) into release-4.0 (17d8c37) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##           release-4.0     #1223   +/-   ##
=============================================
  Coverage       100.00%   100.00%           
=============================================
  Files              161       161           
  Lines             6957      6952    -5     
=============================================
- Hits              6957      6952    -5     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
plantcv/plantcv/naive_bayes_classifier.py 100.00% <100.00%> (ø)

@nfahlgren nfahlgren self-assigned this Jul 11, 2023
@nfahlgren nfahlgren removed their assignment Jul 11, 2023
@nfahlgren nfahlgren added this to Pull Requests in PlantCV4 via automation Jul 13, 2023
@nfahlgren nfahlgren added this to the PlantCV v4.0 milestone Jul 13, 2023
@nfahlgren nfahlgren merged commit 08bfb96 into release-4.0 Jul 13, 2023
5 of 6 checks passed
PlantCV4 automation moved this from Pull Requests to Done Jul 13, 2023
@nfahlgren nfahlgren deleted the release-4.0-speedup-naive-bayes branch July 13, 2023 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PlantCV4
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants