Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove AutoMLSearch self-reference #2304

Merged
merged 6 commits into from Jun 2, 2021
Merged

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented May 25, 2021

Pull Request Description

Fixes #2226

We see slightly lower memory usage at the peak (~500 mb) and the memory after the peak is lower and starts to decrease (line after peak has a negative slope as opposed to being constant).

Memory for all unit tests

unit_test_memory_main

unit_test_memory_no_circ_ref_in_plot

Memory for automl tests

automl-tests-main

2226-automl-tests-no-circ-ref

Plot still works for jupyter notebook

from evalml import AutoMLSearch
from evalml.automl.callbacks import raise_error_callback
import pandas as pd
X = pd.read_csv("/Users/freddy.boulton/Downloads/titanic_text.csv")
y = X.pop('Survived')
automl = AutoMLSearch(X, y, problem_type="binary", error_callback=raise_error_callback, max_batches=10, ensembling=True)
automl.search()

image


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented May 25, 2021

Codecov Report

Merging #2304 (a92ba22) into main (961584a) will decrease coverage by 0.1%.
The diff coverage is 96.3%.

❗ Current head a92ba22 differs from pull request most recent head ade7b07. Consider uploading reports for the commit ade7b07 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2304     +/-   ##
=======================================
- Coverage   99.9%   99.9%   -0.0%     
=======================================
  Files        281     281             
  Lines      24606   24608      +2     
=======================================
+ Hits       24578   24579      +1     
- Misses        28      29      +1     
Impacted Files Coverage Δ
.../automl_tests/test_automl_search_classification.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.9% <90.0%> (-0.1%) ⬇️
evalml/automl/pipeline_search_plots.py 100.0% <100.0%> (ø)
...l/tests/automl_tests/test_pipeline_search_plots.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 961584a...ade7b07. Read the comment docs.

if self.search_iteration_plot:
self.search_iteration_plot.update()
# True when running in a jupyter notebook, else the plot is an instance of plotly.Figure
if isinstance(self.search_iteration_plot, SearchIterationPlot):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need this check because the api of SearchIterationPlot.update and plotly.Figure update are now different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so this line was not covered to begin with right?

@freddyaboulton freddyaboulton marked this pull request as ready for review May 25, 2021 17:51
@dsherry
Copy link
Contributor

dsherry commented May 25, 2021

@freddyaboulton thanks for sharing these plots. If I am reading them correctly, is it right to say that fixing this leak results in a slight reduction in memory consumption? I see a drop in peak memory usage on the order of ~100MB.

@freddyaboulton
Copy link
Contributor Author

@dsherry Yes!

@@ -1,19 +1,18 @@
from evalml.utils import import_or_raise, jupyter_check


class SearchIterationPlot():
def __init__(self, data, show_plot=True):
class SearchIterationPlot:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we kept a reference to the latest results by passing a reference to AutoMLSearch. In order to not have a self-reference but still be able to access the latest results, I'm adding results as an argument to init and update. The plot property of AutoMLSearch will always pass the latest results to the plot.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the memory plots look great 😁

if self.search_iteration_plot:
self.search_iteration_plot.update()
# True when running in a jupyter notebook, else the plot is an instance of plotly.Figure
if isinstance(self.search_iteration_plot, SearchIterationPlot):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so this line was not covered to begin with right?

Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Good work, nice to see some solid memory usage results.

@freddyaboulton freddyaboulton force-pushed the 2226-automl-self-reference branch 2 times, most recently from ea91450 to eed5292 Compare June 1, 2021 22:09
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dsherry dsherry merged commit f01a4d0 into main Jun 2, 2021
@freddyaboulton freddyaboulton deleted the 2226-automl-self-reference branch June 3, 2021 14:02
@chukarsten chukarsten mentioned this pull request Jun 9, 2021
@chukarsten chukarsten mentioned this pull request Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoMLSearch makes a reference to itself
5 participants