Show interactive iteration vs. score plot when using fit() #134

christopherbunn · 2019-10-16T18:31:31Z

Resolves #78

kmax12 · 2019-10-17T22:08:53Z

@christopherbunn this should show best score by iteration. so it should be either monotonically increasing or decreasing

christopherbunn · 2019-10-18T19:42:12Z

Updated figure in original comment to show current graph

codecov · 2019-10-18T19:53:10Z

Codecov Report

Merging #134 into master will increase coverage by 0.06%.
The diff coverage is 97.74%.

@@            Coverage Diff             @@
##           master     #134      +/-   ##
==========================================
+ Coverage   97.08%   97.14%   +0.06%     
==========================================
  Files          95       95              
  Lines        2742     2872     +130     
==========================================
+ Hits         2662     2790     +128     
- Misses         80       82       +2

Impacted Files	Coverage Δ
evalml/tests/automl_tests/test_autobase.py	`100% <ø> (ø)`	⬆️
evalml/models/auto_base.py	`93.93% <100%> (+0.21%)`	⬆️
evalml/tests/automl_tests/test_autoclassifier.py	`100% <100%> (ø)`	⬆️
...l/tests/automl_tests/test_pipeline_search_plots.py	`97.24% <100%> (+0.54%)`	⬆️
evalml/tests/automl_tests/test_autoregressor.py	`100% <100%> (ø)`	⬆️
evalml/models/pipeline_search_plots.py	`97.6% <93.18%> (-2.41%)`	⬇️
evalml/models/auto_regressor.py	`100% <0%> (+9.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f24a55...3eb4736. Read the comment docs.

kmax12 · 2019-10-21T14:18:14Z

evalml/models/auto_base.py

@@ -127,7 +148,10 @@ def fit(self, X, y, feature_types=None, raise_errors=False):
                self.logger.log("\n\nMax time elapsed. Stopping search early.")
                break
            self._do_iteration(X, y, pbar, raise_errors)
-
+            if plot_iterations:
+                new_score = self.rankings['score'].max()


does this work for metrics where lower is better?

kmax12 · 2019-10-21T14:18:28Z

evalml/models/auto_base.py

        Returns:

            self
        """
+        def update_plot(fig, ax, iter_scores):


let's make sure that there is a function that can be called to get this information if another program wanted to be able to generate this plot.

we should then write a test case for that function in this PR

…ter-plot

evalml/models/auto_base.py

kmax12 · 2019-11-05T15:48:09Z

evalml/models/auto_base.py

@@ -328,6 +342,31 @@ def describe_pipeline(self, pipeline_id, return_dict=False):
        if return_dict:
            return pipeline_results

+    def plot_best_score_by_iteration(self, interactive_plot=False):


is it possible to also plot a grey dot for what the current iteration scored. essentially have a scatter plot of scores by iteration and a line ontop showing what the best score was by iteration

evalml/models/auto_base.py

…ter-plot

…Labs/evalml into perf-by-iter-plot

jeremyliweishih

Maybe we can add tests asserting the graph logic (always decreasing or increasing) or other functional tests but otherwise looks good to me.

evalml/models/auto_base.py

kmax12 · 2019-12-06T23:18:31Z

evalml/models/pipeline_search_plots.py

+        title = 'Pipeline Search: Iteration vs. {}'.format(self.data.objective.name)
+        data = go.Scatter(x=iter_numbers, y=self.iteration_scores, mode='lines+markers')
+        layout = dict(title=title, xaxis_title='Iteration', yaxis_title='Score')
+        self.best_score_by_iter_fig = go.FigureWidget(data, layout)


is it possible to make sure x-axis is integers greater than 0? doesnt make sense to have decimals for an iteration or -1 (show -1 on first iteration)

It's possible to set the plot so that it only shows positive values. However, the initial point ends up being cut off.

During search:

At the end:

Compared to the original end plot:

Is it worth having it cutoff?

kmax12 · 2019-12-06T23:21:42Z

evalml/models/pipeline_search_plots.py

+        title = 'Pipeline Search: Iteration vs. {}'.format(self.data.objective.name)
+        data = go.Scatter(x=iter_numbers, y=self.iteration_scores, mode='lines+markers')
+        layout = dict(title=title, xaxis_title='Iteration', yaxis_title='Score')
+        self.best_score_by_iter_fig = go.FigureWidget(data, layout)


what if we had this method return a class SearchIterationPlot

then the caller who creates it could call SearchIterationPlot.update()?

i think this is better because add_iteration_score method is a bit confusing out of context and can really only be called if someone calls best_score_by_iteration first

evalml/models/auto_base.py

kmax12 · 2019-12-06T23:29:48Z

evalml/models/pipeline_search_plots.py

+    def best_score_by_iteration(self):
+        iter_numbers = list(range(len(self.iteration_scores)))
+        title = 'Pipeline Search: Iteration vs. {}'.format(self.data.objective.name)
+        data = go.Scatter(x=iter_numbers, y=self.iteration_scores, mode='lines+markers')


is it possible to also add a light grey dot for the score of every iteration overlaid on this plot? we don't need to do it in this PR unless it's quick. probably makes sense to just create a new issue for it.

that can be help for users to see what the performance of each successive attempt was yielding

~~Added dot to show current position:~~

Updated to show dot for every single iteration:

christopherbunn · 2019-12-10T14:05:22Z

@kmax12 It appears that the random plot resizing you were seeing is a chrome specific issue since I was unable to replicate it in safari. I can fix it by turning off the integer-only formatting and manually setting the tick values for every iteration, but then for large runs the axis quickly becomes pretty cluttered.

kmax12 · 2019-12-09T23:11:08Z

evalml/models/auto_base.py

            while time.time() - start <= self.max_time:
-                self._do_iteration(X, y, pbar, raise_errors)
+                self._do_iteration(X, y, pbar, raise_errors, plot)


any reason to not just call plot.update after the self._do_iteration call? might be cleaner not to pass

I think I wanted to avoid duplicating this call in fit(), but it's probably better to call it twice than to pass it through _do_iteration.

kmax12 · 2019-12-10T18:28:22Z

evalml/models/pipeline_search_plots.py

+        self.best_score_by_iter_fig.update_layout(showlegend=False)
+
+    def update(self):
+        iter_idx = self.data.rankings['id'].idxmax()


is this assuming the id can be used to determine pipeline order? that isn't a safe assumption. we recently updated the structure of the the results to make this info assessable in a reliable way

see #260

Updated to use self.data.results['pipeline_results'] instead

kmax12 · 2019-12-10T18:29:39Z

evalml/models/pipeline_search_plots.py

+            plot
+        """
+
+        if hasattr(self, 'iter_plot'):


i think the logic here might be a bit broken. if you call . search_iteration_plot(interactive_plot=True) twice, the second time you will get the go.Figure instead of the plot returned.

I chose to do it that way so that users can call search.plot.search_iteration_plot() and get back a Plotly figure after the search process is complete. The first time it is called, it returns the SearchIterationPlot() object so that fit() can update it at every iteration.

The other way I thought of doing this is to separate it out so that there was one function called interactive_search_iteration_plot() that sets up the SearchIterationPlot() object, shows the go.FeatureWidget version, and returns the SearchIterationPlot(). search_iteration_plot() then only returns a Plotly figure for after the search process. Would this be a better implementation?

hmm, i think i see what you're saying. let's talk live about this briefly tomorrow, so we can wrap this up.

kmax12 · 2019-12-10T18:54:03Z

based on the issue with formatting, i think it's fine to go back with what you had before. it's better to have negative numbers than to have formatting issue. we can look into fixing it more later. sorry for the wild good chase.

…ter-plot

christopherbunn · 2019-12-10T20:25:30Z

To clarify, the issue is that the odd behavior only comes up when we have the axis set to show integer numbers only. We can still set it to not show negative numbers on the x axis, but the point for iteration 0 would be still cut off slightly.

kmax12 · 2019-12-10T22:18:00Z

evalml/models/pipeline_search_plots.py

-        self.curr_iteration_scores.append(self.data.rankings['score'].iloc[iter_idx])
+        iter_idx = self.data.results['search_order']
+        pipeline_res = self.data.results['pipeline_results']
+        iter_scores = [pipeline_res[i]['score'] for i in range(len(pipeline_res))]


to make sure these score in the same order as iter_idx, should you do

iter_scores = [pipeline_res[i]['score'] for i in iter_idx]

kmax12 · 2019-12-10T22:22:11Z

evalml/models/pipeline_search_plots.py

+            plot
+        """
+
+        if hasattr(self, 'iter_plot'):


hmm, i think i see what you're saying. let's talk live about this briefly tomorrow, so we can wrap this up.

kmax12

LGTM

Updated with tests to check values

christopherbunn added 2 commits October 16, 2019 12:03

Create static plot of iterations during fit()

1933122

Changed to interactive iter vs score plot

f4ce551

Changed to show best score in iteration

e0afe11

kmax12 suggested changes Oct 21, 2019

View reviewed changes

christopherbunn and others added 10 commits October 21, 2019 11:34

Support for metrics where lower is better

eeec53d

Merge branch 'master' of github.com:FeatureLabs/evalml into perf-by-i…

8d83385

…ter-plot

Separated plotting and calculating into two separate functions

19e640d

Fixed blank figures bug

c231be7

Created test for plotting iterations

c06e56e

Increased pipeline number in test

2b5d308

Merge branch 'master' of github.com:FeatureLabs/evalml into perf-by-i…

2b1e8df

…ter-plot

Merge branch 'master' into perf-by-iter-plot

e351f79

Merge branch 'master' of github.com:FeatureLabs/evalml into perf-by-i…

72c34ab

…ter-plot

Updated changelog

a85048e

kmax12 suggested changes Nov 5, 2019

View reviewed changes

christopherbunn mentioned this pull request Nov 5, 2019

Added ability to create a bar plot of feature importances #133

Merged

christopherbunn and others added 11 commits November 11, 2019 08:52

Merge branch 'master' of github.com:FeatureLabs/evalml into perf-by-i…

70e5b86

…ter-plot

Moved plotting functionality to plotly

c64e3a8

Merge branch 'master' of github.com:FeatureLabs/evalml into perf-by-i…

dfeb1bc

…ter-plot

Fixed linting and test errors

195732b

Merge github.com:FeatureLabs/evalml into perf-by-iter-plot

d577d0b

Created plot with only max_time

12c2ca3

Merge branch 'master' of github.com:FeatureLabs/evalml into perf-by-i…

68ac474

…ter-plot

Merge branches 'perf-by-iter-plot' and 'master' of github.com:Feature…

2d088d1

…Labs/evalml into perf-by-iter-plot

Moved iteration plotting to pipeline search plots

4426656

Fixed lint and test issues

61379c7

Merge branch 'master' into perf-by-iter-plot

2af8700

Fixed other lint error

3302116

christopherbunn requested a review from jeremyliweishih December 6, 2019 21:45

jeremyliweishih previously requested changes Dec 6, 2019

View reviewed changes

kmax12 suggested changes Dec 6, 2019

View reviewed changes

christopherbunn added 3 commits December 9, 2019 10:56

Introduced new SearchIterationPlot obj

75f1b11

Updated plot axis to show only int values

15b054a

Added gray dot to indicate current iteration

2bb414f

christopherbunn requested a review from kmax12 December 9, 2019 16:44

christopherbunn and others added 3 commits December 9, 2019 13:03

Updated test to check monotonic

38dc29d

Added trace for current iter and restructured plot in AutoBase

e33fcfa

Merge branch 'master' into perf-by-iter-plot

7c314ca

Merge branch 'master' into perf-by-iter-plot

d8163c1

kmax12 suggested changes Dec 10, 2019

View reviewed changes

christopherbunn added 3 commits December 10, 2019 14:49

Moved plotting out of _do_iteration

b04827f

Merge branch 'master' of github.com:FeatureLabs/evalml into perf-by-i…

25da4a0

…ter-plot

Changed update to use curr_score info from results['search_order']

3240805

kmax12 suggested changes Dec 10, 2019

View reviewed changes

christopherbunn added 3 commits December 11, 2019 10:05

Fixed iter score appearance bug

ee3bd45

Overhauled plot object behavior to reload every iteration

7f32c49

Fixed lint and test errors

3eb4736

christopherbunn requested review from kmax12 and jeremyliweishih and removed request for jeremyliweishih December 12, 2019 16:26

kmax12 approved these changes Dec 12, 2019

View reviewed changes

christopherbunn merged commit ea32c4a into master Dec 12, 2019

dsherry deleted the perf-by-iter-plot branch May 26, 2020 21:08

Show interactive iteration vs. score plot when using fit() #134

Show interactive iteration vs. score plot when using fit() #134

Uh oh!

Conversation

christopherbunn commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmax12 commented Oct 17, 2019

Uh oh!

christopherbunn commented Oct 18, 2019

Uh oh!

codecov bot commented Oct 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremyliweishih left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christopherbunn Dec 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christopherbunn Dec 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christopherbunn commented Dec 10, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmax12 commented Dec 10, 2019

Uh oh!

christopherbunn commented Dec 10, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmax12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

christopherbunn commented Oct 16, 2019 •

edited

Loading

codecov bot commented Oct 18, 2019 •

edited

Loading

christopherbunn Dec 9, 2019 •

edited

Loading

christopherbunn Dec 9, 2019 •

edited

Loading