Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Plot for Prediction vs Actual for Regression Problems #1252

Merged
merged 21 commits into from
Oct 5, 2020
Merged

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Oct 1, 2020

fix #772

Added outlier_threshold for simple outlier detection

Updated Model_Understanding doc here

Regular documentation here

@bchen1116 bchen1116 self-assigned this Oct 1, 2020
@codecov
Copy link

codecov bot commented Oct 1, 2020

Codecov Report

Merging #1252 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1252   +/-   ##
=======================================
  Coverage   99.93%   99.93%           
=======================================
  Files         207      207           
  Lines       13055    13142   +87     
=======================================
+ Hits        13046    13133   +87     
  Misses          9        9           
Impacted Files Coverage Δ
evalml/model_understanding/__init__.py 100.00% <ø> (ø)
evalml/model_understanding/graphs.py 100.00% <100.00%> (ø)
...lml/tests/model_understanding_tests/test_graphs.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ae3207...68e882d. Read the comment docs.

@bchen1116 bchen1116 marked this pull request as ready for review October 1, 2020 19:58
data = pd.concat([pd.Series(predictions),
pd.Series(actual)], axis=1)
data.columns = ['prediction', 'actual']
data['outlier'] = np.where((abs(data['prediction'] - data['actual']) >= outlier_threshold), "#ffff00", "#0000ff")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed to encode colors as hex values

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good but I have two immediate comments:

  • It would be helpful to post an image of what this looks like or add it to the model_understanding docs and link to that instead!
  • I see in the original issue (Add predicted vs actual plot (regression) #772) that we want to support regression and timeseries data. Not sure what the updated requirements were but if this PR adds just for regression (which is fine), could you please file a separate issue to make sure timeseries data doesn't get dropped?

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Looks good! I have two minor comments!

evalml/model_understanding/graphs.py Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 This looks great!

@bchen1116
Copy link
Contributor Author

Submitted issue #1258 to handle adding plot for timeseries.

@bchen1116 bchen1116 merged commit bd04b00 into main Oct 5, 2020
@dsherry dsherry mentioned this pull request Oct 29, 2020
@freddyaboulton freddyaboulton deleted the bc_772_plot branch May 13, 2022 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add predicted vs actual plot (regression)
3 participants