Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t-SNE Model Understanding #1731

Merged
merged 23 commits into from Jan 27, 2021
Merged

t-SNE Model Understanding #1731

merged 23 commits into from Jan 27, 2021

Conversation

ParthivNaresh
Copy link
Contributor

Fixes #1709

@codecov
Copy link

codecov bot commented Jan 25, 2021

Codecov Report

Merging #1731 (8618c94) into main (01bf21f) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1731     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         243      243             
  Lines       19353    19430     +77     
=========================================
+ Hits        19344    19421     +77     
  Misses          9        9             
Impacted Files Coverage Δ
evalml/model_understanding/__init__.py 100.0% <ø> (ø)
evalml/model_understanding/graphs.py 99.8% <100.0%> (+0.1%) ⬆️
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01bf21f...8618c94. Read the comment docs.

Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all this looks good to me. I'm not 100% clear on tsne, but it looks like dimensionality reduction and when I .show()'ed the figs it seemed consistent.

One thing that other graph tests have done is test that the underlying algorithm's data is present in the fig_dict. ala fig_dict_data['data'][0]['values'] == tsne(params) I think just to be consistent with the others you should probably throw that in here. Everything else looks excellent.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Main issue is that I can't graph and see what the graph looks like. Also, would be nice to add this to the api_references.rst doc and model_understanding.ipynb so when we release this, users can see the docs and see the usage.

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
return X_new


def graph_t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', marker_line_width=2, marker_size=7):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm missing something, but I don't see anything graphed in both my Jupyter notebook and jupyter lab, even after I run make installdeps.
image

Are you able to get it to show in both notebook and lab? Can you post a screenshot of the resulting graph?
Also, it takes over a minute to finish this method, whereas t_sne was fairly quick. Is this the same for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Should look something like this, and I'm able to get the results instantly but I'm also using a small dataset. I'm sure the time increases exponentially with the dataset size

Screen Shot 2021-01-26 at 1 22 23 PM

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/tests/model_understanding_tests/test_graphs.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh Looks great! The one thing I want to resolve before merge is the marker_color=color line in graph_t_sne. I think it's slowing us down. The rest are non-blocking nitpicks on docstrings and a minor test change.

evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/model_understanding/graphs.py Outdated Show resolved Hide resolved
evalml/tests/model_understanding_tests/test_graphs.py Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @ParthivNaresh !

if not marker_size >= 0:
raise ValueError("The parameter marker_size must be non-negative")

X_embedded = t_sne(X, n_components=n_components, perplexity=perplexity, learning_rate=learning_rate, metric=metric, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should hard-code n_components=2 and remove it as a function parameter? I think it would avoid errors in n_components=1 and I don't know if we should let users pass >=3 components if we're only going to plot the first two anyways?

@ParthivNaresh ParthivNaresh merged commit 4b3bfe4 into main Jan 27, 2021
2 checks passed
@chukarsten chukarsten mentioned this pull request Feb 1, 2021
@freddyaboulton freddyaboulton deleted the 1709-t-SNE-Model-Understanding branch May 13, 2022 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate t-SNE into Model Understanding
4 participants