New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t-SNE Model Understanding #1731
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1731 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 243 243
Lines 19353 19430 +77
=========================================
+ Hits 19344 19421 +77
Misses 9 9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all this looks good to me. I'm not 100% clear on tsne, but it looks like dimensionality reduction and when I .show()'ed the figs it seemed consistent.
One thing that other graph tests have done is test that the underlying algorithm's data is present in the fig_dict. ala fig_dict_data['data'][0]['values'] == tsne(params)
I think just to be consistent with the others you should probably throw that in here. Everything else looks excellent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Main issue is that I can't graph and see what the graph looks like. Also, would be nice to add this to the api_references.rst
doc and model_understanding.ipynb
so when we release this, users can see the docs and see the usage.
evalml/model_understanding/graphs.py
Outdated
return X_new | ||
|
||
|
||
def graph_t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', marker_line_width=2, marker_size=7): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I'm missing something, but I don't see anything graphed in both my Jupyter notebook and jupyter lab, even after I run make installdeps
.
Are you able to get it to show in both notebook and lab? Can you post a screenshot of the resulting graph?
Also, it takes over a minute to finish this method, whereas t_sne
was fairly quick. Is this the same for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 Should look something like this, and I'm able to get the results instantly but I'm also using a small dataset. I'm sure the time increases exponentially with the dataset size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh Looks great! The one thing I want to resolve before merge is the marker_color=color
line in graph_t_sne
. I think it's slowing us down. The rest are non-blocking nitpicks on docstrings and a minor test change.
…eryx/evalml into 1709-t-SNE-Model-Understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @ParthivNaresh !
if not marker_size >= 0: | ||
raise ValueError("The parameter marker_size must be non-negative") | ||
|
||
X_embedded = t_sne(X, n_components=n_components, perplexity=perplexity, learning_rate=learning_rate, metric=metric, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should hard-code n_components=2
and remove it as a function parameter? I think it would avoid errors in n_components=1
and I don't know if we should let users pass >=3 components if we're only going to plot the first two anyways?
Fixes #1709