New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add elbow detection using the "kneedle" method to Elbow Visualizer #813
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
b5fd21c
Update pcoords.rst
pswaldia ed7f923
Merge branch 'develop' into develop
pswaldia a35f1d4
Merge pull request #1 from DistrictDataLabs/develop
pswaldia 7b341cf
Merge pull request #2 from DistrictDataLabs/develop
pswaldia d6d1cb1
added elbow detection using 'kneedle' method
pswaldia c90c4fb
Merge branch 'develop' into feature
pswaldia ce30b20
Added kneed.py
pswaldia 707112c
Update elbow.py to add knee locator functionality
pswaldia 33cc324
Update __init__.py
pswaldia 8e680f2
Update elbow.py
pswaldia 2be56ef
Update kneed.py
pswaldia df7261d
added minor changes to elbow.py
pswaldia 3d95348
Corrected minor mistake in draw()
pswaldia 182b34f
Update elbow.py to add info for locate_knee
pswaldia 0ac5556
update elbow.py to doc related info
pswaldia cc726a6
Update test_elbow.py
pswaldia a8b64fc
Merge branch 'develop' into feature
bbengfort File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,9 +22,12 @@ | |
import time | ||
import numpy as np | ||
import scipy.sparse as sp | ||
import warnings | ||
|
||
from .base import ClusteringScoreVisualizer | ||
from ..exceptions import YellowbrickValueError | ||
from ..style.palettes import LINE_COLOR | ||
from ..exceptions import YellowbrickValueError, YellowbrickWarning | ||
from ..utils import KneeLocator | ||
|
||
from sklearn.metrics import silhouette_score | ||
from sklearn.metrics import calinski_harabaz_score | ||
|
@@ -170,10 +173,31 @@ class KElbowVisualizer(ClusteringScoreVisualizer): | |
Display the fitting time per k to evaluate the amount of time required | ||
to train the clustering model. | ||
|
||
locate_elbow : bool, default: True | ||
Automatically find the "elbow" or "knee" which likely corresponds to the optimal | ||
value of k using the "knee point detection algorithm". The knee point detection | ||
algorithm finds the point of maximum curvature, which in a well-behaved clustering | ||
problem also represents the pivot of the elbow curve. The point is labeled with a | ||
dashed line and annotated with the score and k values. | ||
|
||
kwargs : dict | ||
Keyword arguments that are passed to the base class and may influence | ||
the visualization as defined in other Visualizers. | ||
|
||
Attributes | ||
---------- | ||
k_scores_ : array of shape (n,) where n is no. of k values | ||
The silhouette score corresponding to each k value. | ||
|
||
k_timers_ : array of shape (n,) where n is no. of k values | ||
The time taken to fit n KMeans model corresponding to each k value. | ||
|
||
elbow_value_ : integer | ||
The optimal value of k. | ||
|
||
elbow_score_ : float | ||
The silhouette score corresponding to the optimal value of k. | ||
|
||
Examples | ||
-------- | ||
|
||
|
@@ -194,6 +218,8 @@ class KElbowVisualizer(ClusteringScoreVisualizer): | |
|
||
For a discussion on the Elbow method, read more at | ||
`Robert Gove's Block <https://bl.ocks.org/rpgove/0060ff3b656618e9136b>`_. | ||
To know about 'Knee Point Detection Algorithm' read at `Finding a "kneedle" in a Haystack | ||
<https://raghavan.usc.edu//papers/kneedle-simplex11.pdf>`_. | ||
|
||
.. seealso:: The scikit-learn documentation for the `silhouette_score | ||
<https://bit.ly/2LYWjYb>`_ and `calinski_harabaz_score | ||
|
@@ -206,7 +232,7 @@ class KElbowVisualizer(ClusteringScoreVisualizer): | |
""" | ||
|
||
def __init__(self, model, ax=None, k=10, | ||
metric="distortion", timings=True, **kwargs): | ||
metric="distortion", timings=True, locate_elbow=True, **kwargs): | ||
super(KElbowVisualizer, self).__init__(model, ax=ax, **kwargs) | ||
|
||
# Get the scoring method | ||
|
@@ -218,7 +244,9 @@ def __init__(self, model, ax=None, k=10, | |
|
||
# Store the arguments | ||
self.scoring_metric = KELBOW_SCOREMAP[metric] | ||
self.metric = metric | ||
self.timings = timings | ||
self.locate_elbow=locate_elbow | ||
|
||
# Convert K into a tuple argument if an integer | ||
if isinstance(k, int): | ||
|
@@ -241,13 +269,19 @@ def __init__(self, model, ax=None, k=10, | |
def fit(self, X, y=None, **kwargs): | ||
""" | ||
Fits n KMeans models where n is the length of ``self.k_values_``, | ||
storing the silhoutte scores in the ``self.k_scores_`` attribute. | ||
storing the silhouette scores in the ``self.k_scores_`` attribute. | ||
The "elbow" and silhouette score corresponding to it are stored in | ||
``self.elbow_value`` and ``self.elbow_score`` respectively. | ||
This method finishes up by calling draw to create the plot. | ||
""" | ||
|
||
self.k_scores_ = [] | ||
self.k_timers_ = [] | ||
|
||
if self.locate_elbow: | ||
self.elbow_value_ = None | ||
self.elbow_score_ = None | ||
|
||
for k in self.k_values_: | ||
# Compute the start time for each model | ||
start = time.time() | ||
|
@@ -260,7 +294,24 @@ def fit(self, X, y=None, **kwargs): | |
self.k_timers_.append(time.time() - start) | ||
self.k_scores_.append( | ||
self.scoring_metric(X, self.estimator.labels_) | ||
) | ||
) | ||
|
||
if self.locate_elbow: | ||
locator_kwargs = { | ||
'distortion': {'curve_nature': 'convex', 'curve_direction': 'decreasing'}, | ||
'silhouette': {'curve_nature': 'concave', 'curve_direction': 'increasing'}, | ||
'calinski_harabaz': {'curve_nature': 'concave', 'curve_direction': 'increasing'}, | ||
}.get(self.metric, {}) | ||
elbow_locator = KneeLocator(self.k_values_,self.k_scores_,**locator_kwargs) | ||
self.elbow_value_ = elbow_locator.knee | ||
if self.elbow_value_ == None: | ||
warning_message=\ | ||
"No 'knee' or 'elbow' point detected, " \ | ||
"pass `locate_elbow=False` to remove the warning" | ||
warnings.warn(warning_message,YellowbrickWarning) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very nice, thank you for changing this warning into a much more understandable signal to the user! |
||
else: | ||
self.elbow_score_ = self.k_scores_[self.k_values_.index(self.elbow_value_)] | ||
|
||
|
||
self.draw() | ||
|
||
|
@@ -271,8 +322,11 @@ def draw(self): | |
Draw the elbow curve for the specified scores and values of K. | ||
""" | ||
# Plot the silhouette score against k | ||
self.ax.plot(self.k_values_, self.k_scores_, marker="D", label="score") | ||
|
||
self.ax.plot(self.k_values_, self.k_scores_, marker="D") | ||
if self.locate_elbow and self.elbow_value_!=None: | ||
elbow_label = "$elbow\ at\ k={}, score={:0.3f}$".format(self.elbow_value_, self.elbow_score_) | ||
self.ax.axvline(self.elbow_value_, c=LINE_COLOR, linestyle="--", label=elbow_label) | ||
|
||
# If we're going to plot the timings, create a twinx axis | ||
if self.timings: | ||
self.axes = [self.ax, self.ax.twinx()] | ||
|
@@ -281,12 +335,14 @@ def draw(self): | |
c='g', marker="o", linestyle="--", alpha=0.75, | ||
) | ||
|
||
|
||
return self.ax | ||
|
||
def finalize(self): | ||
bbengfort marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Prepare the figure for rendering by setting the title as well as the | ||
X and Y axis labels and adding the legend. | ||
|
||
""" | ||
# Get the metric name | ||
metric = self.scoring_metric.__name__.replace("_", " ").title() | ||
|
@@ -299,6 +355,10 @@ def finalize(self): | |
# Set the x and y labels | ||
self.ax.set_xlabel('k') | ||
self.ax.set_ylabel(metric.lower()) | ||
|
||
#set the legend if locate_elbow=True | ||
if self.locate_elbow and self.elbow_value_!=None: | ||
self.ax.legend(loc='best', fontsize='medium') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! |
||
|
||
# Set the second y axis labels | ||
if self.timings: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,3 +22,4 @@ | |
|
||
from .helpers import * | ||
from .types import * | ||
from .kneed import * |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thank you!