Add elbow detection using the "kneedle" method to Elbow Visualizer #813

pswaldia · 2019-04-12T16:46:54Z

I. Are you merging from a feature branch into develop?

Yes

II. Summarize your PR

This PR fixes #764 that aimed to include a feature to annotate the optimal value of 'k' using kneedle method described in https://github.com/arvkevi/kneed .

I have solved the issue in the following way:

By using kneed library which can be added as an optional dependancy.
By plotting a vertical line (black dashed) showing optimal K value.

III. Include a sample plot

In the above plot k=7 is the optimal K.

IV. List any TODOs or questions

Still to do:

Upon getting feedback from the maintainers , it can be decided on whether it is to be added as an optional feature or regular feature.

Questions for the @DistrictDataLabs/team-oz-maintainers:

Is the legend required or not. ? It can be removed if not required.

Sync original

merge

lwgray · 2019-04-12T16:50:04Z

@pswaldia Welcome and Thanks for opening a PR. We are wading through our backlog of issues and will get to your PR asap.

pswaldia · 2019-04-12T16:52:14Z

@lwgray Sure. Thanks for the response.

rebeccabilbro · 2019-04-14T14:50:31Z

Hey there @pswaldia and thanks for jumping in to address #764!

After taking a look through kneed, I think this will be a very useful functionality to integrate into our KElbowVisualizer! So useful, in fact, that I am going to propose that instead of using kneed as an optional dependency, we instead reimplement the code in knee_locator.py and integrate it directly into our KElbowVisualizer (we should also make sure to give @arvkevi a big shout-out in our docs!). Since kneed's dependencies are all already dependencies of YB, and since the code required to generate the elbows is only ~100 lines (not including tests), I'm hoping this won't be too heavy of a lift.

Full disclosure: My rationale for reimplementation over optional dependency is also partly in the interest of the long-term maintainability of our codebase. It will be much easier on our end to maintain the code if it's part of YB. Basically, from the maintainer perspective, fewer dependencies == fewer headaches down the line :D

Let us know if you're game to try the reimplementation @pswaldia!

pswaldia · 2019-04-14T15:14:40Z

@rebeccabilbro Thanks for the response. I am ready to reimplement the knee_locator functionality in yb. I'll be updating the pull request soon.
Thanks.

arvkevi · 2019-04-14T16:22:35Z

This is so cool to see kneed even mentioned by yb 😄
The reimplementation rationale makes sense, case in point, I'm planning a release soon that will implement some significant changes (although they wouldn't affect this PR).

I'm not an expert on OSS licenses, but if you go the reimplementation route it might be best to include the license and copyright attribution? Only mentioning this based on what I've picked up on in my OS adventures.

Thanks for the shoutout 👍

bbengfort · 2019-04-15T13:52:55Z

@arvkevi thanks for chiming in - I was going to reach out to you about this, but I'm super excited that you're onboard to have us include knee_locater.py in our source. Let me know if you'd like to chat more directly over email or elsewhere but here is what I propose:

You've licensed kneed under a very permissive BSD-3 license. Yellowbrick is also licensed in a permissive fashion, using the Apache 2 license. Because our license is slightly more restrictive than yours, I believe that our license will cover yours in an acceptable fashion. Further to meet the obligations of your license we propose to add the following preamble to:

yellowbrick/utils/kneed.py
tests/test_utils/test_kneed.py

as follows:

# yellowbrick.utils.kneed
# A port of the knee-point detection package, kneed.
#
# Author:   Kevin Arvai
# Author:   Pradeep Singh 
# Created:  Mon Apr 15 09:43:18 2019 -0400
#
# Copyright (C) 2017 Kevin Arvai
# All rights reserved.
# Redistribution and use in source and binary forms, with or without modification, 
# are permitted provided that the following conditions are met:
# 
# 1. Redistributions of source code must retain the above copyright notice, this list
# of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice, this 
# list of conditions and the following disclaimer in the documentation and/or other 
# materials provided with the distribution.
#
# 3. Neither the name of the copyright holder nor the names of its contributors may 
# be used to endorse or promote products derived from this software without specific
# prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 
# OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
# IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# ID: kneed.py [] pswaldia@no-reply.github.com $

"""
This package contains a port of the knee-point detection package, kneed, by 
Kevin Arvai and hosted at https://github.com/arvkevi/kneed. This port is maintained 
with permission by the Yellowbrick contributors.
"""

We will also link to you and say thank you in our documentation when we get to that point. Thoughts about the above proposal?

bbengfort · 2019-04-15T14:01:01Z

@pswaldia thank you for taking on this challenge! I will be reviewing your PR when you push the kneedle code and we can work together to make sure this is implemented correctly. As noted above, I think the best way to port the code would be to include the knee_locator.py file with the preamble above in yellowbrick/utils/kneed.py -- the file rename gives a stronger nod to @arvkevi. We can then place test_sample.py into tests/test_utils/test_kneed.py with the preamble modified for that file.

Note that the two tests that rely on the DataGenerator can probably be simplified as we don't have the same requirement to model the original paper. We can either simply move the noisy_gaussian function over, or simply omit those tests.

Finally, to answer your question about the legend, I think that it is helpful, but with more information, e.g. remove the score label and instead simply have a label that says optimal k=7, score=18232 or something like that.

Adding the knee annotation should be an optional kwarg, wtih default=True so our users can turn it off if they don't want to use it.

Let me know if you have any other questions! And thank you again so much for contributing to Yellowbrick!

Added preample as well as code.

pswaldia · 2019-04-15T21:12:06Z

@bbengfort I have made some commits implementing changes as directed by you. Before moving to tests I would like to recieve feedback regarding the changes.
Changes done:

Added kneed.py to yellowbrick/utils/ .
Added the knee annotation as an optional kwarg, wtih default=True so our users can turn it off if they don't want to use it. ( Please let me know if the name of the variable chosen is alright.)
Changes been made to legend. ( I am not sure how to include independent texts to the legend without any symbols. Would like to recieve feedback on that.)

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans()

visualizer = KElbowVisualizer(model, k=(4,12)) #default knee=True

visualizer.fit(X)    # Fit the data to the visualizer
visualizer.poof()    # Draw/show/poof the data

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans()

visualizer = KElbowVisualizer(model, k=(4,12), knee=False) #setting knee=False 

visualizer.fit(X)    # Fit the data to the visualizer
visualizer.poof()    # Draw/show/poof the data

Thanks.

arvkevi · 2019-04-15T22:54:29Z

@bbengfort this is great, I'm excited to see this feature hit master 😄

bbengfort

@pswaldia thank you so much for your patience and hard work on this PR. Things are looking really good and I'm excited to include the port of the kneed package!

I think I've addressed all of your questions in the PR in my comments below (very nice PR description, by the way, thank you for your detail) - but please let me know if I skipped anything.

I also took a look at the tests, and right now it seems that it's just image comparison failures right now. Depending on how you want to approach this, we can either:

Set the knee/locate_elbow parameter to False by default, as a prototype feature, then merge this PR and work on the tests in another PR
Work on the tests/docs as we keep going in this PR.

It's up to you! Thank you again!

bbengfort · 2019-04-16T19:56:41Z

yellowbrick/cluster/elbow.py

@@ -170,6 +171,9 @@ class KElbowVisualizer(ClusteringScoreVisualizer):
        Display the fitting time per k to evaluate the amount of time required
        to train the clustering model.

+    knee : bool, default=True
+        Display the vertical line corresponding to the optimal value of k.      


Hehe, I do think it's funny to have a knee parameter in an "elbow" visualizer. I really do want to keep calling this "knee", but I'm worried it might be a bit confusing particularly to students. May I propose the following:

locate_elbow : bool, default: True Automatically find the "elbow" or "knee" which likely corresponds to the optimal value of k using the "knee point detection algorithm". The knee point detection algorithm finds the point of maximum curvature, which in a well-behaved clustering problem also represents the pivot of the elbow curve. The point is labeled with a dashed line and annotated with the score and k values.

Does that seem a bit more understandable?

Yes , this is definitely a good name and a nice description too. I'll replace that with the suggested one.
Thanks.

bbengfort · 2019-04-16T19:58:21Z

yellowbrick/cluster/elbow.py

@@ -219,6 +223,7 @@ def __init__(self, model, ax=None, k=10,
        # Store the arguments
        self.scoring_metric = KELBOW_SCOREMAP[metric]
        self.timings = timings
+        self.knee=knee


If we do change the name of the argument, this should also be updated to match.

Yeah! Sure.

bbengfort · 2019-04-16T20:01:14Z

yellowbrick/cluster/elbow.py

@@ -247,6 +252,9 @@ def fit(self, X, y=None, **kwargs):

        self.k_scores_ = []
        self.k_timers_ = []
+        self.kneedle=None


Does the KneeLocator store any state that we may need to keep around? If so, we should store it in a variable self._knee_locator (just to make things obvious to other YB contributors). If not, there is probably no reason to keep this object on the visualizer, simply make it a variable local to fit() which will be discarded when we're done fitting.

@bbengfort For the moment I don't think there's any need to include knee locator object in the visualizer. We can make it local to fit().I also think the name kneedle is not that intuitive , I will change that to knee_locator or elbow_locator as you wish.
Thanks

I agree, also after more consideration; the KneeLocator object holds all the data, which is something we don't want to do generally, so I think it's good that we're not storing it on the Visualizer. I'm fine with either knee_locator or elbow_locator -- your choice!

bbengfort · 2019-04-16T20:04:59Z

yellowbrick/cluster/elbow.py

@@ -247,6 +252,9 @@ def fit(self, X, y=None, **kwargs):

        self.k_scores_ = []
        self.k_timers_ = []
+        self.kneedle=None
+        self.knee_value=None
+        self.score=None


This is great, thank you so much for storing these on the visualizer! I think users will be interested in directly accessing them. I've got two requests:

We should make these "learned" attributes and suffix them with an underscore, e.g. self.knee_value_ and self.score_; also it is probably more informative if we name them self.elbow_value_ and self.elbow_score_ if that's ok -- see discussion above.

We should document these in the docstring of the class. Also, it seems that k_scores_ and k_timers_ are not documented, so would you mind also documenting them? See icdm.py Line 123 for an example of how to document the learned attributes.

Finally (more on this later), these properties should only exist iff self.locate_elbow==True.

@bbengfort Thanks so much for your kind words.

I was not aware of that naming convention of 'learned' attributes in yb but I found that in every visualizer class. I would change them as sugggested by you. Thanks.

I will document those attributes. They are important for other contributors too. Thanks.

And as suggested it makes sense for these properties to exist iff self.locate_elbow==True. I will make sure that happens.

One more question...I am not sure what do you mean by "learned" attributes , so will KneeLocator object be considered as a "learned" attribute or not.

I appreciate it! The learned attributes are more a scikit-learn thing than a Yellowbrick thing. If you're interested in learning more, check out the sklearn developer guide. I guess they call them "estimated attributes" there.

A learned/estimated attribute is any data that is created when fit() is called - e.g. the coef_ of a linear model. They only exist because data has been passed into the estimator. Because of this, the convention is to omit them from the class and only set them in fit() -- this also is what we use to check if an estimator is fitted or not.

Because you're making the KneeLocator a local variable, not storing it on the visualizer, it is not a learned attribute. However, elbow_value_, elbow_score_, k_scores_, and k_timers_ are all learned attributes because they do not/cannot exist until fit().

yellowbrick/cluster/elbow.py

bbengfort · 2019-04-16T20:35:20Z

yellowbrick/utils/kneed.py

+            warnings.warn("No local maxima found in the distance curve\n"
+                          "The line is probably not polynomial, try plotting\n"
+                          "the distance curve with plt.plot(knee.xd, knee.yd)\n"
+                          "Also check that you aren't mistakenly setting the curve argument", RuntimeWarning)


Unfortunately, I think we may have to do something about this warning, otherwise, it could be very confusing to our users.

First, it is probably preferable to raise an exception here - that way we can catch the exception in the KElbowVisualizer and either keep drawing without the detected elbow, issue an elbow-specific warning, or advise the user to set locate_elbow=False. I think this warning will be issued given the convex/decreasing or concave/increasing issue that I mentioned above. So we may want to do things differently depending on if the user has the ability to control these params or something has gone wrong with what we expected for the metric.

If we do go with a warning, instead of a RuntimeWarning, could we please issue a YellowbrickWarning ?

whew, this is very, very legacy... the warning started as a sanity check long ago.

We are running into it because we can be a bit all over the place with convex/increasing or concave/decreasing depending on the metric we're using -- and if the clustering is terrible (e.g. bad features, wrong algorithm, no actual clusters) then things get really wild at that point. Any advice or thoughts you have would be welcome!

@bbengfort I am trying to get familiar with the exceptions handling. Once done I'll accommodate these changes in a PR.
Thanks.

yellowbrick/utils/kneed.py

pswaldia · 2019-04-18T19:50:16Z

Hello @bbengfort I have made changes suggested by you. Please tell me what further changes need to be done.

In elbow.py

Learned attributes are documented.
Parameters/Attributes names are changed to make them more understandanble.
Set the curve_nature and curve directionaccording to the metric we have. It was pretty straighforward.
Modified legend in mathLatEx style and sent the legend drawing to finalize().
Warn the user and advise them to set locate_elbow=False when no knee point deteced.

In kneed.py

Changed the docstring to match with yellowbrick style.
Modified the warning message and used YellowBrickWarning instead of RunTimeWarning. Warning message for 2nd warning has not been changed , it will be made more meaningful after going through the entire implementation of kneed.

Demo:
1.

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans()

visualizer = KElbowVisualizer(model, k=(4,12))  #default knee=True, metric='distortion'

visualizer.fit(X)    # Fit the data to the visualizer
visualizer.poof()    # Draw/show/poof the data

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans()

visualizer = KElbowVisualizer(model,metric="calinski_harabaz", k=(4,12))  #default knee=True, metric=''

visualizer.fit(X)    # Fit the data to the visualizer
visualizer.poof()    # Draw/show/poof the data

yellowBrickWarning will be displayed when knee point is not detected. Curve will be drawn without elbow point and warning will advice user to pass locate_elbow=False to remove warning.Note: The below plot has a elbow point but just for demonstration purpose the knee_ value has been set to None.

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

model = KMeans()

visualizer = KElbowVisualizer(model,metric="calinski_harabaz", k=(4,12))  #default knee=True, metric=''

visualizer.fit(X)    # Fit the data to the visualizer
visualizer.poof()    # Draw/show/poof the data

bbengfort

Looks great @pswaldia -- we're getting there! I had a few minor, non-blocking comments. But now it's on to tests! Feeling up for it?

Basically, my thoughts are that we need the current batch of tests to continue with locate_elbow=False then we should have a second batch of tests that test all the metrics and do image similarity with locate_elbow=True; does this sound like a reasonable testing strategy?

Additionally, we should also check the documentation to make sure it's updated correctly with the new visual features.

bbengfort · 2019-04-18T20:44:56Z

yellowbrick/cluster/elbow.py

+                self.curve_direction = 'decreasing'
+            elif self.metric=='silhouette' or self.metric=='calinski_harabaz':
+                self.curve_nature = 'concave'
+                self.curve_direction = 'increasing'    


This is not a big deal and won't block merging of this code, but I wanted to make a recommendation because I'm hoping you will be a regular contributor to Yellowbrick, and this is both a code style thing and an user-facing API thing.

I mentioned that learned attributes, suffixed with an underscore, are only set in fit() (nice work with those attrs, by the way!) In YB/sklearn terms, the other type of attributes are hyperparameters - the properties that the user passes into __init__ - unlike other python programming, sklearn and YB treat these specially. All other attributes of the class should be methods, e.g. fit() or they should be marked as private, e.g. prefixed with an _.

Based on this, technically (again not a big deal) curve direction and nature should be self._curve_direction and self._curve_nature.

There is another little item here, though, and that's that the curve nature and direction should depend only on the metric. What happens if the user changes these values? (nothing in the case of your code) It could become confusing for the user.

Therefore I would propose that we simply keep these as local variables when instantiating the KneeLocator. Simply removing the self. is fine - but I also wanted to show you how you might see this in other YB code:

locator_kwargs = { 'distortion': {'curve_nature': 'convex', 'curve_direction': 'decreasing'}, 'silhouette': {'curve_nature': 'concave', 'curve_direction': 'increasing'}, 'calinski_harabaz': {'curve_nature': 'concave', 'curve_direction': 'increasing'}, }.get(self.metric, {}) elbow_locator = KneeLoator(self.k_values_, self.k_scores, **locator_kwargs)

This is a jump table pattern and we often use it over if/elif so that it's easy to make modifications on a per-metric basis.

This jump table pattern is excellent. I always wondered what **kwargs used for , today I learnt how they can be used. It's lot better over the if/else , looks tidy too.
Thank you!

bbengfort · 2019-04-18T20:45:30Z

yellowbrick/cluster/elbow.py

+            elif self.metric=='silhouette' or self.metric=='calinski_harabaz':
+                self.curve_nature = 'concave'
+                self.curve_direction = 'increasing'    
+            self.elbow_locator = KneeLocator(self.k_values_,self.k_scores_,curve_nature=self.curve_nature,curve_direction=self.curve_direction)


I thought we agreed that this would be a local variable and not stored on the class? I still think that's the right way to go.

bbengfort · 2019-04-18T20:45:42Z

yellowbrick/cluster/elbow.py

+
+    elbow_score_ : float
+        The silhouette score corresponding to the optimal value of k.         
+


Very nice, thank you!

bbengfort · 2019-04-18T20:46:16Z

yellowbrick/cluster/elbow.py

+                warning_message=\
+                "No 'knee' or 'elbow' point detected, " \
+                "pass `locate_elbow=False` to remove the warning"   
+                warnings.warn(warning_message,YellowbrickWarning) 


Very nice, thank you for changing this warning into a much more understandable signal to the user!

bbengfort · 2019-04-18T20:47:55Z

yellowbrick/cluster/elbow.py

-
+        self.ax.plot(self.k_values_, self.k_scores_, marker="D")
+        if self.locate_elbow and self.elbow_value_!=None:
+            elbow_label = "$elbow\ at\ k={}, score={}$".format(self.elbow_value_, np.round(self.elbow_score_,3))


Again, not a big deal in this case but my preference is "{:0.3f}".format(self.elbow_score_) rather than np.round - I think this might be more expected by other developers. It won't be a blocker for this PR though.

bbengfort · 2019-04-18T20:48:33Z

yellowbrick/cluster/elbow.py

+
+        #set the legend if locate_elbow=True
+        if self.locate_elbow and self.elbow_value_!=None:
+            self.ax.legend(loc='best', fontsize='medium')


bbengfort · 2019-04-18T20:49:00Z

yellowbrick/utils/kneed.py

+    -----
+    The KneeLocator is implemented using the "knee point detection algorithm" which can be read at
+    `<https://www1.icsi.berkeley.edu/~barath/papers/kneedle-simplex11.pdf>`
+


pswaldia · 2019-04-18T21:58:13Z

@bbengfort I am done with those minor yet important changes suggested.
And yes that testing strategy seems good to me too. (Although I am not experienced writing tests, it will be first one for me, but i am really excited.). I will also look into the documentation part.
Thanks.

bbengfort · 2019-04-19T14:02:50Z

Thanks for doing that @pswaldia - again, it was no big deal at all, but I do appreciate you making the changes; you're ensuring that this PR is absolutely top notch!

A couple of tips for docs and tests follow:

Documentation

Make sure you have installed the documentation requirements pip install -r docs/requirements.txt (preferably inside a virtual environment)
To build the docs, change into the docs directory and run make html, this should create a folder called _build/html inside of the docs directory. Note this only works in the docs directory! Let me know if there are any errors.
If you're on a Mac or a Linux machine you can open _build/html/index.html on the command line, otherwise, open a browser and go to File -> Open and select that file.
Navigate to Visualizers and API -> Clustering Visualizers -> Elbow Method
To edit the docs, edit the reStructuredText file at docs/api/cluster/elbow.rst

Let me know if there are any errors building the documentation or if the docs look weird.

Tests

Make sure you have the test requirements installed pip install -r tests/requirements.txt (preferably inside a virtual environment)
Run the tests with pytest ... this will take a while. Note that the tests must be run from the root project directory.
Run the elbow visualizer tests with pytest tests/test_cluster/test_elbow.py -- this is probably what you'll want to do most when testing.
You'll be writing/updating tests in TestElbowVisualizer, note that the method name must start with test_ and that the method name and the single line docstring are used in the test reporting.

I suggest the following:

For any tests that are currently failing, update them with locate_elbow=False; if those tests pass, let's merge this PR in and get to the tests and the updates to the docs in another PR. If those tests don't pass, then we'll have to sort something out.

pswaldia · 2019-04-19T21:27:19Z

@bbengfort As you suggested I followed the approach for tests , there were some failures related to image similarity, but I was able to complete the tests by updating the tests with locate_elbow=False. So at the end there were no issue with the tests regarding the KElbowVisualizer. So I think this PR can now be merged and we can get to further tests in another PR.

Regarding the documentation it was a smooth sail but I did noticed some weird things that did not match with the code written.

What I expected was locate_elbow : bool, default: True but it was not so. This was same with every visualizer not only KElbowVisualizer that we updated. Is it okay to have this bcoz in the official documentation available on net its fine. ?

I changed the documentation a bit so please have a look at it and suggest changes if any. I will be completing the tests today

Thanks..

saurabhdaalia · 2019-04-19T22:08:43Z

@bbengfort Regarding the documentation it was a smooth sail but I did noticed some wierd things that did not match with the code written.

What I expected was locate_elbow : bool, default: True but it was not so. This was same with every visualizer not only KElbowVisualizer that we updated. Is it okay to have this bcoz in the official documentation available on net its fine. ?

I changed the documentation a bit so please have a look at it and suggest changes if any. I will be completing the tests today

Thanks..

@pswaldia , I am facing the same issue that you mentioned here. This seems to be an environment issue I feel. Deactivating virtualenv and then running make command works fine.

bbengfort

@pswaldia awesome, this looks great - and thank you for adding a reference to kneed in the documentation. As for the build docs weirdness, I'll take a look and see how it builds on Read the Docs to see if we need to make any changes.

I'm going to go ahead and merge this in -- thank you so much for your contribution! But let's make sure we get those tests in for both kneed and the locate_elbow param in soon! Will you be able to work on that next? If not, we can always set the default locate_elbow=False to give us more time (e.g. marking it as an experimental feature).

pswaldia · 2019-04-22T16:48:37Z

Thanks @bbengfort I will be working on writing tests. Though it will not be that fast this time as I will be busy with my exams. But I will open a pull request doing the required. Thanks again. It was really nice having this pull request merged.

arvkevi · 2019-04-22T16:52:48Z

Nice work @pswaldia 😃

pswaldia · 2019-04-22T16:54:08Z

Thanks sir @arvkevi for your kind words.

bbengfort · 2019-04-22T18:41:10Z

@pswaldia it is important for you to be able to focus on your exams, therefore I think there are two options here:

If you create a PR - I can fill in the test stubs to make sure we don't lose the thread while you're taking your exam.
We can set the default locate_elbow=False

I'm fine with either, but option 1 requires a bit more git wrangling for you (e.g. pull from upstream develop, push into your fork, delete your old feature branch, and create a new feature branch). Let me know what you'd prefer!

pswaldia · 2019-04-22T19:52:23Z

@bbengfort we need to complete the tests sooner or the later , let's go with option 1. I will open a new PR soon.

arvkevi · 2019-07-22T11:36:40Z

@bbengfort there was a bug in knee_locator.py.
The algorithm was reporting the last (most recent) knee point detected instead of the first. This caused the sensitivity parameter S to have an insignificant effect.

It prompted a code refactor, but there are no changes to the API. The refactored code now reports the first knee and S now has a significant impact on the knee point detected. See code snippet and plot in the README.
Also included in the 0.4+ release:

removed an additional for loop
improved variable naming
KneeLocator saves all_knees
more tests...

I'm happy to open a new issue and/or PR and modify the yb source code myself. I see you are nearing a v1.0 release 🎉 so you may want to forge ahead and put this on the backburner. Either way, I wanted to bring it to your attention.

Kautumn06 · 2019-07-22T15:39:08Z

Hi @arvkevi — thank you so much for reaching out! Your timing is perfect because yesterday while reviewing PR #891 I noticed that number being returned for k appeared to be one less than the optimal value. I tested the KElbowVisualizer again using make_blobs to create a synthetic dataset with 8 clusters and it also showed the optimal value as one less than what I was expecting:

I had planned on reaching out to you today to see if perhaps there was a bug in our port, so your timing couldn't be better. We greatly appreciate your help opening a PR to fix this; we are all pretty focused on our v1.0 release, but this is a great bug to squash before it goes out!

pswaldia added 5 commits March 30, 2019 22:29

Update pcoords.rst

b5fd21c

Merge branch 'develop' into develop

ed7f923

Merge pull request #1 from DistrictDataLabs/develop

a35f1d4

Sync original

Merge pull request #2 from DistrictDataLabs/develop

7b341cf

merge

added elbow detection using 'kneedle' method

d6d1cb1

bbengfort added the ready label Apr 12, 2019

rebeccabilbro removed the ready label Apr 14, 2019

lwgray assigned bbengfort Apr 14, 2019

bbengfort self-requested a review April 15, 2019 13:30

pswaldia added 4 commits April 16, 2019 02:10

Merge branch 'develop' into feature

c90c4fb

Added kneed.py

ce30b20

Added preample as well as code.

Update elbow.py to add knee locator functionality

707112c

Update __init__.py

33cc324

bbengfort reviewed Apr 16, 2019

View reviewed changes

pswaldia commented Apr 17, 2019

View reviewed changes

yellowbrick/utils/kneed.py Outdated Show resolved Hide resolved

pswaldia added 2 commits April 19, 2019 00:51

Update elbow.py

8e680f2

Update kneed.py

2be56ef

bbengfort reviewed Apr 18, 2019

View reviewed changes

added minor changes to elbow.py

df7261d

Corrected minor mistake in draw()

3d95348

pswaldia added 2 commits April 20, 2019 02:50

Update elbow.py to add info for locate_knee

182b34f

update elbow.py to doc related info

0ac5556

Update test_elbow.py

cc726a6

rebeccabilbro mentioned this pull request Apr 21, 2019

[WIP] Added n_jobs argument for parallelization in K-Elbow Visualizer #822

Closed

1 task

Merge branch 'develop' into feature

a8b64fc

bbengfort approved these changes Apr 22, 2019

View reviewed changes

bbengfort merged commit 9a3587a into DistrictDataLabs:develop Apr 22, 2019

pswaldia deleted the feature branch April 22, 2019 19:24

pswaldia mentioned this pull request Apr 22, 2019

Additional tests for elbow detection in Elbow Visualizer #824

Merged

This was referenced Jul 23, 2019

Incorporate kneed refactor in utils/kneed.py #931

Closed

Refactor to be current with kneed v0.4.1 #935

Merged

This was referenced Aug 8, 2019

Small fix to readme image generation script #942

Merged

KElbow raises confusing ValueError when optimal k is outside provided k-range #943

Closed


		elbow_score_ : float
		The silhouette score corresponding to the optimal value of k.

Add elbow detection using the "kneedle" method to Elbow Visualizer #813

Add elbow detection using the "kneedle" method to Elbow Visualizer #813

Conversation

pswaldia commented Apr 12, 2019 • edited

I. Are you merging from a feature branch into develop?

II. Summarize your PR

III. Include a sample plot

IV. List any TODOs or questions

lwgray commented Apr 12, 2019

pswaldia commented Apr 12, 2019 • edited

rebeccabilbro commented Apr 14, 2019

pswaldia commented Apr 14, 2019 • edited

arvkevi commented Apr 14, 2019

bbengfort commented Apr 15, 2019 • edited

bbengfort commented Apr 15, 2019

pswaldia commented Apr 15, 2019 • edited

arvkevi commented Apr 15, 2019

bbengfort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pswaldia Apr 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pswaldia Apr 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pswaldia Apr 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pswaldia commented Apr 18, 2019 • edited

bbengfort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pswaldia Apr 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pswaldia commented Apr 18, 2019 • edited

bbengfort commented Apr 19, 2019

pswaldia commented Apr 19, 2019 • edited

saurabhdaalia commented Apr 19, 2019

bbengfort left a comment

Choose a reason for hiding this comment

pswaldia commented Apr 22, 2019 • edited

arvkevi commented Apr 22, 2019

pswaldia commented Apr 22, 2019

bbengfort commented Apr 22, 2019

pswaldia commented Apr 22, 2019

arvkevi commented Jul 22, 2019 • edited

Kautumn06 commented Jul 22, 2019

pswaldia commented Apr 12, 2019 •

edited

pswaldia commented Apr 12, 2019 •

edited

pswaldia commented Apr 14, 2019 •

edited

bbengfort commented Apr 15, 2019 •

edited

pswaldia commented Apr 15, 2019 •

edited

pswaldia Apr 17, 2019 •

edited

pswaldia Apr 17, 2019 •

edited

pswaldia Apr 17, 2019 •

edited

pswaldia commented Apr 18, 2019 •

edited

pswaldia Apr 18, 2019 •

edited

pswaldia commented Apr 18, 2019 •

edited

pswaldia commented Apr 19, 2019 •

edited

pswaldia commented Apr 22, 2019 •

edited

arvkevi commented Jul 22, 2019 •

edited