Kmeans clustering exception #259

khan990 · 2016-06-29T16:31:07Z

I am getting the following exception
I believe my input is totally correct. I was computing cosine similarity over documents with TFIDF algorithm.

input to cluster.compute() is double[][] where 0>= input[i][j] <= 1
int[] index = cluster.Compute(inputs);

using https://github.com/primaryobjects/TFIDF for tfidf

System.InvalidOperationException was unhandled
HResult=-2146233079
Message=Generated value is not between 0 and 1.
Source=Accord.Statistics
StackTrace:
at Accord.Statistics.Distributions.Univariate.GeneralDiscreteDistribution.Random(Double[] probabilities)
at Accord.MachineLearning.KMeans.Randomize(Double[][] points, Boolean useSeeding)
at Accord.MachineLearning.KMeans.Compute(Double[][] data, Double threshold, Boolean computeInformation)
at TFIDFExample.Program.Main(String[] args) in C:\Users\jasim\Documents\Visual Studio 2015\Projects\TFIDF_TwitterClustering\TFIDFExample\Program.cs:line 49
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:

The text was updated successfully, but these errors were encountered:

zgrkpnr · 2016-06-30T19:24:10Z

Did you validate your input for every element? Just do the following.

var max = inputs.Select(i => i.Max()).Max();
var min = inputs.Select(i => i.Min()).Min();

After you check values, inform your result here.

khan990 · 2016-06-30T20:54:49Z

Yes, I did validate them.
Min and Max answers lie within 0>= input[i][j] <= 1 as mentioned before.

What I noticed that, that when I use kmean for 1000 x 15000 double. It works just fine.
But when I used it for 10000 x 15000, it gave me the above error.

cesarsouza · 2016-09-04T17:46:27Z

Hi all,

It is very possible that this issue could have been fixed in the latest release of the framework (3.2, released a few days ago). If it is still possible, would it be possible to let us know if you are still experiencing the issue?

Thanks!

Regards,
Cesar

jbrant · 2016-11-11T08:33:14Z

Hi Cesar,

I'm actually on version 3.3 and experienced the same issue as described above clustering on anywhere from 1,000 - 2,000 dimensions. The data I'm trying to cluster are in the range 0 - 50.

Thanks!
Jonathan

khan990 · 2016-11-11T10:42:37Z

I reduced the dimensions, and it worked...
try doing that, I can understand, it may not be an option for u.
but give it a try...

jbrant · 2016-11-14T09:03:10Z

Hi Khan, thank you very much for your reply. Reducing the dimensions did work; however, in my particular situation, I'm unfortunately unable to incur that loss in fidelity. That being said, I switched to uniform seeding (as opposed to the default kmeans++) and didn't have any issues, even at rather high dimensionality.

Thanks again,
Jonathan

ytakashina · 2017-04-20T06:53:12Z

I had the same issue.

This exception is originally thrown by GeneralDiscreteDistribution.Random(), which is used in ClusterCollection.Randomize() for k-means++ seeding.

The exception will be thrown if the cumulativeSum in GeneralDiscreteDistribution.Random() was under the value uniform which is randomly generated between [0, 1). Maybe something is wrong with the calculation of D in ClusterCollection.Randomize().

version: 3.4.0

… distances to probabilities in the K-Means++ initialization. Updates GH-259: K-means clustering exception

cesarsouza · 2017-07-02T09:47:35Z

I have not been able to reproduce the error myself yet, but this is probably happening due to a loss of precision when computing the discrete probability weights, making the weight vector not sum up to one. One of the possible reasons for that is the probabilities for each point becoming too small.

I have added some handling to sidestep this issue and also present some better error messages.

Regards,
Cesar

cesarsouza · 2017-07-07T20:31:13Z

Should have been fixed in release 3.6.0.

Afgankhan · 2017-07-09T17:59:18Z

hello i want an array of clustering data points like array[10]={1,2,3,4,5,6,7,8,9}
after clustering first show first clusters data indexes e.g number of clusters=2,first cluster data{2,5,9} and 2nd cluster data{1,3,4,6,7,8}
resultant array={12,5,9,1,3,4,6,7,9}
Thanks

cesarsouza added a commit that referenced this issue Jun 30, 2017

Mitigating the impact of a numerical precision issue when normalizing…

0b8aa30

… distances to probabilities in the K-Means++ initialization. Updates GH-259: K-means clustering exception

cesarsouza added the pending-release label Jul 2, 2017

cesarsouza added a commit that referenced this issue Jul 2, 2017

Adding a trace warning message for GH-259.

7fdad28

cesarsouza closed this as completed Jul 7, 2017

cesarsouza removed the pending-release label Jul 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kmeans clustering exception #259

Kmeans clustering exception #259

khan990 commented Jun 29, 2016 •

edited

zgrkpnr commented Jun 30, 2016 •

edited

khan990 commented Jun 30, 2016

cesarsouza commented Sep 4, 2016

jbrant commented Nov 11, 2016 •

edited

khan990 commented Nov 11, 2016

jbrant commented Nov 14, 2016

ytakashina commented Apr 20, 2017 •

edited

cesarsouza commented Jul 2, 2017

cesarsouza commented Jul 7, 2017

Afgankhan commented Jul 9, 2017

Kmeans clustering exception #259

Kmeans clustering exception #259

Comments

khan990 commented Jun 29, 2016 • edited

zgrkpnr commented Jun 30, 2016 • edited

khan990 commented Jun 30, 2016

cesarsouza commented Sep 4, 2016

jbrant commented Nov 11, 2016 • edited

khan990 commented Nov 11, 2016

jbrant commented Nov 14, 2016

ytakashina commented Apr 20, 2017 • edited

cesarsouza commented Jul 2, 2017

cesarsouza commented Jul 7, 2017

Afgankhan commented Jul 9, 2017

khan990 commented Jun 29, 2016 •

edited

zgrkpnr commented Jun 30, 2016 •

edited

jbrant commented Nov 11, 2016 •

edited

ytakashina commented Apr 20, 2017 •

edited