Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Birch clustering finding a single subcluster #141

Open
rth opened this issue Apr 21, 2017 · 1 comment
Open

Birch clustering finding a single subcluster #141

rth opened this issue Apr 21, 2017 · 1 comment

Comments

@rth
Copy link
Contributor

rth commented Apr 21, 2017

The following issue was reported, when using Birch clustering on a 55k document collection with the below parameters,

threshold = 0.7 also 0.3 
Words per title = 3
no of clusters = 20​

which yields

Click to expand the detailed traceback


/home/ubuntu/FreeDiscovery/freediscovery/externals/birch.py:610: UserWarning: Number of subclusters found (1) by Birch is less than (20). Decrease the threshold.
  % (len(centroids), self.n_clusters))
[2017-04-21 14:15:05,946] ERROR in app: Exception on /api/v0/clustering/birch [POST]
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/views.py", line 84, in view
    return self.dispatch_request(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask/views.py", line 149, in dispatch_request
    return meth(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask_apispec/annotations.py", line 115, in wrapped
    return wrapper(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask_apispec/wrapper.py", line 23, in __call__
    response = self.call_view(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/flask_apispec/wrapper.py", line 39, in call_view
    return self.func(*args, **kwargs)
  File "/home/ubuntu/FreeDiscovery/freediscovery/server/resources.py", line 597, in post
    cl.birch(threshold=threshold, **args)
  File "/home/ubuntu/FreeDiscovery/freediscovery/cluster/base.py", line 349, in birch
    return self._cluster_func(n_clusters, km, pars)
  File "/home/ubuntu/FreeDiscovery/freediscovery/cluster/base.py", line 227, in _cluster_func
    labels_).centroids_
  File "/home/ubuntu/miniconda3/envs/freediscovery-env/lib/python3.6/site-packages/sklearn/neighbors/nearest_centroid.py", line 116, in fit
    raise ValueError('y has less than 2 classes')
ValueError: y has less than 2 classes

This is probably due to min_similarity parameter being too low, but in any case we should print a more explicit warning message on how to fix this. The current situation is confusing because to decrease threashold in scikit-learn one has to increase the min_similarity parameter in the REST API.

@dagr1234
Copy link
Member

dagr1234 commented Apr 22, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants