Skip to content

Commit

Permalink
Fixing bug: centroids failed for clusters with summary fields
Browse files Browse the repository at this point in the history
  • Loading branch information
mmerce committed Jan 29, 2015
1 parent cf318c2 commit 333e377
Show file tree
Hide file tree
Showing 5 changed files with 43 additions and 1 deletion.
6 changes: 6 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@
History
-------

1.10.5 (2014-01-29)
~~~~~~~~~~~~~~~~~~~

- Fixing bug: centroids failed when predicted from local clusters with
summary fields.

1.10.4 (2014-01-17)
~~~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 1 addition & 1 deletion bigml/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.10.4'
__version__ = '1.10.5'
3 changes: 3 additions & 0 deletions bigml/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ def __init__(self, cluster, api=None):
self.tag_clouds = {}
self.term_analysis = {}
fields = cluster['clusters']['fields']
summary_fields = cluster['summary_fields']
for field_id in summary_fields:
del fields[field_id]
for field_id, field in fields.items():
if field['optype'] == 'text':

Expand Down
17 changes: 17 additions & 0 deletions tests/features/compare_predictions.feature
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,23 @@ Feature: Compare Predictions
| ../data/iris_sp_chars.csv | 20 | 20 | 30 | {"fields": {}} |{"pétal.length":1, "pétal&width\u0000": 2, "sépal.length":1, "sépal&width": 2, "spécies": "Iris-setosa"} | Cluster 7 | 0.757736964835 |


Scenario: Successfully comparing centroids with summary fields:
Given I create a data source uploading a "<data>" file
And I wait until the source is ready less than <time_1> secs
And I create a dataset
And I wait until the dataset is ready less than <time_2> secs
And I create a cluster with options "<options>"
And I wait until the cluster is ready less than <time_3> secs
And I create a local cluster
When I create a centroid for "<data_input>"
Then the centroid is "<centroid>" with distance "<distance>"
And I create a local centroid for "<data_input>"
Then the local centroid is "<centroid>" with distance "<distance>"

Examples:
| data | time_1 | time_2 | time_3 | options | data_input | centroid | distance |
| ../data/iris.csv | 20 | 20 | 30 | {"summary_fields": ["sepal width"]} |{"petal length": 1, "petal width": 1, "sepal length": 1, "species": "Iris-setosa"} | Cluster 6 | 0.713698082167 |

Scenario: Successfully comparing predictions with proportional missing strategy for missing_splits models:
Given I create a data source uploading a "<data>" file
And I wait until the source is ready less than <time_1> secs
Expand Down
16 changes: 16 additions & 0 deletions tests/features/create_cluster-steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,22 @@ def i_create_a_cluster_from_dataset_list(step):
world.cluster = resource['object']
world.clusters.append(resource['resource'])


@step(r'I create a cluster with options "(.*)"$')
def i_create_a_cluster_with_options(step, options):
dataset = world.dataset.get('resource')
options = json.loads(options)
options.update({'seed': 'BigML',
'cluster_seed': 'BigML',
'k': 8})
resource = world.api.create_cluster(
dataset, options)
world.status = resource['code']
assert world.status == HTTP_CREATED
world.location = resource['location']
world.cluster = resource['object']
world.clusters.append(resource['resource'])

@step(r'I wait until the cluster status code is either (\d) or (-\d) less than (\d+)')
def wait_until_cluster_status_code_is(step, code1, code2, secs):
start = datetime.utcnow()
Expand Down

0 comments on commit 333e377

Please sign in to comment.