Skip to content

Commit

Permalink
remove redundant language about clustering
Browse files Browse the repository at this point in the history
Addresses #66
  • Loading branch information
ErinBecker committed Mar 28, 2017
1 parent 9137cbf commit e02235c
Showing 1 changed file with 1 addition and 5 deletions.
6 changes: 1 addition & 5 deletions episodes/01-working-with-openrefine.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,11 +91,7 @@ and how many times that value occurs in the column.

## Cluster

In OpenRefine, clustering means "finding groups of different values that might be alternative representations of the same thing". For example, the two strings "New York" and "new york" are very likely to refer to the same concept and just have capitalization differences. Likewise, "Gödel" and "Godel" probably refer to the same person. Clustering is a very powerful tool for cleaning datasets which contain misspelled or mistyped entries.
OpenRefine has several clustering algorithms built in. Experiment with them, and learn more about these algorithms and how they work.

In OpenRefine, clustering refers to the operation of "finding groups of different values that might be alternative representations of the same thing". For example, the two strings "New York" and "new york" are very likely to refer to the same concept and just have capitalization differences. Likewise, "Gödel" and "Godel" probably refer to the same person.

In OpenRefine, clustering means "finding groups of different values that might be alternative representations of the same thing". For example, the two strings "New York" and "new york" are very likely to refer to the same concept and just have capitalization differences. Likewise, "Gödel" and "Godel" probably refer to the same person. Clustering is a very powerful tool for cleaning datasets which contain misspelled or mistyped entries. OpenRefine has several clustering algorithms built in. Experiment with them, and learn more about these algorithms and how they work.

> - In the scientificName Text Facet we created in the step above, click the _Cluster_ button.
> - In the resulting pop-up window, you can change the Method and the Keying Function. Try different combinations to
Expand Down

0 comments on commit e02235c

Please sign in to comment.