diff --git a/_episodes/01-working-with-openrefine.md b/_episodes/01-working-with-openrefine.md index 558cbcdf..d00d9d6d 100644 --- a/_episodes/01-working-with-openrefine.md +++ b/_episodes/01-working-with-openrefine.md @@ -91,11 +91,7 @@ and how many times that value occurs in the column. ## Cluster -In OpenRefine, clustering means "finding groups of different values that might be alternative representations of the same thing". For example, the two strings "New York" and "new york" are very likely to refer to the same concept and just have capitalization differences. Likewise, "Gödel" and "Godel" probably refer to the same person. Clustering is a very powerful tool for cleaning datasets which contain misspelled or mistyped entries. -OpenRefine has several clustering algorithms built in. Experiment with them, and learn more about these algorithms and how they work. - -In OpenRefine, clustering refers to the operation of "finding groups of different values that might be alternative representations of the same thing". For example, the two strings "New York" and "new york" are very likely to refer to the same concept and just have capitalization differences. Likewise, "Gödel" and "Godel" probably refer to the same person. - +In OpenRefine, clustering means "finding groups of different values that might be alternative representations of the same thing". For example, the two strings "New York" and "new york" are very likely to refer to the same concept and just have capitalization differences. Likewise, "Gödel" and "Godel" probably refer to the same person. Clustering is a very powerful tool for cleaning datasets which contain misspelled or mistyped entries. OpenRefine has several clustering algorithms built in. Experiment with them, and learn more about these algorithms and how they work. > - In the scientificName Text Facet we created in the step above, click the _Cluster_ button. > - In the resulting pop-up window, you can change the Method and the Keying Function. Try different combinations to