- Implement the Bisecting K-Means algorithm.
- Deal with text data (news records) in document-term sparse matrix format.
- Design a proximity function for text data.
- Think about the Curse of Dimensionality.
- Think about best metrics for evaluating clustering solutions.
Text clustering was performed successfully implementing K-Means and Bisecting K-Means algorithm. Same was evaluated using Silhouette Metric. The data was dealt using document-term sparse matrix and curse of dimensionality was cured using singular value decomposition.