Clustering

Knn from scratch

Task:

Given a dataset of documents with content from 5 different fields ( namely busi- ness, entertainment, politics, sport, and tech ), cluster them using any clustering algorithm of your choice.
Do not use any libraries for this part. You are expected to code your clustering algorithm from scratch.
For feature extraction you can use the vectorizers provided by sklearn or by using the pre trained embeddings. ( Code snippet for the usage of these embeddings has been provided in the previous question ).
You might have to perform some pre-processing on the raw documents before you apply your algorithm.
We have provided ground truth document tags for the documents. Report accuracy score on these documents.
We will test your score on the documents for which the tags have not been provided.
In the dataset, the number after the ’ ’ symbol in the file name denotes the cluster label.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
q6.ipynb		q6.ipynb
q6.py		q6.py

Provide feedback