This project basically cluseters data depending on the tokens it generates from preprocessing using nltk library. WordtoVec model is used for word embedding and then different clustering algo to find suitable cluster
The whole project is done on transac-nar-new.ipynb
The required technologies to run this project is included here at [requirements.txt] (https://github.com/Asif-droid/Internship/blob/main/requirements.txt)
- Kmeans
- Minibatch_Kmeans
- Bisecting_Kmeans
- Dbscan
- Hierarchy clustering
- Download or clone the repo
- open in local machine
- meet the requirements run- pip install -r requirements.txt
- Open the test_script file.
- Give locations of dataset and trained model
- Can adjust the values for Hierarchy clustering and Dbscan (defult is mx_d=1.5 for Hierarchy and eps=.55, min_samples=1 for dbscan)
- Dbscan and Hierarchy clustering doesnot need any pretrained model to cluster the data. It generates cluster depending on the given dataset
- Run the file
- For more clearificaion see transac-nar-new.ipynb