Clustering_News_articles_on-facebook

This repository contains project on clustering of news articles and headlines that are being shared on Facebook.

Dataset- https://drive.google.com/file/d/1NbB053Q4MulTlzxINrlLe9VyhD9GzF0J/view?usp=sharing

Youtube Wakthrough- link to be updated

Dataset snapshot

Data distribution among topics-

Text cleaning procedure

dropna remove stopwords and words with length less than 2 removed numerical text lemmatized words

Dataset snapshot after cleaning

Clustered news articles based on three vectorisation techniques for 2 clustering algorithms

To find the optimum number of clusters, Elbow curve method has been employed.

For Dimensionality reduction we used T-SNE

For converting word2vec to document vector a new method MIN-MAX word vector has been employed.

Vectorisation

TF-IDF - some parameters 1,3 ngrams,min_df-0.15, max_features-10000
WORD2VEC - Gensim google word2vec
DOC2VEC

Clustering Algorithm 1.K-means 2. Agglomerative

Below are the results of clusters

TFIDF KMEANS

TFIDF Agglomerative

Doc2vec KMEANS

Doc2vec Agglomerative

Word2vec KMEANS

Word2vec Agglomerative

Extra Clustering with kmeans cluster tfidf technique and MDS dimensionality reduction.

Below are the stats of clusters

Observations:

With 6 type of combinations- using TF-IDF,Word2vec and Doc2vec, there results are quite different.

Doc2vVec with neither K-Means nor Agglomerative clustering algorithms performed well. They both failed to cluster topics.

TF-IDF comparitevely performed well than Doc2vec but failed to cluster topics in 1-2 categories with both K-means and agglomerative.

Word2Vec performed the best compared to both TF-IDF and Doc2Vec, Compared to K-means Agglomerative performed well and clustered topics appropriately with the sample.

Word2Vec with K-means performed very Well.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Data		Data
Images		Images
MLModel		MLModel
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

Images

Images

MLModel

MLModel

README.md

README.md

Repository files navigation

Clustering_News_articles_on-facebook

About

Releases

Packages

Languages

ARGULASAISURAJ/Clustering_News_articles_on-facebook

Folders and files

Latest commit

History

Repository files navigation

Clustering_News_articles_on-facebook

About

Resources

Stars

Watchers

Forks

Languages