Quora Topic Modelling using Latent Dirichlet Allocation(LDA) and NMF

In this project, I have used Quora questions dataset to implement Latent Dirichlet Allocation (LDA) on and assign a topic to out of 10 unsupervised modelled topics

Data

The data consists of 404,289 Quora questions.Following is a snapshot of the data:

Method

First, we create a TF-IDF matrix from the given questions
Second, we use the LatentDirichletAllocation from sklearn.decomposition to create a LDA model that will assign highest probability words from our vocabulary (which consists of unique words taken from all the questions in our dataset) 3)Eventually we select the topic which shows highest probability for each question for that particular question
We perform a similar approach for Non-Negative Matrix Factorization -->
We divide the TF-IDF matrix into two matrices :
1st) Topics Vs words and 2nd) words Vs question

Result

#0 --> Technical/Books/Movies related questions
#1 --> Looks related questions
#2 --> QnA related questions
#3 --> Social Media related questions
#4 --> Life related questions
#5 --> People/Nationality related questions
#6 --> Language/Programming related questions
#7 --> Politics related questions
#8 --> Finance related questions
#9 --> Daily time related questions

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Data		Data
Images		Images
Notebook		Notebook
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quora Topic Modelling using Latent Dirichlet Allocation(LDA) and NMF

Data

Method

Result

About

Releases

Packages

Languages

Aishwarya4823/Quora-Topic-Modelling-LDA-and-NMF

Folders and files

Latest commit

History

Repository files navigation

Quora Topic Modelling using Latent Dirichlet Allocation(LDA) and NMF

Data

Method

Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages