Skip to content

Aishwarya4823/Quora-Topic-Modelling-LDA-and-NMF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quora Topic Modelling using Latent Dirichlet Allocation(LDA) and NMF

In this project, I have used Quora questions dataset to implement Latent Dirichlet Allocation (LDA) on and assign a topic to out of 10 unsupervised modelled topics

Data

The data consists of 404,289 Quora questions.Following is a snapshot of the data:
Image1

Method

  1. First, we create a TF-IDF matrix from the given questions
  2. Second, we use the LatentDirichletAllocation from sklearn.decomposition to create a LDA model that will assign highest probability words from our vocabulary (which consists of unique words taken from all the questions in our dataset) 3)Eventually we select the topic which shows highest probability for each question for that particular question
  3. We perform a similar approach for Non-Negative Matrix Factorization -->
    We divide the TF-IDF matrix into two matrices :
    1st) Topics Vs words and 2nd) words Vs question

Result

Image2
#0 --> Technical/Books/Movies related questions
#1 --> Looks related questions
#2 --> QnA related questions
#3 --> Social Media related questions
#4 --> Life related questions
#5 --> People/Nationality related questions
#6 --> Language/Programming related questions
#7 --> Politics related questions
#8 --> Finance related questions
#9 --> Daily time related questions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published