# Topic Modeling Assessment Project

Welcome to your Topic Modeling Assessment! For this project you will be working with a dataset of over 400,000 quora questions that have no labeled cateogry, and attempting to find 20 cateogries to assign these questions to. The .csv file of these text questions can be found underneath the Topic-Modeling folder.

Remember you can always check the solutions notebook and video lecture for any questions.

#### Task: Import pandas and read in the quora_questions.csv file.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('quora_questions.csv')

In [3]:
df.head()

Unnamed: 0,Question
0,What is the step by step guide to invest in sh...
1,What is the story of Kohinoor (Koh-i-Noor) Dia...
2,How can I increase the speed of my internet co...
3,Why am I mentally very lonely? How can I solve...
4,"Which one dissolve in water quikly sugar, salt..."


# Preprocessing

#### Task: Use TF-IDF Vectorization to create a vectorized document term matrix. You may want to explore the max_df and min_df parameters.

In [4]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [5]:
tfidf = TfidfVectorizer(max_df=0.95, min_df=2, stop_words='english')

In [6]:
dtm = tfidf.fit_transform(df['Question']) # dtm = document term matrix

In [7]:
dtm

<404289x38669 sparse matrix of type '<class 'numpy.float64'>'
	with 2002912 stored elements in Compressed Sparse Row format>

# Non-negative Matrix Factorization

#### TASK: Using Scikit-Learn create an instance of NMF with 20 expected components. (Use random_state=42)..

In [8]:
from sklearn.decomposition import NMF

In [9]:
nmf_model = NMF(n_components=20,random_state=42)

In [10]:
nmf_model.fit(dtm)



#### TASK: Print our the top 15 most common words for each of the 20 topics.

In [12]:
for index,topic in enumerate(nmf_model.components_):
    print(f'TOP 15 MOST COMMON WORDS FOR TOPIC #{index+1}')
    print([tfidf.get_feature_names_out()[i] for i in topic.argsort()[-15:]])
    print('\n')

TOP 15 MOST COMMON WORDS FOR TOPIC #1
['thing', 'read', 'place', 'visit', 'places', 'phone', 'buy', 'laptop', 'movie', 'ways', '2016', 'books', 'book', 'movies', 'best']


TOP 15 MOST COMMON WORDS FOR TOPIC #2
['majors', 'recruit', 'sex', 'looking', 'differ', 'use', 'exist', 'really', 'compare', 'cost', 'long', 'feel', 'work', 'mean', 'does']


TOP 15 MOST COMMON WORDS FOR TOPIC #3
['add', 'answered', 'needing', 'post', 'easily', 'improvement', 'delete', 'asked', 'google', 'answers', 'answer', 'ask', 'question', 'questions', 'quora']


TOP 15 MOST COMMON WORDS FOR TOPIC #4
['using', 'website', 'investment', 'friends', 'black', 'internet', 'free', 'home', 'easy', 'youtube', 'ways', 'earn', 'online', 'make', 'money']


TOP 15 MOST COMMON WORDS FOR TOPIC #5
['balance', 'earth', 'day', 'death', 'changed', 'live', 'want', 'change', 'moment', 'real', 'important', 'thing', 'meaning', 'purpose', 'life']


TOP 15 MOST COMMON WORDS FOR TOPIC #6
['reservation', 'engineering', 'minister', 'preside

#### TASK: Add a new column to the original quora dataframe that labels each question into one of the 20 topic categories.

In [13]:
topic_results = nmf_model.transform(dtm)

In [14]:
topic_results

array([[2.74927015e-04, 5.88014577e-05, 6.17412189e-06, ...,
        6.97429434e-04, 2.13458466e-04, 0.00000000e+00],
       [1.95705655e-04, 8.80743307e-05, 0.00000000e+00, ...,
        0.00000000e+00, 5.51003101e-05, 1.05546944e-05],
       [1.77372166e-04, 6.43938862e-04, 1.60463804e-03, ...,
        3.02446249e-03, 1.05890709e-03, 1.23898603e-03],
       ...,
       [0.00000000e+00, 1.61570029e-05, 5.23565129e-06, ...,
        0.00000000e+00, 2.76224751e-06, 0.00000000e+00],
       [5.34282407e-04, 1.01028959e-03, 0.00000000e+00, ...,
        1.28754707e-04, 7.76842889e-04, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 1.25204924e-04]])

In [15]:
topic_results.argmax(axis=1) # Shows the most likely topic per row

array([ 5, 16, 17, ..., 11, 11,  9], dtype=int64)

In [16]:
df['Topic'] = topic_results.argmax(axis=1)
df.head(10)

Unnamed: 0,Question,Topic
0,What is the step by step guide to invest in sh...,5
1,What is the story of Kohinoor (Koh-i-Noor) Dia...,16
2,How can I increase the speed of my internet co...,17
3,Why am I mentally very lonely? How can I solve...,11
4,"Which one dissolve in water quikly sugar, salt...",14
5,Astrology: I am a Capricorn Sun Cap moon and c...,1
6,Should I buy tiago?,0
7,How can I be a good geologist?,10
8,When do you use シ instead of し?,19
9,Motorola (company): Can I hack my Charter Moto...,17


<b style="color:yellow">Saving the dataframe with topics</b>

In [17]:
df.to_csv('quora_questions_with_topics.csv')

<b style="color:yellow">Getting questions per topic</b>

In [21]:
# Print sample questions per topic
pd.set_option('display.max_colwidth', None)  # or 199
for i in range(20):
  df_topic = df[df['Topic'] == i]
  display(df_topic.head(10))

Unnamed: 0,Question,Topic
6,Should I buy tiago?,0
19,Which is the best digital marketing institution in banglore?,0
34,What is the best travel website in spain?,0
53,What is the best/most memorable thing you've ever eaten and why?,0
60,How do I download content from a kickass torrent without registration?,0
66,What is the best book ever made?,0
88,Which is the best gaming laptop under 60k INR?,0
90,What is the best reference book for physics class 11th?,0
98,What are the best associate product manager (APM) programs that someone in their early 20s can join to learn product management and have a rewarding career in the company?,0
103,Which is the best fiction novel of 2016?,0


Unnamed: 0,Question,Topic
5,Astrology: I am a Capricorn Sun Cap moon and cap rising...what does that say about me?,1
14,"What are the laws to change your status from a student visa to a green card in the US, how do they compare to the immigration laws in Canada?",1
16,What does manipulation mean?,1
27,Does society place too much importance on sports?,1
33,"Does the United States government still blacklist (employment, etc.) some United States citizens because their political views?",1
44,What universities does Rexnord recruit new grads from? What majors are they looking for?,1
47,"What are the stages of breaking up between couple? I mean, what happens after the breaking up emotionally whether its a male or female?",1
69,At what cost does so much privacy as in Germany come? What else is lost to gain so much privacy?,1
95,How does 3D printing work?,1
114,What does taking advantage of someone mean?,1


Unnamed: 0,Question,Topic
10,Method to find separation of slits using fresnel biprism?,2
18,Why are so many Quora users posting questions that are readily answered on Google?,2
22,What are the questions should not ask on Quora?,2
109,How Google helps in spam ranking adjustment of the search results?,2
120,Why my question was marked as needing imrovement?,2
156,"If I do not monetize YouTube videos & upload copyright content, then are there chances that Google may block my account?",2
158,What does the Quora website look like to members of Quora moderation?,2
159,Why nobody answer my questions in Quora?,2
182,What if I hired two private eyes and ordered them to follow each other?,2
220,How do I earn from Quora?,2


Unnamed: 0,Question,Topic
11,How do I read and find my YouTube comments?,3
12,What can make Physics easy to learn?,3
28,What is best way to make money online?,3
42,"Can I make 50,000 a month by day trading?",3
48,What are some examples of products that can be make from crude oil?,3
49,How do I make friends.,3
52,Nd she is always sad?,3
78,How can I make money through the Internet?,3
164,How can I transfer money from Skrill to a PayPal account?,3
224,Will there be another billion dollar lottery Jackpot?,3


Unnamed: 0,Question,Topic
79,What is purpose of life?,4
91,"National Institute of Technology, Kurukshetra: How is the social life at NITK, Surathkal?",4
115,Why do we cry when we are happy and when we are sad?,4
139,What is the ideal life after retirement?,4
169,How do you make life suit you and stop life from abusing you mentally and emotionally?,4
177,"Between Robert De Niro and Al Pacino, who is more successful?",4
188,Is it possible to pursue many different things in life?,4
209,Why is creativity important?,4
299,What the meaning of this all life?,4
353,What are the most important books ever written?,4


Unnamed: 0,Question,Topic
0,What is the step by step guide to invest in share market in india?,5
54,How GST affects the CAs and tax officers?,5
68,What is your review of Performance Testing?,5
82,"If someone wants to open a commercial FM radio station in any city of India, how much does it cost and what is the procedure?",5
84,What are some of the high salary income jobs in the field of biotechnology?,5
89,What is your review of The Next Warrior: Proving Grounds - Part 9?,5
100,Will there really be any war between India and Pakistan over the Uri attack? What will be its effects?,5
129,How do I access Torbox in India?,5
140,What is our stance against Pakistan?,5
150,How many years Britain ruled India?,5


Unnamed: 0,Question,Topic
51,"Will a Blu Ray play on a regular DVD player? If so, how?",6
55,How difficult is it get into RSI?,6
59,What are the best ways to learn French?,6
65,What is Java programming? How To Learn Java Programming Language ?,6
124,What is the alternative to machine learning?,6
219,How can I learn computer security?,6
221,What is my puk code?,6
242,"In the play ""A Raisin in the Sun"", why do Walter ad Beneatha argue?",6
257,What are the best YouTube channels to learn medicine?,6
319,Why should I learn web design?,6


Unnamed: 0,Question,Topic
15,What would a Trump presidency mean for current international master’s students on an F1 visa?,7
37,"When a girlfriend asks her boyfriend ""Why did you choose me? What makes you want to be with me?"", what should one reply to her?",7
46,How did Darth Vader fought Darth Maul in Star Wars Legends?,7
64,Where can I find a European family office database?,7
97,Why did harry become a horcrux?,7
101,Did Ronald Reagan have a mannerism in his speech?,7
117,"Did you find the ending of the novel ""1984"" depressing?",7
142,Which one is better polo diesel or grand i10 petrol?,7
155,Could Snoke secretly be Darth Maul?,7
162,What are stereotypes about the United Kingdom?,7


Unnamed: 0,Question,Topic
32,What Game of Thrones villain would be the most likely to give you mercy?,8
93,What causes a nightmare?,8
102,What were the war strategies of the Union and the Confederates during the Civil War?,8
111,Is USA the most powerful country of the world?,8
127,Can excessive amounts of Vitamin C cause me to have a miscarriage?,8
130,What are some yakshini mantras?,8
185,"What was the significance of the battle of Somme, and how did this battle compare and contrast to the Battle of Rostov?",8
229,How can we make the world a better place to live in for the future generations?,8
258,Did Swami Vivekananda ever eat non-veg or egg during his journey around the world?,8
291,"Is there an end to the universe, and if not, is the universe infinite?",8


Unnamed: 0,Question,Topic
13,What was your first sexual experience like?,9
17,Why do girls want to be friends with the guy they reject?,9
20,Why do rockets look white?,9
30,What's one thing you would like to do better?,9
71,What is a narcissistic personality disorder?,9
80,When will the BJP government strip all the Muslims and the Christians of the Indian citizenship and put them on boats like the Rohingya's of Burma?,9
96,What was it like to attend Caltech with Jeremy Ehrhardt?,9
116,Why do some girls like to stick their their tongues out when taking pictures?,9
128,What is it like to work in Asahi India Glass? What will be the pay scale after one or two years?,9
133,Is it normal for older men to be attracted to young women?,9


Unnamed: 0,Question,Topic
7,How can I be a good geologist?,10
25,What are some tips on making it through the job interview process at Medicines?,10
29,How should I prepare for CA final law?,10
43,Is being a good kid and not being a rebel worth it in the long run?,10
50,Is Career Launcher good for RBI Grade B preparation?,10
57,What are some good rap songs to dance to?,10
141,What are good websites for escorts?,10
186,What is the most creative college admissions essay you've read?,10
190,Which business is good start up in Hyderabad?,10
253,What are the qualities of a good leader?,10


Unnamed: 0,Question,Topic
3,Why am I mentally very lonely? How can I solve it?,11
83,Why do Swiss despise Asians?,11
86,"What were the major effects of the cambodia earthquake, and how do these effects compare to the Kamchatca earthquakes in 1952?",11
105,Will the recent demonetisation results in higher GDP? If so how much?,11
165,Which is the best earphone with deep bass under 1000?,11
199,What are the effects of demonitization of 500 and 1000 rupees notes on real estate sector?,11
208,"Do inkjet printers use color ink when printing black and white documents? If so, why?",11
260,"What exactly is the ""Common Core Initiative/Standards"" and what are the pros and cons?",11
295,What will be the effect of banning 500 and 1000 notes on stock markets in India?,11
321,How will Indian GDP be affected from banning 500 and 1000 rupees notes?,11


Unnamed: 0,Question,Topic
62,How is the new Harry Potter book 'Harry Potter and the Cursed Child'?,12
67,Can we ever store energy produced in lightning?,12
75,If I fire a bullet backward from an aircraft going faster than the bullet; will the bullet be going backwards?,12
99,Why is the number for Skype at 1-855-425-3768 always busy?,12
145,Does Fab currently offer new employees stock options or RSUs?,12
160,What is the funniest joke you know?,12
166,Are government employees eligible to Sukanya Samrudi Yojana?,12
187,"What would happen if you cover one of your eyes with an eye patch for one year, then take the patch off?",12
192,Who was the wife of Lord Krishna?,12
226,How do I create a new shell in a new terminal using C programming (Linux terminal)?,12


Unnamed: 0,Question,Topic
36,I'm a 19-year-old. How can I improve my skills or what should I do to become an entrepreneur in the next few years?,13
72,How I can speak English fluently?,13
92,What are some of the best romantic movies in English?,13
110,Where can I watch gonulcelen with english subtitles?,13
167,Which is correct - 'Looking forward to speak with you' or 'Look forward to speak with you'?,13
198,How can I become more fluent in Chinese?,13
432,How I start prepare for UGC net English literature latest syllabus?,13
454,"What is an alternative for the word ""is""?",13
521,What words rank the highest on Dictionary.com's difficulty index?,13
568,How do I start writing again?,13


Unnamed: 0,Question,Topic
4,"Which one dissolve in water quikly sugar, salt, methane and carbon di oxide?",14
23,How much is 30 kV in HP?,14
40,Why do Slavs squat?,14
147,In how many ways can we distribute 10 identical looking pencils to 4 students so that each student gets at least one pencil?,14
153,At what age should someone lose their virginity?,14
236,How do you potty train a 4 months Pitbull?,14
241,"There are 8 balls. 7 of them weigh the same. 1 of them has a different weight, (you don't know if it's heavier or lighter). How do you find the odd ball with 2 weighs?",14
247,Why can flash run so fast?,14
264,Why do Slavs squat?,14
286,How successful was odd even plan?,14


Unnamed: 0,Question,Topic
24,What does it mean that every time I look at the clock the numbers are the same?,15
31,What are some special cares for someone with a nose that gets stuffy during the night?,15
39,What is the stall speed and AOA of an f-14 with wings fully swept back?,15
41,When can I expect my Cognizant confirmation mail?,15
74,Who is the richest gambler of all time and how can I reach his level?,15
81,What is the right etiquette for wishing a Jehovah Witness happy birthday?,15
106,Have you ever heard of travel hacking?,15
119,Should the toothbrush be wet or dry before applying the toothpaste?,15
123,What is the greatest mystery in the universe?,15
149,Is 7 days too late for rabies vaccine after a possible non-bite exposure?,15


Unnamed: 0,Question,Topic
1,What is the story of Kohinoor (Koh-i-Noor) Diamond?,16
56,Who is israil friend?,16
107,What's the difference between love and pity?,16
181,How can I stop being addicted to love?,16
243,What are the signs of an ultra smart person playing dumb?,16
268,How do I love my body as a guy?,16
272,My ex-girlfriend is suffering from malaria. I have a deep urge of visiting her but I have my placements on. I have no idea how to be with her. What should I do?,16
276,How far would you go for love? Should I wait for the one I love ir move on?,16
282,How do I get over a friend with whom I haven't talked in 3 years but still miss him?,16
361,Are Canada Geese really Canadian?,16


Unnamed: 0,Question,Topic
2,How can I increase the speed of my internet connection while using a VPN?,17
9,Motorola (company): Can I hack my Charter Motorolla DCX3400?,17
38,How do we prepare for UPSC?,17
45,What is the quickest way to increase Instagram followers?,17
58,I was suddenly logged off Gmail. I can't remember my Gmail password and just realized the recovery email is no longer alive. What can I do?,17
76,How do I prevent breast cancer?,17
77,How do I log out of my Gmail account on my friend's phone?,17
85,How can I increase my height after 21 also?,17
104,Can I recover my email if I forgot the password?,17
112,How do you obtain an instant ulcer pain relief?,17


Unnamed: 0,Question,Topic
26,What is web application?,18
61,Is it normal to have a dark ring around the iris of my eye?,18
70,What are the types of immunity?,18
73,How helpful is QuickBooks' auto data recovery support phone number to recover your corrupted data files?,18
87,What is the difference between sincerity and fairness?,18
94,What is abstract expressionism in painting?,18
108,How competitive is the hiring process at Republic Bank?,18
121,What is the difference between a neutral state and a buffer state?,18
122,What mineral holds the highest electrical charge?,18
134,What is the strongest structure or strongest shape under compression?,18


Unnamed: 0,Question,Topic
8,When do you use シ instead of し?,19
21,What's causing someone to be jealous?,19
35,Why do some people think Obama will try to take their guns away?,19
63,Why do I always get depressed?,19
113,What do you think China food?,19
118,What are some mind-blowing computer tools that exist that most people don't know about?,19
131,Was six party talks successful?,19
135,Who are the Rohingya Muslims?,19
151,How can I stop being afraid of working?,19
163,How do I use Twitter as a business source?,19
