# Homework 2 (Due Thursday Dec 1, 6:29pm PST)

Please submit as a notebook in the format `HW2_FIRSTNAME_LASTNAME_USCID.ipynb` in a group chat to me and the TAs.

Your `USCID` is your student 10-digit ID.

### Part I.  Topic Modelling and Analysis (5pts)

Pick from **one** of the dataset options below:
* **Negative McDonalds Yelp reviews**: `datasets/mcdonalds-yelp-negative-reviews.csv`
* **[Top 5000 Udemy courses](https://www.kaggle.com/datasets/90eededa5561eee7f62c0e68ecdad14c2bdb58bc923834067025dee655a6083e?resource=download)** - a Kaggle dataset of the course descriptions of the top 5000 Udemy courses in 2022: `datasets/top5000_udemy.csv`

In your notebook, explore the data and perform topic modelling. You may use any vectorization or text preprocessing techniques we have discussed.

In order to earn full credit, you must:

* Show the **# of topics you tried, and explain why you ultimately decided on the final #**.
* Demonstrate **adequate text preprocessing (there are likely obvious stopwords / fuzzy matching / regex groupings that can be done to improve the final results)** - show what you tried.
* In 2-3 sentences: A **business analysis of these topics - what do they reveal as actionable next steps or insights for McDonalds or Udemy?** Please be specific in your recommendations/insights.
    - **Not specific**: *We recommend Amazon look into the quality of their toys, since the reviews show disatisfaction with the value of their product.*
    - **Specific**: *Amazon should explore more durable batteries/hardwares. For example, X% of reviews mention that the toys' batteries were broken or immediately died. This is part of a larger theme of components not being ready to use out the box, which often leads to disappointment on holiday occasions when children open up their gifts. See the following document snippets as examples:...*

#### 1. Loading Data 

In [112]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
reviews = pd.read_csv("../datasets/mcdonalds-yelp-negative-reviews.csv", encoding='latin-1')
text = reviews["review"].values

In [174]:
city_list = list(reviews['city'].unique())

In [113]:
reviews.head()

Unnamed: 0,_unit_id,city,review
0,679455653,Atlanta,"I'm not a huge mcds lover, but I've been to better ones. This is by far the worst one I've ever been too! It's filthy inside and if you get drive through they completely screw up your order every time! The staff is terribly unfriendly and nobody seems to care."
1,679455654,Atlanta,"Terrible customer service. I came in at 9:30pm and stood in front of the register and no one bothered to say anything or help me for 5 minutes. There was no one else waiting for their food inside either, just outside at the window. I left and went to Chickfila next door and was greeted before I was all the way inside. This McDonalds is also dirty, the floor was covered with dropped food. Obviously filled with surly and unhappy workers."
2,679455655,Atlanta,"First they ""lost"" my order, actually they gave it to someone one else than took 20 minutes to figure out why I was still waiting for my order.They after I was asked what I needed I replied, ""my order"".They asked for my ticket and the asst mgr looked at the ticket then incompletely filled it.I had to ask her to check to see if she filled it correctly.She acted as if she couldn't be bothered with that so I asked her again.She begrudgingly checked to she did in fact miss something on the ticket.So after 22 minutes I finally had my breakfast biscuit platter.As I left an woman approached and identified herself as the manager, she was dressed as if she had just awoken in an old t-shirt and sweat pants.She said she had heard what happened and said she'd take care of it.Well why didn't she intervene when she saw I was growing annoyed with the incompetence?"
3,679455656,Atlanta,I see I'm not the only one giving 1 star. Only because there is not a -25 Star!!! That's all I need to say!
4,679455657,Atlanta,"Well, it's McDonald's, so you know what the food is. This review reflects solely on the poor service. I have been to this location countless times over the years. They consistently fail on the service end of things. The order takers tend to be rude, no smiles, and a lot of ""sighs"" and ""lip smacking"" when you talk to them. So why go back you ask? This store benefits from being the only place to eat in this area. The next stop is at least 12 minutes away on the other side of town. Also I strongly believe in 2nd chances and know that not every business can satisfy everyone 100% of the time. I have given them many chances at earning a positive review. I could not recommend this location any less. If you can wait, take a pass. There are better McDonald's stores in Griffin, GA."


### 2. Data Cleaning and Text Preprocessing

In [114]:
reviews['reviews_processed'] = reviews['review']

# Remove punctuation
from textacy.preprocessing.remove import punctuation
reviews['reviews_processed'] = reviews['reviews_processed'].apply(punctuation)

# Convert to lowercase 
reviews['reviews_processed'] = reviews['reviews_processed'].map(lambda x: x.lower())

# Replace common entities/concepts 
from textacy.preprocessing.replace import urls, hashtags, numbers, emails, emojis, currency_symbols
reviews['reviews_processed'] = reviews['reviews_processed'].\
 apply(urls).\
 apply(hashtags).\
 apply(currency_symbols).\
 apply(emojis).\
 apply(emails)
# apply(numbers)
# we won't remove numbers because it gives us valuable information 

# Remove or normalize undesired text elements 
#from collections import Counter
#from textacy.preprocessing.normalize import quotation_marks, bullet_points
#quotes = ['"','“','”']

reviews.head(10)


Unnamed: 0,_unit_id,city,review,reviews_processed
0,679455653,Atlanta,"I'm not a huge mcds lover, but I've been to better ones. This is by far the worst one I've ever been too! It's filthy inside and if you get drive through they completely screw up your order every time! The staff is terribly unfriendly and nobody seems to care.",i m not a huge mcds lover but i ve been to better ones this is by far the worst one i ve ever been too it s filthy inside and if you get drive through they completely screw up your order every time the staff is terribly unfriendly and nobody seems to care
1,679455654,Atlanta,"Terrible customer service. I came in at 9:30pm and stood in front of the register and no one bothered to say anything or help me for 5 minutes. There was no one else waiting for their food inside either, just outside at the window. I left and went to Chickfila next door and was greeted before I was all the way inside. This McDonalds is also dirty, the floor was covered with dropped food. Obviously filled with surly and unhappy workers.",terrible customer service i came in at 9 30pm and stood in front of the register and no one bothered to say anything or help me for 5 minutes there was no one else waiting for their food inside either just outside at the window i left and went to chickfila next door and was greeted before i was all the way inside this mcdonalds is also dirty the floor was covered with dropped food obviously filled with surly and unhappy workers
2,679455655,Atlanta,"First they ""lost"" my order, actually they gave it to someone one else than took 20 minutes to figure out why I was still waiting for my order.They after I was asked what I needed I replied, ""my order"".They asked for my ticket and the asst mgr looked at the ticket then incompletely filled it.I had to ask her to check to see if she filled it correctly.She acted as if she couldn't be bothered with that so I asked her again.She begrudgingly checked to she did in fact miss something on the ticket.So after 22 minutes I finally had my breakfast biscuit platter.As I left an woman approached and identified herself as the manager, she was dressed as if she had just awoken in an old t-shirt and sweat pants.She said she had heard what happened and said she'd take care of it.Well why didn't she intervene when she saw I was growing annoyed with the incompetence?",first they lost my order actually they gave it to someone one else than took 20 minutes to figure out why i was still waiting for my order they after i was asked what i needed i replied my order they asked for my ticket and the asst mgr looked at the ticket then incompletely filled it i had to ask her to check to see if she filled it correctly she acted as if she couldn t be bothered with that so i asked her again she begrudgingly checked to she did in fact miss something on the ticket so after 22 minutes i finally had my breakfast biscuit platter as i left an woman approached and identified herself as the manager she was dressed as if she had just awoken in an old t shirt and sweat pants she said she had heard what happened and said she d take care of it well why didn t she intervene when she saw i was growing annoyed with the incompetence
3,679455656,Atlanta,I see I'm not the only one giving 1 star. Only because there is not a -25 Star!!! That's all I need to say!,i see i m not the only one giving 1 star only because there is not a 25 star that s all i need to say
4,679455657,Atlanta,"Well, it's McDonald's, so you know what the food is. This review reflects solely on the poor service. I have been to this location countless times over the years. They consistently fail on the service end of things. The order takers tend to be rude, no smiles, and a lot of ""sighs"" and ""lip smacking"" when you talk to them. So why go back you ask? This store benefits from being the only place to eat in this area. The next stop is at least 12 minutes away on the other side of town. Also I strongly believe in 2nd chances and know that not every business can satisfy everyone 100% of the time. I have given them many chances at earning a positive review. I could not recommend this location any less. If you can wait, take a pass. There are better McDonald's stores in Griffin, GA.",well it s mcdonald s so you know what the food is this review reflects solely on the poor service i have been to this location countless times over the years they consistently fail on the service end of things the order takers tend to be rude no smiles and a lot of sighs and lip smacking when you talk to them so why go back you ask this store benefits from being the only place to eat in this area the next stop is at least 12 minutes away on the other side of town also i strongly believe in 2nd chances and know that not every business can satisfy everyone 100 of the time i have given them many chances at earning a positive review i could not recommend this location any less if you can wait take a pass there are better mcdonald s stores in griffin ga
5,679455658,Atlanta,This has to be one of the worst and slowest McDonald's franchises there is. Can't figure out why my Egg McMuffin is always on a stale un-toasted English muffin. Bought A chocolate shake today and threw it away.,this has to be one of the worst and slowest mcdonald s franchises there is can t figure out why my egg mcmuffin is always on a stale un toasted english muffin bought a chocolate shake today and threw it away
6,679455659,Atlanta,"I'm not crazy about this McDonald's. This is primarily because they are so slow. My gosh what exactly is the hold up? It's FAST food people. Also, this morning, I guess the worker thought his mic was off, but it wasn't. I now know that he is trying to get as many hours as possible because he needs money BAD. Spread the word. Anyway, this location is on a little access road and you have to go back the way you came because there is no exit from it at the other end. It would have helped if there was one. So, in the end I think I'll avoid this location and find another. This should be easy as there is no shortage of Mickey D's in this piece.",i m not crazy about this mcdonald s this is primarily because they are so slow my gosh what exactly is the hold up it s fast food people also this morning i guess the worker thought his mic was off but it wasn t i now know that he is trying to get as many hours as possible because he needs money bad spread the word anyway this location is on a little access road and you have to go back the way you came because there is no exit from it at the other end it would have helped if there was one so in the end i think i ll avoid this location and find another this should be easy as there is no shortage of mickey d s in this piece
7,679455660,Atlanta,"One Star and I'm beng kind. I blame management. last day of free coffee so ""we"" decide to stop and order breakfast and coffees thru drive-thru. She charged us for coffee and when asked why she said she needed to confirm there were two of us in the car. Now she has to clear the order and that took retraining. Ask next time at tthe speaker.Oh it gets better....We get to next window where Einstein is waiting, pours the coffees with different creams/sugars added. ""Which one is which?"" I asked... and he smiled and said ""One has 2 cream 2 sugar and the other has 3 cream 1 sugar"".. didnt maek the cups... but at least he made sure we had straws for our coffees.... Hello? management? Where are You?",one star and i m beng kind i blame management last day of free coffee so we decide to stop and order breakfast and coffees thru drive thru she charged us for coffee and when asked why she said she needed to confirm there were two of us in the car now she has to clear the order and that took retraining ask next time at tthe speaker oh it gets better we get to next window where einstein is waiting pours the coffees with different creams sugars added which one is which i asked and he smiled and said one has 2 cream 2 sugar and the other has 3 cream 1 sugar didnt maek the cups but at least he made sure we had straws for our coffees hello management where are you
8,679455661,Atlanta,"Never been upset about any fast food drive thru service till I came to this McDonalds.After a long trip from California my wife and I went to McDonalds for a quick bite to eat before our drive back home. We pull up to the drive thru but there are a lot of cars waiting to order. We were guessing there must be a lot demand for McDonalds at this late hour. So we wait about 5 - 10 minutes for our turn to order. We order the specials they were having at that time (20 pcs chicken nuggets for $5) and asked for a cup of water because we didn't want to drink any sodas at that late hour. They take our order but tells us they don't serve cups of water there. So we are a little annoyed but ok we shrug it off and wait our turn to pay and receive our order. However, we wait for almost another 10 min to finally pay at the first window - then another 5 more min to finally receive our food. By far the worst McDonalds in the world. Unfriendly and slow.",never been upset about any fast food drive thru service till i came to this mcdonalds after a long trip from california my wife and i went to mcdonalds for a quick bite to eat before our drive back home we pull up to the drive thru but there are a lot of cars waiting to order we were guessing there must be a lot demand for mcdonalds at this late hour so we wait about 5 10 minutes for our turn to order we order the specials they were having at that time 20 pcs chicken nuggets for _CUR_5 and asked for a cup of water because we didn t want to drink any sodas at that late hour they take our order but tells us they don t serve cups of water there so we are a little annoyed but ok we shrug it off and wait our turn to pay and receive our order however we wait for almost another 10 min to finally pay at the first window then another 5 more min to finally receive our food by far the worst mcdonalds in the world unfriendly and slow
9,679455662,Atlanta,"This McDonald's has gotten much better. Usually my order would be wrong every single time so I would not leave that window until I checked every single item. I only hit up fast food once a month or so and it needs to be worth it. Also the fries used to be cold and the cheese on the burger was never melted. Everything was just lukewarm. Now my order has been right a few times in a row and my food hot. Also, I love dining room. Usually you wouldn't find me actually inside a fast food joint but this place has nice flooring, stacked stone, lots of large windows and a flat screen TV usually on HLN. Sometimes its nice to sneak away for a quick weekend breakfast, you know, a little budget and time friendly mommy and me date.",this mcdonald s has gotten much better usually my order would be wrong every single time so i would not leave that window until i checked every single item i only hit up fast food once a month or so and it needs to be worth it also the fries used to be cold and the cheese on the burger was never melted everything was just lukewarm now my order has been right a few times in a row and my food hot also i love dining room usually you wouldn t find me actually inside a fast food joint but this place has nice flooring stacked stone lots of large windows and a flat screen tv usually on hln sometimes its nice to sneak away for a quick weekend breakfast you know a little budget and time friendly mommy and me date


In [116]:
# Lemmatization 
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

lemmatizer = WordNetLemmatizer()

# function to convert nltk tag to wordnet tag
def nltk_tag_to_wordnet_tag(nltk_tag):
    if nltk_tag.startswith('J'):
        return wordnet.ADJ
    elif nltk_tag.startswith('V'):
        return wordnet.VERB
    elif nltk_tag.startswith('N'):
        return wordnet.NOUN
    elif nltk_tag.startswith('R'):
        return wordnet.ADV
    else:          
        return None


# Shoutout to this git repo for providing this code 
# https://gist.github.com/gaurav5430/9fce93759eb2f6b1697883c3782f30de
def lemmatize_sentence(sentence):
    #tokenize the sentence and find the POS tag for each token
    nltk_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))  
    #tuple of (token, wordnet_tag)
    wordnet_tagged = map(lambda x: (x[0], nltk_tag_to_wordnet_tag(x[1])), nltk_tagged)
    lemmatized_sentence = []
    for word, tag in wordnet_tagged:
        if tag is None:
            #if there is no available tag, append the token as is
            lemmatized_sentence.append(word)
        else:        
            #else use the tag to lemmatize the token
            lemmatized_sentence.append(lemmatizer.lemmatize(word, tag))
    return " ".join(lemmatized_sentence)

reviews['reviews_processed'] = reviews['reviews_processed'].apply(lemmatize_sentence)

In [117]:
pd.set_option('display.max_colwidth', None)

In [118]:
# Removing stopwords using gensim 
from gensim.parsing.preprocessing import remove_stopwords
reviews['reviews_processed'] = reviews['reviews_processed'].apply(remove_stopwords)
reviews[['review',"reviews_processed"]].head(10)


Unnamed: 0,review,reviews_processed
0,"I'm not a huge mcds lover, but I've been to better ones. This is by far the worst one I've ever been too! It's filthy inside and if you get drive through they completely screw up your order every time! The staff is terribly unfriendly and nobody seems to care.",m huge mcds lover ve good far bad ve s filthy inside drive completely screw order time staff terribly unfriendly care
1,"Terrible customer service. I came in at 9:30pm and stood in front of the register and no one bothered to say anything or help me for 5 minutes. There was no one else waiting for their food inside either, just outside at the window. I left and went to Chickfila next door and was greeted before I was all the way inside. This McDonalds is also dirty, the floor was covered with dropped food. Obviously filled with surly and unhappy workers.",terrible customer service come 9 30pm stand register bother help 5 minute wait food inside outside window leave chickfila door greet way inside mcdonalds dirty floor cover dropped food obviously surly unhappy worker
2,"First they ""lost"" my order, actually they gave it to someone one else than took 20 minutes to figure out why I was still waiting for my order.They after I was asked what I needed I replied, ""my order"".They asked for my ticket and the asst mgr looked at the ticket then incompletely filled it.I had to ask her to check to see if she filled it correctly.She acted as if she couldn't be bothered with that so I asked her again.She begrudgingly checked to she did in fact miss something on the ticket.So after 22 minutes I finally had my breakfast biscuit platter.As I left an woman approached and identified herself as the manager, she was dressed as if she had just awoken in an old t-shirt and sweat pants.She said she had heard what happened and said she'd take care of it.Well why didn't she intervene when she saw I was growing annoyed with the incompetence?",lose order actually 20 minute figure wait order ask need reply order ask ticket asst mgr look ticket incompletely ask check correctly act couldn t bother ask begrudgingly check fact miss ticket 22 minute finally breakfast biscuit platter leave woman approach identify manager dress awake old t shirt sweat pant hear happen d care t intervene saw grow annoy incompetence
3,I see I'm not the only one giving 1 star. Only because there is not a -25 Star!!! That's all I need to say!,m 1 star 25 star s need
4,"Well, it's McDonald's, so you know what the food is. This review reflects solely on the poor service. I have been to this location countless times over the years. They consistently fail on the service end of things. The order takers tend to be rude, no smiles, and a lot of ""sighs"" and ""lip smacking"" when you talk to them. So why go back you ask? This store benefits from being the only place to eat in this area. The next stop is at least 12 minutes away on the other side of town. Also I strongly believe in 2nd chances and know that not every business can satisfy everyone 100% of the time. I have given them many chances at earning a positive review. I could not recommend this location any less. If you can wait, take a pass. There are better McDonald's stores in Griffin, GA.",s mcdonald s know food review reflect solely poor service location countless time year consistently fail service end thing order taker tend rude smile lot sigh lip smacking talk ask store benefit place eat area stop 12 minute away town strongly believe 2nd chance know business satisfy 100 time chance earn positive review recommend location wait pas good mcdonald s store griffin ga
5,This has to be one of the worst and slowest McDonald's franchises there is. Can't figure out why my Egg McMuffin is always on a stale un-toasted English muffin. Bought A chocolate shake today and threw it away.,bad slow mcdonald s franchise t figure egg mcmuffin stale toast english muffin buy chocolate shake today throw away
6,"I'm not crazy about this McDonald's. This is primarily because they are so slow. My gosh what exactly is the hold up? It's FAST food people. Also, this morning, I guess the worker thought his mic was off, but it wasn't. I now know that he is trying to get as many hours as possible because he needs money BAD. Spread the word. Anyway, this location is on a little access road and you have to go back the way you came because there is no exit from it at the other end. It would have helped if there was one. So, in the end I think I'll avoid this location and find another. This should be easy as there is no shortage of Mickey D's in this piece.",m crazy mcdonald s primarily slow gosh exactly hold s fast food people morning guess worker think mic wasn t know try hour possible need money bad spread word location little access road way come exit end help end think ll avoid location easy shortage mickey d s piece
7,"One Star and I'm beng kind. I blame management. last day of free coffee so ""we"" decide to stop and order breakfast and coffees thru drive-thru. She charged us for coffee and when asked why she said she needed to confirm there were two of us in the car. Now she has to clear the order and that took retraining. Ask next time at tthe speaker.Oh it gets better....We get to next window where Einstein is waiting, pours the coffees with different creams/sugars added. ""Which one is which?"" I asked... and he smiled and said ""One has 2 cream 2 sugar and the other has 3 cream 1 sugar"".. didnt maek the cups... but at least he made sure we had straws for our coffees.... Hello? management? Where are You?",star m beng kind blame management day free coffee decide stop order breakfast coffee drive charge coffee ask need confirm car clear order retrain ask time tthe speaker oh window einstein wait pours coffee different cream sugar add ask smile 2 cream 2 sugar 3 cream 1 sugar didnt maek cup sure straw coffee hello management
8,"Never been upset about any fast food drive thru service till I came to this McDonalds.After a long trip from California my wife and I went to McDonalds for a quick bite to eat before our drive back home. We pull up to the drive thru but there are a lot of cars waiting to order. We were guessing there must be a lot demand for McDonalds at this late hour. So we wait about 5 - 10 minutes for our turn to order. We order the specials they were having at that time (20 pcs chicken nuggets for $5) and asked for a cup of water because we didn't want to drink any sodas at that late hour. They take our order but tells us they don't serve cups of water there. So we are a little annoyed but ok we shrug it off and wait our turn to pay and receive our order. However, we wait for almost another 10 min to finally pay at the first window - then another 5 more min to finally receive our food. By far the worst McDonalds in the world. Unfriendly and slow.",upset fast food drive service till come mcdonalds long trip california wife mcdonalds quick bite eat drive home pull drive lot car wait order guess lot demand mcdonalds late hour wait 5 10 minute turn order order special time 20 pc chicken nugget _CUR_5 ask cup water t want drink soda late hour order tell t serve cup water little annoyed ok shrug wait turn pay receive order wait 10 min finally pay window 5 min finally receive food far bad mcdonalds world unfriendly slow
9,"This McDonald's has gotten much better. Usually my order would be wrong every single time so I would not leave that window until I checked every single item. I only hit up fast food once a month or so and it needs to be worth it. Also the fries used to be cold and the cheese on the burger was never melted. Everything was just lukewarm. Now my order has been right a few times in a row and my food hot. Also, I love dining room. Usually you wouldn't find me actually inside a fast food joint but this place has nice flooring, stacked stone, lots of large windows and a flat screen TV usually on HLN. Sometimes its nice to sneak away for a quick weekend breakfast, you know, a little budget and time friendly mommy and me date.",mcdonald s usually order wrong single time leave window check single item hit fast food month need worth fry use cold cheese burger melt lukewarm order right time row food hot love din room usually wouldn t actually inside fast food joint place nice floor stack stone lot large window flat screen tv usually hln nice sneak away quick weekend breakfast know little budget time friendly mommy date


In [119]:
# Vectorize the corpus 
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(ngram_range=(3,3), min_df=3,
                            max_df=0.4, stop_words="english")

X, terms = vectorizer.fit_transform(reviews['reviews_processed']), vectorizer.get_feature_names_out()
tf_idf = pd.DataFrame(X.toarray(), columns=terms)

print(f"TF-IDF: {tf_idf.shape}")
tf_idf.head(5)

TF-IDF: (1525, 223)


Unnamed: 0,10 minute food,10 minute fry,10 minute later,10 minute line,10 minute order,10 piece chicken,10 piece nugget,15 min food,15 minute drive,15 minute later,...,window hand drink,window pick food,wish negative star,work customer service,work drive order,work fast food,worst customer service,write review fast,write review mcdonald,write review mcdonalds
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [168]:
# Fit the NMF Model
nmf = NMF(n_components=3)
W = nmf.fit_transform(X)
H = nmf.components_
print(f"Original shape of X is {X.shape}")
print(f"Decomposed W matrix is {W.shape}")
print(f"Decomposed H matrix is {H.shape}")

Original shape of X is (1525, 223)
Decomposed W matrix is (1525, 3)
Decomposed H matrix is (3, 223)




In [169]:
from typing import List
import numpy as np
# Report Results 
def get_top_tf_idf_tokens_for_topic(H: np.array, feature_names: List[str], num_top_tokens: int = 5):
  """
  Uses the H matrix (K components x M original features) to identify for each
  topic the most frequent tokens.
  """
  for topic, vector in enumerate(H):
    print(f"TOPIC {topic}\n")
    total = vector.sum()
    top_scores = vector.argsort()[::-1][:num_top_tokens]
    token_names = list(map(lambda idx: feature_names[idx], top_scores))
    strengths = list(map(lambda idx: vector[idx] / total, top_scores))
    
    for strength, token_name in zip(strengths, token_names):
      print(f"\b{token_name} ({round(strength * 100, 1)}%)\n")
    print(f"=" * 50)

get_top_tf_idf_tokens_for_topic(H, tf_idf.columns.tolist(), 3)

TOPIC 0

open 24 hour (39.8%)

drive open 24 (8.2%)

24 hour drive (4.8%)

TOPIC 1

wait 10 minute (41.0%)

order wrong time (7.0%)

10 minute order (5.8%)

TOPIC 2

fast food place (21.2%)

fast food restaurant (15.3%)

wait 15 minute (10.1%)



In [178]:
reviews[reviews['city']=='Dallas']

Unnamed: 0,_unit_id,city,review,reviews_processed
533,679456194,Dallas,"So it's fast food and McDonalds at that. So let's say expectations are very, very low. Sometimes though you just want food that's fast and you know what to expect. Not every meal can be a 5 star dinner with an executive chef.Even with appropriate low expectations, this location takes the cake for subpar service. About a week ago I was exhausted and needed something to eat and there are few fast food restaurants near where I live. So off to McDonalds I went. Ordered a value meal and a single apple pie. Surprise...no pie! Ok, mistakes happen. I've worked drive through in a previous life and I get it. Even though it was 10 pm and I was the only car there, so it's not like they were rushed. Still...mistakes happen.Fast forward to tonight. Again about 10 pm and exhausted and decide to go crazy and get McD's twice in 1 week. So I try again to order an apple pie with my value meal. Now, maybe this is my fault, because I should have checked the bag before driving off. But after sitting at the window and waiting for 5 minutes (and remember, I was really tired), silly me thought they'd be able to get it straight this time.Turns out I was wrong. Yet again...no pie. Both times (within a week of each other) I paid for them. Neither time they felt it necessary to put it in the actual bag.So I called to talk to the manager. And get hung up on as soon as I ask for the manager. So I called back, the manager actually answered and I told him what happened. He didn't sound too displeased, just told me ""next time we'll get your order right.""Yeah, if there IS a next time.",s fast food mcdonalds let s expectation low want food s fast know expect meal 5 star dinner executive chef appropriate low expectation location cake subpar service week ago exhaust need eat fast food restaurant near live mcdonalds ordered value meal single apple pie surprise pie ok mistake happen ve work drive previous life 10 pm car s like rush mistake happen fast forward tonight 10 pm exhaust decide crazy mcd s twice 1 week try order apple pie value meal maybe fault check bag drive sit window wait 5 minute remember tire silly think d able straight time turn wrong pie time week pay time felt necessary actual bag talk manager hang soon ask manager manager actually answer tell happen t sound displeased tell time ll order right yeah time
534,679456195,Dallas,"I know what your thinking....""another 1 star review for McD's and you don't even eat there?!? What gives?"" Well I have to say this is an ""informative"" review! Getting right down to it. I was out running by a few client sites today when I stopped and picked up lunch at Riverside (that's right, be jealous). So, a coworker calls and asks if I would pick up lunch for her. Being a team player I said no problem. She then told me she wanted Satan...I mean McDonalds. I sighed, but agreed. She told me what she wanted and off I went. Oh and to get some ketchup and Sweet and Sour sauce. Mkay. I pulled up to the drive thru and ordered after they tried to sell me their new Angus Burger. (Sidenote: Why do they do that in the first place? I mean, if that is what I wanted, I would have ordered it.) ""May I please have #12 with a Sprite."" ""Would you l like to add on an apple pie for $.50?"" ""Um, no."" ""$7.01, please pull up to the first window."" So I pay. Drive to window #2. Man opens the window hands me the drink and the bag of food. I asked for a S&S Sauce packet. The guy responded. ""We charge for that now."" ""Ha, that's hilarious. Just one please."" ""No really, they are $.28"" ""What? You're kidding right?"" ""No ma'am, if you want one you'll have to pull back around the drive thru."" (Totally dumbfounded I say) ""Do you charge for ketchup too?"" Then drive off. I laugh the entire way back to work like a crazy woman...""What do you mean your charging for condiments? "" ""Is it laced with heroin?"" ""Are there diamonds inside these packets?"" ""Is charging for this even legal?"" I mean it wasn't even my order and I'm mad as hell. They've managed to piss me off again. Congrats McSatan. You officially suck more than ever. Take your Sweet and Sour Sauce and shove it where the sun don't shine. *spits on ground*",know think 1 star review mcd s t eat informative review right run client site today stop pick lunch riverside s right jealous coworker ask pick lunch team player problem tell want satan mean mcdonalds sigh agree tell want oh ketchup sweet sour sauce mkay pull drive order try sell new angus burger sidenote place mean want order 12 sprite l like add apple pie _CUR_ 50 um _CUR_7 01 pull window pay drive window 2 man open window hand drink bag food ask s s sauce packet guy respond charge ha s hilarious _CUR_ 28 kid right ma want ll pull drive totally dumbfound charge ketchup drive laugh entire way work like crazy woman mean charge condiment lace heroin diamond inside packet charge legal mean wasn t order m mad hell ve manage piss congrats mcsatan officially suck sweet sour sauce shove sun t shine spit ground
535,679456196,Dallas,"The drive-thru menu board has changed making it impossible to find the items I usually order. Not sure if its just this location or all McDonalds, anything with mayonnaise will have enough mayo for three sandwiches and enough bread for almost 6 sandwiches compared to the amount of meat. If you request mustard instead of mayonnaise make sure you ask for mustard at the second window because you're getting dry sandwiches.",drive menu board change impossible item usually order sure location mcdonalds mayonnaise mayo sandwich bread 6 sandwich compare meat request mustard instead mayonnaise sure ask mustard second window dry sandwich
536,679456197,Dallas,"Terrible customer service. Rude and indifferent manager. My friends and I go there to purchase food and I asked for a courtesy cup of water as it was very hot and I didn't want soda. She looked me square in the eye in front of everyone and said ""no"". That will cost you $.27 cents. Not even after I had patronized the business could I not have a sip of water? Really? I will never shop at that McDonald's ever again. Any McDonalds really. I actually watched the manager and she treated every minority who walked in like crap.",terrible customer service rude indifferent manager friend purchase food ask courtesy cup water hot t want soda look square eye cost _CUR_ 27 cent patronize business sip water shop mcdonald s mcdonalds actually watch manager treat minority walk like crap
537,679456198,Dallas,Definitely not a fan of this McDonald's. Ordered a Big Mac the other day and the sandwich looked like someone sat on it.I've seen better quality of sandwiches from other locations in Dallas.,definitely fan mcdonald s order big mac day sandwich look like sit ve quality sandwich location dallas
...,...,...,...,...
603,679456265,Dallas,"Don't ever go here in the evening between midnight and 3am. The staff are as dumb as rocks, slow, can't understand them when trying to place an order. Oh, and usually between this time it's ""Cash only"" because for some reason it takes 3 or more hours to batch out their credit card machine. Not to mention the wait time. One time it took over 10 minutes for my sweet tea and fruit & yogurt parfait. Terrible service all around, don't let it's ""fancy"" interior fool you.",t evening midnight 3am staff dumb rock slow t understand try place order oh usually time s cash reason 3 hour batch credit card machine mention wait time time 10 minute sweet tea fruit yogurt parfait terrible service t let s fancy interior fool
604,679456266,Dallas,"We stopped at this McDonalds because it was on our way to my sisters house. The food was good, fast as always but I don't know why they took forever to brew up a pot of coffee plus they forgot I had ordered it (probably because the manager and all the employees were chatting and not paying attention to customers). The indoor play area could use a little cleaning up.",stop mcdonalds way sister house food good fast t know forever brew pot coffee plus forget order probably manager employee chat pay attention customer indoor play area use little cleaning
605,679456267,Dallas,"A McD's in a kinda sketchy area. I've been through the drive thru for breakfast a couple of times. The only thing to really note is they have two places you can order from in the drive thru. Don't be an idiot and sit at the first one for too long with out moving to the next one to order, like I did. Doh. Otherwise enjoy!",mcd s kinda sketchy area ve drive breakfast couple time thing note place order drive t idiot sit long order like doh enjoy
606,679456268,Dallas,"I was just so excited to get to work, call you guys, then make a yelp review. Why? Because you mess something up every single time I go there. Why do I keep going there? Because it is the closest thing to my work in the morning for a quick breakfast. Now, I'll just have to get something closer to my house and eat in the car BECAUSE OF HOW TERRIBLE YALL ARE. You have one job...one SUPER EASY JOB. Take my order, take my money, put my order in the bag and make sure it is correct. When I get to the window and ask ""Did you remember the Honey Mustard"" nicely. and you give me this smirk of ""Heck yes I did"", well that is just fantastic, because you guys always forget the sauces I ASK AND PAY FOR. But, this one time you remember. Thats not the part I'm complaining about. I'm complaining because you remembered my Honey Mustard, but forgot my 2 hash browns! I shouldn't have to check my bag at the window when I receive my order. That is just absurd. HOW HARD IS IT TO GET AN ORDER RIGHT WHEN IT IS JUST 2 HASHBROWNS AND HONEY MUSTARD. How much money do they make ezra by saving food and ""forgetting"" your food even though you pay. Then once we check we are long gone and too far to even turn around and WAIST OUR TIME to get our order right. Pathetic place, you will always stay minimum wage because the simplest of things you can NOT complete. I don't care if it is just a Mcdonalds. It's a business that makes thousands of dollars a day. DO IT RIGHT. If I could give this negative stars I would. I wish I could travel back in time, and build something in the lot of this McDonalds, just so I would know this place would never exist here.",excited work guy yelp review mess single time close thing work morning quick breakfast ll close house eat car terrible yall job super easy job order money order bag sure correct window ask remember honey mustard nicely smirk heck yes fantastic guy forget sauce ask pay time remember thats m complain m complain remember honey mustard forget 2 hash brown shouldn t check bag window receive order absurd hard order right 2 hashbrowns honey mustard money ezra save food forget food pay check long far turn waist time order right pathetic place stay minimum wage simple thing complete t care mcdonalds s business thousand dollar day right negative star wish travel time build lot mcdonalds know place exist


In [180]:
vectorizer = TfidfVectorizer(ngram_range=(2,2), min_df=3,
                            max_df=0.4, stop_words="english")

for city in city_list:
    print("================" + city + "================")
    reviews_city = reviews[reviews['city']==city]
    X, terms = vectorizer.fit_transform(reviews_city['reviews_processed']), vectorizer.get_feature_names_out()
    tf_idf = pd.DataFrame(X.toarray(), columns=terms)
    nmf = NMF(n_components=3)
    W = nmf.fit_transform(X)
    H = nmf.components_
    get_top_tf_idf_tokens_for_topic(H, tf_idf.columns.tolist(), 3)

TOPIC 0

fast food (51.5%)

particular location (4.4%)

expect mcdonald (3.1%)

TOPIC 1

customer service (22.7%)

place order (8.9%)

order drive (7.9%)

TOPIC 2

order wrong (23.1%)

northside hospital (21.8%)

ice cream (11.4%)





TOPIC 0

order right (4.0%)

customer service (3.0%)

fast food (3.0%)

TOPIC 1

big mac (21.5%)

look like (6.8%)

order big (4.9%)

TOPIC 2

ice cream (17.7%)

cream machine (9.6%)

sweet tea (6.2%)

TOPIC 0

customer service (60.1%)

play area (12.7%)

place order (6.6%)

TOPIC 1

parking lot (74.1%)

look like (9.9%)

tell want (8.8%)

TOPIC 2

fast food (28.9%)

15 minute (11.5%)

look like (8.4%)

TOPIC 0

fast food (64.2%)

window order (6.5%)

food place (3.9%)

TOPIC 1

24 hour (76.2%)

big mac (14.1%)

food hot (5.3%)

TOPIC 2

time order (23.5%)

chicken sandwich (15.5%)

big mac (12.9%)

TOPIC 0

parking lot (43.3%)

regular mcdonald (5.9%)

read review (4.5%)

TOPIC 1

customer service (32.2%)

dollar menu (6.0%)

great customer (5.7%)

TOPIC 2

look like (4.4%)

french fry (4.0%)

time order (3.6%)

TOPIC 0

bad mcdonalds (35.7%)

mcdonalds ve (16.6%)

long time (9.7%)

TOPIC 1

fast food (67.4%)

10 minute (11.2%)

15 minute (10.



TOPIC 0

customer service (30.2%)

bad customer (6.7%)

long line (6.0%)

TOPIC 1

mcdonald ve (11.0%)

look like (8.1%)

big mac (8.0%)

TOPIC 2

bad mcdonalds (43.9%)

staff like (8.3%)

20 minute (6.3%)



TypeError: can only concatenate str (not "float") to str

In [122]:
import numpy as np
def get_top_documents_for_each_topic(W: np.array, documents: List[str], num_docs: int = 5):
    sorted_docs = W.argsort(axis=0)[::-1]
    top_docs = sorted_docs[:num_docs].T
    per_document_totals = W.sum(axis=1)
    for topic, top_documents_for_topic in enumerate(top_docs):
        print(f"Topic {topic}")
        for doc in top_documents_for_topic:
            score = W[doc][topic]
            percent_about_topic = round(score / per_document_totals[doc] * 100, 1)
            print(f"{percent_about_topic}%", documents[doc])
            print("=" * 50)

In [123]:
get_top_documents_for_each_topic(W, reviews['review'].tolist())

Topic 0
100.0% Open 24 hours, right around the corner from my hotel. Let me just say, the dude working the graveyard shift is NOT very friendly. I don't remember his full name but it started with a C. You can just tell he was about one more "Would you like it Super Sized," away from completely going postal. I had the pleasure of running into him twice during my entire stay (what can i say? i like Mcdonald's.). Our second encounter was the weirdest. He started like mouthing something weird to me. I was too drunk to make it out, but I'm sure it was something along the lines as, "I'm going to shove these Chicken Nuggets up your ass."
100.0% Denied me a vanilla cone b/c it was "too late" assholes. Whats the point in being open for 24 hours?
100.0% pretty simple review, i work the early morning shift and they are open 24 hours. I've gone twice in the past week to get a coffee and one time they had to pull out the pot and start making it and the other time they told me they only were accepti

In [124]:
from sklearn.decomposition import LatentDirichletAllocation

lda = LatentDirichletAllocation(n_components=3)
W = lda.fit_transform(X)
H = lda.components_
get_top_tf_idf_tokens_for_topic(H, tf_idf.columns.tolist(), 5)

TOPIC 0

open 24 hour (4.5%)

wait 15 minute (2.8%)

eat fast food (2.6%)

bad mcdonald ve (2.3%)

egg cheese biscuit (2.2%)

TOPIC 1

wait 10 minute (4.4%)

order wrong time (2.8%)

ice cream cone (2.4%)

free wi fi (2.3%)

time order wrong (1.8%)

TOPIC 2

fast food restaurant (3.1%)

fast food place (3.0%)

bad customer service (2.8%)

24 hour drive (2.3%)

ice cream machine (2.1%)



In [125]:
get_top_documents_for_each_topic(W, reviews['review'].tolist())

Topic 0
80.3% Normally I don't review a chain unless something about that particular location stands out, good or bad.This location for some reason, has the most ridiculous service I've ever encountered. It's McDonalds you say, what do you expect? Not much, really but let me tell you...ENCOUNTER #1I ordered 10-piece chicken nuggets, fries, and unsweetened ice tea. I got the nuggets and tea. I went up to the counter, placed my receipt on the counter gently faced away from me, and said "I'm sorry, I --" Before I finished my sentence, the employee SLAMMED her hand on my receipt on the counter and dragged it towards her. Um... WTF?!I didn't know how to react because she looked at my receipt and threw it away without addressing me. I went "um...ok..." and backed away from the counter to fill my tea. I have no clue what I did to piss this employee off but my friend who came with me, asked me what the heck happened. I said "I have no clue, I'm just missing my fries..." My friend was nice and 

### Part II. Emotion Classification (5 pts)

Use the `datasets/emotions_dataset.zip` (see the original Dataset source on [Kaggle](https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp)) to build a classification model that predicts the emotion of sentence. If you would like, you may classify only the top 4 emotions, and group all other classes as `Other`. 

In order to earn full credit, you must:

* Show the performance of your model with `CountVectorizer`, `TfIdfVectorizer`, `word2vec`, and `glove` embeddings.
    - for `word2vec`, make sure not to use the `en_core_web_sm` dataset (these are not real embeddings)
* Perform text preprocessing (or explain why it was not necessary):
    - stopword removal
    - ngram tokenization
    - stemming/lemmatization
    - fuzzy matching / regex cleaning / etc. (as you deem necessary, but show that you analyzed the text to make your decision)
* Show **AUROC / F1 scores** for on the holdout (test + validation) datasets.
* A brief discussion (2-3 sentences) of what could improve your model and why.

### 1. Importing Datasets 

In [148]:
train_df = pd.read_csv('../datasets/emotions/train.txt',header=None, names=['text'])
train_df[['text','emotion']] = train_df['text'].str.split(';',expand=True)
train_df['type'] = "train"
test_df = pd.read_csv('../datasets/emotions/test.txt',header=None, names=['text'])
test_df[['text','emotion']] = test_df['text'].str.split(';',expand=True)
test_df['type'] = "test"
val_df = pd.read_csv('../datasets/emotions/val.txt',header=None, names=['text'])
val_df[['text','emotion']] = val_df['text'].str.split(';',expand=True)
val_df['type'] = "val"

df = train_df.append(test_df.append(val_df , ignore_index=True) , ignore_index=True)

df.head()

  df = train_df.append(test_df.append(val_df , ignore_index=True) , ignore_index=True)


Unnamed: 0,text,emotion,type
0,i didnt feel humiliated,sadness,train
1,i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake,sadness,train
2,im grabbing a minute to post i feel greedy wrong,anger,train
3,i am ever feeling nostalgic about the fireplace i will know that it is still on the property,love,train
4,i am feeling grouchy,anger,train


In [149]:
# Checking if all the data have been appended correctly
print(train_df.shape)
print(test_df.shape)
print(val_df.shape)
print(df.shape)

print(df['type'].unique())

(16000, 3)
(2000, 3)
(2000, 3)
(20000, 3)
['train' 'test' 'val']


### 2. Text Preprocessing


In [150]:
# Removing stopwords using gensim 
from gensim.parsing.preprocessing import remove_stopwords
df['text_clean'] = df['text'].apply(remove_stopwords)
df.head(10)

Unnamed: 0,text,emotion,type,text_clean
0,i didnt feel humiliated,sadness,train,didnt feel humiliated
1,i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake,sadness,train,feeling hopeless damned hopeful cares awake
2,im grabbing a minute to post i feel greedy wrong,anger,train,im grabbing minute post feel greedy wrong
3,i am ever feeling nostalgic about the fireplace i will know that it is still on the property,love,train,feeling nostalgic fireplace know property
4,i am feeling grouchy,anger,train,feeling grouchy
5,ive been feeling a little burdened lately wasnt sure why that was,sadness,train,ive feeling little burdened lately wasnt sure
6,ive been taking or milligrams or times recommended amount and ive fallen asleep a lot faster but i also feel like so funny,surprise,train,ive taking milligrams times recommended ive fallen asleep lot faster feel like funny
7,i feel as confused about life as a teenager or as jaded as a year old man,fear,train,feel confused life teenager jaded year old man
8,i have been with petronas for years i feel that petronas has performed well and made a huge profit,joy,train,petronas years feel petronas performed huge profit
9,i feel romantic too,love,train,feel romantic
