# Shakespeare word frequency

- Make a Python string that contains the text of a Shakespeare play (obtained, for example, from Project Gutenberg)
- You can use requests and BeautifulSoup to get the text or you can read in the content from a file, but do not copy the entire play into a notebook cell
- Tokenize the words and remove stopwords
- Find the top 20 most frequent words in the play
- Comment on whether these words give an accurate sense of the play

In [37]:
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

In [26]:
# Read the file that contains text of Romeo and Julie
f = open("romeo-and-juliet.txt", "r")
text = f.read()

## Tokenize the words and remove stopwords

In [27]:
# Tokenize the words
sent = sent_tokenize(text)
print(word_tokenize(sent[1]))

['ACT', 'I', 'Scene', 'I', '.']


In [29]:
words = []
for s in sent:
    for w in word_tokenize(s):
        words.append(w)

In [30]:
# remove stopwords
from nltk.corpus import stopwords
from string import punctuation
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [52]:
print(punctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [54]:
myStopWords = list(punctuation) + stopwords.words('english') + ['’']
wordsNoStop = [w for w in words if w not in myStopWords]

## Find top 20 most frequent words in play

In [55]:
# Find words frequency
wordsFreq = dict()
for w in wordsNoStop:
    if w not in wordsFreq:
        wordsFreq[w] = 0
    wordsFreq[w] += 1

In [57]:
# Get all the count frequencies and sort
values = list(wordsFreq.values())
values.sort(reverse=True)

top20 = values[:20]
top20words = [key for key, value in wordsFreq.items() if value in top20]
top20words

['ROMEO',
 'JULIET',
 'I',
 'A',
 'Romeo',
 'CAPULET',
 'NURSE',
 'The',
 'love',
 'And',
 'What',
 'shall',
 'But',
 'thou',
 'To',
 'That',
 'thee',
 'thy',
 'O',
 'Project']

**Comment on top 20 most frequent words:**

- These 20 most frequent words make sense as it includes the characters' names like Romeo and Juliet, and many old English words liek "thee", "thy", "thou", and also other words that related to the theme of the play like "Love".
- Words like "The", "What", "But", "shall", "That", "To", "I" also makes sense that they appear frequently since their common pronouns, propositions, and adverbs that would appear in regular speech for the play.

# Yelp sentiments

- Find your favorite restaurant on Yelp and copy 15 of its reviews into your notebook as Python strings
- You don't have to use requests for this, you can just copy and paste from a browser
- Also note the numbers of stars for each review in your notebook
- Use Vader to find the polarity of each review
- Compare Vader's scores against user-specified numbers of stars

## 15 Reviews and Stars of Marugame Udon

In [31]:
review1 = '''
This places makes me want to move to LA. There's so many simple but delicious Asian restaurants, and this is one of them. You pick your flavor of udon and they hand it to you right there, and you move down the line to pick your choice of tempura.

I got the nikutama curry udon and my partner got the tonkotsu. I loved the curry udon but WOW the broth on the tonkotsu was so delicious. Would highly recommend the tonkotsu if you love bold flavors. I loved the beef and the egg on the nikutama though. The regular was more than enough for both of us, especially with the added tempura. The shrimp and potato croquette tempura were my favorite, and we also got the chicken katsu one that was really good as well.

I would definitely come back the next time I'm in LA.
'''
review2 = '''
A staple restaurant.

Marugame has been a go-to since my college days. "Conveyer" belt udon with tempura of your choice.

My favorite has always been the Mentai Kamatama ($8.25 for regular size). It's more of a dry udon dish with minimal sauce and pollock roe. The flavor is clean but still savory and memorable. The crunchy tempura crisps and green onion toppings add another level of umami to the dish.

In terms of the tempura, highly recommend the ebi (shrimp). The batter is light and you can still taste the sweetness of the shrimp. I also tried the potato croquette, squid, and chicken tempura. The shrimp is still the best followed by the chicken, potato, then squid.

That being said, Marugame Udon is definitely a restaurant you should try and come back to more than once.
'''
review3 = '''
This was my first time at Marugame in a while. Everywhere else had long wait times,  but Marugame had a short line and quick service when I went.

FLAVOR (3.5/5):
Kitsune Udon: The Kitsune Udon is a solid choice when going to Marugame. The tofu is sweet (a bit too sweet for my liking), but it's a light dish that still leaves you full at the end of the meal. I also got the potato croquette, and it had a sweet flavor. If I were to come back, I'd definitely want to try the Curry Udon.

In terms of sides, I wish there were more options besides the tempura or maybe just more types of tempura. But overall, I was satisfied.

TAKEAWAYS:
If you're looking for a quick place to eat dinner when the rest of Sawtelle is bustling, Marugame is a good spot to go to. Not my first pick, but a decent one.
'''
review4 = '''
As the biggest udon fan, I love Marugame! The line moves swift and you're able to get your udon super fast. It's so convenient to step into line, grab the tempura and croquettes you want, pay at the counter, and eat your meal to your heart's content at your own speed.

I ordered the Nikutama udon and the meat is so flavorful! Paired with the bouncy udon noodles, it makes for an amazing meal. Even though I got a small, I was so full by the end of it.

I also got the shrimp tempura and potato croquette and both were perfectly crispy.

Would come here again and again -- highly recommend if you're in Sawtelle!
'''
review5 = '''
Throughout my years of college, Marugame is probably the most frequented restaurant, not only on Sawtelle but just generally. This is because it's super convenient. Line moves fast, food comes quick, and is always a solid meal. In my many visits to Marugame, I usually get the Nikutama and I don't really get tired of it. I'm also always a bit worried about trying something new because I've never liked other udons as much as the Nikutama.
This time, I tried the curry nikutama. I am not sure why, but the flavor of the curry nikutama was just not what I was expecting. While it wasn't bad, I don't think I would get it again. Their menu is also not very vast, which can be a plus or not depending on what you like.

I wish I could give it 5 stars, but I have noticed the pricing increase a lot recently, which is kinda disappointing since we often go to Marugame expecting a cheaper meal. I paid ~$17 for the curry nikutama + a potato croquette.
'''
review6 = '''
I mean... you can't go wrong with a $5.5 hearty bowl of udon. For a regular size, it is FILLING and very affordable! With the tempura sides that you can pick to your heart's content, it's a very satisfying meal. The assembly style of ordering is quick, convenient, and practical. Very reminiscent of the udon fast-food chains in Japan.

I've been here a number of times, and even when the line's long, it goes by relatively quickly. On a Thursday around 7:30PM, there was virtually no wait.

If it's your first time here, you can't go wrong with a simple Kake (dashi broth). I added their new Gekikara Fireball for 0.50 cents, and it was worth if you like spice. I only added half and it was more than enough kick.

I don't recommend the curry imo, as I remember it being lackluster in flavor, and salty if anything.

My personal favorite tempura add-ons; a bit pricey, albeit a must:  
- Shrimp ($2.25) - a staple!
- Squid ($2.25) - crunchy on the outside, and juicy and chewy on the inside.
- Sweet Potato ($1.95) - every so slightly sweet to cut the savory.
- Potato Croquette ($2.25) - thiccc. Think Japanese french fry LOL

Despite being self-service - shout-out to the busboys who clear the tables so quickly!!

I look forward to trying their Kitsune (sweet fried tofu) one day and for when they bring back the self-service toppings station especially the tempura bits!
'''
review7 = '''
This place was great! I came in on a Sunday at 4:30pm and there was no line and plenty of seating. (When we left around 5:30 the line was out the door).  The way it was set up was similar to a cafeteria line. I ordered the tonkatsu with all the fixings ~green onion and tempura flakes~ along with some tempura mushrooms and shrimp skewers. I also got the ginger berry oolong tea what they make in house.

I liked how they had a little bar area for chili, ginger, tempura sauce and water as well as a self serve to-go box/bag area. Since there wasn't any table service, you sat yourselves and brought your finished meals to the cleaning area as well. The bathrooms were also very clean.

Moving onto the food, the noodles were very chewy and cooked perfectly. I enjoyed the broth from the tonkatsu since it was really rich to where the noodles soaked up all that flavor. I especially liked the mushroom tempura since they were very juicy. One thing I wish they had was soy sauce since I like my tempura with soy sauce since it has a stronger flavor than the tempura sauce offered, but that is just a personal preference. Overall this place was very good, probably the next udon place I've been to yet.
'''
review8 = '''
I love the nikutama udon here! I tried the curry one and I still prefer the original nikutama. They do require proof of vaccination if you're dining in or outside in the patio.

Their udon is always so chewy and yummy! My fav part is the meat and I'm full off a regular bowl!! They also have tempura if you're in the mood for some of that. It's cafeteria style and the line typically goes by fast if theyre busy.
'''
review9 = '''
I am from OC, and before they opened one in costa mesa, I would come here an unhealthy amount of times. Like everytime I planned a trip to LA, I would make sure I came to Marugame Udon for some delicious beef udon and tempura. I dont know what else to say, but they have the best udon I have ever tried. The egg is literally to die for and the way they crack it in your bowl and you see it just floating on top of your bowl is crazy to me. They put a good amount of meat in your bowl and I just inhale my meal everytime i eat here. You have to add green onions, tempura, and chilli powder to your udon because those toppings are what makes it more flavorful and savory.

For parking, you can park in the structure, it is super cheap if you only come to eat for a bit, I usually pay around $2 or something, so its not one of those super expensive LA parking structures. They have more locations now so it is not just the one in Sawtelle. I am so glad there's one in OC and I dont have to drive an hour to get some delicious udon!
'''
review10 = '''
Noodles... IYKYK... noodles are my THING.  And these were some of the very best.  I hate to even compare to my local Udon spot, but this place killed it.  The noodles were incredibly fresh (tender with the perfect bite!)  and the broth was on point. The regular Kake Udon was the best example of traditional Udon Ive had in a minute! I got the BK udon and added a spring egg- just fantastic.   They constantly making tempura and so the tempura flakes are fresh and really crispy. I truly almost went  back for lunch today- twice in 2 days?? THATS HOW GOOD IT WAS.

Currently requiring vaxx proof for dine in, and offering togo containers for people without.  Cafeteria style ordering.    Really have all the utensils, lids, containers readily accessible- so nice so we could take our leftovers back to the hotel.
'''
review11 = '''
Come here for comforting, authentic udon and more deep fried sides you can shake your chopsticks at. After one bad experience the noodles are always chewy and boiled the right amount. It's hard to choose which bowl to get but a L is just $1 more than the Regular size.

Love the curry udon although the curry is thick like gravy. Even if there's a long line out the door, the wait isn't that long. It's always clean and they have outdoor seating. One request though, lower tables as the booth like seats are too low for me.

For Covid precautions they ask you while ordering if you want chopped scallions and fried batter bits but the toppings station still has water, soy sauce and hot sauce.
'''
review12 = '''
I've past by this place so many times and saw the long lines. I finally decided to try this place out. It turns out that its not a restuarant, but kind of like a convenience store style shop. I got a tonkotsu ramen.

The broth was very flavorful and the udon noodles were chewy and springy. I loved the addition of the tempura flakes, spicy meat, and green onions. They gave the broth a nice kick and the flakes added texture. The slice of meat was soft and slightly fatty, which is how I like it. My only complaint was that the egg was overly cooked.

Overall, the food was delicious and not too expensive. Seating was a bit difficult since the place is generally packed.
'''
review13 = '''
I don't have udon very often... but Marguame is one of the best I've had. The food was definitely worth all 5 stars, but that line... There's gotta be a faster or more efficient way to tame that line. Although we did not wait over an hour, I've heard horror stories of such waits. I mean the food is worth checking out, but the long lines may deter people from trying. It did move quickly, but I can see it getting crazyyy.

It is a cafeteria type of restaurant, soups/bowls first then the sides. Which I think they should switch up... Cause once I sat down the soup wasn't as hot. The sides included: shrimp, squid, chicken & veggies ALL dipped in tempura batter. They did have a chicken katsu type which I didn't get to try. For my bowl I got the Nikatuma which included beef, egg & BK sauce (couldn't tell ya what this is haha) [$10.95- Regular; $12.50-Large] Believe me, that udon was sooo good & filling, I could not finish my regular bowl! The meat reminded me of Yoshina meat- which was always on point- IYKYK! We tried the shrimp, squid & chicken tempura sides and it was also really good, nothing too special about it.

If Udon isn't your thing, they do have chicken or beef rice bowls which I heard were really great too. I'll have to try that next time. Yes, there will be a next time, just hoping for a shorter line! Lots of seating inside & a few tables on their patio. As of writing this review, MASKS are required inside & PROOF OF VAX to enter will be needed.
'''
review14 = '''
I love Marugame! I've been here multiple times now and the food is sooo good. It starts off cafeteria style. You basically pick what noodles you want, move on to picking out tempura (if you want any), then drinks. You pay and then pick up condiments and cutlery after. One of the reasons why I love Marugame is you get your food instantly.

I always get the curry udon. The noodles are fresh and springy. For dinner, make sure you get there around 5:30-6pm which is before the dinner rush. After and you will have to wait in a long line. (But the line does move quickly)
'''
review15 = '''
We ordered the Tonkatsu and the Curry Nikutama. Both great flavors, fresh handmade udon noodles. Loved the Tonkatsu broth!!! My only complaint is that the soft boiled egg in the Tonkatsu bowl was hard boiled. (I'm used to soft boiled eggs but maybe it's supposed to be hard boiled?) There wasn't a long line like the ones in Honolulu. We got in very quickly.
'''
review16 = '''
Cafeteria style of ordering. You wait in line and get a tray. Order your soup and along the way check out the tempura section for any pieces you might want before you pay.

With COVID, there's no longer a self serve of green onions, tempura, ginger, etc. You have to request for it.

For  reasonably price udon, it hits the spot for fast casual.

The only thing that can give you the most trouble is the parking. In Sawtelle area, you hit gold to find parking since everything is nearly street parking.
'''

REVIEWS = [
    review1,
    review2,
    review3,
    review4,
    review6,
    review7,
    review8,
    review9,
    review10,
    review11,
    review12,
    review13,
    review14,
    review15,
    review16
]

In [32]:
STARS = { 
    1: 5,
    2: 4,
    3: 4,
    4: 4,
    5: 4,
    6: 5,
    7: 5,
    8: 5,
    9: 5,
    10: 5,
    11: 4,
    12: 5,
    13: 4,
    14: 5,
    15: 4,
    16: 4
}

## Using Vader to find Polarity and compare with stars specified

In [38]:
from nltk.sentiment import vader
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [39]:
sia = vader.SentimentIntensityAnalyzer()

In [40]:
print(len(REVIEWS))

15


In [41]:
i = 1
for review in REVIEWS:
    print("Vader's polarity score for review", i)
    print(sia.polarity_scores(review))
    print("User specified stars:", STARS[i])
    print("--------------------------------------")
    i += 1

Vader's polarity score for review 1
{'neg': 0.0, 'neu': 0.68, 'pos': 0.32, 'compound': 0.9965}
User specified stars: 5
--------------------------------------
Vader's polarity score for review 2
{'neg': 0.0, 'neu': 0.855, 'pos': 0.145, 'compound': 0.969}
User specified stars: 4
--------------------------------------
Vader's polarity score for review 3
{'neg': 0.0, 'neu': 0.816, 'pos': 0.184, 'compound': 0.9834}
User specified stars: 4
--------------------------------------
Vader's polarity score for review 4
{'neg': 0.011, 'neu': 0.792, 'pos': 0.197, 'compound': 0.9744}
User specified stars: 4
--------------------------------------
Vader's polarity score for review 5
{'neg': 0.025, 'neu': 0.85, 'pos': 0.125, 'compound': 0.9727}
User specified stars: 4
--------------------------------------
Vader's polarity score for review 6
{'neg': 0.007, 'neu': 0.866, 'pos': 0.127, 'compound': 0.9683}
User specified stars: 5
--------------------------------------
Vader's polarity score for review 7
{'

**Comment on Polarity comparison:**
- Most reviews with 5 stars were analyzed by Vader with mostly neutral words and additions towards positive and 0 negative.
- On the other hand, the reviews with 4 stars have higher negative polarity scores and less positive scores, but still mostly remained in the neutral range.
- Overall, most of Vader's output primarily stays within the neutral range score and rarely are overwhelmingly negative nor positive.

# Movie reviews

- Make 5 strings that contain reviews (3 sentences each) of your favorite movie comedies
- Make 5 strings that contain reviews (3 sentence each) of your favorite movie dramas
- Make a Python list that contains these 10 strings
- Replicate the analysis pipeline from "04_news_topics.ipynb"
- You don't have to open any files
- Instead of using "listOfNews", use your list of movie reviews
- Modify the characters in "extrastop" if you want to
- For the LDA model step, use "num_topics = 2"
- Comment on the words that the model chooses to represent the 2 topics, and whether they match with your split between comedies and dramas

## Strings of Reviews for Mean Girls and West Side Story

In [1]:
com1 = '''
I give this a five star i absolutely love this movie, it is my favorite thing to watch i literally watch it EVERY DAY WHEN I GET THE CHANCE!! I absolutely love it. Its the best thing ever And if you are reading this look at my profile picture on the top. ITS LITERALLY REGINA GEORGE! That is literally how much i love the movie.
'''

com2 = '''
My favorite character is Regina George, although she is nice in front of your face she will talk behind your back, and honestly, I think that is a definition of a mean girl. They did a really great job with giving the characters a whole lot of personality and sort of showing why they act the way they do. I still wish to this day that there was a sequel, even though this was so long ago.
'''

com3 = '''
I absolutely love this movie! I am in middle school and at sleepovers this is one of our go-to movies. It has the drama, the comedy, and overall entertainment. 
'''

com4 = '''
The characters, for one, are really interesting and all the actors and actresses did such great performances. The story was also interesting and I never  lost interest in it. The comedy is spot on and first going into it I did not think I'd be laughing so hard.
'''

com5 = '''
one of my  fave movies ever. very quotable and realistic. i still watch it when im free. its a classic and an unforgettable movie.
'''

In [2]:
drama1 = '''
If you are thinking of watching this movie, grab the tissues. There are so many plot twists!!
I know this film is obviously based off of the original Shakespearean play, Romeo and Juliet.
'''

drama2 = '''
The previous film version was more of the stage version on the screen. Filmed on sets. By bringing it to the street of New York City, it enables this film to use the city as a character and accurately represent the plight of immigrants in the 50s and that still resonates to this day.
'''

drama3 = '''
And yet, as revered as that 1961 adaptation is, it is not without its faults, notably the casting of white actors as Latinx characters and a pair of romantic leads (Natalie Wood and Richard Beymer) who don’t sing and are arguably the weakest members of an otherwise ace cast. Adam and Josh make the case that with his thrilling new adaptation, Spielberg more than answers the why, without necessarily fixing all of the earlier film’s weaknesses. Plus a review of Sean Baker’s latest, RED ROCKET, which has Simon Rex’s former adult film star making an ignominious return to his Texas hometown.
'''

drama4 = '''
Spielberg does the impossible here (make a great new film adaptation of a classic stage musical that already had a classic film adaptation) and nearly all of the changes/ updates work. Hell even some of the new stagings are even better here ("Cool" and "I Feel Pretty"). Apparently Spielberg's old school craftsmanship is exactly what the movie musical needed.
'''

drama5 = '''
Having seen the original moving several times and having heard the original B'way recording, seeing this version was like seeing an entirely new fantastic movie. Including the redevelopment of the actual neighborhood area as the backdrop for the movie gave it gravitas. Thank you to all involved for giving us a wonderful movie.
'''

## Pipeline for LDA model

In [42]:
listOfReviews = [com1, com2, com3, com4, com5, drama1, drama2, drama3, drama4, drama5]
print(len(listOfReviews))

10


In [5]:
import pandas as pd
from pathlib import Path  
import glob

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from string import punctuation

In [6]:
extrastop = ['``',"''","'re","'s","'re",'``',"''","'ll","--","\'\'","...",
             "n\'t",'one','would','use','subject','from',"\'m","\'ve"]

In [7]:
myStopWords = list(punctuation) + stopwords.words('english') + extrastop

In [8]:
[w for w in word_tokenize(listOfReviews[0].lower()) if w not in myStopWords]

['give',
 'five',
 'star',
 'absolutely',
 'love',
 'movie',
 'favorite',
 'thing',
 'watch',
 'literally',
 'watch',
 'every',
 'day',
 'get',
 'chance',
 'absolutely',
 'love',
 'best',
 'thing',
 'ever',
 'reading',
 'look',
 'profile',
 'picture',
 'top',
 'literally',
 'regina',
 'george',
 'literally',
 'much',
 'love',
 'movie']

In [9]:
listOfReviewsWords = []
for i in listOfReviews:
    listOfReviewsWords.append([w for w in word_tokenize(i.lower()) if w not in myStopWords])

In [10]:
listOfReviewsWords[0]

['give',
 'five',
 'star',
 'absolutely',
 'love',
 'movie',
 'favorite',
 'thing',
 'watch',
 'literally',
 'watch',
 'every',
 'day',
 'get',
 'chance',
 'absolutely',
 'love',
 'best',
 'thing',
 'ever',
 'reading',
 'look',
 'profile',
 'picture',
 'top',
 'literally',
 'regina',
 'george',
 'literally',
 'much',
 'love',
 'movie']

In [11]:
from nltk.stem.porter import PorterStemmer
#from nltk.stem import LancasterStemmer

In [12]:
p_stemmer = PorterStemmer()

In [13]:
listOfStemmedWords = []
for i in listOfReviewsWords:
    listOfStemmedWords.append([p_stemmer.stem(w) for w in i])

In [14]:
listOfStemmedWords[0]

['give',
 'five',
 'star',
 'absolut',
 'love',
 'movi',
 'favorit',
 'thing',
 'watch',
 'liter',
 'watch',
 'everi',
 'day',
 'get',
 'chanc',
 'absolut',
 'love',
 'best',
 'thing',
 'ever',
 'read',
 'look',
 'profil',
 'pictur',
 'top',
 'liter',
 'regina',
 'georg',
 'liter',
 'much',
 'love',
 'movi']

In [15]:
!pip install gensim

Collecting gensim
  Downloading gensim-4.1.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB)
[K     |████████████████████████████████| 24.0 MB 19.5 MB/s eta 0:00:01
[?25hCollecting smart-open>=1.8.1
  Downloading smart_open-5.2.1-py3-none-any.whl (58 kB)
[K     |████████████████████████████████| 58 kB 1.8 MB/s  eta 0:00:01
Installing collected packages: smart-open, gensim
Successfully installed gensim-4.1.2 smart-open-5.2.1


In [16]:
from gensim import corpora, models
import gensim

In [17]:
dictionary = corpora.Dictionary(listOfStemmedWords)

In [18]:
print(dictionary.token2id)

{'absolut': 0, 'best': 1, 'chanc': 2, 'day': 3, 'ever': 4, 'everi': 5, 'favorit': 6, 'five': 7, 'georg': 8, 'get': 9, 'give': 10, 'liter': 11, 'look': 12, 'love': 13, 'movi': 14, 'much': 15, 'pictur': 16, 'profil': 17, 'read': 18, 'regina': 19, 'star': 20, 'thing': 21, 'top': 22, 'watch': 23, 'act': 24, 'ago': 25, 'although': 26, 'back': 27, 'behind': 28, 'charact': 29, 'definit': 30, 'even': 31, 'face': 32, 'front': 33, 'girl': 34, 'great': 35, 'honestli': 36, 'job': 37, 'long': 38, 'lot': 39, 'mean': 40, 'nice': 41, 'person': 42, 'realli': 43, 'sequel': 44, 'show': 45, 'sort': 46, 'still': 47, 'talk': 48, 'think': 49, 'though': 50, 'way': 51, 'whole': 52, 'wish': 53, 'comedi': 54, 'drama': 55, 'entertain': 56, 'go-to': 57, 'middl': 58, 'overal': 59, 'school': 60, 'sleepov': 61, "'d": 62, 'actor': 63, 'actress': 64, 'also': 65, 'first': 66, 'go': 67, 'hard': 68, 'interest': 69, 'laugh': 70, 'lost': 71, 'never': 72, 'perform': 73, 'spot': 74, 'stori': 75, 'classic': 76, 'fave': 77, 'fr

In [19]:
corpus = [dictionary.doc2bow(text) for text in listOfStemmedWords]

In [44]:
print(len(corpus))

10


In [45]:
print(len(dictionary))

204


In [24]:
ldamodel = gensim.models.ldamodel.LdaModel(corpus, 
                                           num_topics=2, 
                                           id2word = dictionary, 
                                           passes=20)

In [47]:
for i in ldamodel.print_topics(num_topics=2, num_words=20):
    print(i)

(0, '0.020*"’" + 0.019*"movi" + 0.016*"charact" + 0.011*"make" + 0.011*"adapt" + 0.011*"film" + 0.011*"still" + 0.011*"without" + 0.011*"cast" + 0.007*"actor" + 0.007*"realli" + 0.007*"star" + 0.007*"spielberg" + 0.007*"regina" + 0.007*"favorit" + 0.007*"georg" + 0.007*"think" + 0.007*"answer" + 0.007*"ace" + 0.007*"beymer"')
(1, '0.029*"movi" + 0.024*"film" + 0.017*"new" + 0.013*"love" + 0.013*"watch" + 0.013*"origin" + 0.013*"liter" + 0.013*"version" + 0.013*"stage" + 0.013*"interest" + 0.010*"absolut" + 0.010*"classic" + 0.009*"give" + 0.009*"even" + 0.009*"day" + 0.009*"great" + 0.009*"see" + 0.009*"thing" + 0.009*"music" + 0.009*"think"')


**Comment on the model:**
- The model split does not really represent my choice between comedy and drama well, as both of the split contains words from either of the reviews of both comedy and drama.
- For example, the first split has both spielberg and regina (which is part of the different movies that I selected)
- This may be because Mean Girls is somewhat of a comedy drama and not completely drama, so the theme could be mixed.
- The model split is a better split when looking that they are between words that mention the characters, director, and cast of the movies vs words that mention the origin, story, and production of the movie.