Assignment 1 - analysis of movie reviews
===

*Due: November 29 2022*

In this assignment you will use web scraping tools to extract the reviews of 2 movies - Dune and Interstellar  - from Rotten Tomatoes.

You will save these reviews as text - strings - in a database that you will save as a CSV file.

After obtaining a dataset of Dune and Interstellar reviews, for each movie you will obtain:

- Wordclouds

- Word Frequency Barplots with the 20 most-frequent words: frequency as Y-axis and words as X-axis

- Sentiment scores for each movie using AFINN

## Define function to download reviews

Code take from [this](https://stackoverflow.com/questions/69963743/scraping-all-reviews-of-a-movie-from-rotten-tomato-using-soup) stackoverflow question:

In [8]:
import pandas as pd
import requests
import re
import time

headers = {
    'Referer': 'https://www.rottentomatoes.com/m/notebook/reviews?type=user',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
}

s = requests.Session()
        
def get_reviews(url):
    r = requests.get(url)
    movie_id = re.findall(r'(?<=movieId":")(.*)(?=","type)',r.text)[0]

    api_url = f"https://www.rottentomatoes.com/napi/movie/{movie_id}/criticsReviews/all" #use reviews/userfor user reviews
    
    payload = {
        'direction': 'next',
        'endCursor': '',
        'startCursor': '',
    }
    
    review_data = []
    
    while True:
        r = s.get(api_url, headers=headers, params=payload)
        data = r.json()

        if not data['pageInfo']['hasNextPage']:
            break

        payload['endCursor'] = data['pageInfo']['endCursor']
        payload['startCursor'] = data['pageInfo']['startCursor'] if data['pageInfo'].get('startCursor') else ''

        review_data.extend(data['reviews'])
        time.sleep(1)
    
    return review_data

## Download reviews

In [12]:
movies = [
    'https://www.rottentomatoes.com/m/interstellar_2014', 
    'https://www.rottentomatoes.com/m/dune_2021'
]

results = []

for movie in movies:
    data = get_reviews(links_to_movies[0] + '/reviews')
    df = pd.json_normalize(data)
    results.append(df)

Interstellar

In [15]:
results[0].head()

Unnamed: 0,creationDate,isFresh,isRotten,isRtUrl,isTop,reviewUrl,quote,reviewId,scoreOri,scoreSentiment,critic.name,critic.criticPictureUrl,critic.vanity,publication.id,publication.name
0,"Aug 22, 2022",True,False,False,False,https://keithandthemovies.com/2014/11/22/revie...,It&#8217;s a contemplative adventure and an em...,102722313,5/5,POSITIVE,Keith Garlington,http://resizing.flixster.com/aM3SRz1K2wrW0GhcQ...,keith-garlington,100009656,Keith & the Movies
1,"Jun 30, 2022",True,False,False,False,https://deepfocusreview.com/reviews/interstellar/,Rarely do epics of this scope and intelligence...,102705084,4/4,POSITIVE,Brian Eggert,http://resizing.flixster.com/ARlILdRJ3N3i_mqLb...,brian-eggert,100009573,Deep Focus Review
2,"May 27, 2022",True,False,False,False,https://www.nextbestpicture.com/interstellar.html,While not all-together perfect&#44; the film r...,102694093,9/10,POSITIVE,Josh Parham,http://resizing.flixster.com/ewXutVxLpPH4_isfe...,josh-parham,3046,Next Best Picture
3,"Feb 11, 2022",True,False,False,False,https://www.williamsonhomepage.com/community/w...,"Nolans ambition and talent, coupled with argua...",102654899,,POSITIVE,Cory Woodroof,http://resizing.flixster.com/LQEvKNkq51uGw52gK...,cory-woodroof,100009561,Williamson Home Page
4,"Oct 9, 2021",True,False,False,False,https://www.nerdophiles.com/2014/11/05/interst...,"The inherent message of the film brings hope, ...",2830324,3/5,POSITIVE,Therese Lacson,http://resizing.flixster.com/gqCoLiAqOxsvfF8bA...,therese-lacson,3888,Nerdophiles


Dune

In [16]:
results[1].head()

Unnamed: 0,creationDate,isFresh,isRotten,isRtUrl,isTop,reviewUrl,quote,reviewId,scoreOri,scoreSentiment,critic.name,critic.criticPictureUrl,critic.vanity,publication.id,publication.name
0,"Aug 22, 2022",True,False,False,False,https://keithandthemovies.com/2014/11/22/revie...,It&#8217;s a contemplative adventure and an em...,102722313,5/5,POSITIVE,Keith Garlington,http://resizing.flixster.com/aM3SRz1K2wrW0GhcQ...,keith-garlington,100009656,Keith & the Movies
1,"Jun 30, 2022",True,False,False,False,https://deepfocusreview.com/reviews/interstellar/,Rarely do epics of this scope and intelligence...,102705084,4/4,POSITIVE,Brian Eggert,http://resizing.flixster.com/ARlILdRJ3N3i_mqLb...,brian-eggert,100009573,Deep Focus Review
2,"May 27, 2022",True,False,False,False,https://www.nextbestpicture.com/interstellar.html,While not all-together perfect&#44; the film r...,102694093,9/10,POSITIVE,Josh Parham,http://resizing.flixster.com/ewXutVxLpPH4_isfe...,josh-parham,3046,Next Best Picture
3,"Feb 11, 2022",True,False,False,False,https://www.williamsonhomepage.com/community/w...,"Nolans ambition and talent, coupled with argua...",102654899,,POSITIVE,Cory Woodroof,http://resizing.flixster.com/LQEvKNkq51uGw52gK...,cory-woodroof,100009561,Williamson Home Page
4,"Oct 9, 2021",True,False,False,False,https://www.nerdophiles.com/2014/11/05/interst...,"The inherent message of the film brings hope, ...",2830324,3/5,POSITIVE,Therese Lacson,http://resizing.flixster.com/gqCoLiAqOxsvfF8bA...,therese-lacson,3888,Nerdophiles
