## Scrapping The Prestige Reviews

- Web Automation: Selenium
- Scrapping Tags: Beautiful Soup
- Language Used:  Python


- Other Libraries:

    - Python Time Module
    - NLTK for analysis


In [23]:
# Importing Important libraries
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import ElementNotVisibleException
import time


def download_reviews(soup):
        
        '''Input: soup- bs4.BeautifulSoup - Reviews Page
           
           This function downloads user reviews into reviews.txt file.
            '''
        # Fetching the reviews
        cnt=0
        for i in soup.findAll("div",class_="text show-more__control"):
            cnt+=1
            with open("reviews.txt","a",encoding='utf-8') as f:
                    f.write(i.text)
                    f.write("\n\n")

        print("Reviews were successfully downloaded in reviews.txt")

def get_reviews(url,filename):
    
    driver = webdriver.Chrome(executable_path=r"C:\Users\shrut\Downloads\chromedriver.exe")
    driver.get(url)

    while True:
        try:

    #       Getting the load more button id in imdb
            loadmore = driver.find_element_by_id("load-more-trigger")

    #       Wait until the toaster message disappears.
            time.sleep(2)
            loadmore.click()

    #       Wait until the toaster message disappears.
            time.sleep(2)
    

        except NoSuchElementException:

            print("Reached bottom of page")
            break

        except ElementNotVisibleException:
            
            print("Reached bottom of page")
            break
            
    soup = BeautifulSoup(driver.page_source,'html.parser')
    download_reviews(soup)

get_reviews("https://www.imdb.com/title/tt0482571/reviews","reviews.txt")

Reached bottom of page
Reviews were successfully downloaded in reviews.txt


### Reading the Reviews

In [7]:
# Reading the Reviews for basic analysis
with open("reviews.txt","r",encoding='utf-8') as f:
    r=f.read()
type(r)

str

### What can we do with what we collected?

Basic demonstration

In [10]:
import nltk
# nltk.download()

In [8]:
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 
from nltk.tokenize import RegexpTokenizer

 
# Fetching Stop Words
stop_words = set(stopwords.words('english')) 

tokenizer = RegexpTokenizer(r'\w+')
word_tokens=tokenizer.tokenize(r)

# Filtering the reviews to remove unnecessary stop words
filtered_words = [w for w in word_tokens if w.lower() not in stop_words]
print(filtered_words)




In [11]:
from nltk.probability import FreqDist
fdist = FreqDist(filtered_words)
print(fdist)

<FreqDist with 10985 samples and 95735 outcomes>


**The most common keywords includes magic, trick, good and of that ilk. It was obvious as the movie does revolve around two magicians Borden and Angier. One the word is Nolan who is the director of the film. Nolan was mentioned a lot in many reviews, next we get lines that has Nolan in it.**

In [14]:
# Getting the most common 15 words
fdist.most_common(15)

[('movie', 2171),
 ('film', 1376),
 ('one', 957),
 ('Nolan', 722),
 ('Borden', 639),
 ('Angier', 630),
 ('Jackman', 623),
 ('Bale', 609),
 ('story', 576),
 ('magic', 575),
 ('time', 566),
 ('like', 532),
 ('trick', 500),
 ('good', 500),
 ('Prestige', 489)]

In [15]:
t = nltk.tokenize.WhitespaceTokenizer()  # or any other Tokenizer
c = nltk.Text(t.tokenize(r))

In [16]:
# finding instances that had movie word in it
for i in c.concordance_list(u"movie"):
    print(i.line,"\n")

Great movie with Great actors writers and twists 

tige are performed. Similarly, for a movie to be a success, its three main aspe 

lls aggrandize the brilliance of the movie ten-fold. Nolan succeeds in having a 

yal as Borden's paramour, Olivia.The movie is a roller-coaster of a ride with i 

sy, is truly exemplary and makes the movie a contemporary classic. The movie is 

he movie a contemporary classic. The movie is a tapestry of twists and turns, w 

scussed it all the way home from the movie theater. This is a winner and should 

better plot. I highly recommend this movie to all ages.New idea for a movie, an 

 a magician on TV!i had to give this movie a 10/10, and i only have 4 movies th 

it as much as i did. What makes this movie so incredible is that while it is in 

redible is that while it is indeed a movie about magicians (or illusionists) it 

ly stays with you. This was the best movie I have seen in at least the past two 

 spent doing better things. Yet this movie was 

In [17]:
# Getting lines where nolan was mentioned
for i in c.concordance_list(u"Nolan"):
    print(i.line,"\n")

e ought to be top-notch. Christopher Nolan incredibly manages to strike all the 

he brilliance of the movie ten-fold. Nolan succeeds in having a dream assemblag 

ate masterpiece. The uncanny feat of Nolan to manifest a motion picture, which  

ection, but if you're willing to let Nolan lead you on the journey into increas 

eed to know anything but Christopher Nolan made the movie and that's the bottom 

I was floored by the deftness of how Nolan weaved and juxtaposed the non linear 

 Prestige is directed by Christopher Nolan and stars Hugh Jackman and Christian 

ovie. One of Nolan's best yet again. Nolan is such an inspiration, he hasn't ma 

e writing in this film is excellent. Nolan brought the script to life with thes 

s/CGI to wow the viewers Christopher Nolan is an unmitigated genius and doesn't 

l have been a statue, and apparently Nolan never told Caine that he wasn't stil 

t just doesn't get any better. chris nolan you are a god. As a fan of old Twili 

uctures have bec