# Web Scraping

Scraping data from imdb.com and sort it based on total votes.

## Import Packages

In [1]:
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

selenium.__version__

'4.22.0'

## 1. The Grand Budapest Hotel - $174,563,280

In [2]:
# Use the absolute path to chromedriver.exe
path = r'C:\Users\Drafter\Documents\Chantika\Tools\chromedriver-win64\chromedriver.exe'
service = Service(executable_path=path)

driver = webdriver.Chrome(service=service)

url = 'https://www.imdb.com/title/tt2278388/reviews?sort=totalVotes&dir=desc&ratingFilter=0'
driver.get(url)

# Define the wait time
wait = WebDriverWait(driver, 10)

# Click the "Load More" button 20 times
for _ in range(20):
    try:
        # Wait until the button is clickable
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="load-more-trigger"]')))
        load_more_button.click()
        # Optionally, wait for some element that indicates new content has loaded
        wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "review-container")]')))
    except Exception as e:
        print(f"Exception occurred while loading more reviews: {e}")
        break

# Get the page source after all content is loaded
page_source = driver.page_source

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')

# Find all review containers
review_containers = soup.find_all('div', class_='lister-item-content')

review_data = {
    'title': [],
    'text': [],
    'rating': []
}

for container in review_containers:
    # Extract review title
    title_elem = container.find('a', class_='title')
    if title_elem:
        review_title = title_elem.get_text(strip=True)
    else:
        review_title = "No title available"
    # Debug
    print(review_title)

    # Extract review text
    text_elem = container.find('div', class_='text show-more__control')
    if text_elem:
        review_text = text_elem.get_text(strip=True)
    else:
        review_text = "No text available"
        
    # Extract review rating
    rating_elem = container.find('span', class_='rating-other-user-rating')
    if rating_elem:
        review_rating = rating_elem.find('span').get_text(strip=True)
    else:
        review_rating = "No rating available"
    
    # Append to dictionary
    review_data['title'].append(review_title)
    review_data['text'].append(review_text)
    review_data['rating'].append(review_rating)

# Close the driver
driver.quit()

A perfect holiday without leaving home.
Absurd, funny, exciting, violent and colourful
Grand Waste of a Stellar Cast
The Emperor's New Clothes
"There are still faint glimmers of civilization left in this barbaric slaughterhouse that was once known as humanity... He was one of them. "
Wes Anderson's Best?
A Grand Adventure
A funny thing happened to me on the way to the Grand Budapest Hotel
Boring Boring Boring
A brilliantly entertaining fantasy outing by Wes Anderson
Another caper for the new millennium
A quirky and visual feast which lacks substance and interesting characters.
Wes Anderson's drug dealer must be rolling in money
what on earth was this movie?
Entertaining, slightly farcical, tale of dark deeds and friendship
So so good.
Too pretentious and boring....
A hotel well worth revisiting more than once
Does not deserve 1 star but nor does it deserve 8.4
Weak and rather pointless movie with a luxury wrapping
Over rated film
Same old Same old
Was so looking forward to this. But sa

In [3]:
df = pd.DataFrame(review_data)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   500 non-null    object
 1   text    500 non-null    object
 2   rating  500 non-null    object
dtypes: object(3)
memory usage: 11.8+ KB


In [4]:
df.sample(5)

Unnamed: 0,title,text,rating
16,Too pretentious and boring....,The Grand Budapest Hotel is written and direct...,5
481,"A silly movie, but with great acting, cinemato...",No text available,9
256,An Instant Classic,No text available,9
196,"Frivolous and visually astounding, fun and fun...",No text available,7
450,it's funny and heartwarming,The Grand Budapest Hotel is a sweet adventure ...,No rating available


In [5]:
df.to_csv('grand_budapest.csv', index=False)

## 2. The Royal Tenenbaums - $71,441,655

In [6]:
# Use the absolute path to chromedriver.exe
path = r'C:\Users\Drafter\Documents\Chantika\Tools\chromedriver-win64\chromedriver.exe'
service = Service(executable_path=path)

driver = webdriver.Chrome(service=service)

url2 = 'https://imdb.com/title/tt0265666/reviews?sort=totalVotes&dir=desc&ratingFilter=0'
driver.get(url2)

# Define the wait time
wait = WebDriverWait(driver, 10)

# Click the "Load More" button 20 times
for _ in range(20):
    try:
        # Wait until the button is clickable
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="load-more-trigger"]')))
        load_more_button.click()
        # Optionally, wait for some element that indicates new content has loaded
        wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "review-container")]')))
    except Exception as e:
        print(f"Exception occurred while loading more reviews: {e}")
        break

# Get the page source after all content is loaded
page_source = driver.page_source

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')

# Find all review containers
review_containers = soup.find_all('div', class_='lister-item-content')

review_data = {
    'title': [],
    'text': [],
    'rating': []
}

for container in review_containers:
    # Extract review title
    title_elem = container.find('a', class_='title')
    if title_elem:
        review_title = title_elem.get_text(strip=True)
    else:
        review_title = "No title available"
    # Debug
    print(review_title)

    # Extract review text
    text_elem = container.find('div', class_='text show-more__control')
    if text_elem:
        review_text = text_elem.get_text(strip=True)
    else:
        review_text = "No text available"
        
    # Extract review rating
    rating_elem = container.find('span', class_='rating-other-user-rating')
    if rating_elem:
        review_rating = rating_elem.find('span').get_text(strip=True)
    else:
        review_rating = "No rating available"
    
    # Append to dictionary
    review_data['title'].append(review_title)
    review_data['text'].append(review_text)
    review_data['rating'].append(review_rating)

# Close the driver
driver.quit()

The perfect balance of drama and comedy
It's more than quirky!
Thank you Wes Anderson! The film is brilliant!
Beauty found in comic places
Humour for the emotionally handicapped
The most depressing comedy ever made
uninspired ... where am i supposed to laugh?
The Royal Mess
Engaging, ghoulish "Tenenbaums" is comic royalty at its best
Favorite movie, even after several viewings
hilariously freaky, yet heartrendingly poignant
'Quirky' Seems To Be The Most Popular Word To Describe This
Slow-burning masterpiece
It never fails to surprise me!
`Different' comedy that uses it's quirks well to produce a wonderful film
Respect different tastes, please
what a load of crap
What's the big deal?
"Royal" Rubbish.....
Sometimes a film becomes a place
Maybe I missed something
I remember when comedies were suppose to be funny...
I can't believe it....
What the hell was that all about!?!
Dire
Ok, I get it--but it's still awful
Quirky and highly original.
Anderson devotees will cherish this one no matter

In [7]:
df2 = pd.DataFrame(review_data)

df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 499 entries, 0 to 498
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   499 non-null    object
 1   text    499 non-null    object
 2   rating  499 non-null    object
dtypes: object(3)
memory usage: 11.8+ KB


In [8]:
df2.sample(5)

Unnamed: 0,title,text,rating
10,"hilariously freaky, yet heartrendingly poignant",No text available,10
233,STAY AWAY !!,Despite Mr. Hackmans good performance...this f...,1
350,Supposed to be funny but doesn't make it,This movie is supposed to be funny and subtle ...,3
247,Second Time Beautiful,"I did not like ""The Royal Tenenbaums"" the firs...",8
80,This movie is more evidence that Movie Critics...,No text available,1


In [9]:
df2.to_csv('royal_tenenbaums.csv', index=False)

## 3. Moonrise Kingdom - $68,264,022

In [10]:
# Use the absolute path to chromedriver.exe
path = r'C:\Users\Drafter\Documents\Chantika\Tools\chromedriver-win64\chromedriver.exe'
service = Service(executable_path=path)

driver = webdriver.Chrome(service=service)

url3 = 'https://www.imdb.com/title/tt1748122/reviews?sort=totalVotes&dir=desc&ratingFilter=0'
driver.get(url3)

# Define the wait time
wait = WebDriverWait(driver, 10)

# Click the "Load More" button 20 times
for _ in range(20):
    try:
        # Wait until the button is clickable
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="load-more-trigger"]')))
        load_more_button.click()
        # Optionally, wait for some element that indicates new content has loaded
        wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "review-container")]')))
    except Exception as e:
        print(f"Exception occurred while loading more reviews: {e}")
        break

# Get the page source after all content is loaded
page_source = driver.page_source

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')

# Find all review containers
review_containers = soup.find_all('div', class_='lister-item-content')

review_data = {
    'title': [],
    'text': [],
    'rating': []
}

for container in review_containers:
    # Extract review title
    title_elem = container.find('a', class_='title')
    if title_elem:
        review_title = title_elem.get_text(strip=True)
    else:
        review_title = "No title available"
    # Debug
    print(review_title)

    # Extract review text
    text_elem = container.find('div', class_='text show-more__control')
    if text_elem:
        review_text = text_elem.get_text(strip=True)
    else:
        review_text = "No text available"
        
    # Extract review rating
    rating_elem = container.find('span', class_='rating-other-user-rating')
    if rating_elem:
        review_rating = rating_elem.find('span').get_text(strip=True)
    else:
        review_rating = "No rating available"
    
    # Append to dictionary
    review_data['title'].append(review_title)
    review_data['text'].append(review_text)
    review_data['rating'].append(review_rating)

# Close the driver
driver.quit()

Moonrise Kingdom will leave you dreamy and smiling, with a hint of melancholy
An ambitious film which for the most part delivers spectacularly
Innocent, beautiful and brilliant fun
Contrived Pretentious Lifeless Style over Substance
Anderson's finest yet?
One of the worst movies I've seen in a long time
Wes Anderson's best? It could well be.
Dull as dishwater
Sweet, beautiful, and funny
Could Have Been Much Better
Boring and uncomfortable
Possibly Anderson's best film in terms of style.
Might be my favourite Wes Anderson film
Highly recommended
Cloyingly annoying. Pretentious high-brow crap w/ wasted talent.
Let's get quirky!
I "get" Wes Anderson but still think it's a poor film!
Pretentious, artsy-fartsy, boring BS
Good Cinematography, Boring Movie
B Movie - Really .... A Bomb
Annoyingly quirky and pretentious
The Emperor's New Clothes
Fabulous Escapist Enteratinment
It's hard to explain why chewing plastic is not tasty
Possibly the worst movie I have ever seen
A waste of $10
Is "hips

In [11]:
df3 = pd.DataFrame(review_data)

df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 499 entries, 0 to 498
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   499 non-null    object
 1   text    499 non-null    object
 2   rating  499 non-null    object
dtypes: object(3)
memory usage: 11.8+ KB


In [12]:
df3.sample(5)

Unnamed: 0,title,text,rating
279,Extraordinary Film,Nice locations and view and a great acting by ...,7
202,Enjoy falling into another Wes Anderson fairy ...,No text available,8
4,Anderson's finest yet?,No text available,10
417,The movie form of that lovely old lady who alw...,This is one of those nice little movies that p...,8
240,Superb,This is definitely one of the best movies I've...,10


In [13]:
df3.to_csv('moonrise_kingdom.csv', index=False)

## 4. Isle of Dogs - $64,656,608

In [2]:
# Ussamplethe absolute path to chromedriver.exe
path = r'C:\Users\Drafter\Documents\Chantika\Tools\chromedriver-win64\chromedriver.exe'
service = Service(executable_path=path)

driver = webdriver.Chrome(service=service)

url4 = 'https://www.imdb.com/title/tt5104604/reviews?sort=totalVotes&dir=desc&ratingFilter=0'
driver.get(url4)

# Define the wait time
wait = WebDriverWait(driver, 10)

# Click the "Load More" button 20 times
for _ in range(20):
    try:
        # Wait until the button is clickable
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="load-more-trigger"]')))
        load_more_button.click()
        # Optionally, wait for some element that indicates new content has loaded
        wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "review-container")]')))
    except Exception as e:
        print(f"Exception occurred while loading more reviews: {e}")
        break

# Get the page source after all content is loaded
page_source = driver.page_source

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')

# Find all review containers
review_containers = soup.find_all('div', class_='lister-item-content')

review_data = {
    'title': [],
    'text': [],
    'rating': []
}

for container in review_containers:
    # Extract review title
    title_elem = container.find('a', class_='title')
    if title_elem:
        review_title = title_elem.get_text(strip=True)
    else:
        review_title = "No title available"
    # Debug
    print(review_title)

    # Extract review text
    text_elem = container.find('div', class_='text show-more__control')
    if text_elem:
        review_text = text_elem.get_text(strip=True)
    else:
        review_text = "No text available"
        
    # Extract review rating
    rating_elem = container.find('span', class_='rating-other-user-rating')
    if rating_elem:
        review_rating = rating_elem.find('span').get_text(strip=True)
    else:
        review_rating = "No rating available"
    
    # Append to dictionary
    review_data['title'].append(review_title)
    review_data['text'].append(review_text)
    review_data['rating'].append(review_rating)

# Close the driver
driver.quit()

Exception occurred while loading more reviews: Message: 

Substantive style over Substance
I Love Dogs...and Japan...and Great Films
Visually interesting but ultimately more style over substance.
I went for Wes Anderson and I got Wes Anderson
Meh . . .
How could you NOT love it?
Didn't work. Pick a genre for your narrative.
Isle Of Dogs belongs on Trash Island
Visually interesting but the plot is uninspired
cute and clever but inconsequential
I love dogs, but this sucked
used the images but ignored the history
BIG DIRECTOR alert
A flash of absorbing and unconventional creativity
Isle of Stereotypes
Another Charming And Quirky Wes Anderson Creation
A great animation does not make a great movie.
Another Wes Anderson Classic
A true gem! Best Wes Anderson since "The Royal Tenenbaums"
Just Another anti-Japanese American Movie
This film is as offensive as it is derivative.
A Work of Art
Disappointing
I don't get the good reviews
Wes Anderson You Done It again
Trash Island becomes a trashy mo

In [4]:
df4 = pd.DataFrame(review_data)

df4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   489 non-null    object
 1   text    489 non-null    object
 2   rating  489 non-null    object
dtypes: object(3)
memory usage: 11.6+ KB


In [5]:
df4.sample(5)

Unnamed: 0,title,text,rating
27,Another Wes Anderson masterpiece!,I love Wes Anderson's films and I love animati...,8
80,How can you be sexist with dogs? Here's how,Even when he's directing a bunch of stop-motio...,2
121,If you love dogs....,Wonderful movie obviously written and directed...,9
113,"Lame Review, Great movie",This was a really good movie. It wasn't your a...,9
163,no mutt,Movie night with Iris.Atypical Anderson's work...,7


In [6]:
df4.to_csv('isle_of_dogs.csv', index=False)

## 5. Fantastic Mr. Fox - $58,087,259

In [18]:
# Use the absolute path to chromedriver.exe
path = r'C:\Users\Drafter\Documents\Chantika\Tools\chromedriver-win64\chromedriver.exe'
service = Service(executable_path=path)

driver = webdriver.Chrome(service=service)

url5 = 'https://www.imdb.com/title/tt0432283/reviews?sort=totalVotes&dir=desc&ratingFilter=0'
driver.get(url5)

# Define the wait time
wait = WebDriverWait(driver, 10)

# Click the "Load More" button 20 times
for _ in range(20):
    try:
        # Wait until the button is clickable
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="load-more-trigger"]')))
        load_more_button.click()
        # Optionally, wait for some element that indicates new content has loaded
        wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "review-container")]')))
    except Exception as e:
        print(f"Exception occurred while loading more reviews: {e}")
        break

# Get the page source after all content is loaded
page_source = driver.page_source

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')

# Find all review containers
review_containers = soup.find_all('div', class_='lister-item-content')

review_data = {
    'title': [],
    'text': [],
    'rating': []
}

for container in review_containers:
    # Extract review title
    title_elem = container.find('a', class_='title')
    if title_elem:
        review_title = title_elem.get_text(strip=True)
    else:
        review_title = "No title available"
    # Debug
    print(review_title)

    # Extract review text
    text_elem = container.find('div', class_='text show-more__control')
    if text_elem:
        review_text = text_elem.get_text(strip=True)
    else:
        review_text = "No text available"
        
    # Extract review rating
    rating_elem = container.find('span', class_='rating-other-user-rating')
    if rating_elem:
        review_rating = rating_elem.find('span').get_text(strip=True)
    else:
        review_rating = "No rating available"
    
    # Append to dictionary
    review_data['title'].append(review_title)
    review_data['text'].append(review_text)
    review_data['rating'].append(review_rating)

# Close the driver
driver.quit()

Exception occurred while loading more reviews: Message: 

The Truly 'Fantastic' Fantastic Mr. Fox
A feast for the eyes and ears. No Roald Dahl in sight though
Sartre and Satire
Roald Dahl must be spinning in his grave over this travesty.
Thoroughly Enjoyable
A Wonderful Return to Classic Animation
a marvelous clubhouse of a movie where we're all invited to laugh and feel happy
To be avoided at ALL costs
Utterly insulting travesty
Holy Cuss, This Film Is Great
The Erractic Mr. Fox
Foxed
I tried to like it...
VERY conflicted feelings about this one...
HUGE disappointment
Poor Americanised rubbish
A Nutshell Review: Fantastic Mr. Fox
Top 10 Favorite Movies
Fantastic Mr. Fox -- Digging for Value
Pretentious, Dry, and Unbearable
I didn't laugh once
Fantastic Mr Fox?.......Not so fantastic
Where's Dahl?
a complete waste of time to watch especially if you are not American
Wes Anderson should make a box-set of his films and title it: "Movies to Slit Your Wrists To"
A real mystery of a film
Uno

In [19]:
df5 = pd.DataFrame(review_data)

df5.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 454 entries, 0 to 453
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   454 non-null    object
 1   text    454 non-null    object
 2   rating  454 non-null    object
dtypes: object(3)
memory usage: 10.8+ KB


In [20]:
df5.sample(5)

Unnamed: 0,title,text,rating
117,"Sophiscated, Funny, and Strangely Touching at ...",No text available,10
218,Good film seems more Wes Anderson and less Roa...,Wes Anderson takes a Rolad Dahl film and makes...,6
266,"An enjoyable adaptation with a good plot, quir...",No text available,7
288,"An animation style all it's own, and to back i...",No text available,8
327,Amazing movie! (contains some spoilers),The film Fantastic Mr. Fox was an amazing film...,9


In [21]:
df5.to_csv('mr_fox.csv', index=False)

## 6. Asteroid City - $53,855,901

In [23]:
# Use the absolute path to chromedriver.exe
path = r'C:\Users\Drafter\Documents\Chantika\Tools\chromedriver-win64\chromedriver.exe'
service = Service(executable_path=path)

driver = webdriver.Chrome(service=service)

url6 = 'https://www.imdb.com/title/tt14230388/reviews?sort=totalVotes&dir=desc&ratingFilter=0'
driver.get(url6)

# Define the wait time
wait = WebDriverWait(driver, 10)

# Click the "Load More" button 20 times
for _ in range(20):
    try:
        # Wait until the button is clickable
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="load-more-trigger"]')))
        load_more_button.click()
        # Optionally, wait for some element that indicates new content has loaded
        wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "review-container")]')))
    except Exception as e:
        print(f"Exception occurred while loading more reviews: {e}")
        break

# Get the page source after all content is loaded
page_source = driver.page_source

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')

# Find all review containers
review_containers = soup.find_all('div', class_='lister-item-content')

review_data = {
    'title': [],
    'text': [],
    'rating': []
}

for container in review_containers:
    # Extract review title
    title_elem = container.find('a', class_='title')
    if title_elem:
        review_title = title_elem.get_text(strip=True)
    else:
        review_title = "No title available"
    # Debug
    print(review_title)

    # Extract review text
    text_elem = container.find('div', class_='text show-more__control')
    if text_elem:
        review_text = text_elem.get_text(strip=True)
    else:
        review_text = "No text available"
        
    # Extract review rating
    rating_elem = container.find('span', class_='rating-other-user-rating')
    if rating_elem:
        review_rating = rating_elem.find('span').get_text(strip=True)
    else:
        review_rating = "No rating available"
    
    # Append to dictionary
    review_data['title'].append(review_title)
    review_data['text'].append(review_text)
    review_data['rating'].append(review_rating)

# Close the driver
driver.quit()

He has out-Wes-Anderson himself
What's the point of this movie?
Worst of his movies, such a waste for this amazing cast
Enjoyable enough for Wes Anderson fans; but lacking in substance or impact
This cast doesn't deserve this script
Unmitigated bore of a film
Unfunny and incomprehensible
It's only up from here
Packed to the Rafters with Nothing of Note
Style Over Substance
What do you see in the pastel Rorschach test?
There is a lot of quality here, but I am afraid 'ASTEROID CITY' might not be everyone's taste,
Late-period Wes is short on heart and soul
An 'homage to a genre'.. at its pretentious best
Be careful : No entertainment, it's movie school teaching
A Wes Anderson overdose in the desert: More of a stiff stage play than a "real" movie
100% Style over substance
Meep-Meep
Has everyone gone mad?
😞 What an Absolute Waste of Time 😖
Break down
Pretentious boring tripe
A non-movie that feels pointless
A feast for the eyes
"You can't wake up if you don't fall asleep."
Visually Beautifu

In [24]:
df6 = pd.DataFrame(review_data)

df6.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   500 non-null    object
 1   text    500 non-null    object
 2   rating  500 non-null    object
dtypes: object(3)
memory usage: 11.8+ KB


In [25]:
df6.sample(5)

Unnamed: 0,title,text,rating
475,I loved it,As a lover of the stage and the inner workings...,8
239,WORST FILM EVER,Asteroid City is in my top 3 worst films ever!...,1
372,Time is the one commodity you can't get back!,Time is the One Commodity You Cannot Get BackW...,1
324,"Metatextual, satisfying, aesthetically pleasin...",No text available,8
351,"I wanted to like this movie so much, but....","I was looking forward to quirky, offbeat, ecle...",4


In [26]:
df6.to_csv('asteroid_city.csv', index=False)

## 7. The French Dispatch - $46,333,545

In [27]:
# Use the absolute path to chromedriver.exe
path = r'C:\Users\Drafter\Documents\Chantika\Tools\chromedriver-win64\chromedriver.exe'
service = Service(executable_path=path)

driver = webdriver.Chrome(service=service)

url7 = 'https://www.imdb.com/title/tt8847712/reviews?sort=totalVotes&dir=desc&ratingFilter=0'
driver.get(url7)

# Define the wait time
wait = WebDriverWait(driver, 10)

# Click the "Load More" button 20 times
for _ in range(20):
    try:
        # Wait until the button is clickable
        load_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="load-more-trigger"]')))
        load_more_button.click()
        # Optionally, wait for some element that indicates new content has loaded
        wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "review-container")]')))
    except Exception as e:
        print(f"Exception occurred while loading more reviews: {e}")
        break

# Get the page source after all content is loaded
page_source = driver.page_source

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(page_source, 'html.parser')

# Find all review containers
review_containers = soup.find_all('div', class_='lister-item-content')

review_data = {
    'title': [],
    'text': [],
    'rating': []
}

for container in review_containers:
    # Extract review title
    title_elem = container.find('a', class_='title')
    if title_elem:
        review_title = title_elem.get_text(strip=True)
    else:
        review_title = "No title available"
    # Debug
    print(review_title)

    # Extract review text
    text_elem = container.find('div', class_='text show-more__control')
    if text_elem:
        review_text = text_elem.get_text(strip=True)
    else:
        review_text = "No text available"
        
    # Extract review rating
    rating_elem = container.find('span', class_='rating-other-user-rating')
    if rating_elem:
        review_rating = rating_elem.find('span').get_text(strip=True)
    else:
        review_rating = "No rating available"
    
    # Append to dictionary
    review_data['title'].append(review_title)
    review_data['text'].append(review_text)
    review_data['rating'].append(review_rating)

# Close the driver
driver.quit()

This Is It. Wes Anderson Films Are Not For Me.
Just Sit Back And Enjoy Yourself
Quirky, stylish...and empty
Probably stranger than most folks will want, but Anderson groupies will surely love it.
Wes Anderson at the height of his power
Just say 'no' to Wes
Boring and pretentious but great cast
Basically if you don't like Anderson's style you're really, truly going to hate this movie.
Still has its moments, but a massive drop in quality in the second half keeps this way below Anderson's finest career achievements
Strong in Aesthetics, Weak In Content
A very Andersony film that lacks a real heart or soul
Biggest wes anderson disappointment
A Wes Anderson movie of stories and characters.
love Moses and Simone
This movie is art!
Not for Me
Timothée Chalamet gets his annus mirabilis in Wes Anderson's ode to print journalism
Boring Bill Murray
the most Wes Anderson-y to date
A pretentious bore
Beautiful, but falls short
WHERE is the STORY? There are 3 of them, simply 1 story would have suffi

In [28]:
df7 = pd.DataFrame(review_data)

df7.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   500 non-null    object
 1   text    500 non-null    object
 2   rating  500 non-null    object
dtypes: object(3)
memory usage: 11.8+ KB


In [29]:
df7.sample(5)

Unnamed: 0,title,text,rating
9,"Strong in Aesthetics, Weak In Content",No text available,6
318,Quick witted like all Wes Anderson movies,A very studded stars film and usually I don't ...,8
225,Well done but an absolute snooze fest,"Very avante garde kind of film, if your not a ...",6
266,Best of the Year!,"Brilliant. Fun, funny, engaging. Put your phon...",10
372,Clever,A hodgepodge of 'see how clever I am' with sca...,6


In [30]:
df7.to_csv('french_dispatch.csv', index=False)