# Step 1 — Web Scrape to Get Reviews
We need to perform web scraping to get User Reviews on IMDb. Beautiful Soup and Selenium can be used for this task. In case you ask, below is a brief introduction about these two:

- **Beautiful Soup** is a Python library that is primarily used for parsing and extracting data from HTML and XML documents. It provides a simple and intuitive interface to navigate and search the parse tree created from the document.
- **Selenium** is a powerful tool for automating web browsers. It provides a browser automation framework that allows you to control web browsers programmatically. Selenium enables tasks such as simulating user interactions, filling out forms, clicking buttons, and navigating through web pages.

In [None]:
pip install textblob

In [None]:
pip install openai

In [6]:
pip install selenium==4.4.3

Collecting selenium==4.4.3
  Downloading selenium-4.4.3-py3-none-any.whl (985 kB)
     ------------------------------------- 986.0/986.0 kB 10.4 MB/s eta 0:00:00
Collecting urllib3[socks]~=1.26
  Downloading urllib3-1.26.16-py2.py3-none-any.whl (143 kB)
     ---------------------------------------- 143.1/143.1 kB ? eta 0:00:00
Installing collected packages: urllib3, selenium
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.0.4
    Uninstalling urllib3-2.0.4:
      Successfully uninstalled urllib3-2.0.4
  Attempting uninstall: selenium
    Found existing installation: selenium 3.141.0
    Uninstalling selenium-3.141.0:
      Successfully uninstalled selenium-3.141.0
Successfully installed selenium-4.4.3 urllib3-1.26.16
Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
anaconda-project 0.11.1 requires ruamel-yaml, which is not installed.
requests 2.23.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.16 which is incompatible.
msedge-selenium-tools 3.141.4 requires selenium==3.141, but you have selenium 4.4.3 which is incompatible.
conda-repo-cli 1.0.20 requires clyent==1.2.1, but you have clyent 1.2.2 which is incompatible.
conda-repo-cli 1.0.20 requires nbformat==5.4.0, but you have nbformat 5.5.0 which is incompatible.
conda-repo-cli 1.0.20 requires requests==2.28.1, but you have requests 2.23.0 which is incompatible.


In [5]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from textblob import TextBlob
import openai



In [None]:
from selenium import webdriver
from webdriver_manager.microsoft import EdgeChromiumDriverManager
from webdriver_manager.chrome import ChromeDriverManager

edge = webdriver.Edge(EdgeChromiumDriverManager().install())
chrome = webdriver.Chrome(ChromeDriverManager().install())

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

driver = webdriver.Edge()

driver.get('https://bing.com')

element = driver.find_element(By.ID, 'sb_form_q')
element.send_keys('WebDriver')
element.submit()

time.sleep(5)
driver.quit()

In [None]:
from msedge.selenium_tools import Edge, EdgeOptions
options = EdgeOptions()
options.use_chromium = True
driver = Edge(executable_path = r"C:\Users\SumeetAbhu\msedgedriver.exe", options = options)
driver.get("https://google.com")
print(driver.title)

In [None]:
from msedge.selenium_tools import Edge
from selenium.webdriver.edge.service import Service
s = Service(r"C:\Users\SumeetAbhu\msedgedriver.exe")
driver = Edge(service=s)

In [None]:
from msedge.selenium_tools import EdgeOptions
from msedge.selenium_tools import Edge

# make Edge headless
edge_options = EdgeOptions()
edge_options.use_chromium = True  # required to make Edge headless
s = Service(r"C:\Users\SumeetAbhu\msedgedriver.exe")
driver = Edge(service=s, options=edge_options)

In [29]:
PATH= r"C:\Users\SumeetAbhu\msedgedriver.exe"

def scrape_imdb_reviews(url):
    # Set up the Selenium webdriver
    driver = webdriver.Edge(PATH)  # Change this line if you are using a different browser

    # Load the initial URL
    driver.get(url)

    # Wait for the "Load more" button to appear and click it until all reviews are loaded
    while True:
        try:
            load_more_button = WebDriverWait(driver, 10).until(
                EC.element_to_be_clickable((By.CLASS_NAME, 'ipl-load-more__button'))
            )
            driver.execute_script("arguments[0].click();", load_more_button)
            time.sleep(2)  # Wait for the reviews to load
        except Exception as e:
            print("No more reviews to load.")
            break

    # Get the page source and close the webdriver
    page_source = driver.page_source
    driver.quit()

    # Parse the page source with BeautifulSoup
    soup = BeautifulSoup(page_source, 'html.parser')

    # Find all review containers
    review_containers = soup.find_all('div', class_='review-container')

    # Initialize a list to store the review data
    reviews = []

    # Extract the relevant information from each review container
    for container in review_containers:
        rating = container.find('span', class_='rating-other-user-rating')
        review_title = container.find('a', class_='title').text.strip()
        review_text = container.find('div', class_='text').text.strip()

        # Add the review data to the list
        reviews.append({
            'Rating': rating.text.strip() if rating else None,
            'Title': review_title,
            'Review': review_text
        })

    return reviews

In [30]:
# URL of the website with Barbie movie reviews
imdb_url = "https://www.imdb.com/title/tt1517268/reviews"

# Scrape the reviews
reviews = scrape_imdb_reviews(imdb_url)

# Save the reviews to a CSV file
df = pd.DataFrame(reviews)
df.to_csv('imdb_reviews.csv', index=False)

  driver = webdriver.Edge(PATH)  # Change this line if you are using a different browser


No more reviews to load.


# Step 2 — Sentiment Analysis with TextBlob
- Skimming the reviews, I see that most are negative and with a rating of 1/10. As our goal is to find positive aspects of this series, we need to filter only the positive reviews. For this purpose, we can rely on user ratings or perform sentiment analysis. 
- The second option is used in this post. Why? Sentiment analysis algorithms analyze the text to assess whether it conveys a positive, negative, or neutral sentiment. This approach allows for a more nuanced understanding of the sentiment expressed in a review. Sentiment analysis can capture subtle aspects of the text and provide insights beyond just an overall rating. It can help identify positive reviews that might have lower ratings due to specific reasons or negative reviews that mention positive aspects.
- TextBlob is a great Python library that we can use for this sentiment analysis. TextBlob uses a machine learning algorithm to classify text into positive and negative sentiments.

In [32]:
import pandas as pd
from textblob import TextBlob


data = pd.read_csv("imdb_reviews.csv")
data.head()

Unnamed: 0,Rating,Title,Review
0,6/10,"Beautiful film, but so preachy","Margot does the best with what she's given, bu..."
1,6/10,"High Highs, Low Lows.",The first thing you need to know about Barbie ...
2,7/10,3 reasons FOR seeing it and 1 reason AGAINST.,The first reason to go see it:It's good fun. I...
3,10/10,"As a guy I felt some discomfort, and that's ok.",As much as it pains me to give a movie called ...
4,9/10,A Technicolor Dream,"Wow, this movie was a love letter to cinema. F..."


In [34]:
df['Rating'] = df['Rating'].apply(lambda x: x.split('/')[0])
data.head()

Unnamed: 0,Rating,Title,Review
0,6/10,"Beautiful film, but so preachy","Margot does the best with what she's given, bu..."
1,6/10,"High Highs, Low Lows.",The first thing you need to know about Barbie ...
2,7/10,3 reasons FOR seeing it and 1 reason AGAINST.,The first reason to go see it:It's good fun. I...
3,10/10,"As a guy I felt some discomfort, and that's ok.",As much as it pains me to give a movie called ...
4,9/10,A Technicolor Dream,"Wow, this movie was a love letter to cinema. F..."


In [36]:
def get_sentiment(review):
    blob = TextBlob(review)
    return blob.sentiment.polarity

data["Sentiment"] = data["Review"].apply(get_sentiment)
data.sort_values(by="Sentiment", ascending=False, inplace=True)
num_rows = int(len(data) * 0.1)
top_positive_reviews = data.head(num_rows)
top_positive_reviews

Unnamed: 0,Rating,Title,Review,Sentiment
13,8/10,Fun and surprisingly touching,"I was honestly doubting this movie at first, b...",0.469345
10,8/10,Barbie Is A Weirdly Fun Movie!,"8.5/10\nWhile i'm not so sure at first, the mo...",0.323642


In [37]:
# Step 2 - Sentiment Analysis with TextBlob
def get_sentiment(review):
    blob = TextBlob(review)
    return blob.sentiment.polarity


data["Sentiment"] = data["Review"].apply(get_sentiment)
data.sort_values(by="Sentiment", ascending=False, inplace=True)
num_rows = int(len(data) * 0.1)
top_positive_reviews = data.head(num_rows)
top_positive_review_content = top_positive_reviews["Review"].tolist()

# Step 3 — Summarize Positive Reviews with OpenAI
In this final step, we need to capture key information and important details in these reviews. OpenAI is capable of doing this task.

-  To keep it easy to follow, the code of Step 2 is included as well.
- Note: increase the max_token if you want to have a longer summary.

In [38]:
# Step 3 - Summarize Positive Reviews with OpenAI
openai.api_key = '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' # Replace with your OpenAI API key

summary_prompts = [f'- {review}' for review in top_positive_review_content]

prompt = '\n'.join(summary_prompts)
summaries = openai.Completion.create(
    engine="text-davinci-003",
    prompt=f'Summarize the following movie reviews: \n{prompt}',
    max_tokens=350,
    temperature=0.3,
    n=1,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0,
)

generated_summary = summaries.choices[0].text.strip()
print("Generated Summary:", generated_summary)

Generated Summary: This movie is a surprisingly enjoyable and heartwarming experience with eye-catching set designs and great performances from Margot Robbie and Ryan Gosling. Billie Eilish's song "What Was I Made For?" is especially beautiful and memorable. The movie is filled with creative and funny moments, with a great cast and soundtrack. It is likely to be nominated for Oscars, and is highly recommended.
