# Sentiment Analysis of Movie Reviews via TMDB API
**Author:** Logan Ash  
**Date:** 2025‑05‑07  

## Introduction
Using the free TMDB (The Movie Database) API, we collect at least 60 user reviews for *The Shawshank Redemption* (TMDB movie ID 278). We clean the text, run sentiment analysis with TextBlob’s default analyzer **and** the NaiveBayesAnalyzer, visualise the sentiment distribution with donut charts, remove stop‑words, create a WordCloud of the 20 most frequent words, and finish with insights.


## 1  Install & prepare dependencies

In [None]:
%pip install tmdbv3api textblob wordcloud nltk matplotlib --quiet

import nltk
nltk.download('stopwords')
nltk.download('movie_reviews')
nltk.download('punkt')

## 2  Imports & TMDB API setup

In [None]:
from tmdbv3api import TMDb, Movie
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from nltk.corpus import stopwords
import pandas as pd

# --- TMDB credentials (provided by user) ---
tmdb = TMDb()
tmdb.api_key = '4253c5a7dc926b08aa0d781c0136be78'
tmdb.access_token = 'eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiI0MjUzYzVhN2RjOTI2YjA4YWEwZDc4MWMwMTM2YmU3OCIsIm5iZiI6MTc0NjU5NDIxNS43MDUsInN1YiI6IjY4MWFlOWE3ZTlhYTk5ZmM5OTgwOTMxNyIsInNjb3BlcyI6WyJhcGlfcmVhZCJdLCJ2ZXJzaW9uIjoxfQ.X6--msBkZCKseRid1_YcKNgIQWGGHoNcexgsqlk0btk'

movie = Movie()

## 3  Fetch at least 60 reviews

In [None]:
from tmdbv3api.tmdb import TMDbException

movie_id = 278  # The Shawshank Redemption
reviews = []
page = 1

while len(reviews) < 60 and page <= 500:
    try:
        resp = movie.reviews(movie_id, page=page) if page > 1 else movie.reviews(movie_id)
    except TMDbException as e:
        print(f'Stopping fetch due to TMDbException: {e}')
        break

    if not resp:
        break

    for r in resp:
        if hasattr(r, 'content') and r.content:
            reviews.append(r.content)
            if len(reviews) >= 60:
                break

    page += 1

print(f'Collected {len(reviews)} reviews')


## 4  Data cleaning

In [None]:
# Build DataFrame, remove NaNs/dupes, basic text sanitation
df = pd.DataFrame({'review': reviews})
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)

# Strip non‑word characters and lowercase; ensure string dtype
df['cleaned'] = (
    df['review'].astype(str)
    .str.replace(r'[^\w\s]', '', regex=True)
    .str.lower()
)
df_cleaned = df['cleaned'].tolist()

print(f'Cleaned list length: {len(df_cleaned)}')

## 5  Sentiment analysis – TextBlob *default* analyzer

In [None]:
counts_default = {'positive': 0, 'negative': 0, 'neutral': 0}
for txt in df_cleaned:
    pol = TextBlob(txt).sentiment.polarity
    if pol > 0:
        counts_default['positive'] += 1
    elif pol < 0:
        counts_default['negative'] += 1
    else:
        counts_default['neutral'] += 1

sizes = [counts_default[k] for k in ['positive', 'negative', 'neutral']]
labels = ['Positive', 'Negative', 'Neutral']

total = sum(sizes)
if total == 0:
    print('No sentiment data to plot (TextBlob default).')
else:
    filtered = [(s, l) for s, l in zip(sizes, labels) if s > 0]
    sizes, labels = zip(*filtered)
    fig, ax = plt.subplots()
    ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, wedgeprops={'width':0.3})
    ax.set_title('Sentiment Distribution (TextBlob Default)')
    plt.show()

## 6  Sentiment analysis – TextBlob *NaiveBayesAnalyzer*

In [None]:
counts_nb = {'pos': 0, 'neg': 0}
for txt in df_cleaned:
    try:
        analysis = TextBlob(txt, analyzer=NaiveBayesAnalyzer()).sentiment
        if analysis.classification == 'pos':
            counts_nb['pos'] += 1
        else:
            counts_nb['neg'] += 1
    except Exception as e:
        print(f'Skipped a review due to: {e}')

sizes = [counts_nb['pos'], counts_nb['neg']]
labels = ['Positive', 'Negative']

total = sum(sizes)
if total == 0:
    print('No sentiment data to plot (NaiveBayesAnalyzer).')
else:
    filtered = [(s, l) for s, l in zip(sizes, labels) if s > 0]
    sizes, labels = zip(*filtered)
    fig, ax = plt.subplots()
    ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, wedgeprops={'width':0.3})
    ax.set_title('Sentiment Distribution (NaiveBayesAnalyzer)')
    plt.show()

## 7  WordCloud of top 20 words (stop‑words removed)

In [None]:
all_text = ' '.join(df_cleaned)
tokens = [w for w in all_text.split() if w.isalpha()]
stop_words = set(stopwords.words('english'))
filtered_tokens = [w for w in tokens if w not in stop_words]

if not filtered_tokens:
    print('No words remaining after stop‑word removal.')
else:
    filtered_text = ' '.join(filtered_tokens)
    wc = WordCloud(width=800, height=400, max_words=20).generate(filtered_text)
    plt.figure(figsize=(10,5))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis('off')
    plt.show()

## 8  Conclusion
- We successfully gathered and cleaned **{len}** reviews for *The Shawshank Redemption* via the free TMDB API.
- TextBlob’s default analyzer showed the share of positive, negative, and neutral opinions, while NaiveBayesAnalyzer gave a simple positive/negative split.
- The WordCloud highlighted the most frequent, meaningful words once stop‑words were removed.

These steps demonstrate an end‑to‑end, cost‑free pipeline for sentiment analysis and basic NLP visualisation using openly available APIs and Python libraries.