## Sammi Beard | DSC 680 Project 3 Code

## Setup
**Import Libraries**

In [1]:
import requests
import praw
from textblob import TextBlob
import nltk
import statistics
import pandas as pd
import streamlit as st
import pandas as pd
import plotly.express as px

**Download libraries for tokenizing texts**

In [2]:
# Download necessary NLTK data
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\saman\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\saman\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

**API setup**

In [20]:
#TMDB API Key
API_KEY = "65a5eebe15b2744a8a75d3d0fec85eff"

In [4]:
# Set up Reddit API with PRAW
def setup_reddit_api():
    reddit = praw.Reddit(
        client_id="cbU0EHsVEgH43n0dJ7L43Q",
        client_secret="pRbi90BO7PWtwjDeIqbvwuFk4FwE_A",
        user_agent="showSentimentAnalyzer"
    )
    return reddit

### TMDB
The goal is to focus on original shows from different streaming services to see who creates the best shows.  Each service has several different codes used, and we want to look at original content, not just content that is on the platform. Therefore, I am using one show from each service to find the unique network id for their original content and then searching for the top 10 shows from each network.

In [5]:
# List of TV shows to search for
tv_show_names = [
    "Stranger Things",
    "Shrinking",
    "Game of Thrones",
    "The Handmaid's Tale",
    "The Boys",
    "Dexter",
    "Outlander",
    "Godfather of Harlem"
]

In [6]:
def get_tv_show_networks(tv_show_names):
    tv_show_dict = {}  # Initialize an empty dictionary to store results as key-value pairs
    
    for TV_SHOW_NAME in tv_show_names:
        # Step 1: Search for the TV show
        SEARCH_TV_URL = f"https://api.themoviedb.org/3/search/tv?api_key={API_KEY}&query={TV_SHOW_NAME}"
        tv_response = requests.get(SEARCH_TV_URL)

        if tv_response.status_code == 200:
            tv_data = tv_response.json()
            if tv_data["results"]:
                first_result = tv_data["results"][0]
                show_id = first_result["id"]
                show_title = first_result["name"]

                # Step 2: Get detailed show info (including network ID)
                SHOW_DETAILS_URL = f"https://api.themoviedb.org/3/tv/{show_id}?api_key={API_KEY}"
                show_details_response = requests.get(SHOW_DETAILS_URL)

                if show_details_response.status_code == 200:
                    show_details = show_details_response.json()
                    networks = show_details.get("networks", [])
                    
                    if networks:
                        network_id = networks[0]["id"]  # Get first network's ID
                        network_name = networks[0]["name"]
                        print(f"\nShow: {show_title}, Network: {network_name}, ID: {network_id}")
                        
                        # Step 3: Discover other shows from this network
                        DISCOVER_TV_URL = f"https://api.themoviedb.org/3/discover/tv?api_key={API_KEY}&with_networks={network_id}"
                        discover_response = requests.get(DISCOVER_TV_URL)
                        discover_data = discover_response.json()

                        if "results" in discover_data and discover_data["results"]:
                            print(f"\nTop TV shows from {network_name}:")
                            for show in discover_data["results"][:10]:  # Show top 10 results
                                tv_show_dict[show['name']] = network_name  # Store in dictionary
                                print(f"Title: {show['name']}, Network: {network_name}")
                        else:
                            print("No results found.")
                    else:
                        print("No network information found for this show.")
                else:
                    print("Error fetching show details:", show_details_response.status_code, show_details_response.text)
            else:
                print(f"No TV shows found for {TV_SHOW_NAME}")
        else:
            print("Error:", tv_response.status_code, tv_response.text)

    return tv_show_dict  # Return the dictionary instead of the list

In [11]:
# Call the function with the list of TV shows
tv_show_dict = get_tv_show_networks(tv_show_names)


Show: Stranger Things, Network: Netflix, ID: 213

Top TV shows from Netflix:
Title: Squid Game, Network: Netflix
Title: Cobra Kai, Network: Netflix
Title: Lucifer, Network: Netflix
Title: Zero Day, Network: Netflix
Title: Raw, Network: Netflix
Title: Sex Education, Network: Netflix
Title: Marvel's Daredevil, Network: Netflix
Title: Chelsea, Network: Netflix
Title: Stranger Things, Network: Netflix
Title: Pen Tor, Network: Netflix

Show: Shrinking, Network: Apple TV+, ID: 2552

Top TV shows from Apple TV+:
Title: Severance, Network: Apple TV+
Title: Silo, Network: Apple TV+
Title: Prime Target, Network: Apple TV+
Title: Slow Horses, Network: Apple TV+
Title: See, Network: Apple TV+
Title: Ted Lasso, Network: Apple TV+
Title: Mythic Quest, Network: Apple TV+
Title: For All Mankind, Network: Apple TV+
Title: Disclaimer, Network: Apple TV+
Title: Foundation, Network: Apple TV+

Show: Game of Thrones, Network: HBO, ID: 49

Top TV shows from HBO:
Title: Last Week Tonight with John Oliver, N

## Reddit
Now that we have a list of the top 10 shows from each network, we are going to get social sentiments about each of them by searching them on Reddit and completing a sentiment analysis of each post. Then capture the average and median sentiments for each show.

In [7]:
# Function to search Reddit for mentions of a show
def get_reddit_posts_for_show(show_name, reddit_api):
    query = f"{show_name} AND (tv OR series)"
    posts = reddit_api.subreddit("all").search(query, sort="new", limit=100)
    post_data = [{"title": post.title, "url": post.url, "author": post.author.name if post.author else "Unknown", "text": post.selftext} for post in posts]
    return post_data

In [8]:
# Function to analyze sentiment using TextBlob
def analyze_sentiment(text):
    # Analyze the sentiment of the text
    blob = TextBlob(text)
    sentiment = blob.sentiment.polarity  # Returns a value between -1 (negative) and 1 (positive)
    return sentiment

In [9]:
# Function to process posts and perform sentiment analysis
def analyze_reddit_sentiment(show_name, reddit_api):
    posts = get_reddit_posts_for_show(show_name, reddit_api)
    
    sentiment_results = []
    sentiment_values = []  # List to store all sentiment values for median calculation
    total_sentiment = 0
    total_posts = len(posts)
    
    if total_posts == 0:
        return [], 0.0, 0.0  # Return empty data with neutral sentiment if no posts found

    for post in posts:
        # Analyze the sentiment of the post title and content
        title_sentiment = analyze_sentiment(post["title"])
        text_sentiment = analyze_sentiment(post["text"])
        
        # Store individual sentiment values
        sentiment_values.extend([title_sentiment, text_sentiment])

        # Store the results
        sentiment_results.append({
            "title": post["title"],
            "author": post.get("author", "Unknown"),  # Ensure author key exists
            "url": post.get("url", ""),  # Ensure URL key exists
            "title_sentiment": title_sentiment,
            "text_sentiment": text_sentiment
        })
        
        # Add the sentiments to the total sentiment
        total_sentiment += title_sentiment + text_sentiment
    
    # Calculate the average sentiment
    average_sentiment = total_sentiment / len(sentiment_values)

    # Calculate the median sentiment
    median_sentiment = statistics.median(sentiment_values)

    return sentiment_results, average_sentiment, median_sentiment

In [10]:
def analyze_multiple_shows(tv_show_dict, reddit_api):
    all_shows_sentiment = {}

    for show_name, network_name in tv_show_dict.items():
        print(f"\nAnalyzing sentiment for '{show_name}' on the network: {network_name}")
        sentiment_data, avg_sentiment, median_sentiment = analyze_reddit_sentiment(show_name, reddit_api)

        # Store the results in the dictionary
        all_shows_sentiment[show_name] = {
            "network": network_name,
            "average_sentiment": avg_sentiment,
            "sentiment_data": sentiment_data,
            "median_sentiment": median_sentiment
        }

        # Print only the average sentiment for each show
        print(f"Average Sentiment: {avg_sentiment:.3f}, Median Sentiment: {median_sentiment:.3f}\n")
    
    return all_shows_sentiment

In [12]:
# Set up Reddit API
reddit_api = setup_reddit_api()

# Analyze sentiment for all shows in the dictionary
all_sentiment_results = analyze_multiple_shows(tv_show_dict, reddit_api)

# Optionally, print out the average sentiment for each show at the end
for show_name, data in all_sentiment_results.items():
    print(f"Show: {show_name} | Network: {data['network']}")
    print(f"Average Sentiment: {data['average_sentiment']:.3f} | Median Sentiment: {data['median_sentiment']:.3f}")
    # print("-" * 80)


Analyzing sentiment for 'Squid Game' on the network: Netflix
Average Sentiment: 0.002, Median Sentiment: 0.000


Analyzing sentiment for 'Cobra Kai' on the network: Netflix
Average Sentiment: 0.109, Median Sentiment: 0.000


Analyzing sentiment for 'Lucifer' on the network: Netflix
Average Sentiment: 0.098, Median Sentiment: 0.045


Analyzing sentiment for 'Zero Day' on the network: Netflix
Average Sentiment: 0.065, Median Sentiment: 0.035


Analyzing sentiment for 'Raw' on the network: Netflix
Average Sentiment: 0.069, Median Sentiment: 0.009


Analyzing sentiment for 'Sex Education' on the network: Netflix
Average Sentiment: 0.080, Median Sentiment: 0.067


Analyzing sentiment for 'Marvel's Daredevil' on the network: Netflix
Average Sentiment: 0.084, Median Sentiment: 0.000


Analyzing sentiment for 'Chelsea' on the network: Netflix
Average Sentiment: 0.104, Median Sentiment: 0.077


Analyzing sentiment for 'Stranger Things' on the network: Netflix
Average Sentiment: 0.071, Median S

## Dataframe
Now convert the data returned into a dataframe

In [13]:
# Convert sentiment results dictionary to a DataFrame
top_shows_df = pd.DataFrame.from_dict(all_sentiment_results, orient='index')

In [14]:
top_shows_df.drop(columns=['sentiment_data'], inplace=True)

In [15]:
# Reset index to make the show names a column
top_shows_df.reset_index(inplace=True)

In [16]:
# Rename columns
top_shows_df.rename(columns={'index': 'Show', 'network': 'Network', 'average_sentiment': 'Average Sentiment', 'median_sentiment': 'Median Sentiment'}, inplace=True)

In [17]:
top_shows_df

Unnamed: 0,Show,Network,Average Sentiment,Median Sentiment
0,Squid Game,Netflix,0.002411,0.000000
1,Cobra Kai,Netflix,0.109062,0.000000
2,Lucifer,Netflix,0.098455,0.045487
3,Zero Day,Netflix,0.065454,0.034997
4,Raw,Netflix,0.068898,0.009206
...,...,...,...,...
75,Billy the Kid,Epix,0.064670,0.047198
76,Condor,Epix,0.065846,0.000000
77,Berlin Station,Epix,0.068991,0.000000
78,Chapelwaite,Epix,0.085694,0.000000


In [22]:
# Export to pickle to load to app.py file
top_shows_df.to_excel("tv_show_sentiment.xlsx", index=False)