## Notebook 3) Sentiment Analysis

#### Section 3.1: Introduction: 

Initially I wanted to scrap comments from an Interesting LinkedIn post I read about an updated rail transport system on the island of Ireland. It generated a bit of traction on LinkedIn and had a very healthy amount of comments. Unfortunately as I looked further into the topic, scraping these comments violates LinkedIn's user terms and agreements. 

So I then began to research social media sites where I could scrap comments without violating any terms and agreements, and I came to the conclusion that Reddit was my most appropriate choice. 

In [78]:
# Install if neccessary
#pip install praw
#pip install nltk

In [80]:
import praw
import json
import csv
from nltk.sentiment import SentimentIntensityAnalyzer

#### Section 3.2) Reddit API

In [82]:
def create_reddit_object(json_file="reddit_config.json", json_key="reddit"):
    with open(json_file) as f:
        data = json.load(f)

    user_values = data[json_key]

    reddit = praw.Reddit(
        client_id=user_values["client_id"],
        client_secret=user_values["client_secret"],
        user_agent=user_values["user_agent"],
        username=user_values["username"],
        password=user_values["password"],
    )

    return reddit

In [83]:
def scrape_comments(post_url, json_file="reddit_config.json", json_key="reddit"):
    reddit = create_reddit_object(json_file, json_key)

    submission = reddit.submission(url=post_url)

    # Print post title
    print("Post Title:", submission.title)

    # Print comments
    for comment in submission.comments.list():
        print(comment.body)

In [84]:
import csv

def scrape_comments_to_csv(post_url, output_csv="Comments.csv", json_file="reddit_config.json", json_key="reddit"):
    reddit = create_reddit_object(json_file, json_key)

    submission = reddit.submission(url=post_url)

    # Create a CSV file and write header
    with open(output_csv, 'w', newline='', encoding='utf-8') as csv_file:
        csv_writer = csv.writer(csv_file)
        csv_writer.writerow(['Post Title', 'Comment'])

        # Write post title
        csv_writer.writerow([submission.title, ''])

        # Write comments
        for comment in submission.comments.list():
            csv_writer.writerow(['', comment.body])
            

In [85]:
post_url = "https://www.reddit.com/r/ireland/comments/17zq2dl/railways_in_ireland_map_showing_the_proposals_of/"

In [86]:
scrape_comments_to_csv(post_url)

#### Section 3.2) Analysis on Raw Data

In [87]:
# Fetch the comments from the Reddit post
submission = reddit.submission(url=post_url)
comments = submission.comments.list()

# Initialize the SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()

# Analyze sentiment for each comment
for comment in comments:
    text = comment.body
    sentiment_score = sia.polarity_scores(text)
    
    print(f"Comment: {text}")
    print(f"Sentiment Score: {sentiment_score}")
    print("\n---\n")

Comment: Some really weird gaps in the closed lines that don't make sense to me IMO. Like the small gap preventing Waterford to Cork
Sentiment Score: {'neg': 0.121, 'neu': 0.781, 'pos': 0.098, 'compound': 0.1045}

---

Comment: Look at that tiny spec of green. Pathetic.
Sentiment Score: {'neg': 0.346, 'neu': 0.654, 'pos': 0.0, 'compound': -0.5719}

---

Comment: The madness of needing to go through Dublin to get from Galway to Sligo, and from Wexford to Waterford
Sentiment Score: {'neg': 0.139, 'neu': 0.861, 'pos': 0.0, 'compound': -0.4404}

---

Comment: They really are determined to make sure Donegal has no train.
Sentiment Score: {'neg': 0.145, 'neu': 0.527, 'pos': 0.328, 'compound': 0.4173}

---

Comment: Mullingar to Athlone line, my heart yearns for you every time I'm in Mullingar station and look through the spooky ghost platform gate 😔

Seriously though, that one line would link up such a lovely chunk of the network.
Sentiment Score: {'neg': 0.096, 'neu': 0.813, 'pos': 0.091, '

In [90]:
# Fetch the comments from the Reddit post
submission = reddit.submission(url=post_url)
comments = submission.comments.list()

# Initialize the SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()

# Accumulate sentiment scores
total_sentiment_score = 0

# Analyze sentiment for each comment
for comment in comments:
    text = comment.body
    sentiment_score = sia.polarity_scores(text)
    
    # Accumulate the compound sentiment score
    total_sentiment_score += sentiment_score['compound']

# Calculate the average sentiment score
average_sentiment_score = total_sentiment_score / len(comments)

# Print the overall sentiment score
print(f"Overall Sentiment Score: {average_sentiment_score}")

Overall Sentiment Score: 0.06925873605947953
