# Homework 1
By Blake Zurman
### Public Sentiment Towards AI
Assume that you are a consultant at a public relations firm, and a client of your firm would like you to evaluate the current public sentiment toward AI in social media like Facebook and Twitter.

In [60]:
pip install nltk vaderSentiment textblob

Note: you may need to restart the kernel to use updated packages.


In [6]:
import nltk
from nltk.tokenize import sent_tokenize
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textblob import TextBlob

nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/blakezurman/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

### Define **Tokenizing** Function

In [24]:
def vader_sentiment(text):
    analyzer = SentimentIntensityAnalyzer()
    return analyzer.polarity_scores(text)

### Define **Sentiment** Function

In [27]:
def textblob_sentiment(text):
    blob = TextBlob(text)
    return blob.sentiment.polarity  # Returns a score between -1 (negative) and 1 (positive)

### Process and Compare Senitment

In [30]:
def analyze_sentiments(texts):
    results = []
    
    for text in texts:
        vader_scores = vader_sentiment(text)
        textblob_score = textblob_sentiment(text)
        
        results.append({
            "Text": text,
            "VADER_Compound": vader_scores['compound'],
            "VADER_Positive": vader_scores['pos'],
            "VADER_Neutral": vader_scores['neu'],
            "VADER_Negative": vader_scores['neg'],
            "TextBlob_Score": textblob_score
        })
    
    return results

### Test Text

In [34]:
sample_texts = [
    "AI is the future of technology! It will change everything.",
    "I am scared of AI taking over jobs. It's a huge risk.",
    "AI is just a buzzword. Nothing innovative here.",
    "I love how AI helps in medicine and healthcare.",
    "AI is biased and can be dangerous if not regulated properly.",
]

results = analyze_sentiments(sample_texts)

for res in results:
    print(res)

{'Text': 'AI is the future of technology! It will change everything.', 'VADER_Compound': 0.0, 'VADER_Positive': 0.0, 'VADER_Neutral': 1.0, 'VADER_Negative': 0.0, 'TextBlob_Score': 0.0}
{'Text': "I am scared of AI taking over jobs. It's a huge risk.", 'VADER_Compound': -0.4019, 'VADER_Positive': 0.141, 'VADER_Neutral': 0.552, 'VADER_Negative': 0.307, 'TextBlob_Score': 0.39999999999999997}
{'Text': 'AI is just a buzzword. Nothing innovative here.', 'VADER_Compound': -0.3412, 'VADER_Positive': 0.0, 'VADER_Neutral': 0.744, 'VADER_Negative': 0.256, 'TextBlob_Score': 0.5}
{'Text': 'I love how AI helps in medicine and healthcare.', 'VADER_Compound': 0.7783, 'VADER_Positive': 0.493, 'VADER_Neutral': 0.507, 'VADER_Negative': 0.0, 'TextBlob_Score': 0.5}
{'Text': 'AI is biased and can be dangerous if not regulated properly.', 'VADER_Compound': -0.6369, 'VADER_Positive': 0.0, 'VADER_Neutral': 0.634, 'VADER_Negative': 0.366, 'TextBlob_Score': -0.3}


### Kaggle Data Set
Chat GPT Daily Tweets NLP esrabicakci

In [40]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("esrabicakci/chat-gpt-daily-tweets-nlp-esrabicakci")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/esrabicakci/chat-gpt-daily-tweets-nlp-esrabicakci?dataset_version_number=1...


100%|██████████████████████████████████████| 5.04M/5.04M [00:00<00:00, 16.5MB/s]

Extracting files...
Path to dataset files: /Users/blakezurman/.cache/kagglehub/datasets/esrabicakci/chat-gpt-daily-tweets-nlp-esrabicakci/versions/1





In [42]:
import os

print(os.listdir(path))  # List files in the dataset directory


['ch.png', 'chatgpt_daily_tweets.csv']


### Loading & Cleaning

In [55]:
import pandas as pd
import os

file_path = os.path.join(path, "chatgpt_daily_tweets.csv")
df = pd.read_csv(file_path)

In [46]:
print(df.info())  # Check column names, data types, and missing values
print(df.columns)  # List all column names

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22002 entries, 0 to 22001
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   tweet_id              22002 non-null  object 
 1   tweet_created         22002 non-null  object 
 2   tweet_extracted       22002 non-null  object 
 3   text                  22002 non-null  object 
 4   lang                  22002 non-null  object 
 5   user_id               22002 non-null  object 
 6   user_name             22000 non-null  object 
 7   user_username         22002 non-null  object 
 8   user_location         13048 non-null  object 
 9   user_description      18526 non-null  object 
 10  user_created          21998 non-null  object 
 11  user_followers_count  21996 non-null  float64
 12  user_following_count  21996 non-null  float64
 13  user_tweet_count      21996 non-null  float64
 14  user_verified         21996 non-null  object 
 15  source             

In [48]:
# Select only relevant columns
df_clean = df[['text']]

# Drop rows with missing tweet text
df_clean = df_clean.dropna(subset=['text'])

# Reset index after dropping rows
df_clean.reset_index(drop=True, inplace=True)

# Preview the cleaned data
print(df_clean.head())

                                                text
0  RT @jexep: เทคนิคฝึกภาษากับ ChatGPT ที่ผมลอง (...
1  ChatGPTをもっと活かせるChrome拡張機能4選 https://t.co/hfacF...
2  RT @DarrellLerner: ChatGPT Plugins are the fas...
3  Get an intelligent chatbot for your website in...
4  🔥Hey Guys, #ZenithSwap has launched at just $ ...


In [50]:
# Apply the function to the dataset's 'text' column
sample_texts = df_clean['text'].tolist()[:10]  # Get a sample of 10 tweets for testing
results = analyze_sentiments(sample_texts)

# Print the results
for res in results:
    print(res)

{'Text': 'RT @jexep: เทคนิคฝึกภาษากับ ChatGPT ที่ผมลอง (ผมลองฝึก อังกฤษ - ญี่ปุ่น, อังกฤษ - เยอรมัน) ใช้วิธีเดียวกัน ได้ผลเป็นที่น่าพอใจครับ เหลือแค่…', 'VADER_Compound': 0.0, 'VADER_Positive': 0.0, 'VADER_Neutral': 1.0, 'VADER_Negative': 0.0, 'TextBlob_Score': 0.0}
{'Text': 'ChatGPTをもっと活かせるChrome拡張機能4選 https://t.co/hfacFe570t', 'VADER_Compound': 0.0, 'VADER_Positive': 0.0, 'VADER_Neutral': 1.0, 'VADER_Negative': 0.0, 'TextBlob_Score': 0.0}
{'Text': 'RT @DarrellLerner: ChatGPT Plugins are the fastest way to get rich in 2023. \n\nI’ve created a step-by-step guide showing you how to earn $10…', 'VADER_Compound': 0.6808, 'VADER_Positive': 0.203, 'VADER_Neutral': 0.797, 'VADER_Negative': 0.0, 'TextBlob_Score': 0.375}
{'Text': "Get an intelligent chatbot for your website in minutes with Chatbase AI. Train ChatGPT on your data and let it answer any question your users have. Simply upload a document or link and add the chat widget - it's that easy!\nMake Money using AI: https://t.co/yLHEqn4w9

### Report: Sentiment Analysis of Tweets Using VADER and TextBlob

---

#### **Introduction**

In this report, I analyze a set of tweets related to ChatGPT, using VADER and TextBlob. These tools assess the sentiment of text data by evaluating the overall emotional tone (positive, negative, neutral).

#### **Data**

The dataset comprises tweets extracted from various social media accounts. The data includes the tweet text and associated metadata such as retweets, likes, and reply counts. For this analysis, I only used tweet text, which was preprocessed to remove missing values and any extra info.

- **Number of tweets:** 10 sample tweets
- **Key Column:** `Text` (Tweet text)
  
#### **Methods / Models**

Sentiment analysis was performed using two established models:
1. **VADER:** A lexicon-based method designed for social media text, which provides a compound score (ranging from -1 to 1) as well as positive, neutral, and negative percentages.
2. **TextBlob:** A library that computes a sentiment polarity score (ranging from -1 to 1) and subjectivity score. A polarity score closer to 1 indicates positive sentiment, while closer to -1 indicates negative sentiment.

#### **Experimental Design / Investigation Strategy**

1. **Data Preprocessing:** Tweets with missing text were removed to ensure the analysis was based on complete records.
2. **Sentiment Analysis:**
   - Each tweet's sentiment was evaluated using both VADER and TextBlob.
   - VADER returns four key metrics: Compound, Positive, Neutral, and Negative scores.
   - TextBlob returns a single sentiment polarity score.

#### **Results / Observations**

The sentiment analysis results for the tweets are summarized below:

| Tweet Text                                                       | VADER Compound | TextBlob Score |
|------------------------------------------------------------------|----------------|----------------|
| RT @jexep: เทคนิคฝึกภาษากับ ChatGPT...                          | 0.0            | 0.0            |
| ChatGPTをもっと活かせるChrome拡張機能4選                          | 0.0            | 0.0            |
| ChatGPT Plugins are the fastest way to get rich in 2023...       | 0.6808         | 0.375          |
| 🔥Hey Guys, #ZenithSwap has launched...                         | 0.0            | 0.0            |
| RT @sinsonetwork: Now! Join #SINSO DataLand^ChatGPT #Airdrop!    | 0.4184         | 0.0            |

- **VADER Scores:** 
  - Most tweets returned neutral or slightly positive sentiment, with a few tweets indicating more positive emotions (e.g., "ChatGPT Plugins are the fastest way to get rich in 2023").
  
- **TextBlob Scores:** 
  - TextBlob's polarity scores were generally neutral or positive, with the exception of some tweets that had a very small positive sentiment.

#### **Conclusions**

Both sentiment analysis tools, VADER and TextBlob, provided valuable insights into the emotional tone of tweets related to ChatGPT. The results showed that the majority of the tweets were neutral, with some indicating a positive tone. VADER's compound score was helpful in identifying tweets with a stronger emotional tone, while TextBlob's score corroborated these findings.