#### Sentiment Scoring – Crypto News Text

**Goal**:  
Turn raw news titles/summaries/descriptions into daily sentiment scores  
(positive/negative/neutral) for correlation with price movements.

**Tools used**:
- VADER (rule-based, fast, good with emojis/slang/caps – perfect for crypto)
- Optional: Hugging Face transformers (local model, higher quality but slower)

**Input**:
- raw_crypto_news_rss.csv (from RSS feeds)
- raw_newsdataio_crypto.csv (if you used NewsData.io)

**Output**:
- daily_sentiment.csv (date + average daily sentiment score)

#### 1. Imports & load data

In [3]:
import pandas as pd
import numpy as np
from datetime import datetime

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

print("Phase 2 started:", datetime.now().strftime("%Y-%m-%d %H:%M"))

# Load news data (use whichever file you have – or combine both)
try:
    df_rss = pd.read_csv('raw_crypto_news_rss.csv', parse_dates=['published'])
    print(f"Loaded RSS data: {len(df_rss)} articles")
except FileNotFoundError:
    df_rss = pd.DataFrame()
    print("No RSS file found")

try:
    df_nd = pd.read_csv('raw_newsdataio_crypto.csv', parse_dates=['pubDate'])
    print(f"Loaded NewsData.io data: {len(df_nd)} articles")
    # Standardize column names
    df_nd = df_nd.rename(columns={'pubDate': 'published', 'description': 'summary'})
except FileNotFoundError:
    df_nd = pd.DataFrame()
    print("No NewsData.io file found")

# Combine if both exist
if not df_rss.empty and not df_nd.empty:
    df_news = pd.concat([df_rss, df_nd], ignore_index=True)
elif not df_rss.empty:
    df_news = df_rss
elif not df_nd.empty:
    df_news = df_nd
else:
    raise FileNotFoundError("No news data files found. Run Phase 1 first.")

# Drop duplicates & sort
df_news = df_news.drop_duplicates(subset=['title', 'link']).sort_values('published').reset_index(drop=True)
print(f"Total unique articles after merge: {len(df_news)}")
df_news.tail(3)

Phase 2 started: 2026-02-16 17:31
Loaded RSS data: 91 articles
Loaded NewsData.io data: 40 articles
Total unique articles after merge: 109


Unnamed: 0,source,title,link,published,summary,content,tags,article_id,keywords,creator,...,sentiment_stats,ai_tag,ai_region,ai_org,ai_summary,duplicate,query,btc_relevant,eth_relevant,sol_relevant
106,Cointelegraph,When will crypto’s CLARITY Act framework pass ...,https://cointelegraph.com/news/when-will-crypt...,2026-02-16 13:08:15,"<p style=""float: right; margin: 0 0 10px 15px;...",,,,,,...,,,,,,,,,,
107,CryptoPotato,Shiba Inu (SHIB) Could Explode by 50% But Unde...,https://cryptopotato.com/shiba-inu-shib-could-...,2026-02-16 13:21:38,Is SHIB gearing up for a move toward $0.0000099?,<p>Shiba Inu (SHIB) has been on an evident dow...,"Crypto Bits, Crypto News, Shiba Inu (SHIB)",,,,...,,,,,,,,,,
108,CryptoPotato,Ethereum Price Prediction: Is Breakout Imminen...,https://cryptopotato.com/ethereum-price-predic...,2026-02-16 13:24:15,Ethereum’s most recent price action reflects a...,<p>Ethereum’s most recent price action reflect...,"Crypto News, ETH Analysis, Ethereum (ETH) Price",,,,...,,,,,,,,,,


#### 2. Create combined text field & clean

In [4]:
df_news['text'] = (
    df_news['title'].fillna('') + ' ' +
    df_news['summary'].fillna('') + ' ' +
    df_news['content'].fillna('')
).str.strip()

# Remove very short/empty texts
df_news = df_news[df_news['text'].str.len() > 20].copy()

print(f"Articles after filtering short text: {len(df_news)}")

Articles after filtering short text: 109


#### 3. VADER Sentiment Scoring

In [5]:
analyzer = SentimentIntensityAnalyzer()

def get_vader_score(text):
    if not text or len(text) < 10:
        return 0.0
    scores = analyzer.polarity_scores(text)
    return scores['compound']  # -1 (very negative) → +1 (very positive)

print("Computing VADER sentiment scores...")
df_news['sentiment_vader'] = df_news['text'].apply(get_vader_score)

# Quick distribution
print("\nSentiment score distribution:")
print(df_news['sentiment_vader'].describe())

df_news[['published', 'title', 'sentiment_vader']].sort_values('sentiment_vader').tail(8)

Computing VADER sentiment scores...

Sentiment score distribution:
count    109.000000
mean       0.024247
std        0.659123
min       -0.992800
25%       -0.599400
50%        0.000000
75%        0.659700
max        0.997800
Name: sentiment_vader, dtype: float64


Unnamed: 0,published,title,sentiment_vader
0,2026-02-13 13:44:37,US CPI Data for January Shows Cooling Inflatio...,0.9709
5,2026-02-13 15:39:48,Pi Network (PI) Jumps 8% in 24 Hours: Is the W...,0.9785
20,2026-02-14 12:48:48,Massive 500% PI Surge Forecast as Pi Network L...,0.9865
9,2026-02-13 19:50:04,CFTC Appoints Crypto Heavyweights to 35-Person...,0.9917
95,2026-02-16 10:07:41,VALR Highlights Africa’s Leadership in Crypto ...,0.9934
107,2026-02-16 13:21:38,Shiba Inu (SHIB) Could Explode by 50% But Unde...,0.9952
38,2026-02-15 08:40:34,Pi Network Pioneers Celebrate PI’s 35% Daily S...,0.9959
42,2026-02-15 12:05:58,Ripple’s February Ledger Update: What It Means...,0.9978


#### 4. Aggregate to daily level

In [7]:
df_news['date'] = df_news['published'].dt.date

daily_sentiment = df_news.groupby('date')['sentiment_vader'].agg([
    'mean',
    'count',           # how many articles that day
    'std'              # variability
]).rename(columns={'mean': 'sentiment_mean'})

daily_sentiment['sentiment_mean_3d'] = daily_sentiment['sentiment_mean'].rolling(
    window=3, 
    min_periods=1, 
    center=True
).mean()

print("Daily sentiment aggregation:")
display(daily_sentiment.tail(10))

# Re-save if needed
daily_sentiment.to_csv('daily_sentiment_vader.csv', index=False)

Daily sentiment aggregation:


Unnamed: 0_level_0,sentiment_mean,count,std,sentiment_mean_3d
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2026-02-13,0.281008,12,0.59616,0.144897
2026-02-14,0.008786,21,0.737389,0.098551
2026-02-15,0.005858,24,0.632639,-0.001877
2026-02-16,-0.020275,52,0.657131,-0.007208
