# Sentiment Analysis of Donald Trump Rally Speeches

This notebook performs comprehensive sentiment analysis on Donald Trump's rally speeches from 2019-2020 using **FinBERT**, a BERT model fine-tuned for financial sentiment analysis. While originally designed for financial text, FinBERT's sentiment classification (positive, negative, neutral) works well for political speech analysis.

## Analysis Overview
- **Model**: ProsusAI/finbert - Pre-trained BERT for sentiment classification
- **Approach**: Chunk long speeches into manageable segments for BERT processing
- **Output**: Sentiment scores (positive, negative, neutral) for each speech
- **Insights**: Temporal trends, location-based patterns, and aggregate sentiment metrics

## Import Libraries

Loading required libraries for deep learning, NLP, and visualization.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

import math
import tensorflow as tf
from tqdm.notebook import tqdm
from typing import List, Tuple, Dict

from transformers import pipeline, AutoTokenizer, TFBertForSequenceClassification
from scipy.special import softmax
from tensorflow.python.ops.numpy_ops import np_config

# Enable NumPy behavior for TensorFlow
np_config.enable_numpy_behavior()

# Set visualization styles
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")


TensorFlow version: 2.20.0
GPU Available: []


## Loading and overview of stored dataset

In [2]:
# Load the dataset prepared in Word Clouds notebook
%store -r DT_rally_speaches_dataset
df = DT_rally_speaches_dataset.copy()

print(f"Loaded {len(df)} speeches")
print(f"Date range: {df['Month'].iloc[0]} {df['Year'].iloc[0]} - {df['Month'].iloc[-1]} {df['Year'].iloc[-1]}")

Loaded 35 speeches
Date range: Jul 2019 - Sep 2020


In [3]:
df.head()

Unnamed: 0,Location,Month,Year,filename,content,Month_Num,Date,word_count
0,Greenville,Jul,2019,GreenvilleJul17_2019.txt,Thank you very much. Thank you. Thank you. Tha...,7,2019-07-15,10605
1,Cincinnati,Aug,2019,CincinnatiAug1_2019.txt,Thank you all. Thank you very much. Thank you ...,8,2019-08-15,8170
2,New Hampshire,Aug,2019,NewHampshireAug15_2019.txt,Thank you very much everybody. Thank you. Wow...,8,2019-08-15,10141
3,Texas,Sep,2019,TexasSep23_2019.txt,"Hello, Houston. I am so thrilled to be here in...",9,2019-09-15,2487
4,New Mexico,Sep,2019,NewMexicoSep16_2019.txt,"Wow, thank you. Thank you, New Mexico. Thank ...",9,2019-09-15,11498


## Model and tokenizer setup

We're using **FinBERT** (ProsusAI/finbert), a BERT model fine-tuned for sentiment analysis. It classifies text into three categories:
- **Positive**: Optimistic, confident language
- **Negative**: Critical, pessimistic language  
- **Neutral**: Factual, balanced statements

In [6]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [7]:
# Load FinBERT model and tokenizer
MODEL_CHECKPOINT = 'ProsusAI/finbert'

print("Loading FinBERT model and tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_CHECKPOINT)
model = TFBertForSequenceClassification.from_pretrained(MODEL_CHECKPOINT)

# Display model configuration
print(f"\nModel: {MODEL_CHECKPOINT}")
print(f"Labels: {model.config.id2label}")
print(f"Max sequence length: {tokenizer.model_max_length}")

Loading FinBERT model and tokenizer...


'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /ProsusAI/finbert/resolve/main/tokenizer_config.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))"), '(Request ID: c1d42ebb-08ec-4f12-b9b5-83dbc8017653)')' thrown while requesting HEAD https://huggingface.co/ProsusAI/finbert/resolve/main/tokenizer_config.json
Retrying in 1s [Retry 1/5].
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /ProsusAI/finbert/resolve/main/tokenizer_config.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))"), '(Request ID: abbb226a-28a6-4660-b563-c8055a3ea437)')' thrown while requesting HEAD https://huggingface.co/ProsusAI/finbert/resolve/main/tokenizer_config.json
Retryi

SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /ProsusAI/finbert/resolve/main/tokenizer_config.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))"), '(Request ID: 494186cc-5c43-4866-a31c-f2c1dd15352d)')

## Text Processing Functions

BERT models have a maximum sequence length (512 tokens). We need to split long speeches into manageable chunks.

In [None]:
def chunk_text_for_bert(text: str, tokenizer, max_length: int = 510) -> List[Dict]:
    """
    Split text into chunks that fit within BERT's token limits.
    
    Parameters:
        text: Input text to chunk
        tokenizer: HuggingFace tokenizer
        max_length: Maximum tokens per chunk (510 to leave room for [CLS] and [SEP])
        
    Returns:
        List of encoded chunks ready for model input
    """
    # Tokenize the full text
    tokens = tokenizer.tokenize(text)
    
    # Split into chunks
    chunks = []
    for i in range(0, len(tokens), max_length):
        chunk_tokens = tokens[i:i + max_length]
        chunk_text = tokenizer.convert_tokens_to_string(chunk_tokens)
        
        # Encode with special tokens
        encoding = tokenizer.encode_plus(
            chunk_text,
            add_special_tokens=True,
            max_length=max_length + 2,  # +2 for [CLS] and [SEP]
            padding='max_length',
            truncation=True,
            return_tensors='tf'
        )
        chunks.append(encoding)
    
    return chunks


def analyze_sentiment(chunks: List[Dict], model) -> Tuple[np.ndarray, np.ndarray]:
    """
    Run sentiment analysis on text chunks.
    
    Parameters:
        chunks: List of encoded text chunks
        model: Loaded sentiment analysis model
        
    Returns:
        Tuple of (all_predictions, mean_sentiment) as numpy arrays
    """
    all_predictions = []
    
    for chunk in chunks:
        # Get model predictions
        outputs = model(chunk)
        
        # Convert logits to probabilities
        probs = tf.nn.softmax(outputs.logits, axis=-1)
        all_predictions.append(probs.numpy())
    
    # Stack all predictions
    all_predictions = np.vstack(all_predictions)
    
    # Calculate mean sentiment across all chunks
    mean_sentiment = np.mean(all_predictions, axis=0)
    
    return all_predictions, mean_sentiment

In [None]:
# Process all speeches with progress tracking
print("🔄 Processing all speeches for sentiment analysis...\n")

sentiment_results = []

for idx, row in tqdm(df.iterrows(), total=len(df), desc="Analyzing speeches"):
    try:
        # Chunk the speech text
        chunks = chunk_text_for_bert(row['content'], tokenizer, max_length=510)
        
        # Analyze sentiment
        chunk_predictions, mean_sentiment = analyze_sentiment(chunks, model)
        
        # Store results
        sentiment_results.append({
            'speech_idx': idx,
            'location': row['Location'],
            'month': row['Month'],
            'year': row['Year'],
            'num_chunks': len(chunks),
            'positive': mean_sentiment[0],
            'negative': mean_sentiment[1],
            'neutral': mean_sentiment[2],
            'chunk_predictions': chunk_predictions,
            'dominant_sentiment': model.config.id2label[np.argmax(mean_sentiment)]
        })
        
    except Exception as e:
        print(f"\n⚠️  Error processing speech {idx} ({row['Location']}): {e}")
        continue

print(f"\n✅ Successfully analyzed {len(sentiment_results)} speeches!")

# Create results DataFrame
sentiment_df = pd.DataFrame([{k: v for k, v in r.items() if k != 'chunk_predictions'} 
                              for r in sentiment_results])

Predictions for text 1 shape: (48, 3)
Predictions for text 2 shape: (46, 3)
Predictions for text 3 shape: (25, 3)
Predictions for text 4 shape: (18, 3)
Predictions for text 5 shape: (22, 3)
Predictions for text 6 shape: (32, 3)
Predictions for text 7 shape: (28, 3)
Predictions for text 8 shape: (32, 3)
Predictions for text 9 shape: (45, 3)
Predictions for text 10 shape: (24, 3)
Predictions for text 11 shape: (27, 3)
Predictions for text 12 shape: (29, 3)
Predictions for text 13 shape: (24, 3)
Predictions for text 14 shape: (26, 3)
Predictions for text 15 shape: (37, 3)
Predictions for text 16 shape: (33, 3)
Predictions for text 17 shape: (24, 3)
Predictions for text 18 shape: (25, 3)
Predictions for text 19 shape: (38, 3)
Predictions for text 20 shape: (31, 3)
Predictions for text 21 shape: (39, 3)
Predictions for text 22 shape: (26, 3)
Predictions for text 23 shape: (24, 3)
Predictions for text 24 shape: (17, 3)
Predictions for text 25 shape: (31, 3)
Predictions for text 26 shape: (29

## Sentiment Analysis Results

Let's examine the sentiment scores for each speech.

In [None]:
# Display sentiment scores
print("Sentiment Scores by Speech:\n")
print("="*80)

for idx, row in sentiment_df.iterrows():
    print(f"{row['location']:.<30} ({row['month']} {row['year']})")
    print(f"   Positive: {row['positive']:.3f} | Negative: {row['negative']:.3f} | Neutral: {row['neutral']:.3f}")
    print(f"   Dominant: {row['dominant_sentiment']} | Chunks: {row['num_chunks']}")
    print()

# Show DataFrame
sentiment_df.head(10)

## Interactive Visualizations

Creating interactive charts to explore sentiment patterns.

In [None]:
# Create comprehensive sentiment visualization
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('Sentiment Distribution Across All Speeches',
                    'Average Sentiment by Year',
                    'Dominant Sentiment Count',
                    'Sentiment Trends Over Time'),
    specs=[[{"type": "bar"}, {"type": "bar"}],
           [{"type": "pie"}, {"type": "scatter"}]]
)

# 1. Overall sentiment distribution (stacked bar)
fig.add_trace(
    go.Bar(name='Positive', x=sentiment_df['location'], y=sentiment_df['positive'],
           marker_color='#2ecc71', showlegend=True),
    row=1, col=1
)
fig.add_trace(
    go.Bar(name='Negative', x=sentiment_df['location'], y=sentiment_df['negative'],
           marker_color='#e74c3c', showlegend=True),
    row=1, col=1
)
fig.add_trace(
    go.Bar(name='Neutral', x=sentiment_df['location'], y=sentiment_df['neutral'],
           marker_color='#95a5a6', showlegend=True),
    row=1, col=1
)

# 2. Average sentiment by year
year_avg = sentiment_df.groupby('year')[['positive', 'negative', 'neutral']].mean()
fig.add_trace(
    go.Bar(name='Positive', x=year_avg.index, y=year_avg['positive'],
           marker_color='#2ecc71', showlegend=False),
    row=1, col=2
)
fig.add_trace(
    go.Bar(name='Negative', x=year_avg.index, y=year_avg['negative'],
           marker_color='#e74c3c', showlegend=False),
    row=1, col=2
)
fig.add_trace(
    go.Bar(name='Neutral', x=year_avg.index, y=year_avg['neutral'],
           marker_color='#95a5a6', showlegend=False),
    row=1, col=2
)

# 3. Dominant sentiment pie chart
sentiment_counts = sentiment_df['dominant_sentiment'].value_counts()
fig.add_trace(
    go.Pie(labels=sentiment_counts.index, values=sentiment_counts.values,
           marker=dict(colors=['#2ecc71', '#e74c3c', '#95a5a6']),
           showlegend=False),
    row=2, col=1
)

# 4. Sentiment timeline
fig.add_trace(
    go.Scatter(x=sentiment_df['speech_idx'], y=sentiment_df['positive'],
               mode='lines+markers', name='Positive',
               line=dict(color='#2ecc71', width=2), showlegend=False),
    row=2, col=2
)
fig.add_trace(
    go.Scatter(x=sentiment_df['speech_idx'], y=sentiment_df['negative'],
               mode='lines+markers', name='Negative',
               line=dict(color='#e74c3c', width=2), showlegend=False),
    row=2, col=2
)
fig.add_trace(
    go.Scatter(x=sentiment_df['speech_idx'], y=sentiment_df['neutral'],
               mode='lines+markers', name='Neutral',
               line=dict(color='#95a5a6', width=2), showlegend=False),
    row=2, col=2
)

# Update layout
fig.update_layout(
    height=900,
    title_text="Comprehensive Sentiment Analysis Dashboard",
    showlegend=True,
    barmode='group',
    template='plotly_white'
)

fig.update_xaxes(tickangle=-45, row=1, col=1)
fig.update_yaxes(title_text="Probability", row=1, col=1)
fig.update_yaxes(title_text="Probability", row=1, col=2)
fig.update_yaxes(title_text="Sentiment Score", row=2, col=2)
fig.update_xaxes(title_text="Speech Index", row=2, col=2)

fig.show()

## Sentiment Heatmap

Visualizing sentiment patterns across speeches and time.

In [None]:
# Create sentiment heatmap
heatmap_data = sentiment_df[['positive', 'negative', 'neutral']].T
heatmap_data.columns = [f"{row['location'][:15]}..." if len(row['location']) > 15 
                        else row['location'] 
                        for _, row in sentiment_df.iterrows()]

fig = go.Figure(data=go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns,
    y=['Positive', 'Negative', 'Neutral'],
    colorscale='RdYlGn',
    text=heatmap_data.values,
    texttemplate='%{text:.2f}',
    textfont={"size": 10},
    colorbar=dict(title="Probability")
))

fig.update_layout(
    title='Sentiment Heatmap: All Speeches',
    xaxis_title='Speech Location',
    yaxis_title='Sentiment Type',
    height=400,
    template='plotly_white'
)

fig.update_xaxes(tickangle=-45)
fig.show()

## Chunk-Level Sentiment Analysis

Examining sentiment variation within individual speeches.

In [None]:
# Select a few interesting speeches to examine in detail
selected_speeches = [0, 10, 20, 30]  # First, middle, and later speeches

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=[f"{sentiment_results[i]['location']} ({sentiment_results[i]['month']} {sentiment_results[i]['year']})" 
                    for i in selected_speeches]
)

positions = [(1, 1), (1, 2), (2, 1), (2, 2)]

for idx, (speech_idx, pos) in enumerate(zip(selected_speeches, positions)):
    result = sentiment_results[speech_idx]
    chunks = result['chunk_predictions']
    
    chunk_indices = list(range(len(chunks)))
    
    # Add traces for each sentiment
    fig.add_trace(
        go.Scatter(x=chunk_indices, y=chunks[:, 0],
                   mode='lines+markers', name='Positive',
                   line=dict(color='#2ecc71'), showlegend=(idx == 0)),
        row=pos[0], col=pos[1]
    )
    fig.add_trace(
        go.Scatter(x=chunk_indices, y=chunks[:, 1],
                   mode='lines+markers', name='Negative',
                   line=dict(color='#e74c3c'), showlegend=(idx == 0)),
        row=pos[0], col=pos[1]
    )
    fig.add_trace(
        go.Scatter(x=chunk_indices, y=chunks[:, 2],
                   mode='lines+markers', name='Neutral',
                   line=dict(color='#95a5a6'), showlegend=(idx == 0)),
        row=pos[0], col=pos[1]
    )
    
    fig.update_xaxes(title_text="Chunk Index", row=pos[0], col=pos[1])
    fig.update_yaxes(title_text="Sentiment Score", row=pos[0], col=pos[1])

fig.update_layout(
    height=800,
    title_text="Sentiment Variation Within Individual Speeches",
    template='plotly_white',
    showlegend=True
)

fig.show()

## Temporal Analysis: Sentiment Over Time

Analyzing how sentiment evolved throughout 2019 and 2020.

In [None]:
# Add chronological date information to sentiment_df
month_map = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
             'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
sentiment_df['month_num'] = sentiment_df['month'].map(month_map)
sentiment_df['date'] = pd.to_datetime(sentiment_df['year'] + '-' + 
                                       sentiment_df['month_num'].astype(str) + '-15')
sentiment_df = sentiment_df.sort_values('date')

# Create temporal visualization
fig = go.Figure()

# Add sentiment traces
fig.add_trace(go.Scatter(
    x=sentiment_df['date'],
    y=sentiment_df['positive'],
    mode='lines+markers',
    name='Positive',
    line=dict(color='#2ecc71', width=3),
    marker=dict(size=8),
    hovertemplate='<b>%{text}</b><br>Positive: %{y:.3f}<extra></extra>',
    text=sentiment_df['location']
))

fig.add_trace(go.Scatter(
    x=sentiment_df['date'],
    y=sentiment_df['negative'],
    mode='lines+markers',
    name='Negative',
    line=dict(color='#e74c3c', width=3),
    marker=dict(size=8),
    hovertemplate='<b>%{text}</b><br>Negative: %{y:.3f}<extra></extra>',
    text=sentiment_df['location']
))

fig.add_trace(go.Scatter(
    x=sentiment_df['date'],
    y=sentiment_df['neutral'],
    mode='lines+markers',
    name='Neutral',
    line=dict(color='#95a5a6', width=3),
    marker=dict(size=8),
    hovertemplate='<b>%{text}</b><br>Neutral: %{y:.3f}<extra></extra>',
    text=sentiment_df['location']
))

# Add vertical line to separate years
fig.add_vline(x=pd.Timestamp('2020-01-01'), line_dash="dash", 
              line_color="gray", annotation_text="2020 Begins")

fig.update_layout(
    title='Sentiment Evolution Over Time (2019-2020)',
    xaxis_title='Date',
    yaxis_title='Sentiment Score',
    height=500,
    template='plotly_white',
    hovermode='x unified'
)

fig.show()

# Calculate rolling average
window = 3
sentiment_df['positive_ma'] = sentiment_df['positive'].rolling(window=window, center=True).mean()
sentiment_df['negative_ma'] = sentiment_df['negative'].rolling(window=window, center=True).mean()
sentiment_df['neutral_ma'] = sentiment_df['neutral'].rolling(window=window, center=True).mean()

# Plot with moving average
fig2 = go.Figure()

# Raw data (lighter)
fig2.add_trace(go.Scatter(x=sentiment_df['date'], y=sentiment_df['positive'],
                          mode='markers', name='Positive (raw)',
                          marker=dict(color='#2ecc71', size=6, opacity=0.3),
                          showlegend=True))
fig2.add_trace(go.Scatter(x=sentiment_df['date'], y=sentiment_df['negative'],
                          mode='markers', name='Negative (raw)',
                          marker=dict(color='#e74c3c', size=6, opacity=0.3),
                          showlegend=True))

# Moving averages (bold)
fig2.add_trace(go.Scatter(x=sentiment_df['date'], y=sentiment_df['positive_ma'],
                          mode='lines', name=f'Positive ({window}-speech avg)',
                          line=dict(color='#2ecc71', width=4)))
fig2.add_trace(go.Scatter(x=sentiment_df['date'], y=sentiment_df['negative_ma'],
                          mode='lines', name=f'Negative ({window}-speech avg)',
                          line=dict(color='#e74c3c', width=4)))

fig2.add_vline(x=pd.Timestamp('2020-01-01'), line_dash="dash", 
               line_color="gray", annotation_text="2020 Begins")

fig2.update_layout(
    title=f'Sentiment Trends with {window}-Speech Moving Average',
    xaxis_title='Date',
    yaxis_title='Sentiment Score',
    height=500,
    template='plotly_white'
)

fig2.show()

## Year-over-Year Comparison

Comparing sentiment patterns between 2019 and 2020.

In [None]:
# Compare sentiment statistics by year
year_stats = sentiment_df.groupby('year').agg({
    'positive': ['mean', 'std', 'min', 'max'],
    'negative': ['mean', 'std', 'min', 'max'],
    'neutral': ['mean', 'std', 'min', 'max'],
    'speech_idx': 'count'
}).round(3)

print("Year-over-Year Sentiment Statistics:")
print("="*80)
print(year_stats)
print()

# Create box plots for sentiment distribution by year
fig = make_subplots(
    rows=1, cols=3,
    subplot_titles=('Positive Sentiment', 'Negative Sentiment', 'Neutral Sentiment')
)

for year in sentiment_df['year'].unique():
    year_data = sentiment_df[sentiment_df['year'] == year]
    
    fig.add_trace(
        go.Box(y=year_data['positive'], name=year, showlegend=True,
               marker_color='#2ecc71' if year == '2019' else '#27ae60'),
        row=1, col=1
    )
    fig.add_trace(
        go.Box(y=year_data['negative'], name=year, showlegend=False,
               marker_color='#e74c3c' if year == '2019' else '#c0392b'),
        row=1, col=2
    )
    fig.add_trace(
        go.Box(y=year_data['neutral'], name=year, showlegend=False,
               marker_color='#95a5a6' if year == '2019' else '#7f8c8d'),
        row=1, col=3
    )

fig.update_layout(
    title_text='Sentiment Distribution by Year',
    height=400,
    template='plotly_white',
    showlegend=True
)

fig.update_yaxes(title_text="Sentiment Score", row=1, col=1)
fig.update_yaxes(title_text="Sentiment Score", row=1, col=2)
fig.update_yaxes(title_text="Sentiment Score", row=1, col=3)

fig.show()

# Statistical comparison
print("\n📊 Key Insights:")
print("="*80)
for year in sorted(sentiment_df['year'].unique()):
    year_data = sentiment_df[sentiment_df['year'] == year]
    print(f"\n{year}:")
    print(f"  • Average Positive: {year_data['positive'].mean():.3f} (±{year_data['positive'].std():.3f})")
    print(f"  • Average Negative: {year_data['negative'].mean():.3f} (±{year_data['negative'].std():.3f})")
    print(f"  • Average Neutral:  {year_data['neutral'].mean():.3f} (±{year_data['neutral'].std():.3f})")
    print(f"  • Speeches: {len(year_data)}")

## Summary Statistics and Insights

In [None]:
# Comprehensive summary
print("=" * 80)
print("📊 SENTIMENT ANALYSIS SUMMARY")
print("=" * 80)

# Overall statistics
print(f"\n🎤 Dataset Overview:")
print(f"   Total Speeches Analyzed: {len(sentiment_df)}")
print(f"   Time Period: {sentiment_df['date'].min().strftime('%B %Y')} - {sentiment_df['date'].max().strftime('%B %Y')}")
print(f"   Total Text Chunks Processed: {sentiment_df['num_chunks'].sum():,}")
print(f"   Average Chunks per Speech: {sentiment_df['num_chunks'].mean():.1f}")

# Overall sentiment averages
print(f"\n📈 Overall Sentiment Scores:")
print(f"   Positive: {sentiment_df['positive'].mean():.3f} (±{sentiment_df['positive'].std():.3f})")
print(f"   Negative: {sentiment_df['negative'].mean():.3f} (±{sentiment_df['negative'].std():.3f})")
print(f"   Neutral:  {sentiment_df['neutral'].mean():.3f} (±{sentiment_df['neutral'].std():.3f})")

# Dominant sentiment
dominant_counts = sentiment_df['dominant_sentiment'].value_counts()
print(f"\n🎯 Dominant Sentiment Distribution:")
for sentiment, count in dominant_counts.items():
    percentage = (count / len(sentiment_df)) * 100
    print(f"   {sentiment}: {count} speeches ({percentage:.1f}%)")

# Most/least positive speeches
most_positive = sentiment_df.nlargest(3, 'positive')
most_negative = sentiment_df.nlargest(3, 'negative')

print(f"\n✨ Most Positive Speeches:")
for _, row in most_positive.iterrows():
    print(f"   • {row['location']} ({row['month']} {row['year']}): {row['positive']:.3f}")

print(f"\n⚠️  Most Negative Speeches:")
for _, row in most_negative.iterrows():
    print(f"   • {row['location']} ({row['month']} {row['year']}): {row['negative']:.3f}")

# Sentiment volatility (speeches with high variance in chunks)
print(f"\n📊 Sentiment Variation:")
chunk_variances = []
for result in sentiment_results:
    chunks = result['chunk_predictions']
    variance = np.var(chunks, axis=0).mean()
    chunk_variances.append((result['location'], result['month'], result['year'], variance))

chunk_variances.sort(key=lambda x: x[3], reverse=True)
print(f"   Speeches with Most Sentiment Variation:")
for location, month, year, var in chunk_variances[:3]:
    print(f"   • {location} ({month} {year}): variance = {var:.4f}")

print(f"\n   Speeches with Most Consistent Sentiment:")
for location, month, year, var in chunk_variances[-3:]:
    print(f"   • {location} ({month} {year}): variance = {var:.4f}")

print("\n" + "=" * 80)

## Save Results to DataFrame

Adding sentiment scores to the original dataset for further analysis.

In [None]:
# Merge sentiment scores back into original DataFrame
df_with_sentiment = df.copy()
df_with_sentiment['sentiment_positive'] = sentiment_df['positive'].values
df_with_sentiment['sentiment_negative'] = sentiment_df['negative'].values
df_with_sentiment['sentiment_neutral'] = sentiment_df['neutral'].values
df_with_sentiment['dominant_sentiment'] = sentiment_df['dominant_sentiment'].values

# Store the enhanced dataset
DT_rally_speeches_with_sentiment = df_with_sentiment
%store DT_rally_speeches_with_sentiment

print("✅ Sentiment scores added to DataFrame!")
print(f"\nNew columns: sentiment_positive, sentiment_negative, sentiment_neutral, dominant_sentiment")
print(f"\nDataFrame shape: {df_with_sentiment.shape}")
print("\n📁 Dataset stored as 'DT_rally_speeches_with_sentiment' for use in other notebooks")

# Display sample
df_with_sentiment[['Location', 'Month', 'Year', 'sentiment_positive', 
                    'sentiment_negative', 'sentiment_neutral', 'dominant_sentiment']].head(10)