# üöÄ Stock Sentiment Graph API - Interactive Demo

This notebook showcases the powerful features of the **Stock Sentiment Graph API** - a Neo4j-powered system that analyzes the correlation between social media sentiment and stock price movements.

## üìä What This Project Does

- **Ingests** stock price data and social media tweets into a Neo4j knowledge graph
- **Analyzes** sentiment using Hugging Face FinBERT (financial sentiment model)
- **Correlates** social sentiment with stock price movements
- **Identifies** trending stocks, influencers, and volatility patterns
- **Predicts** price movements based on sentiment trends
- **Performs** advanced graph analytics (PageRank, Louvain communities, etc.)

## üéØ Key Features We'll Explore

1. **Data Pipeline**: Load stocks and tweets into the graph
2. **Sentiment Analysis**: Real-time sentiment scoring with FinBERT
3. **Quantitative Analysis**: Correlation, trending, predictions, volatility
4. **Graph Analytics**: Network influence, communities, cascades
5. **Visualizations**: Interactive charts and graphs

In [None]:
# Install required packages (if not already installed)
# !pip install requests pandas matplotlib seaborn plotly ipywidgets networkx

In [None]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import json
from typing import Dict, List
import warnings
warnings.filterwarnings('ignore')

# Configure API base URL
API_BASE = "http://localhost:8000/api"

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported successfully!")
print(f"üåê API Base URL: {API_BASE}")
print(f"üìÖ Current Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 1Ô∏è‚É£ Data Pipeline - Ingesting Stocks & Tweets

The unified pipeline loads both stock price data and social media tweets into Neo4j, automatically:
- Creating nodes for Stocks, TradingDays, Tweets, HashTags
- Extracting hashtags and mentions from tweet text
- Calculating daily price changes and volatility
- Linking tweets to stocks and trading days
- Processing sentiment for tweets (if not already present)

In [None]:
def ingest_stock_data(stock: str, start_date: str = "2021-09-30", end_date: str = "2022-09-30"):
    """
    Ingest stock price data and tweets for a given ticker.
    This is the main pipeline endpoint that does everything!
    """
    url = f"{API_BASE}/pipeline/dataset_to_graph"
    payload = {
        "stock": stock,
        "start_date": start_date,
        "end_date": end_date,
        "chunk_size": 2000
    }
    
    print(f"üîÑ Ingesting data for {stock} from {start_date} to {end_date}...")
    response = requests.post(url, json=payload, timeout=300)
    
    if response.status_code == 200:
        result = response.json()
        print(f"‚úÖ Success!")
        print(f"   üìà Prices synced: {result.get('prices_synced', 0)}")
        print(f"   üê¶ Tweets imported: {result.get('tweets_imported', 0)}")
        if 'sentiment_processing' in result:
            sent = result['sentiment_processing']
            print(f"   üí≠ Sentiment processed: {sent.get('processed', 0)} tweets")
        return result
    else:
        print(f"‚ùå Error: {response.status_code}")
        print(response.text)
        return None

# Example: Ingest Tesla (TSLA) data
# Uncomment to run (this may take a few minutes)
# tsla_result = ingest_stock_data("TSLA", "2021-09-30", "2022-09-30")

## 2Ô∏è‚É£ Sentiment Analysis - Real-time FinBERT Scoring

Analyze sentiment of any text using Hugging Face's FinBERT model, specifically trained on financial data.

In [None]:
def analyze_sentiment(text: str, api_key: str = None):
    """
    Analyze sentiment of text using FinBERT.
    Returns sentiment score (0-1, where >0.5 is positive) and confidence.
    """
    url = f"{API_BASE}/sentiment/analyze"
    payload = {"text": text}
    if api_key:
        payload["api_key"] = api_key
    
    response = requests.post(url, json=payload)
    
    if response.status_code == 200:
        result = response.json()
        return result
    else:
        print(f"‚ùå Error: {response.status_code}")
        print(response.text)
        return None

# Example sentiment analysis
sample_tweets = [
    "üöÄ TSLA to the moon! Best investment ever!",
    "üò∞ TSLA is crashing, sell everything!",
    "TSLA earnings report shows steady growth",
    "Not sure about TSLA, mixed signals from analysts"
]

print("üí≠ Analyzing Sample Tweets:\n")
for tweet in sample_tweets:
    result = analyze_sentiment(tweet)
    if result:
        sentiment = result['sentiment']
        confidence = result['confidence']
        label = "üü¢ Positive" if sentiment > 0.5 else "üî¥ Negative"
        print(f"Tweet: {tweet}")
        print(f"  {label} | Score: {sentiment:.3f} | Confidence: {confidence:.3f}\n")

## 3Ô∏è‚É£ Quantitative Analysis

### 3.1 Sentiment-Price Correlation

Discover how social media sentiment correlates with actual stock price movements!

In [None]:
def get_sentiment_price_correlation(stock: str, start_date: str = None, end_date: str = None):
    """
    Get correlation between sentiment and price movements.
    Returns Pearson correlation coefficient and daily data.
    """
    url = f"{API_BASE}/correlation/sentiment-price/{stock}"
    params = {}
    if start_date:
        params["start_date"] = start_date
    if end_date:
        params["end_date"] = end_date
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        return None

# Get correlation data
correlation_data = get_sentiment_price_correlation("TSLA", "2021-09-30", "2022-09-30")

if correlation_data:
    print(f"üìä Sentiment-Price Correlation Analysis for {correlation_data['stock']}")
    print(f"   Correlation Coefficient: {correlation_data.get('correlation_coefficient', 'N/A')}")
    print(f"   Data Points: {correlation_data.get('data_points', 0)}")
    print(f"   Interpretation: {correlation_data.get('interpretation', 'N/A')}")
    
    # Create visualization
    if correlation_data.get('daily_data'):
        df = pd.DataFrame(correlation_data['daily_data'])
        df['date'] = pd.to_datetime(df['date'])
        
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
        
        # Plot 1: Price and Sentiment over time
        ax1_twin = ax1.twinx()
        ax1.plot(df['date'], df['close_price'], 'b-', label='Close Price', linewidth=2)
        ax1_twin.plot(df['date'], df['avg_sentiment'], 'r-', label='Avg Sentiment', linewidth=2, alpha=0.7)
        ax1.set_xlabel('Date', fontsize=12)
        ax1.set_ylabel('Close Price ($)', color='b', fontsize=12)
        ax1_twin.set_ylabel('Average Sentiment', color='r', fontsize=12)
        ax1.set_title('Stock Price vs. Social Media Sentiment Over Time', fontsize=14, fontweight='bold')
        ax1.legend(loc='upper left')
        ax1_twin.legend(loc='upper right')
        ax1.grid(True, alpha=0.3)
        
        # Plot 2: Scatter plot showing correlation
        ax2.scatter(df['avg_sentiment'], df['close_price'], 
                   s=df['tweet_count']*2, alpha=0.6, c=df['tweet_count'], cmap='viridis')
        ax2.set_xlabel('Average Sentiment', fontsize=12)
        ax2.set_ylabel('Close Price ($)', fontsize=12)
        ax2.set_title('Sentiment vs. Price Correlation (bubble size = tweet count)', fontsize=14, fontweight='bold')
        plt.colorbar(ax2.collections[0], ax=ax2, label='Tweet Count')
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()

### 3.2 Trending Stocks

Find which stocks are trending based on tweet volume and sentiment!

In [None]:
def get_trending_stocks(window: str = "daily", limit: int = 10):
    """
    Get trending stocks based on tweet volume and sentiment.
    """
    url = f"{API_BASE}/trending/stocks"
    params = {"window": window, "limit": limit}
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        return None

# Get trending stocks
trending = get_trending_stocks("daily", 15)

if trending:
    print(f"üî• Trending Stocks ({trending['window']} window)")
    print(f"   Window Start: {trending['start_time']}\n")
    
    df_trending = pd.DataFrame(trending['trending_stocks'])
    
    # Create visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Plot 1: Top stocks by trend score
    top_10 = df_trending.head(10)
    ax1.barh(top_10['ticker'], top_10['trend_score'], color='steelblue')
    ax1.set_xlabel('Trend Score', fontsize=12)
    ax1.set_title('Top 10 Trending Stocks by Trend Score', fontsize=14, fontweight='bold')
    ax1.grid(True, alpha=0.3, axis='x')
    
    # Plot 2: Tweet volume vs sentiment
    ax2.scatter(df_trending['avg_sentiment'], df_trending['tweet_volume'], 
               s=df_trending['trend_score']*10, alpha=0.6, c=df_trending['trend_score'], cmap='coolwarm')
    for idx, row in df_trending.head(10).iterrows():
        ax2.annotate(row['ticker'], (row['avg_sentiment'], row['tweet_volume']), 
                    fontsize=8, alpha=0.7)
    ax2.set_xlabel('Average Sentiment', fontsize=12)
    ax2.set_ylabel('Tweet Volume', fontsize=12)
    ax2.set_title('Volume vs. Sentiment (bubble size = trend score)', fontsize=14, fontweight='bold')
    plt.colorbar(ax2.collections[0], ax=ax2, label='Trend Score')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Display table
    print("\nüìã Detailed Trending Stocks Data:")
    print(df_trending.to_string(index=False))

### 3.3 Top Influencers

Identify the most influential users for a specific stock based on their tweet activity and network influence.

In [None]:
def get_top_influencers(stock: str, limit: int = 20):
    """
    Get top influencers for a stock.
    """
    url = f"{API_BASE}/influencers/{stock}"
    params = {"limit": limit}
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        return None

# Get influencers for TSLA
influencers = get_top_influencers("TSLA", 15)

if influencers:
    print(f"üë• Top Influencers for {influencers['stock']}\n")
    
    df_inf = pd.DataFrame(influencers['top_influencers'])
    
    if not df_inf.empty:
        # Create visualization
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        
        # Plot 1: Top influencers by influence score
        top_10 = df_inf.head(10)
        ax1.barh(range(len(top_10)), top_10['influence_score'], color='coral')
        ax1.set_yticks(range(len(top_10)))
        ax1.set_yticklabels(top_10['user_id'], fontsize=9)
        ax1.set_xlabel('Influence Score', fontsize=12)
        ax1.set_title('Top 10 Influencers by Influence Score', fontsize=14, fontweight='bold')
        ax1.grid(True, alpha=0.3, axis='x')
        
        # Plot 2: Tweet count vs sentiment
        ax2.scatter(df_inf['tweet_count'], df_inf['avg_sentiment'], 
                   s=df_inf['influence_count']*5, alpha=0.6, c=df_inf['influence_score'], cmap='plasma')
        ax2.set_xlabel('Tweet Count', fontsize=12)
        ax2.set_ylabel('Average Sentiment', fontsize=12)
        ax2.set_title('Activity vs. Sentiment (bubble size = influence count)', fontsize=14, fontweight='bold')
        plt.colorbar(ax2.collections[0], ax=ax2, label='Influence Score')
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Display table
        print("üìã Detailed Influencers Data:")
        print(df_inf.to_string(index=False))
    else:
        print("‚ö†Ô∏è No influencer data available. Make sure you've ingested data with User information.")

### 3.4 Sentiment-Based Predictions

Predict future price movements based on recent sentiment trends!

In [None]:
def get_sentiment_prediction(stock: str, lookback_days: int = 7):
    """
    Get sentiment-based price prediction.
    """
    url = f"{API_BASE}/prediction/sentiment-based/{stock}"
    params = {"lookback_days": lookback_days}
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        return None

# Get prediction for TSLA
prediction = get_sentiment_prediction("TSLA", 7)

if prediction:
    print(f"üîÆ Sentiment-Based Prediction for {prediction['stock']}")
    print(f"   Lookback Period: {prediction['lookback_days']} days")
    print(f"   Prediction: {prediction['prediction'].upper()}")
    print(f"   Confidence: {prediction['confidence']:.1%}")
    print(f"   Average Sentiment: {prediction.get('avg_sentiment', 'N/A')}")
    print(f"   Tweet Volume: {prediction.get('tweet_volume', 0)}")
    print(f"   Sentiment Volatility: {prediction.get('sentiment_volatility', 'N/A')}")
    print(f"\nüí° Interpretation: {prediction.get('interpretation', 'N/A')}")
    
    # Visualize prediction
    fig, ax = plt.subplots(figsize=(10, 6))
    
    pred_type = prediction['prediction']
    confidence = prediction['confidence']
    
    # Color based on prediction
    color = 'green' if pred_type == 'bullish' else ('red' if pred_type == 'bearish' else 'gray')
    
    # Create gauge-like visualization
    ax.barh([0], [confidence], color=color, alpha=0.7, height=0.5)
    ax.barh([0], [1-confidence], left=[confidence], color='lightgray', alpha=0.3, height=0.5)
    ax.set_xlim(0, 1)
    ax.set_ylim(-0.5, 0.5)
    ax.set_xlabel('Confidence Level', fontsize=12)
    ax.set_title(f'{pred_type.upper()} Prediction - {confidence:.1%} Confidence', 
                fontsize=14, fontweight='bold', color=color)
    ax.text(0.5, 0, f"{prediction.get('avg_sentiment', 0):.3f} avg sentiment\n"
           f"{prediction.get('tweet_volume', 0)} tweets analyzed",
           ha='center', va='center', fontsize=11, fontweight='bold')
    ax.axis('off')
    
    plt.tight_layout()
    plt.show()

### 3.5 Social-Driven Volatility

Identify stocks with high volatility driven by social media sentiment variance.

In [None]:
def get_social_volatility(min_tweets: int = 50, limit: int = 20):
    """
    Get stocks with high social-driven volatility.
    """
    url = f"{API_BASE}/volatility/social-driven"
    params = {"min_tweets": min_tweets, "limit": limit}
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        return None

# Get volatile stocks
volatility = get_social_volatility(50, 20)

if volatility:
    print(f"üìà Social-Driven Volatility Analysis")
    print(f"   Minimum Tweets Threshold: {volatility['min_tweets_threshold']}\n")
    
    df_vol = pd.DataFrame(volatility['volatile_stocks'])
    
    if not df_vol.empty:
        # Create visualization
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        
        # Plot 1: Top volatile stocks
        top_10 = df_vol.head(10)
        ax1.barh(top_10['ticker'], top_10['volatility_score'], color='firebrick')
        ax1.set_xlabel('Volatility Score', fontsize=12)
        ax1.set_title('Top 10 Most Volatile Stocks (Social-Driven)', fontsize=14, fontweight='bold')
        ax1.grid(True, alpha=0.3, axis='x')
        
        # Plot 2: Sentiment std vs tweet count
        ax2.scatter(df_vol['sentiment_std'], df_vol['tweet_count'], 
                   s=df_vol['volatility_score']*2, alpha=0.6, c=df_vol['volatility_score'], cmap='Reds')
        for idx, row in df_vol.head(10).iterrows():
            ax2.annotate(row['ticker'], (row['sentiment_std'], row['tweet_count']), 
                        fontsize=8, alpha=0.7)
        ax2.set_xlabel('Sentiment Standard Deviation', fontsize=12)
        ax2.set_ylabel('Tweet Count', fontsize=12)
        ax2.set_title('Sentiment Variance vs. Volume (bubble size = volatility score)', 
                     fontsize=14, fontweight='bold')
        plt.colorbar(ax2.collections[0], ax=ax2, label='Volatility Score')
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Display table with interpretations
        print("\nüìã Detailed Volatility Data:")
        display_cols = ['ticker', 'tweet_count', 'avg_sentiment', 'sentiment_std', 'volatility_score', 'interpretation']
        available_cols = [col for col in display_cols if col in df_vol.columns]
        print(df_vol[available_cols].to_string(index=False))
    else:
        print("‚ö†Ô∏è No volatility data available.")

## 4Ô∏è‚É£ Graph Analytics

### 4.1 Stock Clusters

Discover which stocks are clustered together based on hashtag co-occurrence patterns.

In [None]:
def get_stock_clusters(limit: int = 10):
    """
    Get stock clusters based on hashtag co-occurrence.
    """
    url = f"{API_BASE}/clusters/stocks"
    params = {"limit": limit}
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        return None

# Get stock clusters
clusters = get_stock_clusters(15)

if clusters:
    print("üîó Stock Clusters (based on hashtag co-occurrence)\n")
    
    df_clusters = pd.DataFrame(clusters['clusters'])
    
    if not df_clusters.empty:
        try:
            import networkx as nx
            
            # Create network visualization
            G = nx.Graph()
            for _, row in df_clusters.iterrows():
                G.add_edge(row['a'], row['b'], weight=row['score'])
            
            if len(G.nodes()) > 0:
                plt.figure(figsize=(14, 10))
                pos = nx.spring_layout(G, k=1, iterations=50)
                
                # Draw edges with weights
                edges = G.edges()
                weights = [G[u][v]['weight'] for u, v in edges]
                nx.draw_networkx_edges(G, pos, width=[w/10 for w in weights], 
                                      alpha=0.6, edge_color='gray')
                
                # Draw nodes
                nx.draw_networkx_nodes(G, pos, node_color='steelblue', 
                                      node_size=1000, alpha=0.9)
                
                # Draw labels
                nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold')
                
                plt.title('Stock Clusters Network (edge thickness = co-occurrence score)', 
                         fontsize=14, fontweight='bold')
                plt.axis('off')
                plt.tight_layout()
                plt.show()
        except ImportError:
            print("‚ö†Ô∏è NetworkX not installed. Install with: pip install networkx")
        
        # Display table
        print("\nüìã Stock Cluster Pairs:")
        print(df_clusters.to_string(index=False))
    else:
        print("‚ö†Ô∏è No cluster data available.")

### 4.2 GDS Algorithms (Graph Data Science)

Advanced graph algorithms using Neo4j's Graph Data Science library.

#### 4.2.1 Global Influence (PageRank)
Compute global user influence using PageRank algorithm.

In [None]:
def get_global_influence(limit: int = 20):
    """
    Get global user influence using GDS PageRank.
    Requires Neo4j GDS plugin.
    """
    url = f"{API_BASE}/gds/influence/global"
    params = {"limit": limit}
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        print("Note: This requires Neo4j GDS plugin. Error may indicate GDS is not installed.")
        return None

# Get global influence (requires GDS plugin)
# Uncomment to run if GDS is installed
# global_influence = get_global_influence(20)
# if global_influence:
#     print(f"üåç Global Influence Ranking ({global_influence['algorithm']})\n")
#     df_global = pd.DataFrame(global_influence['top_users'])
#     print(df_global.to_string(index=False))

#### 4.2.2 Stock Communities (Louvain)
Detect stock communities using Louvain community detection algorithm.

In [None]:
def get_stock_communities():
    """
    Get stock communities using GDS Louvain algorithm.
    Requires Neo4j GDS plugin.
    """
    url = f"{API_BASE}/gds/communities/stocks"
    
    response = requests.get(url)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"‚ùå Error: {response.status_code}")
        print("Note: This requires Neo4j GDS plugin. Error may indicate GDS is not installed.")
        return None

# Get stock communities (requires GDS plugin)
# Uncomment to run if GDS is installed
# communities = get_stock_communities()
# if communities:
#     print(f"üèòÔ∏è Stock Communities ({communities['algorithm']})\n")
#     df_comm = pd.DataFrame(communities['stocks'])
#     
#     # Group by community
#     for comm_id in sorted(df_comm['communityId'].unique()):
#         stocks_in_comm = df_comm[df_comm['communityId'] == comm_id]['ticker'].tolist()
#         print(f"Community {comm_id}: {', '.join(stocks_in_comm)}")

## 5Ô∏è‚É£ Comprehensive Analysis Dashboard

Let's create a comprehensive dashboard showing multiple metrics for a stock!

In [None]:
def create_stock_dashboard(stock: str, start_date: str = "2021-09-30", end_date: str = "2022-09-30"):
    """
    Create a comprehensive dashboard for a stock showing:
    - Correlation analysis
    - Prediction
    - Trending status
    - Influencers
    """
    print(f"\n{'='*60}")
    print(f"üìä COMPREHENSIVE DASHBOARD: {stock}")
    print(f"{'='*60}\n")
    
    # 1. Correlation
    print("1Ô∏è‚É£ SENTIMENT-PRICE CORRELATION")
    print("-" * 60)
    corr = get_sentiment_price_correlation(stock, start_date, end_date)
    if corr:
        print(f"   Correlation: {corr.get('correlation_coefficient', 'N/A')}")
        print(f"   Data Points: {corr.get('data_points', 0)}")
        print(f"   {corr.get('interpretation', 'N/A')}\n")
    
    # 2. Prediction
    print("2Ô∏è‚É£ SENTIMENT-BASED PREDICTION")
    print("-" * 60)
    pred = get_sentiment_prediction(stock, 7)
    if pred:
        print(f"   Prediction: {pred['prediction'].upper()}")
        print(f"   Confidence: {pred['confidence']:.1%}")
        print(f"   Avg Sentiment: {pred.get('avg_sentiment', 'N/A')}")
        print(f"   Tweet Volume: {pred.get('tweet_volume', 0)}\n")
    
    # 3. Trending status
    print("3Ô∏è‚É£ TRENDING STATUS")
    print("-" * 60)
    trending = get_trending_stocks("daily", 50)
    if trending:
        df_trend = pd.DataFrame(trending['trending_stocks'])
        stock_trend = df_trend[df_trend['ticker'] == stock]
        if not stock_trend.empty:
            row = stock_trend.iloc[0]
            rank = df_trend.index[df_trend['ticker'] == stock].tolist()[0] + 1
            print(f"   Rank: #{rank} out of {len(df_trend)} trending stocks")
            print(f"   Trend Score: {row['trend_score']:.2f}")
            print(f"   Tweet Volume: {row['tweet_volume']}")
            print(f"   Avg Sentiment: {row.get('avg_sentiment', 'N/A')}\n")
        else:
            print(f"   {stock} is not currently trending\n")
    
    # 4. Influencers
    print("4Ô∏è‚É£ TOP INFLUENCERS")
    print("-" * 60)
    inf = get_top_influencers(stock, 5)
    if inf and inf.get('top_influencers'):
        df_inf = pd.DataFrame(inf['top_influencers'])
        print(f"   Top 5 Influencers:")
        for idx, row in df_inf.head(5).iterrows():
            print(f"   {idx+1}. {row['user_id']} (Score: {row['influence_score']:.2f}, "
                  f"Tweets: {row['tweet_count']})")
    else:
        print(f"   No influencer data available (requires User nodes)\n")
    
    print(f"{'='*60}\n")

# Create dashboard for TSLA
# Uncomment to run
# create_stock_dashboard("TSLA", "2021-09-30", "2022-09-30")

## 6Ô∏è‚É£ Advanced Visualizations

### Multi-Stock Comparison
Compare sentiment and price trends across multiple stocks.

In [None]:
def compare_stocks(stocks: List[str], start_date: str = "2021-09-30", end_date: str = "2022-09-30"):
    """
    Compare multiple stocks side by side.
    """
    fig, axes = plt.subplots(len(stocks), 2, figsize=(16, 4*len(stocks)))
    
    if len(stocks) == 1:
        axes = axes.reshape(1, -1)
    
    for idx, stock in enumerate(stocks):
        corr_data = get_sentiment_price_correlation(stock, start_date, end_date)
        
        if corr_data and corr_data.get('daily_data'):
            df = pd.DataFrame(corr_data['daily_data'])
            df['date'] = pd.to_datetime(df['date'])
            
            # Plot 1: Price and Sentiment
            ax1 = axes[idx, 0]
            ax1_twin = ax1.twinx()
            ax1.plot(df['date'], df['close_price'], 'b-', label='Price', linewidth=2)
            ax1_twin.plot(df['date'], df['avg_sentiment'], 'r-', label='Sentiment', linewidth=2, alpha=0.7)
            ax1.set_ylabel('Price ($)', color='b', fontsize=10)
            ax1_twin.set_ylabel('Sentiment', color='r', fontsize=10)
            ax1.set_title(f'{stock} - Price vs Sentiment', fontsize=12, fontweight='bold')
            ax1.grid(True, alpha=0.3)
            ax1.legend(loc='upper left')
            ax1_twin.legend(loc='upper right')
            
            # Plot 2: Correlation scatter
            ax2 = axes[idx, 1]
            ax2.scatter(df['avg_sentiment'], df['close_price'], 
                       s=df['tweet_count']*2, alpha=0.6, c=df['tweet_count'], cmap='viridis')
            ax2.set_xlabel('Avg Sentiment', fontsize=10)
            ax2.set_ylabel('Close Price ($)', fontsize=10)
            corr_coef = corr_data.get('correlation_coefficient', 0)
            ax2.set_title(f'{stock} - Correlation: {corr_coef:.3f}', fontsize=12, fontweight='bold')
            plt.colorbar(ax2.collections[0], ax=ax2, label='Tweets')
            ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# Compare multiple stocks
# Uncomment to run
# compare_stocks(["TSLA", "AAPL", "MSFT"], "2021-09-30", "2022-09-30")

## 7Ô∏è‚É£ Key Takeaways & Use Cases

### What Makes This Project Cool? üéØ

1. **Unified Graph Model**: Everything (stocks, tweets, users, hashtags) is connected in a Neo4j graph
2. **Real-time Sentiment**: FinBERT provides financial-domain sentiment analysis
3. **Quantitative Insights**: Correlation, predictions, volatility - all backed by data
4. **Graph Analytics**: Network analysis, communities, influence - discover hidden patterns
5. **Scalable Pipeline**: Handles large datasets with chunking and batch processing
6. **Extensible Schema**: Automatically adapts to additional data (Users, Topics, Events)

### Real-World Applications üíº

- **Trading Signals**: Use sentiment predictions to inform trading decisions
- **Risk Management**: Identify volatile stocks driven by social media
- **Market Research**: Understand which stocks are trending and why
- **Influencer Marketing**: Identify key influencers for specific stocks
- **Portfolio Analysis**: Compare sentiment across your portfolio

### Next Steps üöÄ

1. **Ingest More Data**: Add more stocks and time periods
2. **Custom Analysis**: Build your own queries using Neo4j Cypher
3. **Real-time Updates**: Set up streaming to keep data fresh
4. **Advanced ML**: Train custom models on the graph data
5. **Visualization**: Use Neo4j Bloom or other tools for interactive exploration

---

**Happy Analyzing! üìàüìäüê¶**

## üìù Notes

- **API Server**: Make sure the FastAPI server is running (`make run` or `uvicorn app.main:app`)
- **Neo4j**: Ensure Neo4j is running (`make up` or Docker)
- **Data**: The pipeline uses CSV files in `data/Stock Tweets Sentiment Analysis/`
- **GDS Plugin**: Some features require Neo4j Graph Data Science plugin
- **Hugging Face**: Sentiment analysis may require HF_TOKEN in `.env` for rate limits

### Quick Start Commands

```bash
# Start Neo4j
make up

# Run API server
make run

# Ingest data (via API or curl)
curl -X POST "http://localhost:8000/api/pipeline/dataset_to_graph" \
  -H "Content-Type: application/json" \
  -d '{"stock": "TSLA", "start_date": "2021-09-30", "end_date": "2022-09-30"}'
```