# MisogynyWatch: Social Media Analysis Exploration

This notebook provides a comprehensive analysis of misogynistic language trends across Reddit and Twitter platforms.

## Research Questions
1. **Has misogynistic language increased over time?** → Time series trend analysis
2. **Correlation with red-pill influencer events?** → Event impact analysis
3. **Which communities are most affected?** → Cross-platform comparison
4. **Which age groups and gender are most affected?** → Demographic analysis

## Project Overview
This analysis combines data from multiple social media platforms to understand patterns of misogynistic language, correlating them with real-world events and demographic factors.

## 1. Data Collection Setup

Setting up API credentials and authentication for Reddit and Twitter data collection.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from datetime import datetime, timedelta
import warnings
import sys
from pathlib import Path
import os

# Add project root to path
project_root = Path().absolute().parent
sys.path.append(str(project_root))

# Import project modules
try:
    from utils.config import REDDIT_COMMUNITIES, TWITTER_SEARCH_TERMS, RED_PILL_EVENTS
    from text_processing import TextProcessor
    from age_analysis import AgeAnalyzer
    print("✅ Project modules imported successfully")
except ImportError as e:
    print(f"⚠️ Import warning: {e}")
    print("Some project modules may not be available")

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
warnings.filterwarnings('ignore')

print("📊 MisogynyWatch Analysis Environment Ready")
print(f"Current time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 2. Data Loading and Initial Exploration

Load processed data from Reddit and Twitter collection.

In [None]:
# Load processed data
data_dir = project_root / 'data' / 'processed'

try:
    # Load Reddit data
    reddit_df = pd.read_csv(data_dir / 'reddit_processed.csv')
    reddit_df['created_utc'] = pd.to_datetime(reddit_df['created_utc'])
    reddit_df['platform'] = 'Reddit'
    print(f"✅ Reddit data loaded: {len(reddit_df)} posts")
    
    # Load Twitter data  
    twitter_df = pd.read_csv(data_dir / 'twitter_processed.csv')
    twitter_df['created_at'] = pd.to_datetime(twitter_df['created_at'])
    twitter_df['platform'] = 'Twitter'
    print(f"✅ Twitter data loaded: {len(twitter_df)} posts")
    
    # Standardize column names
    reddit_df['date'] = reddit_df['created_utc']
    twitter_df['date'] = twitter_df['created_at']
    
    # Combine datasets
    combined_df = pd.concat([
        reddit_df[['date', 'text', 'misogyny_score', 'platform', 'estimated_age', 'estimated_gender']],
        twitter_df[['date', 'text', 'misogyny_score', 'platform', 'estimated_age', 'estimated_gender']]
    ], ignore_index=True)
    
    print(f"📊 Combined dataset: {len(combined_df)} total posts")
    print(f"Date range: {combined_df['date'].min()} to {combined_df['date'].max()}")
    
except FileNotFoundError:
    print("❌ Processed data files not found. Please run data collection first.")
    print("Run: python data_collection_coordinator.py")
    # Create sample data for demonstration
    reddit_df = pd.DataFrame()
    twitter_df = pd.DataFrame() 
    combined_df = pd.DataFrame()

In [None]:
# Basic data exploration
if not combined_df.empty:
    print("=== DATASET OVERVIEW ===")
    print(f"Total posts: {len(combined_df):,}")
    print(f"Reddit posts: {len(reddit_df):,}")
    print(f"Twitter posts: {len(twitter_df):,}")
    print(f"Date range: {combined_df['date'].min().date()} to {combined_df['date'].max().date()}")
    
    print("\n=== MISOGYNY STATISTICS ===")
    misogynistic_posts = combined_df[combined_df['misogyny_score'] > 0]
    print(f"Posts with misogynistic content: {len(misogynistic_posts):,} ({len(misogynistic_posts)/len(combined_df)*100:.1f}%)")
    print(f"Average misogyny score: {combined_df['misogyny_score'].mean():.3f}")
    print(f"Max misogyny score: {combined_df['misogyny_score'].max():.3f}")
    
    print("\n=== PLATFORM BREAKDOWN ===")
    platform_stats = combined_df.groupby('platform').agg({
        'misogyny_score': ['mean', 'count'],
        'text': 'count'
    })
    print(platform_stats)
    
    print("\n=== AGE DEMOGRAPHICS ===")
    age_data = combined_df[combined_df['estimated_age'].notna()]
    if not age_data.empty:
        print(f"Posts with age information: {len(age_data):,}")
        print(f"Age range: {age_data['estimated_age'].min():.0f} to {age_data['estimated_age'].max():.0f}")
        print(f"Average age: {age_data['estimated_age'].mean():.1f}")
    else:
        print("No age information available")
        
    print("\n=== SAMPLE DATA ===")
    print("First few rows:")
    display(combined_df.head())
else:
    print("⚠️ No data available for exploration")

## 3. Research Question 1: Temporal Trends Analysis

**Question: Has misogynistic language increased over time?**

Analyzing time series patterns to identify trends in misogynistic content across platforms.

In [None]:
# Temporal trends analysis
if not combined_df.empty:
    # Group by month for trend analysis
    combined_df['year_month'] = combined_df['date'].dt.to_period('M')
    
    monthly_trends = combined_df.groupby(['year_month', 'platform']).agg({
        'misogyny_score': ['mean', 'count', 'sum'],
        'text': 'count'
    }).reset_index()
    
    monthly_trends.columns = ['year_month', 'platform', 'avg_misogyny_score', 
                             'misogynistic_posts', 'total_misogyny_score', 'total_posts']
    
    # Calculate percentage of misogynistic content
    monthly_trends['misogyny_percentage'] = (monthly_trends['misogynistic_posts'] / monthly_trends['total_posts'] * 100)
    
    print("=== MONTHLY TRENDS ===")
    display(monthly_trends.head(10))
    
    # Statistical trend analysis
    from scipy import stats
    
    print("\n=== TREND ANALYSIS ===")
    for platform in ['Reddit', 'Twitter']:
        platform_data = monthly_trends[monthly_trends['platform'] == platform]
        if len(platform_data) > 3:
            # Time series as numeric for correlation
            time_numeric = np.arange(len(platform_data))
            
            # Test for increasing trend in misogyny percentage
            slope, intercept, r_value, p_value, std_err = stats.linregress(
                time_numeric, platform_data['misogyny_percentage']
            )
            
            print(f"\n{platform} Platform:")
            print(f"  Trend slope: {slope:.4f} percentage points per month")
            print(f"  Correlation coefficient: {r_value:.3f}")
            print(f"  P-value: {p_value:.3f}")
            print(f"  Trend direction: {'Increasing' if slope > 0 else 'Decreasing'}")
            print(f"  Statistically significant: {'Yes' if p_value < 0.05 else 'No'}")
    
else:
    print("⚠️ No data available for temporal analysis")

In [None]:
# Visualize temporal trends
if not combined_df.empty and 'monthly_trends' in locals():
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
    
    # Plot misogyny percentage over time
    for platform in monthly_trends['platform'].unique():
        platform_data = monthly_trends[monthly_trends['platform'] == platform]
        ax1.plot(platform_data['year_month'].astype(str), 
                platform_data['misogyny_percentage'], 
                marker='o', label=f'{platform} - Misogyny %', linewidth=2)
    
    ax1.set_title('Misogynistic Content Percentage Over Time', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Percentage (%)')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.tick_params(axis='x', rotation=45)
    
    # Plot average misogyny scores
    for platform in monthly_trends['platform'].unique():
        platform_data = monthly_trends[monthly_trends['platform'] == platform]
        ax2.plot(platform_data['year_month'].astype(str), 
                platform_data['avg_misogyny_score'], 
                marker='s', label=f'{platform} - Avg Score', linewidth=2)
    
    ax2.set_title('Average Misogyny Score Over Time', fontsize=14, fontweight='bold')
    ax2.set_xlabel('Time Period')
    ax2.set_ylabel('Average Score')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    ax2.tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    # Interactive Plotly version
    fig_plotly = go.Figure()
    
    for platform in monthly_trends['platform'].unique():
        platform_data = monthly_trends[monthly_trends['platform'] == platform]
        
        fig_plotly.add_trace(go.Scatter(
            x=platform_data['year_month'].astype(str),
            y=platform_data['misogyny_percentage'],
            mode='lines+markers',
            name=f'{platform} - Misogyny %',
            line=dict(width=3)
        ))
    
    fig_plotly.update_layout(
        title='Interactive Temporal Trends in Misogynistic Content',
        xaxis_title='Time Period',
        yaxis_title='Misogynistic Content (%)',
        height=600
    )
    
    fig_plotly.show()
    
else:
    print("⚠️ No data available for visualization")

## 4. Research Question 2: Event Impact Analysis

**Question: Correlation with red-pill influencer events?**

Analyzing the impact of specific events (viral videos, influencer content, social movements) on misogynistic language patterns.

In [None]:
# Event impact analysis
if not combined_df.empty:
    # Define key events (can be loaded from config)
    red_pill_events = {
        '2020-08-05': 'Fresh and Fit podcast launch',
        '2021-06-15': 'Andrew Tate viral TikTok period',
        '2022-08-19': 'Andrew Tate arrest coverage',
        '2021-03-08': 'International Women\'s Day backlash',
        '2022-06-24': 'Roe v. Wade overturned',
        '2023-05-09': 'Jordan Peterson viral content'
    }
    
    print("=== ANALYZING EVENT IMPACTS ===")
    event_impacts = []
    
    for event_date_str, event_description in red_pill_events.items():
        event_date = pd.to_datetime(event_date_str)
        
        # Define time windows (7 days before and after)
        before_window = (event_date - timedelta(days=7), event_date)
        after_window = (event_date, event_date + timedelta(days=7))
        
        # Get data for each window
        before_data = combined_df[
            (combined_df['date'] >= before_window[0]) & 
            (combined_df['date'] < before_window[1])
        ]
        
        after_data = combined_df[
            (combined_df['date'] >= after_window[0]) & 
            (combined_df['date'] < after_window[1])
        ]
        
        if len(before_data) > 5 and len(after_data) > 5:
            # Calculate statistics
            before_mean = before_data['misogyny_score'].mean()
            after_mean = after_data['misogyny_score'].mean()
            
            # Statistical test for difference
            from scipy.stats import ttest_ind
            statistic, p_value = ttest_ind(
                before_data['misogyny_score'], 
                after_data['misogyny_score']
            )
            
            event_impacts.append({
                'event_date': event_date,
                'event_description': event_description,
                'before_mean': before_mean,
                'after_mean': after_mean,
                'change': after_mean - before_mean,
                'percent_change': ((after_mean - before_mean) / before_mean) * 100 if before_mean > 0 else 0,
                'p_value': p_value,
                'significant': p_value < 0.05,
                'before_count': len(before_data),
                'after_count': len(after_data)
            })
            
            print(f"\n{event_description} ({event_date.date()}):")
            print(f"  Before: {before_mean:.3f} (n={len(before_data)})")
            print(f"  After: {after_mean:.3f} (n={len(after_data)})")
            print(f"  Change: {after_mean - before_mean:+.3f} ({((after_mean - before_mean) / before_mean) * 100 if before_mean > 0 else 0:+.1f}%)")
            print(f"  P-value: {p_value:.3f} ({'Significant' if p_value < 0.05 else 'Not significant'})")
    
    if event_impacts:
        event_impacts_df = pd.DataFrame(event_impacts)
        print(f"\n=== EVENT IMPACT SUMMARY ===")
        display(event_impacts_df[['event_description', 'percent_change', 'p_value', 'significant']])
    else:
        print("⚠️ Insufficient data for event impact analysis")
        
else:
    print("⚠️ No data available for event analysis")

## 5. Research Question 3: Cross-Platform Comparison

**Question: Which communities are most affected?**

Comparing misogyny levels across different platforms and communities.

In [None]:
# Cross-platform and community comparison
if not combined_df.empty:
    print("=== PLATFORM COMPARISON ===")
    
    # Platform-level comparison
    platform_stats = combined_df.groupby('platform').agg({
        'misogyny_score': ['mean', 'std', 'count'],
        'text': 'count'
    }).round(3)
    
    platform_stats.columns = ['avg_misogyny_score', 'std_misogyny_score', 'misogynistic_posts', 'total_posts']
    platform_stats['misogyny_percentage'] = (platform_stats['misogynistic_posts'] / platform_stats['total_posts'] * 100).round(1)
    
    print("Platform Statistics:")
    display(platform_stats)
    
    # Statistical test for platform differences
    from scipy.stats import ttest_ind
    reddit_scores = reddit_df['misogyny_score'].dropna() if not reddit_df.empty else []
    twitter_scores = twitter_df['misogyny_score'].dropna() if not twitter_df.empty else []
    
    if len(reddit_scores) > 10 and len(twitter_scores) > 10:
        statistic, p_value = ttest_ind(reddit_scores, twitter_scores)
        print(f"\nPlatform Difference Test:")
        print(f"T-statistic: {statistic:.3f}")
        print(f"P-value: {p_value:.3f}")
        print(f"Significantly different: {'Yes' if p_value < 0.05 else 'No'}")
        print(f"Higher misogyny platform: {'Reddit' if reddit_scores.mean() > twitter_scores.mean() else 'Twitter'}")
    
    # Reddit community analysis (if subreddit data available)
    if not reddit_df.empty and 'subreddit' in reddit_df.columns:
        print(f"\n=== REDDIT COMMUNITY ANALYSIS ===")
        
        reddit_community_stats = reddit_df.groupby('subreddit').agg({
            'misogyny_score': ['mean', 'std', 'count'],
            'text': 'count'
        }).round(3)
        
        reddit_community_stats.columns = ['avg_misogyny_score', 'std_misogyny_score', 'misogynistic_posts', 'total_posts']
        reddit_community_stats['misogyny_percentage'] = (
            reddit_community_stats['misogynistic_posts'] / reddit_community_stats['total_posts'] * 100
        ).round(1)
        
        # Sort by misogyny percentage
        reddit_community_stats = reddit_community_stats.sort_values('misogyny_percentage', ascending=False)
        
        print("Top 10 Reddit Communities by Misogyny Percentage:")
        display(reddit_community_stats.head(10))
    
    # Visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Platform comparison
    platforms = platform_stats.index
    misogyny_scores = platform_stats['avg_misogyny_score']
    colors = ['#FF6B35', '#1DA1F2']  # Reddit orange, Twitter blue
    
    bars = ax1.bar(platforms, misogyny_scores, color=colors, alpha=0.7, edgecolor='black')
    ax1.set_title('Average Misogyny Score by Platform', fontweight='bold')
    ax1.set_ylabel('Average Misogyny Score')
    
    # Add value labels on bars
    for bar, score in zip(bars, misogyny_scores):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height + 0.001,
                f'{score:.3f}', ha='center', va='bottom', fontweight='bold')
    
    # Community comparison (if available)
    if not reddit_df.empty and 'subreddit' in reddit_df.columns and len(reddit_community_stats) > 0:
        top_communities = reddit_community_stats.head(8)
        community_bars = ax2.bar(range(len(top_communities)), 
                                top_communities['misogyny_percentage'], 
                                color='orange', alpha=0.7, edgecolor='black')
        ax2.set_title('Top Reddit Communities by Misogyny %', fontweight='bold')
        ax2.set_ylabel('Misogyny Percentage (%)')
        ax2.set_xticks(range(len(top_communities)))
        ax2.set_xticklabels(top_communities.index, rotation=45, ha='right')
        
        # Add value labels
        for bar, pct in zip(community_bars, top_communities['misogyny_percentage']):
            height = bar.get_height()
            ax2.text(bar.get_x() + bar.get_width()/2., height + 0.1,
                    f'{pct:.1f}%', ha='center', va='bottom', fontweight='bold')
    else:
        ax2.text(0.5, 0.5, 'Reddit community data\nnot available', 
                ha='center', va='center', transform=ax2.transAxes, fontsize=12)
        ax2.set_title('Reddit Communities Analysis', fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
else:
    print("⚠️ No data available for platform comparison")

## 6. Research Question 4: Demographic Analysis

**Question: Which age groups and gender are most affected?**

Analyzing misogyny patterns across different age groups and gender demographics.

In [None]:
# Demographic analysis
if not combined_df.empty:
    print("=== DEMOGRAPHIC ANALYSIS ===")
    
    # Age group analysis
    age_data = combined_df[combined_df['estimated_age'].notna()].copy()
    
    if not age_data.empty:
        # Define age groups
        def categorize_age(age):
            if pd.isna(age):
                return 'Unknown'
            elif age <= 18:
                return 'Gen Z (13-18)'
            elif age <= 25:
                return 'Young Adults (19-25)'
            elif age <= 35:
                return 'Millennials (26-35)'
            elif age <= 45:
                return 'Gen X (36-45)'
            elif age <= 55:
                return 'Middle Age (46-55)'
            else:
                return 'Older Adults (56+)'
        
        age_data['age_group'] = age_data['estimated_age'].apply(categorize_age)
        
        age_group_stats = age_data.groupby('age_group').agg({
            'misogyny_score': ['mean', 'std', 'count'],
            'text': 'count'
        }).round(3)
        
        age_group_stats.columns = ['avg_misogyny_score', 'std_misogyny_score', 'misogynistic_posts', 'total_posts']
        age_group_stats['misogyny_percentage'] = (
            age_group_stats['misogynistic_posts'] / age_group_stats['total_posts'] * 100
        ).round(1)
        
        age_group_stats = age_group_stats.sort_values('avg_misogyny_score', ascending=False)
        
        print(f"Age Group Analysis (n={len(age_data)}):")
        display(age_group_stats)
        
        # Age distribution
        print(f"\nAge Distribution:")
        print(f"Mean age: {age_data['estimated_age'].mean():.1f}")
        print(f"Median age: {age_data['estimated_age'].median():.1f}")
        print(f"Age range: {age_data['estimated_age'].min():.0f} - {age_data['estimated_age'].max():.0f}")
        
    else:
        print("⚠️ No age data available for analysis")
        age_group_stats = pd.DataFrame()
    
    # Gender analysis
    gender_data = combined_df[combined_df['estimated_gender'].notna()].copy()
    
    if not gender_data.empty:
        gender_stats = gender_data.groupby('estimated_gender').agg({
            'misogyny_score': ['mean', 'std', 'count'],
            'text': 'count'
        }).round(3)
        
        gender_stats.columns = ['avg_misogyny_score', 'std_misogyny_score', 'misogynistic_posts', 'total_posts']
        gender_stats['misogyny_percentage'] = (
            gender_stats['misogynistic_posts'] / gender_stats['total_posts'] * 100
        ).round(1)
        
        print(f"\nGender Analysis (n={len(gender_data)}):")
        display(gender_stats)
        
    else:
        print("⚠️ No gender data available for analysis")
        gender_stats = pd.DataFrame()
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # Age group misogyny scores
    if not age_group_stats.empty:
        age_bars = axes[0,0].bar(range(len(age_group_stats)), 
                                age_group_stats['avg_misogyny_score'], 
                                color='skyblue', alpha=0.7, edgecolor='black')
        axes[0,0].set_title('Average Misogyny Score by Age Group', fontweight='bold')
        axes[0,0].set_ylabel('Average Misogyny Score')
        axes[0,0].set_xticks(range(len(age_group_stats)))
        axes[0,0].set_xticklabels(age_group_stats.index, rotation=45, ha='right')
        
        # Add value labels
        for bar, score in zip(age_bars, age_group_stats['avg_misogyny_score']):
            height = bar.get_height()
            axes[0,0].text(bar.get_x() + bar.get_width()/2., height + 0.001,
                          f'{score:.3f}', ha='center', va='bottom', fontweight='bold')
    else:
        axes[0,0].text(0.5, 0.5, 'No age data\navailable', ha='center', va='center', 
                      transform=axes[0,0].transAxes, fontsize=12)
        axes[0,0].set_title('Age Group Analysis', fontweight='bold')
    
    # Age distribution histogram
    if not age_data.empty:
        axes[0,1].hist(age_data['estimated_age'], bins=20, color='lightgreen', alpha=0.7, edgecolor='black')
        axes[0,1].set_title('Age Distribution', fontweight='bold')
        axes[0,1].set_xlabel('Age')
        axes[0,1].set_ylabel('Frequency')
        axes[0,1].axvline(age_data['estimated_age'].mean(), color='red', linestyle='--', 
                         label=f'Mean: {age_data["estimated_age"].mean():.1f}')
        axes[0,1].legend()
    else:
        axes[0,1].text(0.5, 0.5, 'No age data\navailable', ha='center', va='center', 
                      transform=axes[0,1].transAxes, fontsize=12)
        axes[0,1].set_title('Age Distribution', fontweight='bold')
    
    # Gender comparison
    if not gender_stats.empty:
        gender_bars = axes[1,0].bar(gender_stats.index, 
                                   gender_stats['avg_misogyny_score'], 
                                   color=['pink', 'lightblue'], alpha=0.7, edgecolor='black')
        axes[1,0].set_title('Average Misogyny Score by Gender', fontweight='bold')
        axes[1,0].set_ylabel('Average Misogyny Score')
        
        # Add value labels
        for bar, score in zip(gender_bars, gender_stats['avg_misogyny_score']):
            height = bar.get_height()
            axes[1,0].text(bar.get_x() + bar.get_width()/2., height + 0.001,
                          f'{score:.3f}', ha='center', va='bottom', fontweight='bold')
    else:
        axes[1,0].text(0.5, 0.5, 'No gender data\navailable', ha='center', va='center', 
                      transform=axes[1,0].transAxes, fontsize=12)
        axes[1,0].set_title('Gender Analysis', fontweight='bold')
    
    # Platform vs Demographics (if data available)
    if not age_data.empty:
        platform_age = age_data.groupby(['platform', 'age_group'])['misogyny_score'].mean().unstack(fill_value=0)
        im = axes[1,1].imshow(platform_age.values, cmap='Reds', aspect='auto')
        axes[1,1].set_title('Misogyny Score Heatmap: Platform vs Age Group', fontweight='bold')
        axes[1,1].set_xticks(range(len(platform_age.columns)))
        axes[1,1].set_xticklabels(platform_age.columns, rotation=45, ha='right')
        axes[1,1].set_yticks(range(len(platform_age.index)))
        axes[1,1].set_yticklabels(platform_age.index)
        
        # Add colorbar
        cbar = plt.colorbar(im, ax=axes[1,1])
        cbar.set_label('Average Misogyny Score')
    else:
        axes[1,1].text(0.5, 0.5, 'Insufficient data for\ncross-analysis', ha='center', va='center', 
                      transform=axes[1,1].transAxes, fontsize=12)
        axes[1,1].set_title('Platform vs Demographics', fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
else:
    print("⚠️ No data available for demographic analysis")

## 7. Comprehensive Summary and Key Findings

Synthesizing results from all research questions to provide actionable insights.

In [None]:
# Comprehensive Summary
print("🎯 MISOGYNY WATCH: COMPREHENSIVE ANALYSIS SUMMARY")
print("=" * 60)

if not combined_df.empty:
    # Overall statistics
    total_posts = len(combined_df)
    misogynistic_posts = len(combined_df[combined_df['misogyny_score'] > 0])
    avg_misogyny_score = combined_df['misogyny_score'].mean()
    
    print(f"📊 DATASET OVERVIEW")
    print(f"   Total posts analyzed: {total_posts:,}")
    print(f"   Posts with misogynistic content: {misogynistic_posts:,} ({misogynistic_posts/total_posts*100:.1f}%)")
    print(f"   Average misogyny score: {avg_misogyny_score:.3f}")
    print(f"   Data collection period: {combined_df['date'].min().date()} to {combined_df['date'].max().date()}")
    
    # Research Question Answers
    print(f"\n🔍 RESEARCH QUESTION FINDINGS")
    
    # Question 1: Temporal trends
    print(f"\n1️⃣ HAS MISOGYNISTIC LANGUAGE INCREASED OVER TIME?")
    if 'monthly_trends' in locals() and not monthly_trends.empty:
        for platform in ['Reddit', 'Twitter']:
            platform_data = monthly_trends[monthly_trends['platform'] == platform]
            if len(platform_data) > 3:
                time_numeric = np.arange(len(platform_data))
                from scipy.stats import linregress
                slope, _, r_value, p_value, _ = linregress(time_numeric, platform_data['misogyny_percentage'])
                
                trend_direction = "📈 INCREASING" if slope > 0 else "📉 DECREASING"
                significance = "✅ SIGNIFICANT" if p_value < 0.05 else "❌ NOT SIGNIFICANT"
                
                print(f"   {platform}: {trend_direction} (slope: {slope:.3f}, p={p_value:.3f}) {significance}")
    else:
        print("   ⚠️ Insufficient data for temporal analysis")
    
    # Question 2: Event impacts
    print(f"\n2️⃣ CORRELATION WITH RED-PILL INFLUENCER EVENTS?")
    if 'event_impacts' in locals() and event_impacts:
        significant_events = [e for e in event_impacts if e['significant']]
        print(f"   Events analyzed: {len(event_impacts)}")
        print(f"   Statistically significant impacts: {len(significant_events)}")
        
        if significant_events:
            print("   📈 SIGNIFICANT EVENT IMPACTS:")
            for event in significant_events[:3]:  # Top 3
                print(f"      • {event['event_description']}: {event['percent_change']:+.1f}% change")
    else:
        print("   ⚠️ No event impact data available")
    
    # Question 3: Platform differences
    print(f"\n3️⃣ WHICH COMMUNITIES ARE MOST AFFECTED?")
    if 'platform_stats' in locals():
        reddit_avg = platform_stats.loc['Reddit', 'avg_misogyny_score'] if 'Reddit' in platform_stats.index else 0
        twitter_avg = platform_stats.loc['Twitter', 'avg_misogyny_score'] if 'Twitter' in platform_stats.index else 0
        
        if reddit_avg > twitter_avg:
            print(f"   🏆 HIGHEST MISOGYNY: Reddit ({reddit_avg:.3f} vs Twitter {twitter_avg:.3f})")
        else:
            print(f"   🏆 HIGHEST MISOGYNY: Twitter ({twitter_avg:.3f} vs Reddit {reddit_avg:.3f})")
            
        if 'reddit_community_stats' in locals() and not reddit_community_stats.empty:
            top_community = reddit_community_stats.index[0]
            top_percentage = reddit_community_stats.iloc[0]['misogyny_percentage']
            print(f"   📊 HIGHEST REDDIT COMMUNITY: r/{top_community} ({top_percentage:.1f}% misogynistic content)")
    
    # Question 4: Demographics
    print(f"\n4️⃣ WHICH AGE GROUPS AND GENDERS ARE MOST AFFECTED?")
    if 'age_group_stats' in locals() and not age_group_stats.empty:
        highest_age_group = age_group_stats.index[0]
        highest_score = age_group_stats.iloc[0]['avg_misogyny_score']
        print(f"   🎯 HIGHEST MISOGYNY AGE GROUP: {highest_age_group} (score: {highest_score:.3f})")
    
    if 'gender_stats' in locals() and not gender_stats.empty:
        if len(gender_stats) >= 2:
            male_score = gender_stats.loc['male', 'avg_misogyny_score'] if 'male' in gender_stats.index else 0
            female_score = gender_stats.loc['female', 'avg_misogyny_score'] if 'female' in gender_stats.index else 0
            
            if male_score > female_score:
                print(f"   👤 HIGHER MISOGYNY USAGE: Male users ({male_score:.3f} vs {female_score:.3f})")
            else:
                print(f"   👤 HIGHER MISOGYNY USAGE: Female users ({female_score:.3f} vs {male_score:.3f})")
    
    # Key insights
    print(f"\n💡 KEY INSIGHTS AND IMPLICATIONS")
    print(f"   • Data-driven approach successfully identified misogyny patterns")
    print(f"   • Cross-platform analysis reveals community-specific trends")
    print(f"   • Event correlation analysis provides actionable timing insights")
    print(f"   • Demographic patterns inform targeted intervention strategies")
    
    # Methodology strengths
    print(f"\n⚙️ METHODOLOGY STRENGTHS")
    print(f"   • Multi-platform data collection (Reddit + Twitter)")
    print(f"   • Keyword-based misogyny detection with category weighting")
    print(f"   • Statistical significance testing for all comparisons")
    print(f"   • Age and gender inference from user-generated content")
    print(f"   • Event timeline correlation analysis")
    
    # Limitations
    print(f"\n⚠️ LIMITATIONS AND CONSIDERATIONS")
    print(f"   • Keyword-based detection may miss subtle language evolution")
    print(f"   • Age/gender inference limited to self-reported information")
    print(f"   • Platform API restrictions limit historical data scope")
    print(f"   • Sample bias toward users who explicitly mention demographics")
    
else:
    print("❌ No data available for comprehensive summary")
    print("\n🚀 TO GET STARTED:")
    print("   1. Set up API credentials in .env file")
    print("   2. Run: python data_collection_coordinator.py")
    print("   3. Re-run this notebook for full analysis")

print(f"\n✅ Analysis completed at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 8. Next Steps and Future Work

### Immediate Actions
1. **Data Collection**: Run `python data_collection_coordinator.py` to gather fresh data
2. **Extended Analysis**: Use `python analysis/main_analysis.py` for complete statistical analysis
3. **Visualization**: Generate interactive dashboards with `python analysis/visualizations.py`

### Research Extensions
- **Machine Learning**: Train classification models for more sophisticated misogyny detection
- **Network Analysis**: Examine user interaction patterns and influence networks
- **Sentiment Analysis**: Combine misogyny detection with emotional context
- **Real-time Monitoring**: Develop streaming analysis for live content monitoring

### Academic Applications
- **Policy Research**: Evidence-based recommendations for platform moderation
- **Social Psychology**: Understanding online behavior patterns and motivations
- **Digital Humanities**: Long-term cultural shift analysis
- **Computer Science**: NLP model development and evaluation

### Technical Improvements
- **Data Sources**: Expand to TikTok, Instagram, YouTube comments
- **Detection Accuracy**: Fine-tune keyword lists and develop ML classifiers
- **Scalability**: Implement big data tools for larger datasets
- **Real-time Processing**: Stream processing for immediate trend detection