# Crisis Network Analysis - Quick Start Guide

This notebook demonstrates the basic usage of the Crisis Network Analysis framework.

## Setup

First, ensure you have installed all dependencies:
```bash
pip install -r requirements.txt
```

In [None]:
# Import required libraries
import sys
from pathlib import Path

# Add parent directory to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✅ Libraries imported successfully!")

## 1. Data Collection

Collect data from Reddit using the WorkingRedditCollector.

In [None]:
from collection.working_reddit_collector import WorkingRedditCollector
from dotenv import load_dotenv
import os

# Load API credentials
load_dotenv(project_root / 'config' / 'api_keys.env')

# Initialize collector
collector = WorkingRedditCollector(
    client_id=os.getenv('REDDIT_CLIENT_ID'),
    client_secret=os.getenv('REDDIT_CLIENT_SECRET'),
    user_agent=os.getenv('REDDIT_USER_AGENT')
)

print("✅ Reddit collector initialized!")

In [None]:
# Collect sample data
df = collector.collect_posts(
    subreddit='LosAngeles',
    limit=50,
    time_filter='week'
)

print(f"Collected {len(df)} posts")
df.head()

## 2. Data Preprocessing

Clean and validate the collected data.

In [None]:
from preprocessing.data_cleaner import DataCleaner
from preprocessing.quality_validator import QualityValidator

# Clean the data
cleaner = DataCleaner()
df_clean = cleaner.clean_dataset(df)

# Get cleaning report
report = cleaner.get_cleaning_report()
print(f"\n📊 Cleaning Report:")
print(f"Initial rows: {report['initial_rows']}")
print(f"Final rows: {report['final_rows']}")
print(f"Retention rate: {report['retention_rate']:.1f}%")
print(f"Duplicates removed: {report['duplicates_removed']}")

In [None]:
# Validate data quality
validator = QualityValidator()
validation_results = validator.validate_dataset(df_clean)

print(f"\n✅ Overall Quality Score: {validation_results['overall_score']:.2f}/100")
print(f"\nCompleteness Score: {validation_results['completeness']['score']:.2f}/100")
print(f"Consistency Score: {validation_results['consistency']['score']:.2f}/100")
print(f"Content Quality Score: {validation_results['content_quality']['quality_score']:.2f}/100")

## 3. Basic Data Exploration

In [None]:
# Plot score distribution
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(df_clean['score'], bins=20, edgecolor='black')
plt.xlabel('Post Score')
plt.ylabel('Frequency')
plt.title('Distribution of Post Scores')

plt.subplot(1, 2, 2)
plt.hist(df_clean['num_comments'], bins=20, edgecolor='black', color='orange')
plt.xlabel('Number of Comments')
plt.ylabel('Frequency')
plt.title('Distribution of Comment Counts')

plt.tight_layout()
plt.show()

In [None]:
# Top authors by post count
top_authors = df_clean['author'].value_counts().head(10)

plt.figure(figsize=(10, 6))
top_authors.plot(kind='barh')
plt.xlabel('Number of Posts')
plt.ylabel('Author')
plt.title('Top 10 Most Active Authors')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

## 4. Network Analysis (if you have enough data)

Build and analyze interaction networks.

In [None]:
# Save the cleaned data for network analysis
output_path = project_root / 'data' / 'processed' / 'sample_data.csv'
df_clean.to_csv(output_path, index=False)
print(f"✅ Cleaned data saved to {output_path}")

In [None]:
# Note: Network analysis requires more data
# Uncomment when you have collected sufficient data

# from networks.crisis_network_analyzer import CrisisNetworkAnalyzer

# analyzer = CrisisNetworkAnalyzer(str(output_path))
# networks = analyzer.build_multi_layer_networks()

# for name, network in networks.items():
#     print(f"{name}: {network.number_of_nodes()} nodes, {network.number_of_edges()} edges")

## 5. Next Steps

1. **Collect more data**: Use different subreddits and time ranges
2. **Search for crisis keywords**: Use `collector.search_posts()` with crisis-specific terms
3. **Build networks**: Analyze user interactions and information flow
4. **Run LIWC analysis**: Examine cognitive and emotional processes
5. **Compare crises**: Analyze patterns across different crisis events

For more details, see:
- `docs/api_reference.md` - Complete API documentation
- `docs/methodology.md` - Research methodology
- `README.md` - Project overview