# Hurricane Social Media Analysis Pipeline
## Tweet Geolocation and Temporal Rasterization

This notebook processes hurricane-related tweets and generates time-sliced raster datasets for visualization in ArcGIS Pro.

**Pipeline Overview:**
1. Load hurricane tweet data (Francine & Helene)
2. Build time lookup dictionaries
3. Load reference shapefiles (states, counties, cities)
4. Create hierarchical geographic lookups
5. Expand tweets by multi-level geographic matches
6. Create interval counts per time bin
7. Collect ordered time bins
8. Build master grid canvas
9. Process Francine hurricane data
10. Process Helene hurricane data
11. Generate summary statistics

**Output:**
- Incremental rasters (activity per time bin)
- Cumulative rasters (accumulated activity over time)
- GeoTIFF format compatible with ArcGIS Pro

---
## Setup: Import Dependencies

In [2]:
import os
import sys
import warnings
from typing import Tuple

warnings.filterwarnings('ignore')

# Add src to path
notebook_dir = os.path.dirname(os.path.abspath('__file__'))
src_dir = os.path.join(notebook_dir, 'src')
if src_dir not in sys.path:
    sys.path.insert(0, src_dir)

# Import project modules
import config
import data_loader
import geographic_matching
import rasterization

print("✓ All dependencies loaded successfully")
print(f"✓ Project root: {config.LOCAL_PATH}")
print(f"✓ Output directory: {config.OUTPUT_DIR}")

<class 'ModuleNotFoundError'>: No module named 'config'

---
## Configuration Settings

Review and modify these settings if needed:

In [None]:
print("Current Configuration:")
print("=" * 60)
print(f"Target CRS: {config.TARGET_CRS}")
print(f"Cell Size: {config.CELL_SIZE_M} meters")
print(f"Time Bin Width: {config.TIME_BIN_HOURS} hours")
print(f"\nWeights:")
for level, weight in config.WEIGHTS.items():
    print(f"  {level:10s}: {weight}")
print(f"\nFuzzy Matching:")
print(f"  Threshold: {config.FUZZY_THRESHOLD}")
print(f"  Contextual: {config.FUZZY_THRESHOLD_CONTEXTUAL}")

---
## Step 1: Load Hurricane Data

Load tweet GeoJSON files for both hurricanes:

In [None]:
print("\n" + "=" * 80)
print("STEP 1/11 - LOADING HURRICANE DATA")
print("=" * 80)

francine_gdf, helene_gdf = data_loader.load_hurricane_data()

print(f"\n✓ Loaded {len(francine_gdf)} Francine tweets")
print(f"✓ Loaded {len(helene_gdf)} Helene tweets")

---
## Step 2: Build Time Lookups

Create fast timestamp dictionaries for temporal aggregation:

In [None]:
print("\n" + "=" * 80)
print("STEP 2/11 - BUILDING TIME LOOKUPS")
print("=" * 80)

francine_dict, helene_dict = data_loader.create_timestamp_dictionaries(
    francine_gdf, helene_gdf
)

print(f"\n✓ Created timestamp dictionaries for both hurricanes")

---
## Step 3: Load Reference Shapefiles

Load administrative boundary layers:

In [None]:
print("\n" + "=" * 80)
print("STEP 3/11 - LOADING REFERENCE SHAPEFILES")
print("=" * 80)

states_gdf, counties_gdf, cities_gdf = data_loader.load_reference_shapefiles()

print(f"\n✓ Loaded {len(states_gdf)} states")
print(f"✓ Loaded {len(counties_gdf)} counties")
print(f"✓ Loaded {len(cities_gdf)} cities")

---
## Step 4: Create Hierarchical Geographic Lookups

Build fast lookup structures for geographic matching:

In [None]:
print("\n" + "=" * 80)
print("STEP 4/11 - CREATING HIERARCHICAL LOOKUPS")
print("=" * 80)

lookups = geographic_matching.create_hierarchical_lookups(
    states_gdf, counties_gdf, cities_gdf
)

print(f"\n✓ Created hierarchical lookup dictionaries")

---
## Step 5: Expand Tweets by Geographic Matches

Match tweet text to geographic locations at multiple administrative levels:

In [None]:
print("\n" + "=" * 80)
print("STEP 5/11 - EXPANDING TWEETS BY MATCHES")
print("=" * 80)

print("\nProcessing Francine tweets...")
francine_gdf = geographic_matching.expand_tweets_by_matches(
    francine_gdf, lookups, "FRANCINE"
)

print("\nProcessing Helene tweets...")
helene_gdf = geographic_matching.expand_tweets_by_matches(
    helene_gdf, lookups, "HELENE"
)

print(f"\n✓ Tweet expansion complete")

---
## Step 6: Create Interval Counts

Aggregate matches by time bin for rasterization:

In [None]:
print("\n" + "=" * 80)
print("STEP 6/11 - CREATING INTERVAL COUNTS")
print("=" * 80)

francine_interval_counts = geographic_matching.create_interval_counts(francine_gdf)
helene_interval_counts = geographic_matching.create_interval_counts(helene_gdf)

print(f"\n✓ Created interval count aggregations")

---
## Step 7: Collect Time Bins

Get ordered list of time bins for raster generation:

In [None]:
print("\n" + "=" * 80)
print("STEP 7/11 - COLLECTING TIME BINS")
print("=" * 80)

francine_time_bins = data_loader.get_time_bins(francine_gdf)
helene_time_bins = data_loader.get_time_bins(helene_gdf)

print(f"\n✓ Francine time bins: {len(francine_time_bins)}")
print(f"✓ Helene time bins: {len(helene_time_bins)}")

---
## Step 8: Build Master Grid

Create the master raster grid with projected geometries:

In [None]:
print("\n" + "=" * 80)
print("STEP 8/11 - CREATING MASTER GRID")
print("=" * 80)

grid_params = rasterization.create_master_grid(
    francine_gdf, helene_gdf, states_gdf, counties_gdf, cities_gdf
)

print(f"\n✓ Master grid created")
print(f"  Grid size: {grid_params['width']} x {grid_params['height']} cells")
print(f"  Cell size: {grid_params['cell_size']} meters")

---
## Step 9: Ensure Output Directory

Create output directory if it doesn't exist:

In [None]:
print("\n" + "=" * 80)
print("STEP 9/11 - ENSURING OUTPUT DIRECTORY")
print("=" * 80)

os.makedirs(config.OUTPUT_DIR, exist_ok=True)

print(f"\n✓ Output directory ready: {config.OUTPUT_DIR}")

---
## Step 10: Process Hurricane Francine

Generate time-sliced rasters for Hurricane Francine:

**This will create:**
- Incremental rasters (activity per time bin)
- Cumulative rasters (accumulated activity)

In [None]:
print("\n" + "=" * 80)
print("STEP 10/11 - PROCESSING HURRICANE FRANCINE")
print("=" * 80)

francine_output = rasterization.process_hurricane(
    hurricane_name='francine',
    gdf_proj=grid_params['francine_proj'],
    interval_counts=francine_interval_counts,
    time_bins=francine_time_bins,
    timestamp_dict=francine_dict,
    grid_params=grid_params,
)

print(f"\n✓ Francine processing complete")
print(f"  Output: {francine_output}")

---
## Step 11: Process Hurricane Helene

Generate time-sliced rasters for Hurricane Helene:

In [None]:
print("\n" + "=" * 80)
print("STEP 11/11 - PROCESSING HURRICANE HELENE")
print("=" * 80)

helene_output = rasterization.process_hurricane(
    hurricane_name='helene',
    gdf_proj=grid_params['helene_proj'],
    interval_counts=helene_interval_counts,
    time_bins=helene_time_bins,
    timestamp_dict=helene_dict,
    grid_params=grid_params,
)

print(f"\n✓ Helene processing complete")
print(f"  Output: {helene_output}")

---
## Summary and Next Steps

In [None]:
print("\n" + "=" * 80)
print("PIPELINE COMPLETE")
print("=" * 80)

print(f"\nOutput Directories:")
print(f"  Francine: {francine_output}")
print(f"  Helene:   {helene_output}")

print(f"\nRaster Types Created:")
print(f"  - increment:  Tweet activity per time bin")
print(f"  - cumulative: Accumulated tweet activity over time")

print(f"\nNext Steps:")
print(f"  1. Add raster datasets to ArcGIS Pro map")
print(f"  2. Configure symbology (stretch, color ramp)")
print(f"  3. Enable time slider for animation")
print(f"  4. Export animations as needed")

# Count files created
import glob
francine_inc = len(glob.glob(os.path.join(francine_output, 'increment', '*.tif')))
francine_cum = len(glob.glob(os.path.join(francine_output, 'cumulative', '*.tif')))
helene_inc = len(glob.glob(os.path.join(helene_output, 'increment', '*.tif')))
helene_cum = len(glob.glob(os.path.join(helene_output, 'cumulative', '*.tif')))

print(f"\nFiles Created:")
print(f"  Francine: {francine_inc} incremental + {francine_cum} cumulative = {francine_inc + francine_cum} total")
print(f"  Helene:   {helene_inc} incremental + {helene_cum} cumulative = {helene_inc + helene_cum} total")
print(f"  Grand Total: {francine_inc + francine_cum + helene_inc + helene_cum} GeoTIFF files")

---
## Optional: Quick Visualization Check

View a sample raster to verify output:

In [None]:
import rasterio
import matplotlib.pyplot as plt
import numpy as np

# Get first cumulative raster
sample_raster = glob.glob(os.path.join(francine_output, 'cumulative', '*.tif'))[0]

with rasterio.open(sample_raster) as src:
    data = src.read(1)
    
    # Mask zeros for better visualization
    data_masked = np.ma.masked_equal(data, 0)
    
    plt.figure(figsize=(12, 8))
    plt.imshow(data_masked, cmap='YlOrRd', interpolation='nearest')
    plt.colorbar(label='Tweet Activity')
    plt.title(f'Sample Raster: {os.path.basename(sample_raster)}')
    plt.tight_layout()
    plt.show()
    
    print(f"\nRaster Statistics:")
    print(f"  Min (non-zero): {data[data > 0].min():.2f}")
    print(f"  Max: {data.max():.2f}")
    print(f"  Mean (non-zero): {data[data > 0].mean():.2f}")
    print(f"  Non-zero cells: {np.count_nonzero(data):,}")

---
## ArcGIS Pro Integration Notes

### Adding Rasters to Map:
```python
import arcpy

# Get current map
aprx = arcpy.mp.ArcGISProject("CURRENT")
map_obj = aprx.activeMap

# Add all cumulative rasters
raster_folder = r"C:\users\colto\documents\github\tweet_project\rasters_output\francine\cumulative"
for raster_path in glob.glob(os.path.join(raster_folder, '*.tif')):
    map_obj.addDataFromPath(raster_path)
```

### Enable Time:
1. Right-click layer → Properties → Time
2. Enable time
3. Set time field or extract from filename
4. Open Time Slider (Time tab → Time Slider)

### Mosaic Dataset (Alternative):
Run `src/arcgis_mosaic.py` in ArcGIS Pro Python environment to create mosaic datasets with automatic time enablement.