# Mirror Correlation Analysis

This notebook measures the behavioral alignment between casual and member riders at identified anchor stations. By calculating the Pearson correlation of their hourly ride distributions, we can confirm which stations exhibit a 'strong mirror' effect, indicating that casual riders at these locations share the same habitual patterns as commuters.

## 1. Setup & Configuration

Define the paths for the input and output data.

In [1]:
import pandas as pd
from pathlib import Path


DATA_DIR = Path("../data/processed")

## 2. Correlation Analysis Logic

The function below perform the following steps:
1. **Data Loading**: Loads trip data and derives the 'hour' feature.
2. **Anchor Selection**: Filters for 'Confirmed Behavioral Anchors' from previous segmentation.
3. **Distribution Calculation**: Computes normalized hourly ride counts (0-23h) for both casuals and members at each anchor station.
4. **Correlation Measurement**: Uses Pearson Correlation to quantify the similarity between the two distributions.

In [2]:
def run_mirror_correlation():
 
    master_path = DATA_DIR / "fact_trips.csv"
    anchor_path = DATA_DIR / "station_behavior_segments.csv"
    output_path = DATA_DIR / "mirror_correlation_results.csv"
    
   
    if not master_path.exists() or not anchor_path.exists():
        print(f"❌ Error: Required files missing in {DATA_DIR}")
        return

    print("Starting Behavioral Mirroring Analysis...")
    
 
    df = pd.read_csv(master_path, usecols=['start_station_name', 'started_at', 'member_casual'])
    
    print("Converting timestamps and deriving hour...")
    df['started_at'] = pd.to_datetime(df['started_at'])
    df['hour'] = df['started_at'].dt.hour
    
  
    anchors = pd.read_csv(anchor_path)
    elite_anchors = anchors[anchors['final_status'] == 'Confirmed Behavioral Anchor']['start_station_name'].tolist()
    
    if not elite_anchors:
        print("⚠️ No 'Confirmed Behavioral Anchors' found. Check your segmentation thresholds.")
        return

    results = []
    for station in elite_anchors:
        subset = df[df['start_station_name'] == station]
        
        
        casual_dist = subset[subset['member_casual'] == 'casual']['hour'].value_counts(normalize=True).sort_index()
        member_dist = subset[subset['member_casual'] == 'member']['hour'].value_counts(normalize=True).sort_index()
        
      
        full_index = pd.Index(range(24))
        casual_dist = casual_dist.reindex(full_index, fill_value=0)
        member_dist = member_dist.reindex(full_index, fill_value=0)
        
      
        correlation = casual_dist.corr(member_dist)
        
        results.append({
            "Station": station,
            "Mirror_Correlation": round(correlation, 4),
            "Verdict": "Strong Mirror" if correlation >= 0.85 else "Weak Alignment"
        })

  
    mirror_df = pd.DataFrame(results).sort_values(by="Mirror_Correlation", ascending=False)
    mirror_df.to_csv(output_path, index=False)
    
    print("-" * 50)
    print("ELITE ANCHOR CORRELATION RESULTS:")
    print(mirror_df.to_string(index=False))
    print(f"\n✅ SUCCESS: Mirror analysis saved to {output_path}")

## 3. Execution

Execute the mirror correlation analysis.

In [3]:
if __name__ == "__main__":
    run_mirror_correlation()

Starting Behavioral Mirroring Analysis...
Converting timestamps and deriving hour...
--------------------------------------------------
ELITE ANCHOR CORRELATION RESULTS:
                         Station  Mirror_Correlation        Verdict
Wacker Dr & Washington St Corral              0.9881  Strong Mirror
    Clinton St & Washington Blvd              0.9068  Strong Mirror
       Artesian Ave & Hubbard St              0.7820 Weak Alignment
                       NAVY PIER              0.7257 Weak Alignment

✅ SUCCESS: Mirror analysis saved to ..\data\processed\mirror_correlation_results.csv
