# Habitual Analysis: Identifying Commuter-Like Casuals

This notebook analyzes casual ride patterns to identify habits that mimic member behavior. It specifically looks at rush-hour concentration ($C_h$) and mid-week focus ($C_d$) to score stations based on their 'routine' strength.

## 1. Setup & Configuration

Define the target data directory and the specific hours designated as the 'Rush Window' (Morning and Evening peaks).

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path


DATA_DIR = Path("../data/processed")
RUSH_WINDOW = [7, 8, 9, 17, 18, 19]

## 2. Habitual Metrics Model

The core analysis involves:
1. **Volume Filtering**: Focusing on top 25% active stations to ensure statistical relevance.
2. **Hourly Consistency ($C_h$)**: Calculating the proportion of rides occurring during rush hours.
3. **Mid-week Focus ($C_d$)**: Calculating the frequency of rides on Tuesday, Wednesday, and Thursday.
4. **Final Scoring**: A weighted score (60% $C_h$ + 40% $C_d$) to determine the overall Routine Score.

In [2]:
def run_habitual_analysis():
    input_path = DATA_DIR / "fact_trips.csv"
    if not input_path.exists():
        print("❌ Error: fact_trips.csv not found. Run pipeline.py first.")
        return

    
    df = pd.read_csv(input_path, usecols=['start_station_name', 'started_at', 'member_casual', 'is_commute'])
    
    
    df = df[df['member_casual'] == 'casual'].copy()
    df['started_at'] = pd.to_datetime(df['started_at'])
    df['month'] = df['started_at'].dt.month_name()
    df['hour'] = df['started_at'].dt.hour
    df['day_name'] = df['started_at'].dt.day_name()

    print(f"Analyzing habitual patterns for {len(df):,} casual rides...")

   
    station_monthly_vol = df.groupby(['start_station_name', 'month']).size().reset_index(name='vol')
    vol_threshold = station_monthly_vol['vol'].quantile(0.75) 
    valid_stations = station_monthly_vol[station_monthly_vol['vol'] >= vol_threshold]
    
    
    df['in_rush'] = df['hour'].isin(RUSH_WINDOW).astype(int)
    ch_scores = df.groupby(['start_station_name', 'month'])['in_rush'].mean().reset_index(name='Ch')

    
    midweek_days = ['Tuesday', 'Wednesday', 'Thursday']
    df['is_midweek'] = df['day_name'].isin(midweek_days).astype(int)
    cd_scores = df.groupby(['start_station_name', 'month'])['is_midweek'].mean().reset_index(name='Cd')

    
    results = valid_stations.merge(ch_scores, on=['start_station_name', 'month'])
    results = results.merge(cd_scores, on=['start_station_name', 'month'])
    results['routine_score'] = (results['Ch'] * 0.6) + (results['Cd'] * 0.4)
    
    
    results['tier'] = pd.cut(
        results['routine_score'], 
        bins=[0, 0.25, 0.45, 1.0], 
        labels=['Low', 'Emerging', 'Strong']
    )

    output_path = DATA_DIR / "habitual_metrics.csv"
    results.sort_values('routine_score', ascending=False).to_csv(output_path, index=False)
    print(f"✅ SUCCESS: Habitual metrics saved to {output_path}")

## 3. Execution

Execute the habitual analysis process.

In [3]:
if __name__ == "__main__":
    run_habitual_analysis()

Analyzing habitual patterns for 1,568,655 casual rides...
✅ SUCCESS: Habitual metrics saved to ..\data\processed\habitual_metrics.csv
