# ðŸ§  Behavioral Refinement

In this step, we're sharpening our focus. Now that we have our initial habitual metrics, we'll apply more stringent classification thresholds to identify the 'Strong Mirrored' stations. We'll also normalize our scores to provide a standardized view of behavioral patterns across the entire network.

### 1. Setup and Data Loading
We load our habitual metrics and prepare for the refinement process. This file is the output from our previous habitual analysis.

In [1]:
import pandas as pd
from pathlib import Path

In [2]:
DATA_DIR = Path("../data/processed")
input_path = DATA_DIR / "habitual_metrics.csv"
output_path = DATA_DIR / "refined_behavioral_scores.csv"

if not input_path.exists():
    raise FileNotFoundError("\u274c habitual_metrics.csv not found. Run habitual_analysis first.")

df = pd.read_csv(input_path)
print(f"Loaded {len(df):,} rows from habitual_metrics.csv")

Loaded 3,489 rows from habitual_metrics.csv


### 2. Tightening the Mirror Classification
To ensure we're targeting the most high-potential stations, we're 'raising the bar'. By increasing our routine score (RS) thresholds, we filter out stations that exhibit more leisure-heavy or incidental usage, focusing on those that truly mirror member-like commuter behavior.

In [3]:
def classify_mirror(rs):
    if rs >= 0.50:
        return "Strong Mirror"
    elif rs >= 0.40:
        return "Moderate Mirror"
    elif rs >= 0.30:
        return "Weak Mirror"
    else:
        return "Reject"

df['mirror_verdict'] = df['routine_score'].apply(classify_mirror)

### 3. Score Normalization and Export
Lastly, we apply Min-Max scaling to our routine scores. This provides a clear, 0-to-1 scale that makes it easier to compare stations and prioritize our marketing efforts. The final refined dataset is saved for visualization and final reporting.

In [4]:
df['normalized_RS'] = (
    (df['routine_score'] - df['routine_score'].min()) /
    (df['routine_score'].max() - df['routine_score'].min())
)

df = df.sort_values(by='routine_score', ascending=False)
df.to_csv(output_path, index=False)

print("-" * 50)
print(f"\u2705 SUCCESS: Refined scores saved to {output_path}")
print("\nNew Marketing Breakdown:")
print(df['mirror_verdict'].value_counts())

--------------------------------------------------
âœ… SUCCESS: Refined scores saved to ..\data\processed\refined_behavioral_scores.csv

New Marketing Breakdown:
mirror_verdict
Weak Mirror        2232
Moderate Mirror     631
Reject              578
Strong Mirror        48
Name: count, dtype: int64
