# Station Segmentation: Defining the Behavioral Portfolio

This notebook categorizes bike stations based on their habitual density and consistency. By aggregating refined behavioral scores across multiple months, we identify 'Confirmed Behavioral Anchors'—stations where casual riders consistently mimic commuter behavior—and differentiate them from high-potential emerging sites or inconsistent noise.

## 1. Setup & Configuration

Define the input directory for behavioral scores and the output path for segmented results.

In [1]:
import pandas as pd
from pathlib import Path


DATA_DIR = Path("../data/processed")

## 2. Segmentation Framework

The segmentation process involves two primary dimensions:
1. **Density Score**: The percentage of months a station was classified as a 'Strong Mirror'.
2. **Consistency Score**: The mean routine score across all observed months.

Stations are then classified into one of three tiers: **Confirmed Behavioral Anchor**, **High-Potential Emerging**, or **Inconsistent / Noise** based on their density score.

In [2]:
def run_station_segmentation():
    input_path = DATA_DIR / "refined_behavioral_scores.csv"
    output_path = DATA_DIR / "station_behavior_segments.csv"

    if not input_path.exists():
        print(f"❌ Error: {input_path} not found.")
        return

    df = pd.read_csv(input_path)


    density_df = (
        df.groupby("start_station_name")["mirror_verdict"]
        .apply(lambda x: (x == "Strong Mirror").sum() / len(x))
        .reset_index(name="density_score")
    )


    consistency_df = (
        df.groupby("start_station_name")["routine_score"]
        .mean()
        .reset_index(name="consistency_score")
    )

    final_df = density_df.merge(consistency_df, on="start_station_name")

    def classify_station(density):
        if density >= 0.60:  
            return "Confirmed Behavioral Anchor"
        elif density >= 0.30: 
            return "High-Potential Emerging"
        else:
            return "Inconsistent / Noise"

    final_df["final_status"] = final_df["density_score"].apply(classify_station)
    final_df = final_df.sort_values("consistency_score", ascending=False)
    final_df.to_csv(output_path, index=False)

    print("-" * 50)
    print(f"✅ SUCCESS: High-density segments saved to {output_path}")
    print("\nNew Portfolio Distribution:")
    print(final_df["final_status"].value_counts())

## 3. Execution

Execute the station segmentation process.

In [3]:
if __name__ == "__main__":
    run_station_segmentation()

--------------------------------------------------
✅ SUCCESS: High-density segments saved to ..\data\processed\station_behavior_segments.csv

New Portfolio Distribution:
final_status
Inconsistent / Noise           491
High-Potential Emerging          5
Confirmed Behavioral Anchor      4
Name: count, dtype: int64
