### Cell 1: Setup and Configuration
Purpose: To centralize all settings, file paths, and analysis parameters in a single, easy-to-manage block. This is a best practice that makes the entire notebook easier to read, understand, and update.

In [1]:
# ==============================================================================
# CELL 1: SETUP AND CONFIGURATION
# ==============================================================================
import pandas as pd
import numpy as np

print("--- Cell 1: Initializing Configuration and Loading Libraries ---")

# Central dictionary for all settings, file paths, and constants.
CONFIG = {
    "input_csv_path": "00_cleaned_data.csv",
    "output_csv_path": "01_vehicle_residences_with_status.csv",
    "output_map_path": "01_residences_map.html",
    "residence_params": {
        "start_hour": 0,    # 12:00 AM
        "end_hour": 5,      # Up to 4:59 AM
        "eps_km": 0.15,     # 150m radius for DBSCAN
        "min_samples": 10,  # Min 10 points for a stable residence
        "earth_radius_km": 6371.0088
    }
}

print("Configuration loaded successfully.")

--- Cell 1: Initializing Configuration and Loading Libraries ---
Configuration loaded successfully.


### Cell 2: Data Loading and Pre-processing (Vectorized)
Purpose: To load the data and efficiently perform the first critical analysis step: identifying and flagging stationary vehicles. This refactored version replaces a for loop with a much faster, vectorized groupby().transform() operation.

In [2]:
# ==============================================================================
# CELL 2: DATA LOADING AND PRE-PROCESSING (STATIONARY VEHICLE DETECTION)
# ==============================================================================

print("\n--- Cell 2: Loading Data and Identifying Stationary Vehicles ---")

# Load the data using the path from CONFIG
try:
    df = pd.read_csv(CONFIG["input_csv_path"], parse_dates=['timestamp'])
    print(f"Successfully loaded '{CONFIG['input_csv_path']}' with {len(df)} records.")
except FileNotFoundError:
    print(f"ERROR: File not found at {CONFIG['input_csv_path']}")
    df = pd.DataFrame()

if not df.empty:
    df.dropna(subset=['fixed_lat', 'fixed_long'], inplace=True)
    
    # --- 1. Vectorized Stationary Vehicle Detection ---
    # Create a temporary column representing a unique location bin.
    # This is more robust than relying on exact float matches.
    df['location_bin'] = df['fixed_lat'].round(4).astype(str) + '_' + df['fixed_long'].round(4).astype(str)
    
    # Use transform('nunique') to count unique locations for each VIN and broadcast it back.
    df['unique_location_count'] = df.groupby('vin')['location_bin'].transform('nunique')
    
    # --- 2. Create 'movement_status' Flag and 'df_active' ---
    df['movement_status'] = np.where(
        df['unique_location_count'] == 1,
        "Stationary or GPS Failure",
        "Has Movement"
    )
    
    # Create the DataFrame for our main analysis, containing only moving vehicles
    df_active = df[df['movement_status'] == "Has Movement"].copy()
    
    # Clean up temporary helper column
    df.drop(columns=['location_bin', 'unique_location_count'], inplace=True)
    
    print(f"Flagged {len(df[df['movement_status'] != 'Has Movement'].vin.unique())} vehicles as 'Stationary or GPS Failure'.")
    print(f"Created 'df_active' for analysis with {df_active['vin'].nunique()} moving vehicles.")


--- Cell 2: Loading Data and Identifying Stationary Vehicles ---
Successfully loaded '00_cleaned_data.csv' with 154116 records.
Flagged 145 vehicles as 'Stationary or GPS Failure'.
Created 'df_active' for analysis with 1090 moving vehicles.


### Cell 3: Residence Analysis on Active Vehicles
Purpose: To perform the core analysis. This cell filters for nighttime data and applies the DBSCAN clustering algorithm to each active vehicle to find its residence. The logic is encapsulated in a helper function and applied efficiently.

In [3]:
# ==============================================================================
# CELL 3: RESIDENCE ANALYSIS ON ACTIVE VEHICLES (Corrected)
# ==============================================================================
from sklearn.cluster import DBSCAN

if 'df_active' in locals() and not df_active.empty:
    print("\n--- Cell 3: Running Residence Analysis via DBSCAN ---")

    # --- 1. Filter for Nighttime Hours ---
    cfg = CONFIG["residence_params"]
    df_night = df_active[
        (df_active['timestamp'].dt.hour >= cfg["start_hour"]) & 
        (df_active['timestamp'].dt.hour < cfg["end_hour"])
    ].copy()
    print(f"Found {len(df_night)} nighttime records for active vehicles.")

    # --- 2. Define a function to process each vehicle group ---
    def find_residence_for_group(group):
        if len(group) < cfg["min_samples"]:
            return None 

        coords_rad = np.radians(group[['fixed_lat', 'fixed_long']].values)
        epsilon = cfg["eps_km"] / cfg["earth_radius_km"]
        db = DBSCAN(eps=epsilon, min_samples=cfg["min_samples"], algorithm='ball_tree', metric='haversine').fit(coords_rad)
        
        group = group.assign(cluster_label=db.labels_)
        
        main_cluster = group[group['cluster_label'] != -1]
        
        if main_cluster.empty:
            return None
        
        largest_cluster_id = main_cluster['cluster_label'].value_counts().idxmax()
        residence_cluster = main_cluster[main_cluster['cluster_label'] == largest_cluster_id]
        
        residence_lat = residence_cluster['fixed_lat'].mean()
        residence_lon = residence_cluster['fixed_long'].mean()
        confidence = (len(residence_cluster) / len(group)) * 100
        
        return pd.Series({
            'residence_lat': residence_lat,
            'residence_lon': residence_lon,
            'confidence_score_percent': round(confidence, 2)
        })

    # --- 3. Apply the function to each VIN group ---
    print(f"Analyzing {df_night['vin'].nunique()} vehicles with nighttime data...")
    
    # --- THIS IS THE CORRECTED LINE ---
    # We add `include_groups=False` to adopt the future behavior and silence the warning.
    residences_df = df_night.groupby('vin').apply(find_residence_for_group, include_groups=False).dropna().reset_index()
    # ------------------------------------
    
    print(f"Successfully identified stable residences for {len(residences_df)} vehicles.")
    display(residences_df.head())


--- Cell 3: Running Residence Analysis via DBSCAN ---
Found 29691 nighttime records for active vehicles.
Analyzing 1060 vehicles with nighttime data...
Successfully identified stable residences for 982 vehicles.


Unnamed: 0,vin,residence_lat,residence_lon,confidence_score_percent
0,MD9EMCDL24F217385,27.237259,77.873366,65.52
1,MD9EMCDL24G217002,12.887537,80.189521,50.0
2,MD9EMCDL24G217006,28.532335,77.439679,50.0
3,MD9EMCDL24G217011,22.703841,75.807188,34.29
4,MD9EMCDL24G217012,22.715101,75.825687,72.22


### Cell 4: Final Summary and Output
Purpose: To combine the results of the pre-processing (stationary vehicles) and the analysis (active vehicles) into a single, comprehensive summary DataFrame. It uses an efficient, vectorized method (np.select) to assign the final status labels and saves the result to a CSV.

In [4]:
# ==============================================================================
# CELL 4: CREATE FINAL SUMMARY REPORT AND SAVE TO CSV
# ==============================================================================

if 'df' in locals():
    print("\n--- Cell 4: Creating Final Summary Report ---")

    # --- 1. Start with the unique status of all vehicles ---
    status_df = df[['vin', 'movement_status']].drop_duplicates()

    # --- 2. Merge with residence results ---
    if 'residences_df' in locals():
        final_summary_df = pd.merge(status_df, residences_df, on='vin', how='left')
    else: # Handle case where no residences were found
        final_summary_df = status_df
        final_summary_df['residence_lat'] = np.nan
        final_summary_df['residence_lon'] = np.nan
        final_summary_df['confidence_score_percent'] = np.nan

    # --- 3. Create a final, clean status column (Vectorized) ---
    conditions = [
        (final_summary_df['movement_status'] == 'Stationary or GPS Failure'),
        (final_summary_df['residence_lat'].notna())
    ]
    choices = ['Stationary Vehicle', 'Residence Found']
    final_summary_df['final_status'] = np.select(conditions, choices, default='Not Enough Night Data')

    # --- 4. Save to CSV ---
    final_summary_df.to_csv(CONFIG["output_csv_path"], index=False)
    print(f"Saved final summary for all {len(final_summary_df)} vehicles to '{CONFIG['output_csv_path']}'.")
    
    print("\n--- Final Summary Breakdown ---")
    print(final_summary_df['final_status'].value_counts())
    
    print("\n--- Preview of Final Report ---")
    display(final_summary_df.head())


--- Cell 4: Creating Final Summary Report ---
Saved final summary for all 1235 vehicles to '01_vehicle_residences_with_status.csv'.

--- Final Summary Breakdown ---
final_status
Residence Found          982
Stationary Vehicle       145
Not Enough Night Data    108
Name: count, dtype: int64

--- Preview of Final Report ---


Unnamed: 0,vin,movement_status,residence_lat,residence_lon,confidence_score_percent,final_status
0,MD9EMCDL24F217385,Has Movement,27.237259,77.873366,65.52,Residence Found
1,MD9EMCDL24G217002,Has Movement,12.887537,80.189521,50.0,Residence Found
2,MD9EMCDL24G217006,Has Movement,28.532335,77.439679,50.0,Residence Found
3,MD9EMCDL24G217010,Has Movement,,,,Not Enough Night Data
4,MD9EMCDL24G217011,Has Movement,22.703841,75.807188,34.29,Residence Found


### Cell 5: Visualization on Interactive Map
Purpose: To visually present the final results on an interactive map. This cell filters for the successfully identified residences and plots them using folium, providing an immediate geographic overview of your fleet's home bases. This code was already professional and required no major changes.

In [5]:
# ==============================================================================
# CELL 5: VISUALIZE RESIDENCE LOCATIONS ON AN INTERACTIVE MAP
# ==============================================================================
import folium

if 'final_summary_df' in locals():
    print("\n--- Cell 5: Visualizing Residence Locations on a Map ---")
    
    # Filter for only the vehicles where we actually found a residence
    map_data = final_summary_df[final_summary_df['final_status'] == 'Residence Found'].copy()
    
    if not map_data.empty:
        # Create the base map
        residence_map = folium.Map()

        # Add points to the map
        for idx, row in map_data.iterrows():
            folium.Marker(
                location=[row['residence_lat'], row['residence_lon']],
                tooltip=f"<b>VIN:</b> {row['vin']}<br><b>Confidence:</b> {row['confidence_score_percent']:.1f}%",
                icon=folium.Icon(color='blue', icon='home')
            ).add_to(residence_map)

        # Auto-fit the map view to the data
        sw = map_data[['residence_lat', 'residence_lon']].min().values.tolist()
        ne = map_data[['residence_lat', 'residence_lon']].max().values.tolist()
        residence_map.fit_bounds([sw, ne])

        # Save and display
        residence_map.save(CONFIG["output_map_path"])
        print(f"Map of {len(map_data)} residences saved to '{CONFIG['output_map_path']}'. Displaying map below.")
        display(residence_map)
    else:
        print("No residences were found to plot on the map.")


--- Cell 5: Visualizing Residence Locations on a Map ---
Map of 982 residences saved to '01_residences_map.html'. Displaying map below.
