# AADT Confidence Interval - Interstate 605

## FHWA Links
* Guidelines for Obtaining AADT Estimates from Non-Traditional Sources:
    * https://www.fhwa.dot.gov/policyinformation/travel_monitoring/pubs/aadtnt/Guidelines_for_AADT_Estimates_Final.pdf
  
  
## AADT Analysis Locations
* 10 locations were used in the analysis
* Locations were determined based on the location on installed & recording Traffic Operations cameras
    * for additional information contact Zhenyu Zhu with Traffic Operations

## Traffic Census Data
* https://dot.ca.gov/programs/traffic-operations/census/traffic-volumes
* Back AADT, Peak Month, and Peak Hour usually represents traffic South or West of the count location.  
* Ahead AADT, Peak Month, and Peak Hour usually represents traffic North or East of the count location. Listing of routes with their designated  

* Because the Back & Ahead counts are included at each location in the Traffic Census Data, (e.g., "IRWINDALE, ARROW HIGHWAY") only one [OBJECTID*] per location was pulled; for this analysis the North Bound Nodes were used for the analysis. 

## StreetLight Analysis Data
* StreetLight Locations on Interstate 605 are one-direction, each location will contain two points: northbound and southbound




In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats

In [3]:
def getdata_and_cleanheaders(path):
    # Read the CSV file
    df = pd.read_csv(path)

    # Clean column headers: remove spaces, convert to lowercase, and strip trailing asterisks
    cleaned_columns = []
    for column in df.columns:
        cleaned_column = column.replace(" ", "").lower().rstrip("*")
        cleaned_columns.append(cleaned_column)

    df.columns = cleaned_columns
    return df

In [2]:
def extract_aadt_comparison_data(df_stl, df_tc, mapping_dicts):
    """
    Extracts AADT counts from StreetLight and Traffic Census datasets 
    based on a mapping of zonename keys and objectid.

    Parameters:
    - df_stl (pd.DataFrame): StreetLight data with 'zonename' and 'averagedailysegmenttraffic(stlvolume)' columns.
    - df_tc (pd.DataFrame): Traffic Census data with 'objectid', 'ahead_aadt', and 'back_aadt' columns.
    - mapping_dicts (list of dict): Each item includes 'objectid' and potentially zonename_0 to zonename_3.

    Returns:
    - Tuple[np.array, np.array]: Two numpy arrays (StreetLight AADT values, Traffic Census AADT values).
    """
    aadt_non_traditional = []
    aadt_true = []

    for item in mapping_dicts:
        objectid = item["objectid"]
        
        # Determine the correct AADT direction from Traffic Census
        tc_row = df_tc[df_tc["objectid"].astype(str) == objectid]
        if tc_row.empty:
            continue
        
        if int(objectid) % 2 == 0:
            aadt_value = tc_row.iloc[0]["back_aadt"]
        else:
            aadt_value = tc_row.iloc[0]["ahead_aadt"]

        # Try to find a matching STL zonename
        matched = False
        for key in item:
            if key.startswith("zonename_"):
                zonename = item[key]
                stl_row = df_stl[df_stl["zonename"] == zonename]
                if not stl_row.empty:
                    stl_volume = stl_row.iloc[0]["averagedailysegmenttraffic(stlvolume)"]
                    aadt_non_traditional.append(stl_volume)
                    aadt_true.append(aadt_value)
                    matched = True
                    break  # Use the first matching zonename found

        if not matched:
            # Optionally print which objectid failed to match for STL
            print(f"No StreetLight match for objectid {objectid}")

    return np.array(aadt_non_traditional), np.array(aadt_true)

In [4]:
def create_non_trad_df_and_aadt_dict(df_tc, df_stl, interstate_605_aadt_locations):
    # Creating the true_aadt_dict
    true_aadt_dict = {
        str(row["objectid"]): row["back_aadt"] if int(row["objectid"]) % 2 == 0 else row["ahead_aadt"]
        for _, row in df_tc.iterrows()
    }
    
    # Creating non_trad_df
    non_trad_records = []

    for loc in interstate_605_aadt_locations:
        daytype = loc.get("daytype", "0: All Days (M-Su)")
        for key in loc:
            if key.startswith("zonename_"):
                zonename = loc[key]
                stl_match = df_stl[df_stl["zonename"] == zonename]
                if not stl_match.empty:
                    aadt_value = stl_match.iloc[0]["averagedailysegmenttraffic(stlvolume)"]
                    non_trad_records.append({
                        "zonename": zonename,
                        "aadt_value": aadt_value,
                        "daytype": daytype
                    })

    non_trad_df = pd.DataFrame(non_trad_records)

    return true_aadt_dict, non_trad_df

In [5]:
def compute_tce_confidence_interval_from_df(
    aadt_locations,
    true_aadt_dict,
    non_trad_df,
    confidence=0.95,
    daytype_filter="0: All Days (M-Su)"
):
    """
    Computes the Traffic Count Error (TCE), Confidence Interval, t-critical value, and t-test statistic.
    Only uses non-traditional AADT values with the specified 'daytype'.

    Parameters:
    - aadt_locations (list[dict]): List of location dictionaries.
    - true_aadt_dict (dict): Traditional AADT values keyed by objectid.
    - non_trad_df (DataFrame): DataFrame with non-traditional AADT data.
    - confidence (float): Confidence level for CI calculation.
    - daytype_filter (str): Daytype to filter on (default: "0: All Days (M-Su)")

    Returns:
    - tuple: (mean_tce, ci_lower, ci_upper, t_critical_value, t_test_statistic)
    """
    if true_aadt_dict is None:
        raise ValueError("true_aadt_dict is required but was None.")

    # Filter DataFrame to include only the specified daytype
    filtered_df = non_trad_df[non_trad_df["daytype"] == daytype_filter]

    # Build dict: zonename -> aadt_value
    non_traditional_aadt_dict = dict(zip(
        filtered_df["zonename"],
        filtered_df["aadt_value"]
    ))

    tce_list = []

    for loc in aadt_locations:
        objectid = loc.get("objectid")
        true_aadt = true_aadt_dict.get(objectid)

        if not true_aadt:  # Skip if missing or zero
            continue

        # Sum non-traditional AADT from all zonenames at this location
        non_trad_aadt = sum(
            non_traditional_aadt_dict.get(loc[key], 0)
            for key in loc if key.startswith("zonename_")
        )

        # Compute TCE
        tce = 100 * (non_trad_aadt - true_aadt) / true_aadt
        tce_list.append(tce)

    # If no valid TCEs were computed
    if not tce_list:
        return None, None, None, None, None

    # Convert to NumPy array and calculate statistics
    tce_array = np.array(tce_list)
    mean_tce = np.mean(tce_array)
    std_tce = np.std(tce_array, ddof=1)
    n = len(tce_array)
    se = std_tce / np.sqrt(n)
    
    # Compute the t-critical value
    df = n - 1
    t_critical = stats.t.ppf((1 + confidence) / 2, df)
    
    # Calculate the Confidence Interval
    ci_lower = mean_tce - t_critical * se
    ci_upper = mean_tce + t_critical * se
    
    # Compute the t-test statistic
    t_test_statistic = mean_tce / se if se != 0 else None

    return mean_tce, ci_lower, ci_upper, t_critical, t_test_statistic

In [6]:
interstate_605_aadt_locations = [
    {
        "objectid": "13117",  # this comes from Traffic Census [objectid]
        "location_description": "IRWINDALE, LOWER AZUSA ROAD/LOS ANGELES STREET",
        "order_number": 0, #
        "zonename_0": "San Gabriel River Freeway / 60501", # northbound
        "zonename_1": "San Gabriel River Freeway / 62770", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13118",  # this comes from Traffic Census [objectid]
        "location_description": "IRWINDALE, LOWER AZUSA ROAD/LOS ANGELES STREET",
        "order_number": 1,
        "zonename_2": "San Gabriel River Freeway / 17825964", # northbound
        "zonename_3": "San Gabriel River Freeway / 19247825", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13113",  # This comes from Traffic Census [objectid]
        "location_description": "BALDWIN PARK, JCT. RTE. 10",
        "order_number": 2,
        "zonename_0": "San Gabriel River Freeway / 19171162", # northbound
        "zonename_1": "San Gabriel River Freeway / 18695412", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13114",  # This comes from Traffic Census [objectid]
        "location_description": "BALDWIN PARK, JCT. RTE. 10",
        "order_number": 3,
        "zonename_2": "San Gabriel River Freeway / 23240119", # northbound
        "zonename_3": "San Gabriel River Freeway / 62623", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13111",  # This comes from Traffic Census [objectid]
        "location_description": "INDUSTRY, VALLEY BOULEVARD",
        "order_number": 4,
        "zonename_0": "San Gabriel River Freeway / 686634", # northbound
        "zonename_1": "San Gabriel River Freeway / 62623", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13112",  # This comes from Traffic Census [objectid]
        "location_description": "INDUSTRY, VALLEY BOULEVARD",
        "order_number": 5,
        "zonename_2": "San Gabriel River Freeway / 703991", # northbound
        "zonename_3": "San Gabriel River Freeway / 705638", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13109",  # This comes from Traffic Census [objectid]
        "location_description": "INDUSTRY, JCT. RTE. 60",
        "order_number": 6,
        "zonename_0": "San Gabriel River Freeway / 18665945", # northbound
        "zonename_1": "San Gabriel River Freeway / 705638", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13110",  # This comes from Traffic Census [objectid]
        "location_description": "INDUSTRY, JCT. RTE. 60",
        "order_number": 7,
        "zonename_2": "San Gabriel River Freeway / 13434006", # northbound
        "zonename_3": "San Gabriel River Freeway / 17555043", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13105",  # This comes from Traffic Census [objectid]
        "location_description": "PICO RIVERA, ROSE HILLS ROAD",
        "order_number": 8,
        "zonename_0": "San Gabriel River Freeway / 19894523", # northbound
        "zonename_1": "San Gabriel River Freeway / 20053622", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13106",  # This comes from Traffic Census [objectid]
        "location_description": "PICO RIVERA, ROSE HILLS ROAD",
        "order_number": 9,
        "zonename_2": "San Gabriel River Freeway / 703999", # northbound
        "zonename_3": "San Gabriel River Freeway / 673791", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13101",  # This comes from Traffic Census [objectid]
        "location_description": "WHITTIER, JCT. RTE. 72",
        "order_number": 10,
        "zonename_0": "San Gabriel River Freeway / 23806759", # northbound
        "zonename_1": "San Gabriel River Freeway / 19085288", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13102",  # This comes from Traffic Census [objectid]
        "location_description": "WHITTIER, JCT. RTE. 72",
        "order_number": 11,
        "zonename_2": "San Gabriel River Freeway / 689516", # northbound
        "zonename_3": "San Gabriel River Freeway / 1283676", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13089",  # This comes from Traffic Census [objectid]
        "location_description": "NORWALK, IMPERIAL HIGHWAY",
        "order_number": 12,
        "zonename_0": "San Gabriel River Freeway / 1292558", # northbound
        "zonename_1": "San Gabriel River Freeway / 43294", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13090",  # This comes from Traffic Census [objectid]
        "location_description": "NORWALK, IMPERIAL HIGHWAY",
        "order_number": 13,
        "zonename_2": "San Gabriel River Freeway / 36182", # northbound
        "zonename_3": "San Gabriel River Freeway / 699066", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13085",  # This comes from Traffic Census [objectid]
        "location_description": "NORWALK, ROSECRANS AVENUE",
        "order_number": 14,
        "zonename_0": "San Gabriel River Freeway / 18987883", # northbound
        "zonename_1": "San Gabriel River Freeway / 657856", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13086",  # This comes from Traffic Census [objectid]
        "location_description": "NORWALK, ROSECRANS AVENUE",
        "order_number": 15,
        "zonename_2": "San Gabriel River Freeway / 13303859", # northbound
        "zonename_3": "San Gabriel River Freeway / 23409847", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13084",  # This comes from Traffic Census [objectid]
        "location_description": "NORWALK, ALONDRA BOULEVARD",
        "order_number": 16,
        "zonename_0": "San Gabriel River Freeway / 13815791", # northbound
        "zonename_1": "San Gabriel River Freeway / 35978", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13083",  # This comes from Traffic Census [objectid]
        "location_description": "NORWALK, ALONDRA BOULEVARD",
        "order_number": 17,
        "zonename_2": "San Gabriel River Freeway / 15650683", # northbound
        "zonename_3": "San Gabriel River Freeway / 15851885", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13081",  # This comes from Traffic Census [objectid]
        "location_description": "CERRITOS, JCT. RTE. 91",
        "order_number": 18,
        "zonename_0": "San Gabriel River Freeway / 15650683", # northbound
        "zonename_1": "San Gabriel River Freeway / 17325939", # southbound
        "daytype": "0: All Days (M-Su)"
    },
    {
        "objectid": "13082",  # This comes from Traffic Census [objectid]
        "location_description": "CERRITOS, JCT. RTE. 91",
        "order_number": 19,
        "zonename_2": "San Gabriel River Freeway / 45330", # northbound
        "zonename_3": "San Gabriel River Freeway / 45717", # southbound
        "daytype": "0: All Days (M-Su)"
    },
]

In [7]:
# Identify the GCS path to the data
gcs_path = "gs://calitp-analytics-data/data-analyses/big_data/compare_traffic_counts/5_confidence_interval_i605_2022/"

In [8]:
# pull in the data & create dataframes
df_tc = getdata_and_cleanheaders(f"{gcs_path}caltrans_traffic_census_605_2022.csv")  # Traffic Census
df_stl = getdata_and_cleanheaders(f"{gcs_path}streetlight_605_all_vehicles_2022_v2_np.csv")  # StreetLight



In [9]:
# run the "extract_aadt_comparison_data" function
aadt_non_traditional, aadt_true = extract_aadt_comparison_data(df_stl, df_tc, interstate_605_aadt_locations)

In [10]:
true_aadt_dict, non_trad_df = create_non_trad_df_and_aadt_dict(df_tc, df_stl, interstate_605_aadt_locations)

In [11]:
mean_tce, ci_lower, ci_upper, t_critical, t_test_statistic = compute_tce_confidence_interval_from_df(
    aadt_locations=interstate_605_aadt_locations,
    true_aadt_dict=true_aadt_dict,
    non_trad_df=non_trad_df  # Ensure this matches the correct variable name
)

In [12]:
# Print the results
print("Mean TCE:", mean_tce)
print("95% Confidence Interval:", (ci_lower, ci_upper))
print("t-test statistic:", t_test_statistic)

Mean TCE: -3.6213543095203575
95% Confidence Interval: (-10.78115021710635, 3.5384415980656345)
t-test statistic: -1.0586309689820026


### Mean TCE: -3.62
Traffic Census Error (TCE)
* A negative TCE of -3.62% means that on average, the StreetLight AADT estimates are about 3.62% lower than the official Caltrans Traffic Census counts.

### 95% Confidence Interval (-10.78%, 3.54%)
* Based on the sample of locations, the results suggest 95% confidence that the true average TCE (i.e., the average percent difference between StreetLight and Census across the entire population) falls somewhere between -10.78% and +3.54%.
    * Since this interval includes zero, it's possible that the true average error is zero, meaning StreetLight might not be significantly over- or underestimating, on average.
    * But the range is quite wide (~14 percentage points), which indicates some variability in the data or a small sample size.

### T-Test Statistic  
* **-1.059**: This means your observed sample mean is about **1.059 standard errors** below the expected population mean. Since it's not far enough from the threshold (2.093), the result is **not significant**.

### Summary
* On average, StreetLight data is underestimating AADT by about 3.6% on this subset of locations.
* But with 95% confidence, the actual average error could be as much as 10.8% under or 3.5% over the true value.
* Because zero is in that range, you can't definitively say it's underestimating — the difference might not be statistically significant.

