# AADT Confidence Interval - Interstate 80

## FHWA Links
* Guidelines for Obtaining AADT Estimates from Non-Traditional Sources:
    * https://www.fhwa.dot.gov/policyinformation/travel_monitoring/pubs/aadtnt/Guidelines_for_AADT_Estimates_Final.pdf
  
  
## AADT Analysis Locations
* 10 locations were used in the analysis
* Locations were determined based on the location on installed & recording Traffic Operations cameras
    * for additional information contact Zhenyu Zhu with Traffic Operations

## Traffic Census Data
* Back AADT, Peak Month, and Peak Hour usually represents traffic South or West of the count location.  
* Ahead AADT, Peak Month, and Peak Hour usually represents traffic North or East of the count location. Listing of routes with their designated  

* Because the Back & Ahead counts are included at each location in the Traffic Census Data, (e.g., "IRWINDALE, ARROW HIGHWAY") only one [OBJECTID*] per location was pulled; for this analysis the North Bound Nodes were used for the analysis. 

## StreetLight Analysis Data
* StreetLight Locations on Interstate 80 are one-direction, each location will contain two points: northbound and southbound




In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats

In [2]:
# Example AADT data from non-traditional sources and permanent counters
aadt_non_traditional = np.array([10500, 9800, 10200, 9500, 9900])
aadt_true = np.array([10000, 10000, 10000, 10000, 10000])

# Step 1: Compute TCEs
tce = 100 * (aadt_non_traditional - aadt_true) / aadt_true

# Step 2: Compute sample mean and standard error
mean_tce = np.mean(tce)
std_tce = np.std(tce, ddof=1)  # sample standard deviation
n = len(tce)
se = std_tce / np.sqrt(n)

# Step 3: Compute 95% confidence interval
confidence = 0.95
t_score = stats.t.ppf((1 + confidence) / 2, df=n - 1)
ci_lower = mean_tce - t_score * se
ci_upper = mean_tce + t_score * se

# Print results
print(f"Mean TCE: {mean_tce:.2f}%")
print(f"95% Confidence Interval for TCE: ({ci_lower:.2f}%, {ci_upper:.2f}%)")

Mean TCE: -0.20%
95% Confidence Interval for TCE: (-4.96%, 4.56%)


* Mean TCE: -0.20%
* 95% Confidence Interval for TCE: (-4.96%, 4.56%)

### What does this mean?
* Traffic Count Error (TCE) quantifies the percentage difference between AADT estimates from non-traditional sources and true AADT values from permanent counters:
 
In this case:
* Mean TCE of -0.20% indicates that, on average, the non-traditional AADT estimates are 0.20% lower than the true AADT values. This suggests a slight underestimation.
* 95% Confidence Interval of (-4.96%, 4.56%) means that we can be 95% confident that the true mean TCE lies within this range. In other words, if we were to repeat this sampling process multiple times, 95% of the calculated confidence intervals would contain the true mean TCE .

### Interpretation
* The confidence interval includes zero, which implies that the average difference between the non-traditional and true AADT estimates is not statistically significant. This means that, on average, the non-traditional estimates neither consistently overestimate nor underestimate the true values.
* The relatively narrow confidence interval (approximately ±5%) indicates that the non-traditional AADT estimates are reasonably precise and closely align with the true AADT values.
### Practical Implications
* **Accuracy**: The non-traditional AADT estimates are, on average, very close to the true values, with only a slight underestimation.
* **Precision**: The narrow confidence interval suggests that the estimates are consistently close to the true values across different samples.
* **Reliability**: Given the small mean error and narrow confidence interval, non-traditional AADT estimation methods can be considered reliable for practical applications.

In [3]:
def extract_aadt_comparison_data(df_stl, df_tc, mapping_dicts):
    """
    Extracts AADT counts from StreetLight and Traffic Census datasets based on a mapping of zonename and objectid.
    Odd objectid strings use 'ahead_aadt', even ones use 'back_aadt'.

    Parameters:
    - df_stl (pd.DataFrame): StreetLight data with a 'zonename' and 'averagedailysegmenttraffic(stlvolume)' column.
    - df_tc (pd.DataFrame): Traffic Census data with 'objectid', 'ahead_aadt', and 'back_aadt' columns.
    - mapping_dicts (list of dict): Contains mappings with keys 'objectid', 'zonename', and 'order_number'.

    Returns:
    - Tuple[np.array, np.array]: A tuple with two numpy arrays (StreetLight AADT values, Traffic Census AADT values).
    """
    aadt_non_traditional = []
    aadt_true = []

    for item in mapping_dicts:
        objectid = item["objectid"]
        zonename = item["zonename"]

        # Get the corresponding STL volume from df_stl
        stl_row = df_stl[df_stl["zonename"] == zonename]
        if not stl_row.empty:
            stl_volume = stl_row.iloc[0]["averagedailysegmenttraffic(stlvolume)"]
        else:
            continue  # Skip if no match found in StreetLight

        # Get the corresponding Traffic Census AADT based on even/odd objectid string
        tc_row = df_tc[df_tc["objectid"].astype(str) == objectid]
        if not tc_row.empty:
            if int(objectid) % 2 == 0:
                aadt_value = tc_row.iloc[0]["back_aadt"]
            else:
                aadt_value = tc_row.iloc[0]["ahead_aadt"]
        else:
            continue  # Skip if no match found in Traffic Census

        aadt_non_traditional.append(stl_volume)
        aadt_true.append(aadt_value)

    return np.array(aadt_non_traditional), np.array(aadt_true)

In [4]:
# Identify the GCS path to the data
gcs_path = "gs://calitp-analytics-data/data-analyses/big_data/compare_traffic_counts/5_confidence_interval_i80_2022/"

In [5]:
def getdata_and_cleanheaders(path):
    # Read the CSV file
    df = pd.read_csv(path)

    # Clean column headers: remove spaces, convert to lowercase, and strip trailing asterisks
    cleaned_columns = []
    for column in df.columns:
        cleaned_column = column.replace(" ", "").lower().rstrip("*")
        cleaned_columns.append(cleaned_column)

    df.columns = cleaned_columns
    return df

In [6]:
# pull in the data & create dataframes
df_tc = getdata_and_cleanheaders(
    f"{gcs_path}caltrans_traffic_census_80_d3_2022.csv"
)  # Traffic Census
df_stl = getdata_and_cleanheaders(
    f"{gcs_path}streetlight_i80_aadt_2022.csv"
)  # StreetLight



In [7]:
df_stl.shape

(1200, 17)

In [8]:
df_tc.shape

(172, 16)

In [9]:
interstate_605_aadt_locations = [
    {
        "objectid": "6365",  # this comes from Traffic Census [objectid]
        "order_number": 0,
        "zonename": "Alan S. Hart Freeway / 683187",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6366",  # this comes from Traffic Census [objectid]
        "order_number": 1,
        "zonename": "Alan S. Hart Freeway / 70888",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6349",  # This comes from Traffic Census [objectid]
        "order_number": 2,
        "zonename": "Alan S. Hart Freeway / 17768734",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6350",  # This comes from Traffic Census [objectid]
        "order_number": 3,
        "zonename": "Alan S. Hart Freeway / 13663182",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13596",  # This comes from Traffic Census [objectid]
        "order_number": 4,
        "zonename": "Alan S. Hart Freeway / 70480",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13779",  # This comes from Traffic Census [objectid]
        "order_number": 5,
        "zonename": "Alan S. Hart Freeway / 19625302",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6339",  # This comes from Traffic Census [objectid]
        "order_number": 6,
        "zonename": "Alan S. Hart Freeway / 17897855",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6340",  # This comes from Traffic Census [objectid]
        "order_number": 7,
        "zonename": "Alan S. Hart Freeway / 13954444",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6331",  # This comes from Traffic Census [objectid]
        "order_number": 8,
        "zonename": "Alan S. Hart Freeway / 70016",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6332",  # This comes from Traffic Census [objectid]
        "order_number": 9,
        "zonename": "Alan S. Hart Freeway / 69088",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6305",  # This comes from Traffic Census [objectid]
        "order_number": 10,
        "zonename": "Alan S. Hart Freeway / 712613",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6306",  # This comes from Traffic Census [objectid]
        "order_number": 11,
        "zonename": "Alan S. Hart Freeway / 21926012",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6291",  # This comes from Traffic Census [objectid]
        "order_number": 12,
        "zonename": "Alan S. Hart Freeway / 15369559",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6292",  # This comes from Traffic Census [objectid]
        "order_number": 13,
        "zonename": "Alan S. Hart Freeway / 20541977",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6271",  # This comes from Traffic Census [objectid]
        "order_number": 14,
        "zonename": "Alan S. Hart Freeway / 21646790",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6272",  # This comes from Traffic Census [objectid]
        "order_number": 15,
        "zonename": "Alan S. Hart Freeway / 22175291",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6259",  # This comes from Traffic Census [objectid]
        "order_number": 16,
        "zonename": "Alan S. Hart Freeway / 15552281",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6260",  # This comes from Traffic Census [objectid]
        "order_number": 17,
        "zonename": "Alan S. Hart Freeway / 14374733",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6249",  # This comes from Traffic Census [objectid]
        "order_number": 18,
        "zonename": "I 80 / 23724550",  # this comes from StreetLight [zonename]
    },
    {
        "objectid": "6250",  # This comes from Traffic Census [objectid]
        "order_number": 19,
        "zonename": "I 80 / 16281553",  # this comes from StreetLight [zonename]
    },
]

In [10]:
# run the "extract_aadt_comparison_data" function
aadt_non_traditional, aadt_true = extract_aadt_comparison_data(
    df_stl, df_tc, interstate_605_aadt_locations
)

In [12]:
def compute_tce_confidence_interval(aadt_non_traditional, aadt_true, confidence=0.95, locations=None):
    """
    Computes the Traffic Count Error (TCE) and returns the confidence interval 
    along with the t-test number (critical value), excluding entries with nan in aadt_true.

    Parameters:
    - aadt_non_traditional (np.array): AADT estimates from a non-traditional source.
    - aadt_true (np.array): True AADT values from permanent counters.
    - confidence (float): Confidence level (default is 0.95).
    - locations (list of dicts): Optional. Contains metadata including 'order_number'.

    Returns:
    - tuple: (mean_tce, ci_lower, ci_upper, t_test_number, excluded_order_numbers)
    """
    aadt_non_traditional = np.array(aadt_non_traditional)
    aadt_true = np.array(aadt_true)

    # Identify valid (non-nan) indices
    valid_idx = ~np.isnan(aadt_true)

    # Get list of excluded order_numbers if locations provided
    excluded_order_numbers = []
    if locations is not None:
        excluded_order_numbers = [
            loc["order_number"] for i, loc in enumerate(locations)
            if not valid_idx[i]
        ]

    # Filter the arrays
    aadt_non_traditional = aadt_non_traditional[valid_idx]
    aadt_true = aadt_true[valid_idx]

    tce = 100 * (aadt_non_traditional - aadt_true) / aadt_true
    mean_tce = np.mean(tce)
    std_tce = np.std(tce, ddof=1)
    n = len(tce)
    se = std_tce / np.sqrt(n)
    df = n - 1  # degrees of freedom
    t_test_number = stats.t.ppf((1 + confidence) / 2, df)
    ci_lower = mean_tce - t_test_number * se
    ci_upper = mean_tce + t_test_number * se

    return mean_tce, ci_lower, ci_upper, t_test_number, excluded_order_numbers

In [14]:
mean_tce, ci_lower, ci_upper, t_val, excluded = compute_tce_confidence_interval(
    aadt_non_traditional, aadt_true, confidence=0.95, locations=interstate_605_aadt_locations)

In [16]:
print(f"Mean TCE: {mean_tce:.2f}%")
print(f"95% Confidence Interval: ({ci_lower:.2f}%, {ci_upper:.2f}%)")
print(f"t-test number (critical value): {t_val:.4f}")
print(f"Excluded order_numbers due to NaN: {excluded}")

Mean TCE: -47.37%
95% Confidence Interval: (-51.41%, -43.34%)
t-test number (critical value): 2.1009
Excluded order_numbers due to NaN: [0]


In [17]:
aadt_non_traditional

array([17189, 16055, 18419, 15860, 17199, 14906, 17344, 16274, 18823,
       16869, 23456, 21212, 30010, 25949, 43378, 44862, 90490, 86054,
       94506, 95652])

In [18]:
aadt_true

array([    nan,  35500.,  29000.,  26000.,  31500.,  29500.,  28500.,
        29000.,  27500.,  27500.,  38500.,  44000.,  57000.,  60000.,
        81000.,  90000., 219000., 203000., 212000., 230000.])

### What does this mean?
#### 1. Mean TCE: -47.37%
* TCE (Treatment Control Effect) here refers to the difference in AADT counts between the two datasets: traditional counts versus non-traditional counts.
* A mean of -47.37% means that the non-traditional AADT source is showing 47.37% fewer cars on average than the traditional source. So, in general, the non-traditional data is reporting much lower traffic counts.
#### 2. 95% Confidence Interval: (-51.41%, -43.34%)
* The 95% Confidence Interval tells us the range within which we are 95% confident the true difference lies.
* In this case, you’re 95% confident that the non-traditional AADT counts are between 43.34% and 51.41% lower than the traditional AADT counts.
* This means that the non-traditional counts are consistently showing lower traffic volumes when compared to the traditional data.
* Because the whole interval is negative (below 0%), it indicates the non-traditional source is consistently under-reporting traffic when compared to the traditional source.
#### 3. t-test number (critical value): 2.1009
* The critical value of 2.1009 is a threshold used in the t-test to see if the difference between your datasets is statistically significant.
* If the t-statistic (which measures the size of the difference between the two datasets relative to the variability) is larger than 2.1009, it means the difference you’re seeing between the traditional and non-traditional counts is not due to random chance.
* Since the confidence interval does not include 0, we know that the difference between the two data sources is statistically significant — the non-traditional counts are reliably different (lower) than the traditional counts.

#### How This Relates to the AADT Data Comparison:
* Comparing two different methods for counting AADT — traditional versus non-traditional sources.
* The mean TCE of -47.37% tells us that, on average, the non-traditional AADT counts show about 47.37% fewer cars than the traditional counts.
* The 95% Confidence Interval of (-51.41%, -43.34%) means that you are 95% confident the real difference in traffic counts between the two sources is between 43.34% and 51.41% lower in the non-traditional data.
* The t-test shows that this difference is statistically significant, so the non-traditional counts are reliably reporting fewer vehicles than the traditional counts.

This suggests that if you use non-traditional AADT data, you can expect it to consistently report lower traffic volumes compared to traditional methods, and this difference is unlikely to be due to random chance.