# AADT Confidence Interval

## FHWA Links
* Guidelines for Obtaining AADT Estimates from Non-Traditional Sources:
    * https://www.fhwa.dot.gov/policyinformation/travel_monitoring/pubs/aadtnt/Guidelines_for_AADT_Estimates_Final.pdf
  
  
## AADT Analysis Locations
* 10 locations were used in the analysis
* Locations were determined based on the location on installed & recording Traffic Operations cameras
    * for additional information contact Zhenyu Zhu with Traffic Operations

## Traffic Census Data
* Back AADT, Peak Month, and Peak Hour usually represents traffic South or West of the count location.  
* Ahead AADT, Peak Month, and Peak Hour usually represents traffic North or East of the count location. Listing of routes with their designated  

* Because the Back & Ahead counts are included at each location in the Traffic Census Data, (e.g., "IRWINDALE, ARROW HIGHWAY") only one [OBJECTID*] per location was pulled; for this analysis the North Bound Nodes were used for the analysis. 

## StreetLight Analysis Data
* StreetLight Locations on Interstate 605 are one-direction, each location will contain two points: northbound and southbound




In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats

In [2]:
# Example AADT data from non-traditional sources and permanent counters
aadt_non_traditional = np.array([10500, 9800, 10200, 9500, 9900])
aadt_true = np.array([10000, 10000, 10000, 10000, 10000])

# Step 1: Compute TCEs
tce = 100 * (aadt_non_traditional - aadt_true) / aadt_true

# Step 2: Compute sample mean and standard error
mean_tce = np.mean(tce)
std_tce = np.std(tce, ddof=1)  # sample standard deviation
n = len(tce)
se = std_tce / np.sqrt(n)

# Step 3: Compute 95% confidence interval
confidence = 0.95
t_score = stats.t.ppf((1 + confidence) / 2, df=n-1)
ci_lower = mean_tce - t_score * se
ci_upper = mean_tce + t_score * se

# Print results
print(f"Mean TCE: {mean_tce:.2f}%")
print(f"95% Confidence Interval for TCE: ({ci_lower:.2f}%, {ci_upper:.2f}%)")

Mean TCE: -0.20%
95% Confidence Interval for TCE: (-4.96%, 4.56%)


Mean TCE: -0.20%

95% Confidence Interval for TCE: (-4.96%, 4.56%)

What does this mean?
Traffic Count Error (TCE) quantifies the percentage difference between AADT estimates from non-traditional sources and true AADT values from permanent counters:
 
In this case:

Mean TCE of -0.20% indicates that, on average, the non-traditional AADT estimates are 0.20% lower than the true AADT values. This suggests a slight underestimation.

95% Confidence Interval of (-4.96%, 4.56%) means that we can be 95% confident that the true mean TCE lies within this range. In other words, if we were to repeat this sampling process multiple times, 95% of the calculated confidence intervals would contain the true mean TCE .

Interpretation
The confidence interval includes zero, which implies that the average difference between the non-traditional and true AADT estimates is not statistically significant. This means that, on average, the non-traditional estimates neither consistently overestimate nor underestimate the true values.

The relatively narrow confidence interval (approximately ±5%) indicates that the non-traditional AADT estimates are reasonably precise and closely align with the true AADT values.

Practical Implications
Accuracy: The non-traditional AADT estimates are, on average, very close to the true values, with only a slight underestimation.

Precision: The narrow confidence interval suggests that the estimates are consistently close to the true values across different samples.

Reliability: Given the small mean error and narrow confidence interval, non-traditional AADT estimation methods can be considered reliable for practical applications.

In [3]:
def extract_aadt_comparison_data(df_stl, df_tc, mapping_dicts):
    """
    Extracts AADT counts from StreetLight and Traffic Census datasets based on a mapping of zonename and objectid.
    Odd objectid strings use 'ahead_aadt', even ones use 'back_aadt'.
    
    Parameters:
    - df_stl (pd.DataFrame): StreetLight data with a 'zonename' and 'averagedailysegmenttraffic(stlvolume)' column.
    - df_tc (pd.DataFrame): Traffic Census data with 'objectid', 'ahead_aadt', and 'back_aadt' columns.
    - mapping_dicts (list of dict): Contains mappings with keys 'objectid', 'zonename', and 'order_number'.
    
    Returns:
    - Tuple[np.array, np.array]: A tuple with two numpy arrays (StreetLight AADT values, Traffic Census AADT values).
    """
    aadt_non_traditional = []
    aadt_true = []

    for item in mapping_dicts:
        objectid = item['objectid']
        zonename = item['zonename']
        
        # Get the corresponding STL volume from df_stl
        stl_row = df_stl[df_stl['zonename'] == zonename]
        if not stl_row.empty:
            stl_volume = stl_row.iloc[0]['averagedailysegmenttraffic(stlvolume)']
        else:
            continue  # Skip if no match found in StreetLight
        
        # Get the corresponding Traffic Census AADT based on even/odd objectid string
        tc_row = df_tc[df_tc['objectid'].astype(str) == objectid]
        if not tc_row.empty:
            if int(objectid) % 2 == 0:
                aadt_value = tc_row.iloc[0]['back_aadt']
            else:
                aadt_value = tc_row.iloc[0]['ahead_aadt']
        else:
            continue  # Skip if no match found in Traffic Census
        
        aadt_non_traditional.append(stl_volume)
        aadt_true.append(aadt_value)

    return np.array(aadt_non_traditional), np.array(aadt_true)

In [4]:
# Identify the GCS path to the data
gcs_path = "gs://calitp-analytics-data/data-analyses/big_data/compare_traffic_counts/5_confidence_interval_i605_2022/"

In [5]:
def getdata_and_cleanheaders(path):   
    # Read the CSV file
    df = pd.read_csv(path)
    
    # Clean column headers: remove spaces, convert to lowercase, and strip trailing asterisks
    cleaned_columns = []
    for column in df.columns:
        cleaned_column = column.replace(" ", "").lower().rstrip("*")
        cleaned_columns.append(cleaned_column)
    
    df.columns = cleaned_columns
    return df

In [6]:
# pull in the data & create dataframes
df_tc = getdata_and_cleanheaders(f"{gcs_path}caltrans_traffic_census_605_2022.csv") # Traffic Census
df_stl = getdata_and_cleanheaders(f"{gcs_path}streetlight_605_all_vehicles_2022_v2_np.csv") # StreetLight



In [7]:
df_stl.shape

(411, 36)

In [8]:
df_tc.shape

(56, 16)

In [9]:
interstate_605_aadt_locations = [
    {
        "objectid": "13117", # this comes from Traffic Census [objectid]
        "order_number": 0,
        "zonename": "San Gabriel River Freeway/22736179" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13118", # this comes from Traffic Census [objectid]
        "order_number": 1,
        "zonename": "San Gabriel River Freeway / 21923758" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13113", # This comes from Traffic Census [objectid]
        "order_number": 2,
        "zonename": "San Gabriel River Freeway / 13815791" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13114", # This comes from Traffic Census [objectid]
        "order_number": 3,
        "zonename": "San Gabriel River Freeway / 690054" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13111", # This comes from Traffic Census [objectid]
        "order_number": 4,
        "zonename": "San Gabriel River Freeway / 671965" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13112", # This comes from Traffic Census [objectid]
        "order_number": 5,
        "zonename": "San Gabriel River Freeway / 1283924" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13109", # This comes from Traffic Census [objectid]
        "order_number": 6,
        "zonename": "San Gabriel River Freeway / 1292558" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13110", # This comes from Traffic Census [objectid]
        "order_number": 7,
        "zonename": "San Gabriel River Freeway / 43294" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13105", # This comes from Traffic Census [objectid]
        "order_number": 8,
        "zonename": "San Gabriel River Freeway / 689515" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13106", # This comes from Traffic Census [objectid]
        "order_number": 9,
        "zonename": "San Gabriel River Freeway / 41598" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13103", # This comes from Traffic Census [objectid]
        "order_number": 10,
        "zonename": "San Gabriel River Freeway / 16496201" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13104", # This comes from Traffic Census [objectid]
        "order_number": 11,
        "zonename": "San Gabriel River Freeway / 19811124" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "12089", # This comes from Traffic Census [objectid]
        "order_number": 12,
        "zonename": "San Gabriel River Freeway / 703991" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "12090", # This comes from Traffic Census [objectid]
        "order_number": 13,
        "zonename": "San Gabriel River Freeway / 705638" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13085", # This comes from Traffic Census [objectid]
        "order_number": 14,
        "zonename": "San Gabriel River Freeway / 62521" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13086", # This comes from Traffic Census [objectid]
        "order_number": 15,
        "zonename": "San Gabriel River Freeway / 682535" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13084", # This comes from Traffic Census [objectid]
        "order_number": 16,
        "zonename": "San Gabriel River Freeway / 19171162" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13083", # This comes from Traffic Census [objectid]
        "order_number": 17,
        "zonename": "San Gabriel River Freeway / 18695412" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13081", # This comes from Traffic Census [objectid]
        "order_number": 18,
        "zonename": "San Gabriel River Freeway / 17134947" # this comes from StreetLight [zonename]
    },
    {
        "objectid": "13082", # This comes from Traffic Census [objectid]
        "order_number": 19,
        "zonename": "San Gabriel River Freeway / 19247826" # this comes from StreetLight [zonename]
    }
]


In [10]:
# run the "extract_aadt_comparison_data" function
aadt_non_traditional, aadt_true = extract_aadt_comparison_data(df_stl, df_tc, interstate_605_aadt_locations)

In [11]:

def compute_tce_confidence_interval(aadt_non_traditional, aadt_true, confidence=0.95):
    """
    Computes the Traffic Count Error (TCE) and returns the confidence interval.

    Parameters:
    - aadt_non_traditional (np.array): AADT estimates from a non-traditional source.
    - aadt_true (np.array): True AADT values from permanent counters.
    - confidence (float): Confidence level (default is 0.95).

    Returns:
    - tuple: (mean_tce, ci_lower, ci_upper)
    """
    tce = 100 * (aadt_non_traditional - aadt_true) / aadt_true
    mean_tce = np.mean(tce)
    std_tce = np.std(tce, ddof=1)
    n = len(tce)
    se = std_tce / np.sqrt(n)
    t_score = stats.t.ppf((1 + confidence) / 2, df=n - 1)
    ci_lower = mean_tce - t_score * se
    ci_upper = mean_tce + t_score * se

    return mean_tce, ci_lower, ci_upper

In [12]:
mean, lower, upper = compute_tce_confidence_interval(aadt_non_traditional, aadt_true)
print(f"Mean TCE: {mean:.2f}%")
print(f"95% Confidence Interval: ({lower:.2f}%, {upper:.2f}%)")

Mean TCE: -54.48%
95% Confidence Interval: (-59.26%, -49.71%)


### Understanding the Metrics
* Traffic Count Error (TCE) quantifies the percentage difference between AADT estimates from non-traditional sources and true AADT values from permanent counters.
* Mean TCE of -54.48% indicates that, on average, the non-traditional AADT estimates are 54.48% lower than the true AADT values. This suggests a substantial underestimation.
* 95% Confidence Interval of (-59.26%, -49.71%) means that we can be 95% confident that the true mean TCE lies within this range. In other words, if we were to repeat this sampling process multiple times, 95% of the calculated confidence intervals would contain the true mean TCE.

### Interpretation
* The confidence interval does not include zero, which implies that the average difference between the non-traditional and true AADT estimates is statistically significant. This means that the non-traditional estimates consistently underestimate the true values.
* The relatively narrow confidence interval (approximately ±5%) indicates that the non-traditional AADT estimates are consistently underestimating the true values across different samples.
### Practical Implications
* Accuracy: The non-traditional AADT estimates are significantly lower than the true values, with an average underestimation of over 50%.
* Precision: The narrow confidence interval suggests that this underestimation is consistent across different observations.
* Reliability: Given the significant and consistent underestimation, non-traditional AADT estimation methods may not be reliable without adjustments or calibrations.
### Recommendations
* Investigate Potential Causes: Examine the methodology of the non-traditional data sources to identify factors contributing to the underestimation.
* Calibration: Consider applying correction factors or calibration techniques to adjust the non-traditional estimates closer to the true values.
* Supplement with Additional Data: Use supplementary data sources or methods to validate and enhance the accuracy of the non-traditional AADT estimates.