# Final Project - Most Optimal Parking Location Based on Time of Day And Location

Ethan Defilippi

ECD57@pitt.edu

## Abstract

Attending college at the University of Pittsburgh can be a challenge if you are a
commuter with early classes. It is not an uncommon occurrence to spend minutes looking for
parking once you arrive on campus. I plan on answering the question, “Where is the best place
to park on Pitt’s campus based on time of day?” Using data from the Pennsylvania Department
of Transportation and the Pittsburgh Parking Authority, I plan on training a model to predict the
best meter location based on what time of day and what location within the city of Pittsburgh to
park at. I consider this project useful because it helps to alleviate a real-world problem that
many people face (especially Pitt commuters). Anyone who commutes to work or school and
has trouble finding parking could benefit from this model. I plan on using two data sets that
contain meter information for almost all parking meters within the city of Pittsburgh and traffic
density information throughout the city at all times of day. I plan on training a Random Forest
model on the datasets to accomplish this because of their resistance to missing data and their
ability to output multiple options for parking with a probability score.

In [15]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
import folium
from folium import plugins
from IPython.display import display, clear_output
import scipy.stats as stats


clicked_coords = []

traffic = pd.read_csv('data-trafficcounts.csv')
meters = pd.read_csv('parking-meters.csv')
## Drop all column in meters, but x,y, and location
meters = meters[['x', 'y', 'location','rate']]
##Drop rows with NA values if x or y is NA
meters = meters.dropna(subset=['x', 'y','rate','location'])

### Getting rid of NA's

My first data set had ZERO NA values in it (AMAZING!). The second data set (the parking meters one) had a few NA's for x and y. These are extremely important, so those rows were dropped entirely. They are coordinates and there is no way to bring them back into the data or estimate them. The NA's from created_user/data and last_edited_user/data are not important to the data set, so they are kept in. 

###  Dealing with outliers


In [None]:
# Set up subplots for before/after comparison
fig, axes = plt.subplots(2, 6, figsize=(20, 8))
fig.suptitle('Distribution Before and After Outlier Removal', fontsize=16)

# Lists to store test results
before_tests = []
after_tests = []

hours = ['7a', '8a', '9a', '10a', '11a', '12p']

# Before outlier removal
for idx, hour in enumerate(hours):
    # Store normality test results
    stat, p_value = stats.normaltest(traffic[hour])
    before_tests.append({'hour': hour, 'p_value': p_value})
    
    # Plot distribution
    sns.histplot(data=traffic[hour], kde=True, ax=axes[0, idx])
    axes[0, idx].set_title(f'{hour}\np={p_value:.3f}')

# Remove outliers using IQR method
traffic_clean = traffic.copy()
for hour in hours:
    Q1 = traffic[hour].quantile(0.25)
    Q3 = traffic[hour].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5 * IQR
    upper = Q3 + 1.5 * IQR
    traffic_clean[hour] = traffic_clean[hour].clip(lower=lower, upper=upper)

# After outlier removal
for idx, hour in enumerate(hours):
    # Store normality test results
    stat, p_value = stats.normaltest(traffic_clean[hour])
    after_tests.append({'hour': hour, 'p_value': p_value})
    
    # Plot distribution
    sns.histplot(data=traffic_clean[hour], kde=True, ax=axes[1, idx])
    axes[1, idx].set_title(f'{hour}\np={p_value:.3f}')

plt.tight_layout()
plt.show()

# Print test results
print("\nNormality Test Results (p-values):")
print("Hour\tBefore\tAfter")
for b, haversine_component in zip(before_tests, after_tests):
    print(f"{b['hour']}\t{b['p_value']:.3f}\t{haversine_component['p_value']:.3f}")

I chose not to remove any outliers from the data-trafficcounts.csv file. When cleaning outliers, I used IQR. From the visualizations shown, there does not seem to have been any major outliers because the data is largely unchanged. Outliers here could represent sensor errors, but there seem to have been none present. (Great job PennDot). The other dataset has no outliers to be removed.

Standard visualiztions do not show great correlations with this data because almost all of the meter values are independent to one another. All sensors in my feature list seem to be very positively skewed. 

### Feature Selection

I plan to use all of the traffic hours as features from the trafficcounts dataset in order to find real world traffic counts at important commute times for people. I also plan on using all of the meter GPS coordinates to train the model on meters close to each other. The response features for this model will be the longitudes and latitudes of parking zones within the city in order to give users a parking area to use.

## The Machine Learning Model

### Fandom Forest Regression Model

For my first model, I used a random forest model because they are great with non-linear data (such as this data) and output the confidence ratings that I need for the interactive map. Random forest also handles skewed data very well, so it was the perfect choice for this project.

In [None]:
class RandForestRegression:
    def __init__(self):
        self.traffic_data = None
        self.meter_data = None
        self.model = None
        self.scaler = StandardScaler()
        self.confidence_threshold = 0.6  # Minimum confidence to show recommendation
        
    def load_data(self, traffic_df, meters_df):
        """Load and prepare the traffic and parking meter datasets"""
        self.traffic_data = traffic_df
        self.meter_data = meters_df
        
        self.time_columns = ['1a','2a','3a','4a','5a','6a','7a', '8a', '9a', '10a', '11a', '12p','1p','2p','3p','4p','5p','6p','7p','8p','9p','10p','11p','12a']
        
        print(f"Loaded {len(self.traffic_data)} traffic sensors and {len(self.meter_data)} parking meters")
        
    def calculate_distance(self, start_latitude, start_longitude, destination_latitude, destination_longitude):
        """
        Calculate distance between two points using Haversine formula
        
        a = sin²(φB - φA/2) + cos φA * cos φB * sin²(λB - λA/2)
        c = 2 * atan2( √a, √(1−a) )
        d = R ⋅ c
        
        """     
        r = 6371  # Earth's radius in kilometers

        start_latitude, start_longitude, destination_latitude, destination_longitude = map(math.radians, [start_latitude, start_longitude, destination_latitude, destination_longitude])
        delta_latitude = destination_latitude - start_latitude
        delta_longitude = destination_longitude - start_longitude
        
        haversine_component = math.sin(delta_latitude/2)**2 + math.cos(start_latitude) * math.cos(destination_latitude) * math.sin(delta_longitude/2)**2
        central_angle = 2 * math.asin(math.sqrt(haversine_component))
        return r * central_angle

    def prepare_features(self):
        """Prepare features for the machine learning model"""
        features = []
        labels = []
        
        for _, meter in self.meter_data.iterrows():
            # Find nearest traffic sensors
            distances = []
            traffic_values = []
            
            for _, sensor in self.traffic_data.iterrows():
                dist = self.calculate_distance(meter['y'], meter['x'], 
                                            sensor['Latitude'], sensor['Longitude'])
                distances.append(dist)
                
                # Get traffic values for all time periods
                time_values = [sensor[time_col] for time_col in self.time_columns]
                traffic_values.append(time_values)
            
            # Get 3 nearest sensors
            nearest_indices = np.argsort(distances)[:3]
            
            for time_idx, time_col in enumerate(self.time_columns):
                # Features for this time period
                feature_row = [
                    meter['x'],  # longitude
                    meter['y'],  # latitude
                    distances[nearest_indices[0]],  # distance to nearest sensor
                    distances[nearest_indices[1]],  # distance to 2nd nearest
                    distances[nearest_indices[2]],  # distance to 3rd nearest
                    traffic_values[nearest_indices[0]][time_idx],  # nearest sensor traffic
                    traffic_values[nearest_indices[1]][time_idx],  # 2nd nearest traffic
                    traffic_values[nearest_indices[2]][time_idx],  # 3rd nearest traffic
                    time_idx  # time of day encoded as 0-5
                ]
                
                features.append(feature_row)
                
                # Label: inverse of average traffic (higher is better for parking)
                avg_traffic = np.mean([traffic_values[i][time_idx] for i in nearest_indices])
                parking_score = 1.0 / (1.0 + avg_traffic/100)  # Normalize to 0-1
                labels.append(parking_score)
        
        return np.array(features), np.array(labels)

    def train_model(self):
        """Train the Random Forest model"""
        print("Preparing features and training model...")
        
        X, y = self.prepare_features()
        X = self.scaler.fit_transform(X)
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Train Random Forest model
        self.model = RandomForestRegressor(n_estimators=100, random_state=42)
        self.model.fit(X_train, y_train)
        
    def predict_parking_spots(self, target_lat, target_lon, time, max_distance=1.0):
        """
        Predict best parking spots and their confidence scores
        
        Parameters:
        - target_lat: float, destination latitude
        - target_lon: float, destination longitude
        - time: str, time of day (e.g., '8a')
        - max_distance: float, maximum walking distance in km
        
        Returns: DataFrame with predictions and confidence scores
        """
        if time not in self.time_columns:
            raise ValueError(f"Time must be one of {self.time_columns}")
        
        time_idx = self.time_columns.index(time)
        
        # Prepare prediction data
        pred_features = []
        valid_meters = []
        
        for idx, meter in self.meter_data.iterrows():
            # Calculate distance to destination
            dist = self.calculate_distance(target_lat, target_lon, meter['y'], meter['x'])
            
            if dist <= max_distance:
                # Find nearest traffic sensors
                sensor_distances = []
                traffic_values = []
                
                for _, sensor in self.traffic_data.iterrows():
                    sensor_dist = self.calculate_distance(meter['y'], meter['x'], 
                                                        sensor['Latitude'], sensor['Longitude'])
                    sensor_distances.append(sensor_dist)
                    traffic_values.append(sensor[time])
                
                # Get 3 nearest sensors
                nearest_indices = np.argsort(sensor_distances)[:3]
                
                feature_row = [
                    meter['x'],
                    meter['y'],
                    sensor_distances[nearest_indices[0]],
                    sensor_distances[nearest_indices[1]],
                    sensor_distances[nearest_indices[2]],
                    traffic_values[nearest_indices[0]],
                    traffic_values[nearest_indices[1]],
                    traffic_values[nearest_indices[2]],
                    time_idx
                ]
                
                pred_features.append(feature_row)
                valid_meters.append(idx)
        
        if not pred_features:
            return pd.DataFrame()
        
        # Scale features and make predictions
        X_pred = self.scaler.transform(np.array(pred_features))
        predictions = self.model.predict(X_pred)
        
        # Get confidence scores using tree variance
        confidences = []
        for x in X_pred:
            tree_predictions = [tree.predict(x.reshape(1, -1))[0] for tree in self.model.estimators_]
            confidence = 1 - np.std(tree_predictions)  # Lower variance = higher confidence
            confidences.append(confidence)
        
        # Create results dataframe
        results = self.meter_data.iloc[valid_meters].copy()
        results['prediction'] = predictions
        results['confidence'] = confidences
        results['distance_to_dest'] = [
            self.calculate_distance(target_lat, target_lon, row['y'], row['x'])
            for _, row in results.iterrows()
        ]
        
        # Filter by confidence threshold and sort
        results = results[results['confidence'] >= self.confidence_threshold]
        results = results.sort_values('prediction', ascending=False)
        
        return results

    def create_map(self, target_lat, target_lon, recommendations):
        """Create a folium map with the recommended parking spots"""
        # Create base map centered on target location
        m = folium.Map(location=[target_lat, target_lon], zoom_start=15)
        
        # Add destination marker
        folium.Marker(
            [target_lat, target_lon],
            popup='Destination',
            icon=folium.Icon(color='red', icon='info-sign')
        ).add_to(m)
        
        # Add recommended parking spots
        for _, spot in recommendations.iterrows():
            # Color based on confidence (green=high, orange=not high)
            color = 'green' if spot['confidence'] > 0.7 else 'orange'
            
            # Create popup content
            popup_html = f"""
                <b>{spot['location']}</b><br>
                Rate: {spot['rate']}<br>
                Distance: {spot['distance_to_dest']:.2f} km<br>
                Confidence: {spot['confidence']:.2f}<br>
                Score: {spot['prediction']:.2f}
            """

            folium.Marker(
                [spot['y'], spot['x']],
                popup=popup_html,
                icon=folium.Icon(color=color, icon='parking', prefix='fa')
            ).add_to(m)
        
        # Add search box
        plugins.Geocoder().add_to(m)
        
        # Add layer control
        folium.LayerControl().add_to(m)
        
        return m
    
    def create_interactive_map(self):
        global clicked_coords
        clicked_coords = []
        
        # Create base map focused on Pittsburgh
        m = folium.Map(
            location=[40.4406, -79.9959],
            zoom_start=12
        )
        
        m.add_child(folium.LatLngPopup())
        # Add clickable marker functionality
        m.add_child(folium.ClickForLatLng())
        
        return m

RandForest = RandForestRegression()

# Load data
RandForest.load_data(traffic, meters)

# Train model
RandForest.train_model()

### Decision Tree Model

For my second model, I used a decision tree model because they are outlier resistant, handle non-linear data well, and handle missing values well.

In [None]:
class DecisionTree:
    def __init__(self):
        self.traffic_data = None
        self.meter_data = None
        self.model = None
        self.scaler = StandardScaler()
        self.confidence_threshold = 0.6  # Minimum confidence to show recommendation
        
    def load_data(self, traffic_df, meters_df):
        """Load and prepare the traffic and parking meter dataframes"""
        self.traffic_data = traffic_df
        self.meter_data = meters_df
        
        self.time_columns = ['1a','2a','3a','4a','5a','6a','7a', '8a', '9a', '10a', '11a', '12p','1p','2p','3p','4p','5p','6p','7p','8p','9p','10p','11p','12a']
        
        print(f"Loaded {len(self.traffic_data)} traffic sensors and {len(self.meter_data)} parking meters")
        
    def calculate_distance(self, start_latitude, start_longitude, destination_latitude, destination_longitude):
        """
        Calculate distance between two points using Haversine formula
        
        a = sin²(φB - φA/2) + cos φA * cos φB * sin²(λB - λA/2)
        c = 2 * atan2( √a, √(1−a) )
        d = R ⋅ c
        
        """     
        r = 6371  # Earth's radius in kilometers

        start_latitude, start_longitude, destination_latitude, destination_longitude = map(math.radians, [start_latitude, start_longitude, destination_latitude, destination_longitude])
        delta_latitude = destination_latitude - start_latitude
        delta_longitude = destination_longitude - start_longitude
        
        haversine_component = math.sin(delta_latitude/2)**2 + math.cos(start_latitude) * math.cos(destination_latitude) * math.sin(delta_longitude/2)**2
        central_angle = 2 * math.asin(math.sqrt(haversine_component))
        return r * central_angle

    def prepare_features(self):
        """Prepare features for the machine learning model"""
        features = []
        labels = []
        
        for _, meter in self.meter_data.iterrows():
            # Find nearest traffic sensors
            distances = []
            traffic_values = []
            
            for _, sensor in self.traffic_data.iterrows():
                dist = self.calculate_distance(meter['y'], meter['x'], 
                                            sensor['Latitude'], sensor['Longitude'])
                distances.append(dist)
                
                # Get traffic values for all time periods
                time_values = [sensor[time_col] for time_col in self.time_columns]
                traffic_values.append(time_values)
            
            # Get 3 nearest sensors
            nearest_indices = np.argsort(distances)[:3]
            
            for time_idx, time_col in enumerate(self.time_columns):
                # Features for this time period
                feature_row = [
                    meter['x'],  # longitude
                    meter['y'],  # latitude
                    distances[nearest_indices[0]],  # distance to nearest sensor
                    distances[nearest_indices[1]],  # distance to 2nd nearest
                    distances[nearest_indices[2]],  # distance to 3rd nearest
                    traffic_values[nearest_indices[0]][time_idx],  # nearest sensor traffic
                    traffic_values[nearest_indices[1]][time_idx],  # 2nd nearest traffic
                    traffic_values[nearest_indices[2]][time_idx],  # 3rd nearest traffic
                    time_idx  # time of day encoded as 0-5
                ]
                
                features.append(feature_row)
                
                # Label: inverse of average traffic (higher is better for parking)
                avg_traffic = np.mean([traffic_values[i][time_idx] for i in nearest_indices])
                parking_score = 1.0 / (1.0 + avg_traffic/100)  # Normalize to 0-1
                labels.append(parking_score)
        
        return np.array(features), np.array(labels)

    def train_model(self):
        """Train the Decision Tree model"""
        print("Preparing features and training model...")
        
        X, y = self.prepare_features()
        X = self.scaler.fit_transform(X)
                
        #Train the Decision Tree model
        self.model = DecisionTreeRegressor(max_depth=10,min_samples_split=5,random_state=42)
        
    def predict_parking_spots(self, target_lat, target_lon, time, max_distance=1.0):
        """
        Predict best parking spots and their confidence scores
        
        Parameters:
        - target_lat: float, destination latitude
        - target_lon: float, destination longitude
        - time: str, time of day (e.g., '8a')
        - max_distance: float, maximum walking distance in km
        
        Returns: DataFrame with predictions and confidence scores
        """
        if time not in self.time_columns:
            raise ValueError(f"Time must be one of {self.time_columns}")
        
        time_idx = self.time_columns.index(time)
        
        # Prepare prediction data
        pred_features = []
        valid_meters = []
        
        for idx, meter in self.meter_data.iterrows():
            # Calculate distance to destination
            dist = self.calculate_distance(target_lat, target_lon, meter['y'], meter['x'])
            
            if dist <= max_distance:
                # Find nearest traffic sensors
                sensor_distances = []
                traffic_values = []
                
                for _, sensor in self.traffic_data.iterrows():
                    sensor_dist = self.calculate_distance(meter['y'], meter['x'], 
                                                        sensor['Latitude'], sensor['Longitude'])
                    sensor_distances.append(sensor_dist)
                    traffic_values.append(sensor[time])
                
                # Get 3 nearest sensors
                nearest_indices = np.argsort(sensor_distances)[:3]
                
                feature_row = [
                    meter['x'],
                    meter['y'],
                    sensor_distances[nearest_indices[0]],
                    sensor_distances[nearest_indices[1]],
                    sensor_distances[nearest_indices[2]],
                    traffic_values[nearest_indices[0]],
                    traffic_values[nearest_indices[1]],
                    traffic_values[nearest_indices[2]],
                    time_idx
                ]
                
                pred_features.append(feature_row)
                valid_meters.append(idx)
        
        if not pred_features:
            return pd.DataFrame()
        
        # Scale features and make predictions
        X_pred = self.scaler.transform(np.array(pred_features))
        predictions = self.model.predict(X_pred)
        
        # Use tree depth as proxy for confidence
        confidences = []
        for x in X_pred:
            # Calculate path length for this prediction
            path_length = 0
            node_id = 0
            while node_id != -1:  # Until we reach a leaf
                if self.model.tree_.feature[node_id] == -2:  # Leaf node
                    break
                if x[self.model.tree_.feature[node_id]] <= self.model.tree_.threshold[node_id]:
                    node_id = self.model.tree_.children_left[node_id]
                else:
                    node_id = self.model.tree_.children_right[node_id]
                path_length += 1
            
            # Normalize confidence based on max possible depth
            confidence = path_length / self.model.get_depth()
            confidences.append(confidence)
        
        # Create results dataframe
        results = self.meter_data.iloc[valid_meters].copy()
        results['prediction'] = predictions
        results['confidence'] = confidences
        results['distance_to_dest'] = [
            self.calculate_distance(target_lat, target_lon, row['y'], row['x'])
            for _, row in results.iterrows()
        ]
        
        # Filter by confidence threshold and sort
        results = results[results['confidence'] >= self.confidence_threshold]
        results = results.sort_values('prediction', ascending=False)
        
        return results

    def create_map(self, target_lat, target_lon, recommendations):
        """Create a folium map with the recommended parking spots"""
        # Create base map centered on target location
        m = folium.Map(location=[target_lat, target_lon], zoom_start=15)
        
        # Add destination marker
        folium.Marker(
            [target_lat, target_lon],
            popup='Destination',
            icon=folium.Icon(color='red', icon='info-sign')
        ).add_to(m)
        
        # Add recommended parking spots
        for _, spot in recommendations.iterrows():
            # Color based on confidence (green=high, orange=not high)
            color = 'green' if spot['confidence'] > 0.7 else 'orange'
            
            # Create popup content
            popup_html = f"""
                <b>{spot['location']}</b><br>
                Rate: {spot['rate']}<br>
                Distance: {spot['distance_to_dest']:.2f} km<br>
                Confidence: {spot['confidence']:.2f}<br>
                Score: {spot['prediction']:.2f}
            """

            folium.Marker(
                [spot['y'], spot['x']],
                popup=popup_html,
                icon=folium.Icon(color=color, icon='parking', prefix='fa')
            ).add_to(m)
        
        # Add search box
        plugins.Geocoder().add_to(m)
        
        # Add layer control
        folium.LayerControl().add_to(m)
        
        return m
    
    def create_interactive_map(self):
        global clicked_coords
        clicked_coords = []
        
        # Create base map focused on Pittsburgh
        m = folium.Map(
            location=[40.4406, -79.9959],
            zoom_start=12
        )
        
        m.add_child(folium.LatLngPopup())
        # Add clickable marker functionality
        m.add_child(folium.ClickForLatLng())
        
        return m

DTree = DecisionTree()

# Load data
DTree.load_data(traffic, meters)

# Train model
DTree.train_model()

### KNN Model

Lastly, I chose a KNN model because they are simple to implement from my existing structure, and update with new data well. Because traffic data is always being collected, this model can be expanded later to use that new data. 

In [None]:
class KNN:
    def __init__(self):
        self.traffic_data = None
        self.meter_data = None
        self.model = None
        self.scaler = StandardScaler()
        self.confidence_threshold = 0.6  # Minimum confidence to show recommendation
        
    def load_data(self, traffic_df, meters_df):
        """Load and prepare the traffic and parking meter datasets"""
        self.traffic_data = traffic_df
        self.meter_data = meters_df
        
        self.time_columns = ['1a','2a','3a','4a','5a','6a','7a', '8a', '9a', '10a', '11a', '12p','1p','2p','3p','4p','5p','6p','7p','8p','9p','10p','11p','12a']
        
        print(f"Loaded {len(self.traffic_data)} traffic sensors and {len(self.meter_data)} parking meters")
        
    def calculate_distance(self, start_latitude, start_longitude, destination_latitude, destination_longitude):
        """
        Calculate distance between two points using Haversine formula
        
        a = sin²(φB - φA/2) + cos φA * cos φB * sin²(λB - λA/2)
        c = 2 * atan2( √a, √(1−a) )
        d = R ⋅ c
        
        """     
        r = 6371  # Earth's radius in kilometers

        start_latitude, start_longitude, destination_latitude, destination_longitude = map(math.radians, [start_latitude, start_longitude, destination_latitude, destination_longitude])
        delta_latitude = destination_latitude - start_latitude
        delta_longitude = destination_longitude - start_longitude
        
        haversine_component = math.sin(delta_latitude/2)**2 + math.cos(start_latitude) * math.cos(destination_latitude) * math.sin(delta_longitude/2)**2
        central_angle = 2 * math.asin(math.sqrt(haversine_component))
        return r * central_angle

    def prepare_features(self):
        """Prepare features for the machine learning model"""
        features = []
        labels = []
        
        for _, meter in self.meter_data.iterrows():
            # Find nearest traffic sensors
            distances = []
            traffic_values = []
            
            for _, sensor in self.traffic_data.iterrows():
                dist = self.calculate_distance(meter['y'], meter['x'], 
                                            sensor['Latitude'], sensor['Longitude'])
                distances.append(dist)
                
                # Get traffic values for all time periods
                time_values = [sensor[time_col] for time_col in self.time_columns]
                traffic_values.append(time_values)
            
            # Get 3 nearest sensors
            nearest_indices = np.argsort(distances)[:3]
            
            for time_idx, time_col in enumerate(self.time_columns):
                # Features for this time period
                feature_row = [
                    meter['x'],  # longitude
                    meter['y'],  # latitude
                    distances[nearest_indices[0]],  # distance to nearest sensor
                    distances[nearest_indices[1]],  # distance to 2nd nearest
                    distances[nearest_indices[2]],  # distance to 3rd nearest
                    traffic_values[nearest_indices[0]][time_idx],  # nearest sensor traffic
                    traffic_values[nearest_indices[1]][time_idx],  # 2nd nearest traffic
                    traffic_values[nearest_indices[2]][time_idx],  # 3rd nearest traffic
                    time_idx  # time of day encoded as 0-5
                ]
                
                features.append(feature_row)
                
                # Label: inverse of average traffic (higher is better for parking)
                avg_traffic = np.mean([traffic_values[i][time_idx] for i in nearest_indices])
                parking_score = 1.0 / (1.0 + avg_traffic/100)  # Normalize to 0-1
                labels.append(parking_score)
        
        return np.array(features), np.array(labels)

    def train_model(self):
        """Train the KNN model"""
        print("Preparing features and training model...")
        
        X, y = self.prepare_features()
        X = self.scaler.fit_transform(X)
                
        # Initialize and train KNN
        self.model = KNeighborsRegressor(n_neighbors=5,weights='distance',metric='euclidean')
        
    def predict_parking_spots(self, target_lat, target_lon, time, max_distance=1.0):
        """
        Predict best parking spots and their confidence scores
        
        Parameters:
        - target_lat: float, destination latitude
        - target_lon: float, destination longitude
        - time: str, time of day (e.g., '8a')
        - max_distance: float, maximum walking distance in km
        
        Returns: DataFrame with predictions and confidence scores
        """
        if time not in self.time_columns:
            raise ValueError(f"Time must be one of {self.time_columns}")
        
        time_idx = self.time_columns.index(time)
        
        # Prepare prediction data
        pred_features = []
        valid_meters = []
        
        for idx, meter in self.meter_data.iterrows():
            # Calculate distance to destination
            dist = self.calculate_distance(target_lat, target_lon, meter['y'], meter['x'])
            
            if dist <= max_distance:
                # Find nearest traffic sensors
                sensor_distances = []
                traffic_values = []
                
                for _, sensor in self.traffic_data.iterrows():
                    sensor_dist = self.calculate_distance(meter['y'], meter['x'], 
                                                        sensor['Latitude'], sensor['Longitude'])
                    sensor_distances.append(sensor_dist)
                    traffic_values.append(sensor[time])
                
                # Get 3 nearest sensors
                nearest_indices = np.argsort(sensor_distances)[:3]
                
                feature_row = [
                    meter['x'],
                    meter['y'],
                    sensor_distances[nearest_indices[0]],
                    sensor_distances[nearest_indices[1]],
                    sensor_distances[nearest_indices[2]],
                    traffic_values[nearest_indices[0]],
                    traffic_values[nearest_indices[1]],
                    traffic_values[nearest_indices[2]],
                    time_idx
                ]
                
                pred_features.append(feature_row)
                valid_meters.append(idx)
        
        if not pred_features:
            return pd.DataFrame()
        
        # Scale features and make predictions
        X_pred = self.scaler.transform(np.array(pred_features))

        # Get predictions and distances to nearest neighbors
        predictions = self.model.predict(X_pred)
        distances, _ = self.model.kneighbors(X_pred)

        # Convert distances to confidence scores (closer = more confident)
        max_dist = np.max(distances)
        confidences = 1 - np.mean(distances / max_dist, axis=1)
        
        # Create results dataframe
        results = self.meter_data.iloc[valid_meters].copy()
        results['prediction'] = predictions
        results['confidence'] = confidences
        results['distance_to_dest'] = [
            self.calculate_distance(target_lat, target_lon, row['y'], row['x'])
            for _, row in results.iterrows()
        ]
        
        # Filter by confidence threshold and sort
        results = results[results['confidence'] >= self.confidence_threshold]
        results = results.sort_values('prediction', ascending=False)
        
        return results

    def create_map(self, target_lat, target_lon, recommendations):
        """Create a folium map with the recommended parking spots"""
        # Create base map centered on target location
        m = folium.Map(location=[target_lat, target_lon], zoom_start=15)
        
        # Add destination marker
        folium.Marker(
            [target_lat, target_lon],
            popup='Destination',
            icon=folium.Icon(color='red', icon='info-sign')
        ).add_to(m)
        
        # Add recommended parking spots
        for _, spot in recommendations.iterrows():
            # Color based on confidence (green=high, orange=not high)
            color = 'green' if spot['confidence'] > 0.7 else 'orange'
            
            # Create popup content
            popup_html = f"""
                <b>{spot['location']}</b><br>
                Rate: {spot['rate']}<br>
                Distance: {spot['distance_to_dest']:.2f} km<br>
                Confidence: {spot['confidence']:.2f}<br>
                Score: {spot['prediction']:.2f}
            """

            folium.Marker(
                [spot['y'], spot['x']],
                popup=popup_html,
                icon=folium.Icon(color=color, icon='parking', prefix='fa')
            ).add_to(m)
        
        # Add search box
        plugins.Geocoder().add_to(m)
        
        # Add layer control
        folium.LayerControl().add_to(m)
        
        return m
    
    def create_interactive_map(self):
        global clicked_coords
        clicked_coords = []
        
        # Create base map focused on Pittsburgh
        m = folium.Map(
            location=[40.4406, -79.9959],
            zoom_start=12
        )
        
        m.add_child(folium.LatLngPopup())
        # Add clickable marker functionality
        m.add_child(folium.ClickForLatLng())
        
        return m

knnModel = KNN()

# Load data
knnModel.load_data(traffic, meters)

# Train model
knnModel.train_model()

### User-Interactable Model - Random Forest Regression Model Backbone

In [None]:
global clicked_coords
clicked_coords = []

while(True):
    initial_map = RandForest.create_interactive_map()
    display(initial_map)
    
    user_input = input("After placing marker, copy the coordinates as input to the model. Press Enter to continue...")
    clear_output(wait=True)

    clicked_coords = user_input.split(sep=' ') # Default to Pittsburgh
    
    if clicked_coords:
        lat = float(clicked_coords[1])
        lon = float(clicked_coords[3])
        user_time = input("Enter a time in the formate <hour><a/p> (e.g., 8a): ")
        max_distance = float(input("Enter maximum walking distance (km): "))
        if not user_time or not max_distance:
            print("Invalid input. Please try again.")
            continue
        
        # Make prediction
        recommendations = RandForest.predict_parking_spots(lat, lon, user_time, max_distance)
        break
    else:
        print("No location selected. Please try again.")
        

print("\nTop recommendations:")
print(recommendations[['location', 'rate', 'distance_to_dest', 'prediction', 'confidence']].head())

# Create and save map
m = RandForest.create_map(lat, lon, recommendations)
m.save('parking_recommendations.html')
print("\nMap saved as 'parking_recommendations.html'")


# Conclusion

The best performing model from my testing is the random forest regression model. Its ability to handle skewed and non-linear data well made it the best-performing model overall. 


This model outputs it's top 5 predicted best parking spots based off predicted occupancy (on a scale of 0-1) and its confidence in its prediction. Using the 'recommendation' data from the model, an interactive map is created that shows all possible meter locations within the user-defined distance. 

## Usage: 
* Click "Run All"
* Wait for all of the models to train and load (takes a while, but I figured they should all at least run)
* Once the interactive map pops up, find a location within the city to park and click on it. (Ex. The Pete)
* Copy all of the text shown on the map pin and paste it into terminal asking for user input.
* Enter a time into the user-input terminal
* Enter a max walking distance in kilometers (float)
* Look at the recommendations or run the html file generated to get a map of parking meter suggestions

# Sources/Help During Project

Anthropic Claude for Helping with Some of the Boilerplate Code
* Code Comments Between Each Model
* Calculating Distance in Circle
* Specific Styling Choices for the Folium Map

Folium Guide For How to Use the Interactive Map
* https://python-visualization.github.io/folium/latest/user_guide.html