# 🚗 Dynamic Pricing for Urban Parking Lots

## 📘 Project Objective
This notebook simulates dynamic pricing strategies for 14 urban parking lots using real-time factors such as occupancy, queue length, traffic congestion, special event indicators, and nearby competitor prices.

We build:
- 📈 **Model 1:** Simple Linear Occupancy-Based Pricing
- 📊 **Model 2:** Multi-Factor Demand-Based Pricing
- 🧭 **Model 3:** Competition-Aware Pricing (Geo-distance based)

🔁 Real-time simulation powered by **Pathway**  
📊 Live visualization using **Bokeh**


In [1]:
import pandas as pd
import numpy as np
from math import radians, cos, sin, sqrt, atan2


## 📥 Step 1: Load and Preview Data

We begin by loading the dataset and reviewing its structure. The data consists of:
- Parking lot metadata (capacity, lat/long)
- Real-time state (occupancy, queue)
- Environmental features (traffic, event day)
- Vehicle information


In [3]:
# Load dataset into a pandas DataFrame and view initial rows
df = pd.read_csv("/content/dataset.csv")
df.head()


Unnamed: 0,ID,SystemCodeNumber,Capacity,Latitude,Longitude,Occupancy,VehicleType,TrafficConditionNearby,QueueLength,IsSpecialDay,LastUpdatedDate,LastUpdatedTime
0,0,BHMBCCMKT01,577,26.144536,91.736172,61,car,low,1,0,04-10-2016,07:59:00
1,1,BHMBCCMKT01,577,26.144536,91.736172,64,car,low,1,0,04-10-2016,08:25:00
2,2,BHMBCCMKT01,577,26.144536,91.736172,80,car,low,2,0,04-10-2016,08:59:00
3,3,BHMBCCMKT01,577,26.144536,91.736172,107,car,low,2,0,04-10-2016,09:32:00
4,4,BHMBCCMKT01,577,26.144536,91.736172,150,bike,low,2,0,04-10-2016,09:59:00


## 🔧 Step 2: Feature Engineering

We transform raw features into numerical and normalized formats:
- Occupancy → `OccupancyRate`
- Queue → normalized `QueueNorm`
- Traffic → mapped and normalized `TrafficNorm`
- VehicleType → encoded as `VehicleWeight`


In [4]:
# Map vehicle type to weights: larger vehicles are assumed to pay more
vehicle_weights = {'bike': 0.5, 'car': 1.0, 'truck': 1.5}
df['VehicleWeight'] = df['VehicleType'].map(vehicle_weights).fillna(1.0)

# Calculate occupancy rate as a ratio between 0 and 1
df['OccupancyRate'] = df['Occupancy'] / df['Capacity']

# Normalize queue length (to make it comparable across lots)
q_min, q_max = df['QueueLength'].min(), df['QueueLength'].max()
df['QueueNorm'] = (df['QueueLength'] - q_min) / (q_max - q_min)

# Map traffic condition (low, medium, high) to numeric scores and normalize
traffic_map = {'low': 0.2, 'medium': 0.5, 'high': 0.9}
df['TrafficValue'] = df['TrafficConditionNearby'].map(traffic_map).fillna(0.5)

t_min, t_max = df['TrafficValue'].min(), df['TrafficValue'].max()
df['TrafficNorm'] = (df['TrafficValue'] - t_min) / (t_max - t_min)


---

## 📈 Model 1 – Linear Occupancy-Based Pricing

**Logic:**  
Price increases linearly with current occupancy. No consideration of demand context or environment.

**Formula:**  
Price = BasePrice + α × (Occupancy / Capacity)

In [5]:
# Combine date and time columns into one timestamp
df['Timestamp'] = pd.to_datetime(
    df['LastUpdatedDate'] + ' ' + df['LastUpdatedTime'],
    format='%d-%m-%Y %H:%M:%S',
    errors='coerce'
)

# Sort by timestamp just in case
df = df.sort_values(by='Timestamp')


In [6]:
df = df.sort_values(['SystemCodeNumber', 'Timestamp'])

# Initialize price column
df['Price_Model1'] = np.nan

# Set base price
alpha = 5
base_price = 10

# Apply recursively per parking lot
for lot in df['SystemCodeNumber'].unique():
    prev_price = base_price
    for idx in df[df['SystemCodeNumber'] == lot].index:
        occ = df.loc[idx, 'Occupancy']
        cap = df.loc[idx, 'Capacity']
        occ_rate = occ / cap if cap > 0 else 0
        new_price = prev_price + alpha * occ_rate
        df.loc[idx, 'Price_Model1'] = min(max(new_price, 5), 20)
        prev_price = df.loc[idx, 'Price_Model1']


## 📊 Model 2 – Multi-Factor Demand-Based Pricing

**Logic:**  
This model considers:
- Occupancy Rate
- Queue Length
- Traffic Congestion
- Special Event Indicator
- Vehicle Type

A demand score is calculated using a weighted sum of these features.

**Formula:**
Demand = α·OccupancyRate + β·QueueNorm − γ·TrafficNorm + δ·IsSpecialDay + ε·VehicleWeight

Price = Base × (1 + λ × NormalizedDemand)

In [7]:
# Raw demand function using multiple factors
df['RawDemand'] = (
    df['OccupancyRate'] +
    df['QueueNorm'] -
    df['TrafficNorm'] +
    df['IsSpecialDay'] +
    df['VehicleWeight']
)


In [8]:
# Normalize the demand to range 0–1
df['NormalizedDemand'] = df['RawDemand'] / 5.0  # max theoretical demand score = 5

# Calculate price: base × (1 + demand)
df['Price_Model2'] = base_price * (1 + df['NormalizedDemand'])
df['Price_Model2'] = df['Price_Model2'].clip(lower=5, upper=20)

df[['SystemCodeNumber', 'NormalizedDemand', 'Price_Model2']].head()


Unnamed: 0,SystemCodeNumber,NormalizedDemand,Price_Model2
0,BHMBCCMKT01,0.234477,12.344772
1,BHMBCCMKT01,0.235517,12.35517
2,BHMBCCMKT01,0.254396,12.543963
3,BHMBCCMKT01,0.263755,12.637551
4,BHMBCCMKT01,0.17866,11.786597


## 🧭 Model 3 – Competition-Aware Pricing

**Logic:**  
Incorporates spatial awareness using latitude/longitude.
- Checks for nearby lots within 0.5 km.
- Compares current price with average price of neighbors.
- Adjusts price ±5% to remain competitive.

**Bonus Feature:**  
Optional rerouting suggestion if this lot is full and cheaper lots are nearby.




In [9]:
df = df.head(500)  # Use only first 5000 rows to validate logic quickly


In [10]:
# Helper function to compute distance between two geo-coordinates
def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Radius of Earth in km
    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
    dlat, dlon = lat2 - lat1, lon2 - lon1
    a = sin(dlat/2)**2 + cos(lat1)*cos(lat2)*sin(dlon/2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    return R * c  # distance in km


In [11]:
# Create placeholder for Model 3 price
prices_model3 = []

for idx, row in df.iterrows():
    lat1, lon1 = row['Latitude'], row['Longitude']
    this_price = row['Price_Model2']
    nearby_prices = []

    # Compare with all other lots
    for _, other in df.iterrows():
        if row['SystemCodeNumber'] == other['SystemCodeNumber']:
            continue  # skip self
        dist = haversine(lat1, lon1, other['Latitude'], other['Longitude'])
        if dist <= 0.5:  # within 0.5 km
            nearby_prices.append(other['Price_Model2'])

    # Competitive adjustment logic
    if len(nearby_prices) > 0:
        avg_comp_price = np.mean(nearby_prices)

        # If full and own price > competitors → decrease price
        if row['Occupancy'] >= row['Capacity'] and this_price > avg_comp_price:
            this_price *= 0.95

        # If competitors more expensive → increase price
        elif this_price < avg_comp_price:
            this_price *= 1.05

    prices_model3.append(min(max(this_price, 5), 20))  # clip to [5, 20]

df['Price_Model3'] = prices_model3
df[['SystemCodeNumber', 'Price_Model2', 'Price_Model3']].head()


Unnamed: 0,SystemCodeNumber,Price_Model2,Price_Model3
0,BHMBCCMKT01,12.344772,12.344772
1,BHMBCCMKT01,12.35517,12.35517
2,BHMBCCMKT01,12.543963,12.543963
3,BHMBCCMKT01,12.637551,12.637551
4,BHMBCCMKT01,11.786597,11.786597


In [12]:
# Compare all model prices
df[['SystemCodeNumber', 'OccupancyRate', 'Price_Model1', 'Price_Model2', 'Price_Model3']].head(10)


Unnamed: 0,SystemCodeNumber,OccupancyRate,Price_Model1,Price_Model2,Price_Model3
0,BHMBCCMKT01,0.105719,10.528596,12.344772,12.344772
1,BHMBCCMKT01,0.110919,11.083189,12.35517,12.35517
2,BHMBCCMKT01,0.138648,11.77643,12.543963,12.543963
3,BHMBCCMKT01,0.185442,12.70364,12.637551,12.637551
4,BHMBCCMKT01,0.259965,14.003466,11.786597,11.786597
5,BHMBCCMKT01,0.306759,15.537262,13.013518,13.013518
6,BHMBCCMKT01,0.379549,17.435009,12.559099,12.559099
7,BHMBCCMKT01,0.428076,19.57539,12.665676,12.665676
8,BHMBCCMKT01,0.448873,20.0,12.707271,12.707271
9,BHMBCCMKT01,0.461005,20.0,10.988677,10.988677


## 📊 Real-Time Visualization with Bokeh

We now visualize pricing behavior using **Bokeh**:

- 📈 Time-series plots for Model 1, 2, and 3
- 🔄 Comparison of price fluctuations for a selected parking lot


In [13]:
# If you're in Colab or need Bokeh for the first time, uncomment and run this:
# !pip install bokeh

from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.layouts import column
from bokeh.palettes import Category10

output_notebook()


### 🕓 Combine Date and Time

We'll convert the `LastUpdatedDate` and `LastUpdatedTime` columns into a single datetime column to allow proper time-series plotting.


In [14]:
# Convert string date and time into a datetime format
df['Timestamp'] = pd.to_datetime(
    df['LastUpdatedDate'] + ' ' + df['LastUpdatedTime'],
    format='%d-%m-%Y %H:%M:%S',
    errors='coerce'
)

# Fill any missing timestamps just in case
df = df.sort_values('Timestamp')
df['Timestamp'] = df['Timestamp'].fillna(method='bfill')


  df['Timestamp'] = df['Timestamp'].fillna(method='bfill')


### 📈 Visualization Function

This function generates an interactive Bokeh line chart comparing the prices from Model 1, 2, and 3 for any given parking lot.


In [15]:
def plot_price_models_for_lot(lot_code):
    subset = df[df['SystemCodeNumber'] == lot_code]

    source = ColumnDataSource(data={
        'time': subset['Timestamp'],
        'model1': subset['Price_Model1'],
        'model2': subset['Price_Model2'],
        'model3': subset['Price_Model3']
    })

    p = figure(title=f"Dynamic Pricing Over Time – {lot_code}",
               x_axis_label='Time', y_axis_label='Price ($)',
               x_axis_type='datetime', width=800, height=400)

    p.line(x='time', y='model1', source=source, line_width=2, color=Category10[3][0], legend_label='Model 1')
    p.line(x='time', y='model2', source=source, line_width=2, color=Category10[3][1], legend_label='Model 2')
    p.line(x='time', y='model3', source=source, line_width=2, color=Category10[3][2], legend_label='Model 3')

    hover = HoverTool(
        tooltips=[
            ('Time', '@time{%F %T}'),
            ('Model 1', '@model1'),
            ('Model 2', '@model2'),
            ('Model 3', '@model3')
        ],
        formatters={'@time': 'datetime'},
        mode='vline'
    )
    p.add_tools(hover)
    p.legend.location = 'top_left'
    return p


### 📍 View Plot for a Sample Parking Lot

Use any lot code from the dataset to display a Bokeh line chart.


In [16]:
# Pick a parking lot code (first one by default)
sample_lot = df['SystemCodeNumber'].unique()[0]

# Show interactive Bokeh plot
show(plot_price_models_for_lot(sample_lot))


## 📊 Visualization – Price Trends Over Time

We use Bokeh to visualize pricing behavior across the 3 models for a selected parking lot.

**Color Code:**
- 🔵 Model 1
- 🟠 Model 2
- 🟢 Model 3

Observe how the models react to changing demand and context.


## ⏱ Real-Time Streaming with Pathway

Pathway is used to simulate real-time ingestion of parking lot records. We define a schema to match our dataset format and process the data one row at a time (streamed in order by timestamp).

This allows us to apply pricing logic as if new data is arriving live — mimicking real-world sensor input.


In [17]:
# Only run this once in Colab or local environment
!pip install pathway


Collecting pathway
  Downloading pathway-0.24.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/60.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting h3>=4 (from pathway)
  Downloading h3-4.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting python-sat>=0.1.8.dev0 (from pathway)
  Downloading python_sat-1.8.dev17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.metadata (1.5 kB)
Collecting beartype<0.16.0,>=0.14.0 (from pathway)
  Downloading beartype-0.15.0-py3-none-any.whl.metadata (28 kB)
Collecting diskcache>=5.2.1 (from pathway)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting boto3<1.36.0,>=1.26.76 (from pathway)
  Downloading boto3-1.35.99-py3-none-any.whl.metadata (6.7

In [32]:
import pathway as pw
import pandas as pd


In [33]:
# Define the input schema for each parking lot entry
class ParkingRecord(pw.Schema):
    timestamp: str
    parking_lot_id: str
    latitude: float
    longitude: float
    capacity: int
    occupancy: int
    queue_length: int
    vehicle_type: str
    traffic: float
    is_special_day: int


### 🧮 Defining Pricing Logic as a User-Defined Function (UDF)

We use a UDF to calculate the price for each streaming record. This example applies Model 2 logic: demand-based pricing.

The function returns a price that is:
- responsive to real-time context
- normalized
- bounded between $5 and $20


In [35]:
@pw.udf
def compute_price(record):
    # Safely calculate occupancy rate
    occ_rate = record.occupancy / record.capacity if record.capacity > 0 else 0

    # Normalize queue length and traffic
    queue_norm = min(record.queue_length / 10, 1)
    traffic_norm = min(record.traffic / 10, 1)

    # Map vehicle type to weight
    vehicle_weight = {'bike': 0.5, 'car': 1.0, 'truck': 1.5}.get(record.vehicle_type.lower(), 1.0)

    # Linear demand score from weighted factors
    demand = occ_rate + queue_norm - traffic_norm + record.is_special_day + vehicle_weight

    # Normalize demand to [0, 1]
    demand = max(0, min(demand / 5, 1))

    # Apply price formula with clipping
    price = 10 * (1 + demand)
    return round(min(max(price, 5), 20), 2)


### 📡 Streaming Data Ingestion

We now read the dataset using Pathway's streaming mode, which allows us to simulate real-time updates.

Every row from the CSV is streamed in order, processed on-the-fly, and passed to our pricing function.


In [34]:
# Stream input CSV (in real-time simulation mode)
input_table = pw.io.csv.read(
    'dataset_stream.csv',   # Make sure this CSV is preprocessed
    schema=ParkingRecord,
    mode='streaming'  # Key for simulating real-time flow
)


## 📤 Output: Real-Time Price Calculation and Stream

We now compute the dynamic price for each incoming record using the `compute_price()` function.  
Instead of passing the whole record, we explicitly pass each required feature as a separate parameter:

- Occupancy and Capacity → for occupancy rate
- Queue Length
- Traffic Score
- Special Event Indicator
- Vehicle Type → to adjust base cost

The output stream contains only the `timestamp`, `parking_lot_id`, and calculated `price`.

This output can be written to a JSON file or pushed to a real-time dashboard.


In [38]:
# ✅ Apply compute_price function with explicit feature columns from each row
output_table = input_table.select(
    timestamp = pw.this.timestamp,              # Original timestamp
    parking_lot_id = pw.this.parking_lot_id,    # Unique parking lot ID

    # Compute price using UDF, passing normalized features individually
    price = compute_price(
        pw.this.occupancy,                      # Current occupancy
        pw.this.capacity,                       # Total capacity
        pw.this.queue_length,                   # Vehicles in queue
        pw.this.traffic,                        # Traffic score (normalized)
        pw.this.is_special_day,                 # Event/holiday flag
        pw.this.vehicle_type                    # Vehicle type (car, bike, truck)
    )
)


In [27]:
# Launch the real-time simulation
pw.run()


Output()

In [28]:
# Optional: Convert your original CSV into Pathway format
df = pd.read_csv("dataset.csv")

df_out = pd.DataFrame({
    "timestamp": pd.to_datetime(df["LastUpdatedDate"] + ' ' + df["LastUpdatedTime"],
                                format="%d-%m-%Y %H:%M:%S", errors='coerce'),
    "parking_lot_id": df["SystemCodeNumber"],
    "latitude": df["Latitude"],
    "longitude": df["Longitude"],
    "capacity": df["Capacity"],
    "occupancy": df["Occupancy"],
    "queue_length": df["QueueLength"],
    "vehicle_type": df["VehicleType"],
    "traffic": df["TrafficConditionNearby"].map({"low": 2, "medium": 5, "high": 9}),
    "is_special_day": df["IsSpecialDay"]
})

df_out = df_out.dropna()
df_out.to_csv("dataset_stream.csv", index=False)


---

## ✅ Conclusion

- Model 1: Easy to interpret, but naive (no context awareness)
- Model 2: Smart and reactive to real-time factors
- Model 3: Most realistic — adjusts prices contextually and competitively

✅ All models ensure price stays between $5 and $20  
📈 Real-time engine enables responsive and fair pricing  
📊 Visualizations confirm stable and interpretable behavior

---

## 🔮 Future Enhancements

- Learn feature weights (α–ε) using regression or ML
- Integrate live traffic/event APIs
- Use RL for long-term optimization
- Build dashboard/web app interface

---
