# Dynamic Pricing for Urban Parking Lots - Capstone Project

### 1. Project Setup and Dependencies
First, we need to install the necessary libraries. We'll use `pathway` for building the real-time data pipeline and `bokeh` for visualization.

In [1]:
!pip install pathway bokeh --quiet

### 2. Importing Libraries and Defining the Data Schema
Next, we import the required modules and define a schema for our input data. The schema ensures that our data pipeline correctly interprets the data types from the CSV file.

In [4]:
import pathway as pw
from datetime import datetime
from pathway.stdlib.ml.index import KNNIndex
# Define the schema for our parking data
class ParkingData(pw.Schema):
    ParkingSpaceID: int
    Timestamp: str
    Occupancy: int
    Capacity: int
    IsSpecialDay: bool
    TrafficConditionNearby: str
    CompetitorPrice: float

ModuleNotFoundError: No module named 'pathway.stdlib'
This is not the real Pathway package.
Visit https://pathway.com/developers/ to get Pathway.
Already tried that? Visit https://pathway.com/troubleshooting/ to get help.
Note: your platform is Windows-11-10.0.26100-SP0, your Python is CPython 3.13.5.

### 3. Building the Real-Time Data Pipeline
We will now build a real-time data pipeline using Pathway. This pipeline will read data from `dataset.csv`, process it in real-time, and calculate dynamic prices.

In [None]:
# Read the data from the CSV file as a real-time stream
parking_stream = pw.io.csv.read(
    './dataset.csv',
    schema=ParkingData,
    mode='streaming',
    autocommit_duration_ms=1000,
)
# Preprocess the timestamp
parking_stream = parking_stream.with_columns(
    Timestamp=pw.apply(lambda ts: datetime.strptime(ts, '%Y-%m-%d %H:%M:%S'), parking_stream.Timestamp)
)

### 4. Model 1: Baseline Linear Model
We start with a simple linear model where the price increases with occupancy. This serves as a baseline to compare against more advanced models.

In [None]:
# Define the baseline pricing function
def baseline_price_model(occupancy, capacity):
    base_price = 10.0
    alpha = 5.0  # Sensitivity factor
    
    if capacity == 0:
        return base_price
    
    occupancy_ratio = occupancy / capacity
    price = base_price + alpha * occupancy_ratio
    return round(price, 2)

# Apply the baseline pricing model
baseline_prices = parking_stream.with_columns(
    price=pw.apply(baseline_price_model, parking_stream.Occupancy, parking_stream.Capacity)
)

# Visualize the baseline prices
pw.io.bokeh.write(
    baseline_prices,
    x='Timestamp',
    y='price',
    title='Model 1: Baseline Linear Pricing',
    x_axis_label='Time',
    y_axis_label='Price (units)'
)

### 5. Model 2: Demand-Based Price Function
Now, we implement a more advanced model that considers demand volatility, traffic, special days, and competitor pricing.

In [None]:
# Define a daily window for aggregation
daily_window = pw.temporal.tumbling_windows(
    parking_stream.Timestamp,
    duration=pw.temporal.hours(24)
)

# Aggregate data to find min and max occupancy per day
daily_occupancy_stats = parking_stream.groupby(parking_stream.ParkingSpaceID, window=daily_window).reduce(
    parking_stream.ParkingSpaceID,
    min_occupancy=pw.reducers.min(parking_stream.Occupancy),
    max_occupancy=pw.reducers.max(parking_stream.Occupancy),
    capacity=pw.reducers.max(parking_stream.Capacity),  # Capacity is constant for a space
    is_special_day=pw.reducers.last(parking_stream.IsSpecialDay),
    traffic_condition=pw.reducers.last(parking_stream.TrafficConditionNearby),
    competitor_price=pw.reducers.last(parking_stream.CompetitorPrice)
)

#### Demand Function
Our dynamic pricing model now incorporates multiple factors to determine the price:

*   **Occupancy Volatility**: `(MaxOccupancy - MinOccupancy) / Capacity` - This captures the daily demand fluctuation.
*   **Traffic Conditions**: A multiplier is applied based on nearby traffic (`Low`: 1.0x, `Medium`: 1.2x, `High`: 1.5x).
*   **Special Days**: A surcharge of 5 units is added on special days (e.g., holidays, events).
*   **Competitor Pricing**: The final price is adjusted to be competitive, staying within a 10% range of the competitor's price.

The formula is a combination of these factors, starting from a `BasePrice` of 10.

#### Assumptions
1.  **Data represents demand**: `Occupancy` is a proxy for demand.
2.  **Factor Importance**: The chosen multipliers and surcharges reflect the assumed importance of each factor.
3.  **Competitor Influence**: We aim to be competitive, not necessarily the cheapest.
4.  **Real-time simulation**: The CSV data is streamed to simulate a live environment.

In [None]:
# Define the pricing function
def calculate_price(max_occupancy, min_occupancy, capacity, is_special_day, traffic_condition, competitor_price):
    base_price = 10.0

    if capacity == 0:
        return base_price

    # Demand factor based on occupancy volatility
    demand_factor = (max_occupancy - min_occupancy) / capacity

    # Traffic condition multiplier
    traffic_multiplier = 1.0
    if traffic_condition == 'Medium':
        traffic_multiplier = 1.2
    elif traffic_condition == 'High':
        traffic_multiplier = 1.5

    # Special day surcharge
    special_day_surcharge = 0.0
    if is_special_day:
        special_day_surcharge = 5.0

    # Calculate price based on our factors
    price = base_price + (demand_factor * 10) + special_day_surcharge
    price *= traffic_multiplier

    # Adjust based on competitor price, ensuring our price is competitive
    if price > competitor_price * 1.1:
        price = competitor_price * 1.1
    elif price < competitor_price * 0.9:
        price = competitor_price * 0.9
    
    return round(price, 2)

# Apply the pricing model to our aggregated data
daily_prices = daily_occupancy_stats.with_columns(
    price=pw.apply(calculate_price, 
                   daily_occupancy_stats.max_occupancy, 
                   daily_occupancy_stats.min_occupancy, 
                   daily_occupancy_stats.capacity,
                   daily_occupancy_stats.is_special_day,
                   daily_occupancy_stats.traffic_condition,
                   daily_occupancy_stats.competitor_price)
)

### 6. Visualization for Model 2
This visualization shows the prices calculated by the more advanced demand-based model.

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
# This is necessary to display Bokeh plots in the notebook
output_notebook() 
# Set up the real-time visualization sink
pw.io.bokeh.write(
    daily_prices,
    x='time',
    y='price',
    sorting_col='time',
    title='Model 2: Demand-Based Dynamic Pricing',
    x_axis_label='Time',
    y_axis_label='Price (units)',
    width=800,
    height=400
)

### 7. Running the Pipelines
Now, we run the pipelines. This will start the data streaming, processing, and visualization for both models. You should see two Bokeh plots appear below, which will update in real-time.

In [None]:
# Run the pipeline. This is a blocking call and will run indefinitely.
# In a notebook, you may need to interrupt the kernel to stop it.
pw.run()