In [1]:
import pandas as pd
import numpy as np
import io
from google.colab import files
uploaded = files.upload()

Saving dataset.csv to dataset.csv


In [2]:
filename = list(uploaded.keys())[0]
df = pd.read_csv(io.BytesIO(uploaded[filename]))
df.head()  # preview
df.info()  # data structure
df.shape   #shape of data
df.isnull().sum()  #null values

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18368 entries, 0 to 18367
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      18368 non-null  int64  
 1   SystemCodeNumber        18368 non-null  object 
 2   Capacity                18368 non-null  int64  
 3   Latitude                18368 non-null  float64
 4   Longitude               18368 non-null  float64
 5   Occupancy               18368 non-null  int64  
 6   VehicleType             18368 non-null  object 
 7   TrafficConditionNearby  18368 non-null  object 
 8   QueueLength             18368 non-null  int64  
 9   IsSpecialDay            18368 non-null  int64  
 10  LastUpdatedDate         18368 non-null  object 
 11  LastUpdatedTime         18368 non-null  object 
dtypes: float64(2), int64(5), object(5)
memory usage: 1.7+ MB


Unnamed: 0,0
ID,0
SystemCodeNumber,0
Capacity,0
Latitude,0
Longitude,0
Occupancy,0
VehicleType,0
TrafficConditionNearby,0
QueueLength,0
IsSpecialDay,0


## Loading and Exploring the Dataset

We successfully loaded the dataset, which contains **18,368 entries** across **12 columns**.

### Key Observations:

- Each record corresponds to a unique timestamped update from a parking lot.
- The columns are categorized as follows:


#### Parking Lot Features
- **Capacity**: Total number of parking spaces in the lot
- **Occupancy**: Number of currently occupied spaces
- **QueueLength**: Vehicles waiting to enter the lot

#### Vehicle and Environment Information
- **VehicleType**: Incoming vehicle type (e.g., Car, Bike, Truck)
- **TrafficConditionNearby**: Text description of congestion
- **IsSpecialDay**: 1 if it’s a holiday or event day, else 0

#### Timestamps
- **LastUpdatedDate**, **LastUpdatedTime**: For ordering the stream in real-time
- **SystemCodeNumber**: Acts as a unique parking lot identifier

### Data Quality

- No missing values in any column
- Data types are consistent with expected formats:
  - Numeric for counts and coordinates
  - Categorical/text for types and timestamps

The dataset is clean and ready for modeling.


In [3]:
# Model 1: Linear Pricing Based on Occupancy

ALPHA = 2.0  # linear increase factor
BASE_PRICE = 10  # initial price

def linear_pricing(row, current_price=BASE_PRICE):
    occupancy_ratio = row['Occupancy'] / row['Capacity']
    price = current_price + ALPHA * occupancy_ratio
    return round(min(max(price, 5), 15), 2)

# Applying model to DataFrame
df['Price_Model_1'] = df.apply(lambda row: linear_pricing(row), axis=1)

# Previewing results
df[['SystemCodeNumber', 'Occupancy', 'Capacity', 'Price_Model_1']].head()

Unnamed: 0,SystemCodeNumber,Occupancy,Capacity,Price_Model_1
0,BHMBCCMKT01,61,577,10.21
1,BHMBCCMKT01,64,577,10.22
2,BHMBCCMKT01,80,577,10.28
3,BHMBCCMKT01,107,577,10.37
4,BHMBCCMKT01,150,577,10.52


## Model 1 : Linear Pricing Based on Occupancy

Implemented a simple baseline pricing model where the price increases linearly with the occupancy rate of each parking lot.

### Formula:
$$
\text{Price}_{t+1} = \text{Price}_t + \alpha \cdot \left( \frac{\text{Occupancy}}{\text{Capacity}} \right)
$$

### Where:
- $\text{Price}_t$: Current price (starts at \$10)
- $\alpha$: Tuning factor that controls how steeply price increases (we used 2.0)
- $\frac{\text{Occupancy}}{\text{Capacity}}$: Utilization ratio of the lot


- **Base Price**: $10

- **Tuning Factor (α)**: 2.0
- **Price Range**: Clamped between 5 and 15

This model helps simulate basic supply-demand behavior — higher occupancy drives prices up to manage congestion.

We applied this model to each row in the dataset, treating every record as an independent pricing event.

In [4]:
# Converting all strings to lowercase
df['VehicleType'] = df['VehicleType'].str.lower()
df['TrafficConditionNearby'] = df['TrafficConditionNearby'].str.lower()

In [5]:
# Model 2 : Demand-Based Pricing

# Step 1 : Mapping vehicle weights and traffic levels
vehicle_weights = {
    'bike': 0.3,
    'car': 0.6,
    'truck': 1.0,
    'cycle': 0.3  # treat like bike
}

traffic_weights = {
    'low': 0.2,
    'medium': 0.5,
    'high': 0.9,
    'average': 0.4
}

df['VehicleWeight'] = df['VehicleType'].map(vehicle_weights)
df['Traffic'] = df['TrafficConditionNearby'].map(traffic_weights)

# Step 2: Defining parameters
params = {
    'alpha': 1.0,
    'beta': 0.4,
    'gamma': 0.7,
    'delta': 0.5,
    'epsilon': 1.2,
    'lambda': 0.8,
    'base_price': 10
}

# Step 3: Calculating Demand Score
def compute_demand(row):
    occ_ratio = row['Occupancy'] / row['Capacity']
    demand = (
        params['alpha'] * occ_ratio +
        params['beta'] * row['QueueLength'] -
        params['gamma'] * row['Traffic'] +
        params['delta'] * row['IsSpecialDay'] +
        params['epsilon'] * row['VehicleWeight']
    )
    return demand

df['Raw_Demand'] = df.apply(compute_demand, axis=1)

# Step 4: Normalize demand (0–1)
min_d, max_d = df['Raw_Demand'].min(), df['Raw_Demand'].max()
df['Norm_Demand'] = (df['Raw_Demand'] - min_d) / (max_d - min_d + 1e-6)

# Step 5: Final Price
df['Price_Model_2'] = params['base_price'] * (1 + params['lambda'] * df['Norm_Demand'])
df['Price_Model_2'] = df['Price_Model_2'].clip(5, 15).round(2)


# Preview
df[['SystemCodeNumber', 'Occupancy', 'QueueLength', 'VehicleType', 'IsSpecialDay', 'TrafficConditionNearby', 'Price_Model_2']].head()

Unnamed: 0,SystemCodeNumber,Occupancy,QueueLength,VehicleType,IsSpecialDay,TrafficConditionNearby,Price_Model_2
0,BHMBCCMKT01,61,1,car,0,low,10.94
1,BHMBCCMKT01,64,1,car,0,low,10.94
2,BHMBCCMKT01,80,2,car,0,low,11.41
3,BHMBCCMKT01,107,2,car,0,low,11.47
4,BHMBCCMKT01,150,2,bike,0,low,11.15


### Formula:

The Model 2 pricing depends on a demand score computed from multiple features:

#### Demand Function:
$$
\text{Demand} = \alpha \cdot \left( \frac{\text{Occupancy}}{\text{Capacity}} \right) + \beta \cdot \text{QueueLength} - \gamma \cdot \text{Traffic} + \delta \cdot \text{IsSpecialDay} + \epsilon \cdot \text{VehicleTypeWeight}
$$

#### Final Price Calculation:
$$
\text{Price}_{t} = \text{BasePrice} \times \left( 1 + \lambda \cdot \text{NormalizedDemand} \right)
$$



### Where:
- \( alpha = 1.0 \): Weight for occupancy
- \( beta = 0.4 \): Weight for queue length
- \( gamma = 0.7 \): Weight for traffic (penalizes demand)
- \( delta = 0.5 \): Boost for special days
- \( epsilon = 1.2 \): Weight for vehicle type (e.g., truck > car > cycle)
- \( lambda = 0.8 \): Scaling factor for how strongly demand affects price
- \( {BasePrice} = \$10 \)


### Price is Clamped Between:
- **Minimum**: \$5  
- **Maximum**: \$15  
To ensure smooth and realistic pricing behavior.

In [None]:
!pip install bokeh



In [6]:
# Importing necessary libraries
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource

output_notebook()

# Filtering Data for a Specific Lot
lot_id = "BHMBCCMKT01"

# Filtering rows for selected lot and reset index safely
lot_df = df[df['SystemCodeNumber'] == lot_id].reset_index(drop=True)

# Creating Bokeh-compatible data source
source = ColumnDataSource(data={
    'Time': lot_df.index,  # simple numeric index for time steps
    'Price': lot_df['Price_Model_2']
})
# Creating Bokeh Line Plot
p = figure(
    title=f"Model 2 Dynamic Price – Lot: {lot_id}",
    x_axis_label='Time Step',
    y_axis_label='Price ($)',
    width=800,
    height=400
)

p.line('Time', 'Price', source=source, line_width=2, color="navy", legend_label="Model 2 Price")
p.circle('Time', 'Price', source=source, size=6, color="orange")

p.legend.location = "top_left"
p.grid.grid_line_alpha = 0.3

show(p)



### Dynamic Price Visualization using Bokeh for Model 2

This interactive plot shows how the **Model 2 (Demand-Based)** price changes over time for a selected parking lot (**BHMBCCMKT01**). Each data point corresponds to a 30-minute interval from the real-time simulation dataset.

#### Key Insights:
- Price variation is smooth and bounded between **\$5 and \$15** as required.
- Demand is computed using a combination of features:
  - Occupancy rate
  - Queue length
  - Traffic conditions
  - Special day indicator
  - Vehicle type

#### Plot Details:
- **Blue line**: Represents the price calculated at each time step using the demand-based formula.
- **Orange circles**: Represent individual time step values (for clearer visibility of variations).

This visualization confirms that the model reacts dynamically to real-time features and maintains realistic price fluctuations.

In [7]:
# Replace with any lot ID you'd like to visualize
lot_id = "BHMBCCMKT01"

# Filtering data for selected lot
lot_df = df[df['SystemCodeNumber'] == lot_id].reset_index(drop=True)

source = ColumnDataSource(data={
    'Time': lot_df.index,
    'Model1': lot_df['Price_Model_1'],
    'Model2': lot_df['Price_Model_2'],
})

p = figure(
    title=f"📊 Price Comparison – Model 1 vs Model 2 ({lot_id})",
    x_axis_label='Time Step',
    y_axis_label='Price ($)',
    width=850,
    height=400
)

# Model 1 line
p.line('Time', 'Model1', source=source, line_width=2, color="green", legend_label="Model 1 – Linear")
p.circle('Time', 'Model1', source=source, size=5, color="green", alpha=0.5)

# Model 2 line
p.line('Time', 'Model2', source=source, line_width=2, color="navy", legend_label="Model 2 – Demand-Based")
p.circle('Time', 'Model2', source=source, size=5, color="orange", alpha=0.6)

p.legend.location = "top_left"
p.grid.grid_line_alpha = 0.3

show(p)



### Model 1 vs Model 2 : Pricing Comparison

This interactive plot compares pricing behavior between:

- **Model 1 – Linear Pricing**: A baseline model where price increases linearly based on occupancy.
- **Model 2 – Demand-Based Pricing**: A more intelligent model that adjusts price based on multiple demand-related factors including queue length, traffic, special events, and vehicle type.

#### Observations:
- **Model 1 (green)** exhibits smoother, lower-range variation, since it considers only occupancy.
- **Model 2 (blue)** shows sharper, more dynamic price changes as it reacts to multiple real-time features.
- Both models stay within the **\$5–\$15** required bounds.

#### Visual Elements:
- Green line and dots: Model 1 price over time  
- Navy blue line with orange dots: Model 2 price over time

This comparison helps evaluate the effectiveness of incorporating multi-factor demand analysis into dynamic pricing.

In [9]:
!pip install pathway --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.4/149.4 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.7/69.7 MB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.6/77.6 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m777.6/777.6 kB[0m [31m40.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.5/26.5 MB[0m [31m56.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [10]:
import pathway as pw

In [7]:
import pandas as pd

df = pd.read_csv("dataset.csv")

stream_cols = [
    'SystemCodeNumber',
    'Occupancy',
    'Capacity',
    'QueueLength',
    'VehicleType',
    'IsSpecialDay',
    'TrafficConditionNearby'
]

df_stream = df[stream_cols].copy()
df_stream.insert(0, 'Timestamp', range(1, len(df_stream) + 1))
df_stream.to_csv("stream_input.csv", index=False)

@pw.udf
def compute_price(occupancy, capacity, queue, traffic_str, special_day, vehicle_type_str):
    traffic_weights = {'low': 0.2, 'average': 0.4, 'medium': 0.5, 'high': 0.9}
    vehicle_weights = {'car': 0.6, 'bike': 0.3, 'cycle': 0.3, 'truck': 1.0}
    traffic = traffic_weights.get(str(traffic_str).lower(), 0.4)
    vehicle_weight = vehicle_weights.get(str(vehicle_type_str).lower(), 0.5)

    alpha, beta, gamma, delta, epsilon = 1.0, 0.4, 0.7, 0.5, 1.2
    base_price, lambd = 10, 0.8
    occ_ratio = occupancy / capacity if capacity else 0
    demand = (
        alpha * occ_ratio +
        beta * queue -
        gamma * traffic +
        delta * special_day +
        epsilon * vehicle_weight
    )
    min_d, max_d = 0, 4
    norm_d = (demand - min_d) / (max_d - min_d + 1e-6)
    price = base_price * (1 + lambd * norm_d)
    return round(min(max(price, 5), 15), 2)

class ParkingEvent(pw.Schema):
    Timestamp: int
    SystemCodeNumber: str
    Occupancy: int
    Capacity: int
    QueueLength: int
    VehicleType: str
    IsSpecialDay: int
    TrafficConditionNearby: str

input_stream = pw.io.csv.read(
    "stream_input.csv",
    schema=ParkingEvent,
    mode="streaming"
)

@pw.table_transformer
def pricing_model(events: pw.Table[ParkingEvent]) -> pw.Table:
    return events.select(
        Timestamp=events.Timestamp,
        SystemCodeNumber=events.SystemCodeNumber,
        Price=compute_price(
            events.Occupancy,
            events.Capacity,
            events.QueueLength,
            events.TrafficConditionNearby,
            events.IsSpecialDay,
            events.VehicleType
        )
    )

output = pricing_model(input_stream)
pw.io.jsonlines.write(output, filename="output_stream.jsonl")
pw.run()

Output()



KeyboardInterrupt: 

We simulate real-time behavior using Pathway's streaming mode. The system remains active after processing the dataset, mimicking a real-time feed. Manual interruption is used after stream completion.

### Real-Time Streaming with Pathway

iimplemented real-time processing using Pathway’s `streaming` mode. The CSV is timestamped to simulate chronological data flow.

Pathway continuously processes incoming rows and applies our pricing logic, as would happen in a live system.

Since the dataset is finite, we stop the execution manually after it completes processing.


In [8]:
df_output = pd.read_json("output_stream.jsonl", lines=True)
df_output.head()

Unnamed: 0,Timestamp,SystemCodeNumber,Price,diff,time
0,13868,Others-CCCPS202,15.0,1,1751820069390
1,648,BHMBCCMKT01,13.23,1,1751820069390
2,10042,Others-CCCPS105a,15.0,1,1751820069390
3,9745,Others-CCCPS105a,15.0,1,1751820069390
4,10450,Others-CCCPS105a,15.0,1,1751820069390


In [12]:
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool
output_notebook()

def plot_price_trend(df, lot_id):
    df_lot = df[df['SystemCodeNumber'] == lot_id].sort_values('Timestamp')
    source = ColumnDataSource(df_lot)

    p = figure(
    title=f"Price Trend for Lot {lot_id}",
    x_axis_label='Time',
    y_axis_label='Price',
    width=700,
    height=400
)

    p.line(x='Timestamp', y='Price', source=source, line_width=2, color='navy')
    p.circle(x='Timestamp', y='Price', source=source, size=4, color='red')

    p.add_tools(HoverTool(tooltips=[("Time", "@Timestamp"), ("Price", "@Price")]))
    show(p)

plot_price_trend(df_output, 'BHMBCCMKT01')



### Real-Time Price Visualization

Visualizing how prices change over time for individual parking lots. The plot below shows the output from our dynamic pricing model in real-time, as processed by Pathway.

Each point represents the price at a specific timestamp, allowing us to observe how demand (influenced by factors like traffic, queue, and occupancy) impacts the pricing strategy.

Hover over the points to see exact price and time values.

In [13]:
from google.colab import files
files.download("output_stream.jsonl")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Conclusion

This project delivers a complete real-time dynamic pricing system for urban parking lots, designed and deployed using real-world data.

Beginning with two pricing strategies:
- **Model 1**, a simple linear approach based on occupancy, offering interpretability and a solid baseline.
- **Model 2**, a more responsive demand-based model that factors in queue length, traffic conditions, special day indicators, and vehicle type weights.

To validate their behavior, I visualized pricing trends using Bokeh, both individually and comparatively, before integrating the final logic into a live streaming pipeline using **Pathway**. This allowed me to simulate real-time price updates based on continuous data ingestion.

The result is a responsive, adaptive system that adjusts pricing based on real-time demand signals, demonstrating how data, logic, and streaming can converge to solve tangible problems in urban planning.

> A scalable foundation for smarter cities, and a strong proof of concept for data-driven dynamic pricing.