##**Dynamic Pricing Strategy: Overview**
- In a dynamic pricing strategy, the aim is to maximize revenue and profitability by pricing items at the right level that balances supply and demand dynamics. It allows businesses to adjust prices dynamically based on factors like time of day, day of the week, customer segments, inventory levels, seasonal fluctuations, competitor pricing, and market conditions.

- To implement a data-driven dynamic pricing strategy, businesses typically require data that can provide insights into customer behaviour, market trends, and other influencing factors. So to create a dynamic pricing strategy, we need to have a dataset based on:

- historical sales data
- customer purchase patterns
- market demand forecasts
- cost data
- customer segmentation data,
- real-time market data.

#Importing Library

In [310]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [311]:
df=pd.read_csv('/content/dynamic_pricing.csv')

In [312]:
df.head()

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Location_Category,Customer_Loyalty_Status,Number_of_Past_Rides,Average_Ratings,Time_of_Booking,Vehicle_Type,Expected_Ride_Duration,Historical_Cost_of_Ride
0,90,45,Urban,Silver,13,4.47,Night,Premium,90,284.257273
1,58,39,Suburban,Silver,72,4.06,Evening,Economy,43,173.874753
2,42,31,Rural,Silver,0,3.99,Afternoon,Premium,76,329.795469
3,89,28,Rural,Regular,67,4.31,Afternoon,Premium,134,470.201232
4,78,22,Rural,Regular,74,3.77,Afternoon,Economy,149,579.681422


In [313]:
df.tail()

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Location_Category,Customer_Loyalty_Status,Number_of_Past_Rides,Average_Ratings,Time_of_Booking,Vehicle_Type,Expected_Ride_Duration,Historical_Cost_of_Ride
995,33,23,Urban,Gold,24,4.21,Morning,Premium,11,91.389526
996,84,29,Urban,Regular,92,4.55,Morning,Premium,94,424.155987
997,44,6,Suburban,Gold,80,4.13,Night,Premium,40,157.36483
998,53,27,Suburban,Regular,78,3.63,Night,Premium,58,279.095048
999,78,63,Rural,Gold,14,4.21,Afternoon,Economy,147,655.065106


In [314]:
df.shape

(1000, 10)

In [315]:
df.describe()

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Number_of_Past_Rides,Average_Ratings,Expected_Ride_Duration,Historical_Cost_of_Ride
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,60.372,27.076,50.031,4.25722,99.588,372.502623
std,23.701506,19.068346,29.313774,0.435781,49.16545,187.158756
min,20.0,5.0,0.0,3.5,10.0,25.993449
25%,40.0,11.0,25.0,3.87,59.75,221.365202
50%,60.0,22.0,51.0,4.27,102.0,362.019426
75%,81.0,38.0,75.0,4.6325,143.0,510.497504
max,100.0,89.0,100.0,5.0,180.0,836.116419


In [316]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Number_of_Riders         1000 non-null   int64  
 1   Number_of_Drivers        1000 non-null   int64  
 2   Location_Category        1000 non-null   object 
 3   Customer_Loyalty_Status  1000 non-null   object 
 4   Number_of_Past_Rides     1000 non-null   int64  
 5   Average_Ratings          1000 non-null   float64
 6   Time_of_Booking          1000 non-null   object 
 7   Vehicle_Type             1000 non-null   object 
 8   Expected_Ride_Duration   1000 non-null   int64  
 9   Historical_Cost_of_Ride  1000 non-null   float64
dtypes: float64(2), int64(4), object(4)
memory usage: 78.3+ KB


In [317]:
df.isnull().sum()

Unnamed: 0,0
Number_of_Riders,0
Number_of_Drivers,0
Location_Category,0
Customer_Loyalty_Status,0
Number_of_Past_Rides,0
Average_Ratings,0
Time_of_Booking,0
Vehicle_Type,0
Expected_Ride_Duration,0
Historical_Cost_of_Ride,0


In [318]:
df.duplicated()

Unnamed: 0,0
0,False
1,False
2,False
3,False
4,False
...,...
995,False
996,False
997,False
998,False


#Relationship between expected ride duration and the historical cost of the ride

In [319]:
fig = px.scatter(df, x='Expected_Ride_Duration',
                 y='Historical_Cost_of_Ride',
                 title='Expected Ride Duration vs. Historical Cost of Ride',
                 trendline='ols')
fig.show()

#Distribution of the historical cost of rides based on the vehicle type

In [320]:
fig = px.box(df, x='Vehicle_Type',
             y='Historical_Cost_of_Ride',
             title='Historical Cost of Ride Distribution by Vehicle Type')
fig.show()

In [321]:
# Select only numeric columns from df
numeric_df = df.select_dtypes(include=['number'])

# Compute correlation matrix only on numeric columns
corr_matrix = numeric_df.corr()

# Plot the heatmap
fig = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.columns,
    y=corr_matrix.columns,
    colorscale='Viridis'
))
fig.update_layout(title='Correlation Matrix')
fig.show()


#Implementing a Dynamic Pricing Strategy:
- The data provided by the company states that the company uses a pricing model that only takes the expected ride duration as a factor to determine the price for a ride. - Now, we will implement a dynamic pricing strategy aiming to adjust the ride costs dynamically based on the demand and supply levels observed in the data.
- It will capture high-demand periods and low-supply scenarios to increase prices, while low-demand periods and high-supply situations will lead to price reductions.

In [322]:
# Calculate demand_multiplier based on percentile for high and low demand
high_demand_percentile = 75
low_demand_percentile = 25

df['demand_multiplier'] = np.where(df['Number_of_Riders'] > np.percentile(df['Number_of_Riders'], high_demand_percentile),
                                     df['Number_of_Riders'] / np.percentile(df['Number_of_Riders'], high_demand_percentile),
                                     df['Number_of_Riders'] / np.percentile(df['Number_of_Riders'], low_demand_percentile))

# Calculate supply_multiplier based on percentile for high and low supply
high_supply_percentile = 75
low_supply_percentile = 25

df['supply_multiplier'] = np.where(df['Number_of_Drivers'] > np.percentile(df['Number_of_Drivers'], low_supply_percentile),
                                     np.percentile(df['Number_of_Drivers'], high_supply_percentile) / df['Number_of_Drivers'],
                                     np.percentile(df['Number_of_Drivers'], low_supply_percentile) / df['Number_of_Drivers'])

# Define price adjustment factors for high and low demand/supply
demand_threshold_high = 1.2
demand_threshold_low = 0.8
supply_threshold_high = 0.8
supply_threshold_low = 1.2

# Calculate adjusted_ride_cost for dynamic pricing
df['adjusted_ride_cost'] = df['Historical_Cost_of_Ride'] * (
    np.maximum(df['demand_multiplier'], demand_threshold_low) *
    np.maximum(df['supply_multiplier'], supply_threshold_high)
)

- Ithe above code, we first calculated the demand multiplier by comparing the number of riders to percentiles representing high and low demand levels. If the number of riders exceeds the percentile for high demand, the demand multiplier is set as the number of riders divided by the high-demand percentile. Otherwise, if the number of riders falls below the percentile for low demand, the demand multiplier is set as the number of riders divided by the low-demand percentile.

- Next, we calculated the supply multiplier by comparing the number of drivers to percentiles representing high and low supply levels. If the number of drivers exceeds the low-supply percentile, the supply multiplier is set as the high-supply percentile divided by the number of drivers. On the other hand, if the number of drivers is below the low-supply percentile, the supply multiplier is set as the low-supply percentile divided by the number of drivers.

In [323]:
#Calculating the profit percentage for each ride
df['profit_percentage']=((df['adjusted_ride_cost']-df['Historical_Cost_of_Ride'])/df['Historical_Cost_of_Ride'])*100
profitable_rides=df[df['profit_percentage'] >0]

#Identify loss rides where profit percentage is negative
loss_rides=df[df['profit_percentage'] <0]

#calculate the count of profitable and loss rides
profitable_count=len(profitable_rides)
loss_count=len(loss_rides)

#create a dount chart to show the distributiond of profitable and loss rides

labels=['profitable_rides', 'loss_rides']
values= [profitable_count,loss_count]

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=0.4)])
fig.update_layout(title='Profitability of Rides (Dynamic Pricing vs Historical Pricing)')
fig.show()

#Relationship between the expected ride duration and the cost of the ride based on the dynamic pricing strategy

In [324]:
fig= px.scatter(df,
                x='Expected_Ride_Duration',
                y= 'adjusted_ride_cost',
                title= 'Expected Ride Duration vs. cost of Ride',
                trendline='ols')

fig.show()

#Training a Predictive Model

In [325]:
from sklearn.preprocessing import StandardScaler

In [326]:
def data_preprocessing_pioline(df):
  #identify the numeric and categorical columns
  numeric_features= df.select_dtypes(include=['float', 'int']).columns
  categorical_features= df.select_dtypes(include=['object']).columns

  #Handel missing value in numeric columns
  df['numeric_features']= df[numeric_features].fillna(df[numeric_features])

  #Detected and handel outliers in numeric feature using IQR

  for feature in numeric_features:
    Q1=df[feature].quantile(0.25)
    Q3=df[feature].quantile(0.75)

    IQR=Q3-Q1

    lower_bound= Q1 - (1.5*IQR)
    upper_bound= Q3 + (1.5* IQR)

    df[feature]= np.where((df[feature] < lower_bound) | (df[feature]> upper_bound),
                          df[feature].mean(), df[feature])

    #Handel missing value in categorical coulmns
    df[categorical_features] =df[categorical_features].fillna(df[categorical_features].mode().iloc[0])

    return df

In [327]:
df['Vechile_Type']= df['Vehicle_Type'].map({'Premium':1,
                                            'Economy':0})

In [328]:
from sklearn.model_selection import train_test_split

In [329]:
x= np.array(df[['Number_of_Drivers','Number_of_Riders','Vechile_Type', 'Expected_Ride_Duration']])
y= np.array(df[['adjusted_ride_cost']])

In [330]:
x

array([[ 45,  90,   1,  90],
       [ 39,  58,   0,  43],
       [ 31,  42,   1,  76],
       ...,
       [  6,  44,   1,  40],
       [ 27,  53,   1,  58],
       [ 63,  78,   0, 147]])

In [331]:
y

array([[ 266.71052777],
       [ 245.65381735],
       [ 424.47868424],
       [ 701.15545229],
       [1952.47242714],
       [ 544.41422748],
       [ 105.58526755],
       [ 356.13637855],
       [2687.92924551],
       [ 768.0622268 ],
       [1588.42985569],
       [ 622.4851571 ],
       [ 539.45179389],
       [ 422.14609829],
       [1505.86681251],
       [1374.36008126],
       [ 315.27699297],
       [ 147.68823481],
       [ 168.28714084],
       [ 481.42048154],
       [  82.84091894],
       [  51.25693804],
       [1716.50088986],
       [ 537.48852993],
       [ 565.57952089],
       [1663.75837665],
       [ 378.21972154],
       [ 625.22348482],
       [ 719.69927557],
       [ 766.34342472],
       [ 359.70811179],
       [ 727.581995  ],
       [ 203.87746369],
       [ 673.14410777],
       [ 594.99201458],
       [ 411.44865118],
       [ 364.94475743],
       [1018.45942346],
       [ 358.18590376],
       [2589.31087885],
       [ 656.27285978],
       [ 232.774

In [332]:
x_train, x_test, y_train, y_test =train_test_split(x, y, train_size=0.2 , random_state=42)

In [333]:
#Reshape y to 1D array
y_train= y_train.ravel()
y_test= y_test.ravel()

In [334]:
from sklearn.ensemble import RandomForestRegressor
model=RandomForestRegressor()
model.fit(x_train, y_train)

In [335]:
def get_vehicle_type_numeric(vehicle_type):
    vehicle_type_mapping = {
        "Premium": 1,
        "Economy": 0
    }
    vehicle_type_numeric = vehicle_type_mapping.get(vehicle_type)
    return vehicle_type_numeric

# Predicting using user input values
def predict_price(number_of_riders, number_of_drivers, vehicle_type, Expected_Ride_Duration):
    vehicle_type_numeric = get_vehicle_type_numeric(vehicle_type)
    if vehicle_type_numeric is None:
        raise ValueError("Invalid vehicle type")

    input_data = np.array([[number_of_riders, number_of_drivers, vehicle_type_numeric, Expected_Ride_Duration]])
    predicted_price = model.predict(input_data)
    return predicted_price

# Example prediction using user input values
user_number_of_riders = 50
user_number_of_drivers = 25
user_vehicle_type = "Economy"
Expected_Ride_Duration = 30
predicted_price = predict_price(user_number_of_riders, user_number_of_drivers, user_vehicle_type, Expected_Ride_Duration)
print("Predicted price:", predicted_price)

Predicted price: [186.7864729]


In [336]:
#predict on the test set
y_pred= model.predict(x_test)
fig= go.Figure()

fig.add_trace(go.Scatter(x=y_test.flatten(), y=y_pred, mode='markers',name='Actual vs Predicted'))

# Add a line representing the ideal case
fig.add_trace(go.Scatter(
    x=[min(y_test.flatten()), max(y_test.flatten())],
    y=[min(y_test.flatten()), max(y_test.flatten())],
    mode='lines',
    name='Ideal',
    line=dict(color='red', dash='dash')
))

fig.update_layout(
    title='Actual vs Predicted Values',
    xaxis_title='Actual Values',
    yaxis_title='Predicted Values',
    showlegend=True,
)

fig.show()

#Conclusion
- In a dynamic pricing strategy, the aim is to maximize revenue and profitability by pricing items at the right level that balances supply and demand dynamics.
- It allows businesses to adjust prices dynamically based on factors like time of day, day of the week, customer segments, inventory levels, seasonal fluctuations, competitor pricing, and market conditions.