#**DYNAMIC PRICING STATEGY**


DATASET=['https://www.kaggle.com/datasets/arashnic/dynamic-pricing-dataset?select=dynamic_pricing.csv']


---



#**Introduction:**
In this project, we aim to optimize ride-sharing fares through dynamic pricing using data-driven techniques. Our goal is to develop a predictive model that can dynamically adjust prices based on real-time market conditions. We'll analyze a dataset containing historical ride data and build a model to predict optimal fares for rides based on various factors.

#**Dataset Review:**

The dataset includes features such as number of riders, number of drivers, location category, customer loyalty status, number of past rides, average ratings, time of booking, vehicle type, expected ride duration, and historical cost of the rides.

We conducted exploratory data analysis (EDA) to understand the distribution of features, identify patterns, and preprocess the data for modeling.

The dataset was split into training and testing sets for model evaluation.

**IMPORTING LIBRARIES**

In [57]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

In [58]:
data=pd.read_csv('/content/dynamic_pricing.csv')
data

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Location_Category,Customer_Loyalty_Status,Number_of_Past_Rides,Average_Ratings,Time_of_Booking,Vehicle_Type,Expected_Ride_Duration,Historical_Cost_of_Ride
0,90,45,Urban,Silver,13,4.47,Night,Premium,90,284.257273
1,58,39,Suburban,Silver,72,4.06,Evening,Economy,43,173.874753
2,42,31,Rural,Silver,0,3.99,Afternoon,Premium,76,329.795469
3,89,28,Rural,Regular,67,4.31,Afternoon,Premium,134,470.201232
4,78,22,Rural,Regular,74,3.77,Afternoon,Economy,149,579.681422
...,...,...,...,...,...,...,...,...,...,...
995,33,23,Urban,Gold,24,4.21,Morning,Premium,11,91.389526
996,84,29,Urban,Regular,92,4.55,Morning,Premium,94,424.155987
997,44,6,Suburban,Gold,80,4.13,Night,Premium,40,157.364830
998,53,27,Suburban,Regular,78,3.63,Night,Premium,58,279.095048


In [59]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Number_of_Riders         1000 non-null   int64  
 1   Number_of_Drivers        1000 non-null   int64  
 2   Location_Category        1000 non-null   object 
 3   Customer_Loyalty_Status  1000 non-null   object 
 4   Number_of_Past_Rides     1000 non-null   int64  
 5   Average_Ratings          1000 non-null   float64
 6   Time_of_Booking          1000 non-null   object 
 7   Vehicle_Type             1000 non-null   object 
 8   Expected_Ride_Duration   1000 non-null   int64  
 9   Historical_Cost_of_Ride  1000 non-null   float64
dtypes: float64(2), int64(4), object(4)
memory usage: 78.2+ KB


In [60]:
data.isnull().sum()

Number_of_Riders           0
Number_of_Drivers          0
Location_Category          0
Customer_Loyalty_Status    0
Number_of_Past_Rides       0
Average_Ratings            0
Time_of_Booking            0
Vehicle_Type               0
Expected_Ride_Duration     0
Historical_Cost_of_Ride    0
dtype: int64

In [61]:
data.describe()

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Number_of_Past_Rides,Average_Ratings,Expected_Ride_Duration,Historical_Cost_of_Ride
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,60.372,27.076,50.031,4.25722,99.588,372.502623
std,23.701506,19.068346,29.313774,0.435781,49.16545,187.158756
min,20.0,5.0,0.0,3.5,10.0,25.993449
25%,40.0,11.0,25.0,3.87,59.75,221.365202
50%,60.0,22.0,51.0,4.27,102.0,362.019426
75%,81.0,38.0,75.0,4.6325,143.0,510.497504
max,100.0,89.0,100.0,5.0,180.0,836.116419


1.Number of Riders: On average, there are around 60 riders per ride, with a minimum of 20 and a maximum of 100. The standard deviation is approximately 23.7, indicating some variability in the number of riders.

2.Number of Drivers: The average number of drivers per ride is about 27, with a minimum of 5 and a maximum of 89. The standard deviation is around 19.1.

3.Number of Past Rides: The average number of past rides for customers is approximately 50, with a minimum of 0 and a maximum of 100. The standard deviation is about 29.3.

4.Average Ratings: The average rating for rides is about 4.26, with a minimum of 3.5 and a maximum of 5. The standard deviation is approximately 0.44.

5.Expected Ride Duration: The average expected ride duration is around 99.6 minutes, with a minimum of 10 minutes and a maximum of 180 minutes. The standard deviation is about 49.2.

6.Historical Cost of Ride: The average historical cost of rides is approximately $372.50, with a minimum of $26 and a maximum of $836. The standard deviation is around $187.16.

###**EDA**

###**Expected Ride Duration vs. Historical Cost of Ride**

In [63]:
fig = px.scatter(data, x='Expected_Ride_Duration',
                 y='Historical_Cost_of_Ride',
                 title='Expected Ride Duration vs. Historical Cost of Ride',
                 trendline='ols')
fig.show()

##**Historical Cost of Ride Distribution by Vehicle Type**

In [64]:
fig = px.box(data, x='Vehicle_Type',
             y='Historical_Cost_of_Ride',
             title='Historical Cost of Ride Distribution by Vehicle Type')
fig.show()

##**CORRELATION MATRIX**

In [67]:
corr_matrix = data.corr()

colorscale = [[0.0, 'rgb(255,255,255)'],
              [0.2, 'rgb(235,255,235)'],
              [0.4, 'rgb(200,255,200)'],
              [0.6, 'rgb(150,255,150)'],
              [0.8, 'rgb(100,255,100)'],
              [1.0, 'rgb(0,200,0)']]

fig = go.Figure(data=go.Heatmap(z=corr_matrix.values,
                                 x=corr_matrix.columns,
                                 y=corr_matrix.columns,
                                 colorscale=colorscale))

fig.update_layout(title='Correlation Matrix',
                  xaxis=dict(title='Features'),
                  yaxis=dict(title='Features'))
fig.show()






**Calculate adjusted ride costs based on dynamic pricing strategies considering demand and supply factors.**

**DEFINE PERCENTILE FOR HIGH & LOW DEMAND**

In [80]:
high_demand_percentile = 75
low_demand_percentile = 25


**DEMAND MULTIPLER BASED ON NUMBER OF RIDERS**

We calculate a demand multiplier based on the number of riders. This multiplier indicates the level of demand for the ride relative to the given percentile thresholds.

In [81]:
data['demand_multiplier'] = np.where(
    data['Number_of_Riders'] > np.percentile(data['Number_of_Riders'], high_demand_percentile),
    data['Number_of_Riders'] / np.percentile(data['Number_of_Riders'], high_demand_percentile),
    data['Number_of_Riders'] / np.percentile(data['Number_of_Riders'], low_demand_percentile)
)

high_supply_percentile = 75
low_supply_percentile = 25

**supply multiplier based on number of drivers**

 calculate a supply multiplier based on the number of drivers. This multiplier indicates the level of supply availability relative to the given percentile thresholds.

In [82]:
data['supply_multiplier'] = np.where(
    data['Number_of_Drivers'] > np.percentile(data['Number_of_Drivers'], low_supply_percentile),
    np.percentile(data['Number_of_Drivers'], high_supply_percentile) / data['Number_of_Drivers'],
    np.percentile(data['Number_of_Drivers'], low_supply_percentile) / data['Number_of_Drivers']
)
demand_threshold_high = 1.2
demand_threshold_low = 0.8
supply_threshold_high = 0.8
supply_threshold_low = 1.2


**CALCULATE ADJUSTED RIDE COST FOR DYNAMIC PRICING**

We define thresholds for adjusting prices based on demand and supply levels. These thresholds determine how much the fares will be adjusted depending on the demand and supply multipliers.

 calculate the adjusted ride cost by applying the demand and supply adjustments to the historical cost of the ride.

In [84]:
data['adjusted_ride_cost'] = data['Historical_Cost_of_Ride'] * (
    np.maximum(data['demand_multiplier'], demand_threshold_low) *
    np.maximum(data['supply_multiplier'], supply_threshold_high)
)



---



###**Analyzing Profitability of Dynamic Pricing Strategy**

Profit Percentage Calculation: Compute profit percentage for each ride by comparing adjusted ride costs with historical costs.

Identify Profitable and Loss Rides: Segment rides into profitable and loss categories based on positive and negative profit percentages.

Count Profitable and Loss Rides: Determine the number of rides in each category.

In [69]:
data['profit_percentage'] = ((data['adjusted_ride_cost'] - data['Historical_Cost_of_Ride']) / data['Historical_Cost_of_Ride']) * 100

profitable_rides = data[data['profit_percentage'] > 0]
loss_rides = data[data['profit_percentage'] < 0]

profitable_count = len(profitable_rides)
loss_count = len(loss_rides)

labels = ['Profitable Rides', 'Loss Rides']
values = [profitable_count, loss_count]

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=0.4)])

fig.update_layout(title='Profitability of Rides (Dynamic Pricing vs. Historical Pricing)',
                  paper_bgcolor='rgb(240, 240, 240)')
fig.update_traces(marker=dict(colors=['rgb(65, 185, 130)', 'rgb(235, 90, 90)']))
fig.show()


**Expected Ride Duration vs. Cost of Ride**

In [70]:
import plotly.express as px

ig = px.scatter(data,
                 x='Expected_Ride_Duration',
                 y='adjusted_ride_cost',
                 title='Expected Ride Duration vs. Cost of Ride',
                 trendline='ols')

fig.update_traces(marker_color='rgb(65, 185, 130)',)
fig.update_layout(template='plotly_white')
fig.show()


###**preprocessing and training model**

identifying numeric and categorical variable

In [85]:
def data_preprocessing_pipeline(data):
    numeric_features = data.select_dtypes(include=['float', 'int']).columns
    categorical_features = data.select_dtypes(include=['object']).columns

    return data, numeric_features, categorical_features


**handling missing values**

In [86]:
def handle_missing_values_numeric(data, numeric_features):

    data[numeric_features] = data[numeric_features].fillna(data[numeric_features].mean())
    return data

    def handle_missing_values_categorical(data, categorical_features):
       data[categorical_features] = data[categorical_features].fillna(data[categorical_features].mode().iloc[0])
    return data


**handling outliers using iqr method**

In [87]:
def handle_outliers_numeric(data, numeric_features):

       for feature in numeric_features:
        Q1 = data[feature].quantile(0.25)
        Q3 = data[feature].quantile(0.75)
        IQR = Q3 - Q1
        lower_bound = Q1 - (1.5 * IQR)
        upper_bound = Q3 + (1.5 * IQR)
        data[feature] = np.where((data[feature] < lower_bound) | (data[feature] > upper_bound),
                                 data[feature].mean(), data[feature])
    return data


**mapping vehicle function**

Maps the categorical variable Vehicle_Type to numeric values, where "Premium" is mapped to 1 and "Economy" is mapped to 0.

In [73]:
data["Vehicle_Type"] = data["Vehicle_Type"].map({"Premium": 1, "Economy": 0})


**splitting data and model training**

In [74]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

x = data[["Number_of_Riders", "Number_of_Drivers", "Vehicle_Type", "Expected_Ride_Duration"]].values
y = data["adjusted_ride_cost"].values

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

y_train = y_train.ravel()
y_test = y_test.ravel()

model = RandomForestRegressor()
model.fit(x_train, y_train)


In [75]:

def get_vehicle_type_numeric(vehicle_type):
    vehicle_type_mapping = {
        "Premium": 1,
        "Economy": 0
    }
    vehicle_type_numeric = vehicle_type_mapping.get(vehicle_type)
    if vehicle_type_numeric is None:
        raise ValueError("Invalid vehicle type")
    return vehicle_type_numeric

def predict_price(number_of_riders, number_of_drivers, vehicle_type, Expected_Ride_Duration):

    vehicle_type_numeric = get_vehicle_type_numeric(vehicle_type)


    input_data = np.array([[number_of_riders, number_of_drivers, vehicle_type_numeric, Expected_Ride_Duration]])


    predicted_price = model.predict(input_data)
    return predicted_price

user_number_of_riders = 50
user_number_of_drivers = 25
user_vehicle_type = "Economy"
user_expected_ride_duration = 30

predicted_price = predict_price(user_number_of_riders, user_number_of_drivers, user_vehicle_type, user_expected_ride_duration)
print("Predicted price:", predicted_price)


Predicted price: [265.07339438]


In [76]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

y_pred = model.predict(x_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("R-squared (R2) Score:", r2)


Mean Absolute Error (MAE): 122.04917910443898
Mean Squared Error (MSE): 30542.414676065688
R-squared (R2) Score: 0.8666709242957986


In [89]:
y_pred = model.predict(x_test)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=y_test.flatten(),
    y=y_pred,
    mode='markers',
    name='Actual vs Predicted'
))

fig.add_trace(go.Scatter(
    x=[min(y_test.flatten()), max(y_test.flatten())],
    y=[min(y_test.flatten()), max(y_test.flatten())],
    mode='lines',
    name='Ideal',
    line=dict(color='red', dash='dash')
))

fig.update_layout(
    title='Actual vs Predicted Values',
    xaxis_title='Actual Values',
    yaxis_title='Predicted Values',
    showlegend=True,
)

fig.show()



---



###**Conclusion and Summary:**
used regression model: Random Forest Regression.

The Random Forest Regression model outperformed terms of accuracy metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R2) score.

The Random Forest model demonstrated better predictive capabilities, capturing complex relationships and interactions in the data.

Therefore, we recommend using the Random Forest Regression model for predicting ride fares in the ride-sharing platform, as it provides more accurate and reliable predictions.

This predictive model can help optimize pricing strategies, improve revenue, and enhance customer satisfaction in the ride-sharing service.



