# ***Decision Tree Regression using Taxi Trip Price Prediction***

***Problem Statement***

Predict the taxi trip price based on trip-related features.

Learning Type: Supervised Learning

Problem Type: Regression

Algorithm: Decision Tree Regressor

Dataset: Taxi Trip Pricing Dataset (CSV)

***Step 1: Import Required Libraries***

In [138]:
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder,OrdinalEncoder


***Step 2: Load the Dataset***

In [139]:
df = pd.read_csv("taxi_trip_pricing.csv")
df.head()


Unnamed: 0,Trip_Distance_km,Time_of_Day,Day_of_Week,Passenger_Count,Traffic_Conditions,Weather,Base_Fare,Per_Km_Rate,Per_Minute_Rate,Trip_Duration_Minutes,Trip_Price
0,19.35,Morning,Weekday,3.0,Low,Clear,3.56,0.8,0.32,53.82,36.2624
1,47.59,Afternoon,Weekday,1.0,High,Clear,,0.62,0.43,40.57,
2,36.87,Evening,Weekend,1.0,High,Clear,2.7,1.21,0.15,37.27,52.9032
3,30.33,Evening,Weekday,4.0,Low,,3.48,0.51,0.15,116.81,36.4698
4,,Evening,Weekday,3.0,High,Clear,2.93,0.63,0.32,22.64,15.618


In [140]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Trip_Distance_km       950 non-null    float64
 1   Time_of_Day            950 non-null    object 
 2   Day_of_Week            950 non-null    object 
 3   Passenger_Count        950 non-null    float64
 4   Traffic_Conditions     950 non-null    object 
 5   Weather                950 non-null    object 
 6   Base_Fare              950 non-null    float64
 7   Per_Km_Rate            950 non-null    float64
 8   Per_Minute_Rate        950 non-null    float64
 9   Trip_Duration_Minutes  950 non-null    float64
 10  Trip_Price             951 non-null    float64
dtypes: float64(7), object(4)
memory usage: 86.1+ KB


In [141]:
df=df.dropna()

***Step 3: Separate Features and Target***

In [142]:
X = df.drop("Trip_Price", axis=1)
y = df["Trip_Price"]


***Step 4: Handle Categorical Data (Encoding)***

In [143]:
df['Traffic_Conditions'].value_counts()

Unnamed: 0_level_0,count
Traffic_Conditions,Unnamed: 1_level_1
Medium,236
Low,218
High,108


In [144]:
print(df["Time_of_Day"].value_counts())
print("-----------------------------")
print(df["Day_of_Week"].value_counts())
print("-----------------------------")
print(df["Weather"].value_counts())

Time_of_Day
Afternoon    220
Morning      157
Evening      124
Night         61
Name: count, dtype: int64
-----------------------------
Day_of_Week
Weekday    381
Weekend    181
Name: count, dtype: int64
-----------------------------
Weather
Clear    386
Rain     134
Snow      42
Name: count, dtype: int64


***Ordinal Encoding***

In [145]:
ordinal_features = ["Traffic_Conditions"]

ordinal_encoder = OrdinalEncoder(categories=[["Low", "Medium", "High"]])
X[ordinal_features] = ordinal_encoder.fit_transform(X[ordinal_features])


***Label Encoding***

In [146]:
label_cols = ["Time_of_Day", "Day_of_Week", "Weather"]

le = LabelEncoder()
for col in label_cols:
    X[col] = le.fit_transform(X[col])

***Step 5: Train-Test Split***

In [147]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


***Step 6: Create Decision Tree Regression Model***

In [148]:
model = DecisionTreeRegressor(
    max_depth=11,
    random_state=42

)


***Step 7: Train the Model***

In [149]:
model.fit(X_train, y_train)


***Step 8: Make Predictions***

In [150]:
y_pred = model.predict(X_test)


***Step 9: Model Evaluation***

In [151]:
print("MAE:", mean_absolute_error(y_test, y_pred))
print("MSE:", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))


MAE: 8.924990785700983
MSE: 165.9608254838747
R2 Score: 0.9236284607026547
