# **Linear Regression using Taxi Trip Pricing Dataset**

***Problem Statement***

Predict the Trip Price of a taxi ride using distance, time, traffic, weather, and fare-related features.

Learning Type: Supervised Learning

Problem Type: Regression

Target Variable: Trip_Price

***Step 1: Import Required***

In [62]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OrdinalEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error


***Step 2: Load the Dataset***

In [63]:
df = pd.read_csv("taxi_trip_pricing.csv")


***Step 3: Basic Data Inspection***

In [64]:
df.head()
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Trip_Distance_km       950 non-null    float64
 1   Time_of_Day            950 non-null    object 
 2   Day_of_Week            950 non-null    object 
 3   Passenger_Count        950 non-null    float64
 4   Traffic_Conditions     950 non-null    object 
 5   Weather                950 non-null    object 
 6   Base_Fare              950 non-null    float64
 7   Per_Km_Rate            950 non-null    float64
 8   Per_Minute_Rate        950 non-null    float64
 9   Trip_Duration_Minutes  950 non-null    float64
 10  Trip_Price             951 non-null    float64
dtypes: float64(7), object(4)
memory usage: 86.1+ KB


***Step 4: Handle Missing Values***

In [65]:
df = df.dropna()


***Step 5: Separate Features and Target***

In [66]:
X = df.drop("Trip_Price", axis=1)
y = df["Trip_Price"]


***Step 6: Encoding Categorical Data***

In [67]:
ordinal_features = ["Traffic_Conditions"]

ordinal_encoder = OrdinalEncoder(categories=[["Low", "Medium", "High"]])
X[ordinal_features] = ordinal_encoder.fit_transform(X[ordinal_features])
X


Unnamed: 0,Trip_Distance_km,Time_of_Day,Day_of_Week,Passenger_Count,Traffic_Conditions,Weather,Base_Fare,Per_Km_Rate,Per_Minute_Rate,Trip_Duration_Minutes
0,19.35,Morning,Weekday,3.0,0.0,Clear,3.56,0.80,0.32,53.82
2,36.87,Evening,Weekend,1.0,2.0,Clear,2.70,1.21,0.15,37.27
5,8.64,Afternoon,Weekend,2.0,1.0,Clear,2.55,1.71,0.48,89.33
12,41.79,Night,Weekend,3.0,2.0,Clear,4.60,1.77,0.11,86.95
14,9.91,Evening,Weekday,2.0,2.0,Clear,2.32,1.26,0.34,41.72
...,...,...,...,...,...,...,...,...,...,...
990,40.17,Evening,Weekday,3.0,0.0,Clear,3.81,0.66,0.42,62.66
992,14.34,Afternoon,Weekday,1.0,1.0,Clear,3.23,1.01,0.29,45.07
994,18.69,Evening,Weekday,3.0,1.0,Clear,4.90,1.79,0.17,79.41
995,5.49,Afternoon,Weekend,4.0,1.0,Clear,2.39,0.62,0.49,58.39


In [69]:
label_cols = ["Time_of_Day", "Day_of_Week", "Weather"]

le = LabelEncoder()
for col in label_cols:
    X[col] = le.fit_transform(X[col])


***Step 7: Feature Scaling***

In [70]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


***Step 8: Train-Test Split***

In [71]:
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.3, random_state=42
)


***Step 9: Create Linear Regression Model***

In [72]:
model = LinearRegression()


***Step 10: Train the Model***

In [73]:
model.fit(X_train, y_train)


***Step 11: Make Predictions***

In [74]:
y_pred = model.predict(X_test)


***Step 12: Model Evaluation***

In [76]:
mse = mean_squared_error(y_test,y_pred)
mae = mean_absolute_error(y_test,y_pred)
r2 = r2_score(y_test,y_pred)

print("Mean Squared Error:", mse)
print("Mean Absolute Error:", mae)
print("R2 Score:", r2)


Mean Squared Error: 239.6829393089775
Mean Absolute Error: 9.469188647315125
R2 Score: 0.8897031575676431


In [77]:
print(model.score(X_train,y_train))
print(model.score(X_test,y_test))

0.8857394335363984
0.8897031575676431
