
# 🚕 NYC Taxi Fare Estimator

This project predicts the **total cost** of a taxi ride in New York City using pickup time, trip distance, and basic ride details. The model simulates how a platform could estimate fares in real-time, improving rider transparency and business efficiency.


## 📂 Load Dataset (Colab Compatible)

In [None]:

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import warnings
warnings.filterwarnings('ignore')

# Download dataset from GitHub if running in Colab
if "google.colab" in str(get_ipython()):
    !wget https://raw.githubusercontent.com/Rafsun-Chowdhury/NYC-Taxi-Fare-Prediction/main/taxi_fare_data.csv

df = pd.read_csv("taxi_fare_data.csv")
df.head()


## 🔍 Initial Exploration & Cleaning

In [None]:

df.dropna(inplace=True)

# Convert datetime
df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'])
df['dropoff_datetime'] = pd.to_datetime(df['dropoff_datetime'])

# Create trip duration in minutes
df['duration'] = (df['dropoff_datetime'] - df['pickup_datetime']).dt.total_seconds() / 60

# Extract hour and day
df['hour'] = df['pickup_datetime'].dt.hour
df['day_of_week'] = df['pickup_datetime'].dt.dayofweek

# Filter outliers
df = df[(df['total_amount'] > 0) & (df['total_amount'] < 300)]
df = df[(df['trip_distance'] > 0) & (df['trip_distance'] < 100)]
df = df[(df['duration'] > 0) & (df['duration'] < 180)]

df[['trip_distance', 'duration', 'hour', 'day_of_week', 'total_amount']].describe()


## 🎯 Feature Selection & Target

In [None]:

features = ['trip_distance', 'duration', 'hour', 'day_of_week', 'passenger_count']
X = df[features]
y = df['total_amount']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 🤖 Model Training & Prediction

In [None]:

# Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)

# Random Forest
rf = RandomForestRegressor(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)


## 📈 Model Evaluation

In [None]:

def evaluate(y_test, y_pred, model_name):
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2 = r2_score(y_test, y_pred)
    print(f"{model_name} Results:")
    print(f"MAE: ${mae:.2f}")
    print(f"RMSE: ${rmse:.2f}")
    print(f"R² Score: {r2:.2f}\n")

evaluate(y_test, y_pred_lr, "Linear Regression")
evaluate(y_test, y_pred_rf, "Random Forest")


## 📊 Prediction Visualization

In [None]:

plt.figure(figsize=(8,6))
sns.scatterplot(x=y_test, y=y_pred_rf, alpha=0.3)
plt.xlabel("Actual Total Fare")
plt.ylabel("Predicted Fare")
plt.title("Random Forest: Actual vs Predicted Fare")
plt.grid(True)
plt.show()


## ⏱️ Average Fare by Hour of Day

In [None]:

plt.figure(figsize=(10,5))
sns.barplot(x='hour', y='total_amount', data=df)
plt.title("Average Total Fare by Hour of Day")
plt.xlabel("Hour of Pickup")
plt.ylabel("Average Fare ($)")
plt.grid(True)
plt.show()


## 📉 Prediction Error Distribution

In [None]:

errors = y_test - y_pred_rf
plt.figure(figsize=(8,4))
sns.histplot(errors, kde=True)
plt.title("Distribution of Prediction Errors (Random Forest)")
plt.xlabel("Prediction Error ($)")
plt.grid(True)
plt.show()


## 💡 Predict Fare for Custom Input

In [None]:

def predict_fare(model, distance, duration, hour, day, passengers):
    features = np.array([[distance, duration, hour, day, passengers]])
    pred = model.predict(features)[0]
    print(f"Estimated Fare: ${pred:.2f}")

# Example usage:
predict_fare(rf, distance=4.0, duration=12, hour=18, day=2, passengers=1)



## ✅ Conclusion

This model demonstrates a realistic fare estimation system using public taxi data. With added geolocation and external features (like weather or zones), this project could be extended into a production-level application or dashboard.
