# 🚕 Taxi Fare Prediction Challenge

**Dive into the world of ride-sharing economics and build a model to predict taxi trip fares!**

This challenge leverages a realistic synthetic dataset, perfect for honing your regression skills and exploring pricing dynamics within the taxi industry.  Put your data science prowess to the test and develop a robust fare prediction model.

[![Taxi Dataset](https://img.shields.io/badge/Dataset-Kaggle-blueviolet)](https://www.kaggle.com/datasets/denkuznetz/taxi-price-prediction)


## Dataset Overview

This dataset simulates taxi trip data, incorporating key factors influencing fare calculation.  Your goal is to accurately predict the `Fare Amount` based on provided features.


## Feature Breakdown

| Feature           | Description                                     | Data Type    |
|-------------------|-------------------------------------------------|-------------|
| `Distance (km)`    | Trip length                                      | Numeric     |
| `Pickup Time`    | Trip start time                               | DateTime    |
| `Dropoff Time`   | Trip end time                                | DateTime    |
| `Traffic Condition`| Traffic level (light, medium, heavy)           | Categorical |
| `Passenger Count` | Number of passengers                           | Numeric     |
| `Weather Condition`| Weather status (clear, rain, snow)             | Categorical |
| `Trip Duration (min)` | Total trip time in minutes                    | Numeric     |
| `Fare Amount (USD)`| **Target Variable:** The cost of the trip      | Numeric     |



## Challenge Objectives

Construct a linear regression model to predict taxi fares.  Follow these steps:

1. **Data Ingestion & Exploration:**  Load the dataset and perform thorough Exploratory Data Analysis (EDA) to understand the data distribution, identify potential outliers, and uncover relationships between features.

2. **Feature Engineering:**  Craft new features from the existing ones to potentially improve model performance.  Think about time-based features (hour of day, day of week), or combined features (distance/duration).

3. **Preprocessing:**
    * **Data Cleaning:** Handle missing values, outliers, and any inconsistencies in the data.
    * **Data Splitting:** Partition the data into training and testing sets.

4. **Model Training & Evaluation:**
    * Implement linear regression models using the methods that you learned today.
    * Consider regularization techniques (LASSO, Ridge) to prevent overfitting and improve generalization.
    * Evaluate model performance using appropriate metrics (e.g., RMSE, MAE, R-squared).


##  Get Started!

Download the dataset, explore the data, and start building your predictive model! Good luck! 🚕💨

## Read the data

In [None]:
# importing libraries
import kagglehub
import pandas as pd
import os

In [None]:
# Download the dataset from kaggle
path = kagglehub.dataset_download("denkuznetz/taxi-price-prediction")
print("Path to dataset files:", path)

Path to dataset files: /root/.cache/kagglehub/datasets/denkuznetz/taxi-price-prediction/versions/1


In [None]:
# Read the data using pandas
csv_path = os.path.join(path, "taxi_trip_pricing.csv")
taxiPricing = pd.read_csv(csv_path)
taxiPricing.head()

Unnamed: 0,Trip_Distance_km,Time_of_Day,Day_of_Week,Passenger_Count,Traffic_Conditions,Weather,Base_Fare,Per_Km_Rate,Per_Minute_Rate,Trip_Duration_Minutes,Trip_Price
0,19.35,Morning,Weekday,3.0,Low,Clear,3.56,0.8,0.32,53.82,36.2624
1,47.59,Afternoon,Weekday,1.0,High,Clear,,0.62,0.43,40.57,
2,36.87,Evening,Weekend,1.0,High,Clear,2.7,1.21,0.15,37.27,52.9032
3,30.33,Evening,Weekday,4.0,Low,,3.48,0.51,0.15,116.81,36.4698
4,,Evening,Weekday,3.0,High,Clear,2.93,0.63,0.32,22.64,15.618


## Perform Exploratory Data Analysis (EDA)



In [None]:
# Show plots and heatmaps

## Do Feature Engineering

In [None]:
# Add, edit, and remove features, if applicable.

## Preprocess the data

In [None]:
# Show missing values, clean the data, split it to train and test sets

## Train regression models

In [None]:
# Fit the data in your models and evaluate them.
# Apply the techniques you learned today, including implementation using only NumPy and implementation using sklearn.

This challenge is made by [Ali Alqutayfi](https://www.linkedin.com/in/ali-alqutayfi/).