![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 1. Chicago Cab Fare Predictor using Linear Regression

Accurately predicting the cost of a taxi ride can provide valuable insights for both riders and service providers, enabling more informed decisions and better financial planning. In this project, we focus on building a **Linear Regression model** to predict taxi fares in **Chicago, Illinois**. By analyzing patterns in historical data, we aim to create a model that can reliably estimate the fare for a given trip.

The [dataset used in this project](https://download.mlcc.google.com/mledu-datasets/chicago_taxi_train.csv) is a **subset of the [City of Chicago Taxi Trips dataset](https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew)**, specifically **focusing on a two-day period in May 2022**. This data contains key features such as trip distance, pickup/dropoff locations, and ride duration, which we will leverage to train our predictive model.

**Project Objectives:**

- **Dataset:** A cleaned and preprocessed subset of taxi trips over a two-day period in May 2022.

- **Model:** A Linear Regression model that predicts the fare based on input features like trip distance, time of day, and other relevant variables.

- **Goal:** To build an accurate fare predictor that can assist in understanding taxi fare dynamics in Chicago.

This project not only serves as a practical application of regression modeling but also offers insights into the pricing structure of taxi services in a major metropolitan area.

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 2. Part I: Initial Setup

## 1. Import Required Libraries

In [1]:
# General
import io

# Data
import numpy as np
import pandas as pd

# Machine Learning
import keras

# Data Visualization
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import seaborn as sns

## 2. Load the Dataset

In [2]:
chicago_taxi_dataset = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/chicago_taxi_train.csv")

In [3]:
print(f"Shape of dataset: {chicago_taxi_dataset.shape}")

Shape of dataset: (31694, 18)


In [4]:
chicago_taxi_dataset.head()

Unnamed: 0,TRIP_START_TIMESTAMP,TRIP_END_TIMESTAMP,TRIP_START_HOUR,TRIP_SECONDS,TRIP_MILES,TRIP_SPEED,PICKUP_CENSUS_TRACT,DROPOFF_CENSUS_TRACT,PICKUP_COMMUNITY_AREA,DROPOFF_COMMUNITY_AREA,FARE,TIPS,TIP_RATE,TOLLS,EXTRAS,TRIP_TOTAL,PAYMENT_TYPE,COMPANY
0,05/17/2022 7:15:00 AM,05/17/2022 7:45:00 AM,7.25,2341,2.57,4.0,,,,17.0,31.99,2.0,6.3,0.0,0.0,33.99,Mobile,Flash Cab
1,05/17/2022 5:15:00 PM,05/17/2022 5:30:00 PM,17.25,1074,1.18,4.0,,17031080000.0,,8.0,9.75,3.0,27.9,0.0,1.0,14.25,Credit Card,Flash Cab
2,05/17/2022 5:15:00 PM,05/17/2022 5:30:00 PM,17.25,1173,1.29,4.0,17031320000.0,17031080000.0,32.0,8.0,10.25,0.0,0.0,0.0,0.0,10.25,Cash,Sun Taxi
3,05/17/2022 6:00:00 PM,05/17/2022 7:00:00 PM,18.0,3360,3.7,4.0,17031320000.0,17031240000.0,32.0,24.0,23.75,0.0,0.0,0.0,1.0,24.75,Cash,Choice Taxi Association
4,05/17/2022 5:00:00 PM,05/17/2022 5:30:00 PM,17.0,1044,1.15,4.0,17031320000.0,17031080000.0,32.0,8.0,10.0,0.0,0.0,0.0,0.0,10.0,Cash,Flash Cab


## 3. Update the Dataframe

In [7]:
# Update the DataFrame to use only specific columns from the dataset
training_df = chicago_taxi_dataset[['TRIP_MILES', 'TRIP_SECONDS', 'FARE', 'COMPANY', 'PAYMENT_TYPE', 'TIP_RATE']]

print(f"Total number of rows: {len(training_df.index)}")
print(f"Shape of dataset: {training_df.shape}\n\n")
training_df.head(200)

Total number of rows: 31694
Shape of dataset: (31694, 6)




Unnamed: 0,TRIP_MILES,TRIP_SECONDS,FARE,COMPANY,PAYMENT_TYPE,TIP_RATE
0,2.57,2341,31.99,Flash Cab,Mobile,6.3
1,1.18,1074,9.75,Flash Cab,Credit Card,27.9
2,1.29,1173,10.25,Sun Taxi,Cash,0.0
3,3.70,3360,23.75,Choice Taxi Association,Cash,0.0
4,1.15,1044,10.00,Flash Cab,Cash,0.0
...,...,...,...,...,...,...
195,1.13,821,9.00,Blue Ribbon Taxi Association,Mobile,22.9
196,0.57,414,6.00,Flash Cab,Cash,0.0
197,1.22,886,9.00,City Service,Cash,0.0
198,1.68,1219,9.00,Sun Taxi,Mobile,23.0


![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

In [None]:
# Deep Learning as subset of ML

from IPython import display
display.Image("data/images/DL_01_Intro-01-DL-subset-of-ML.jpg")

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)