<h1> Business Problem </h3>

<h3>Background:</h3>

In the highly competitive taxi service industry, operational efficiency, customer satisfaction, and strategic market positioning are crucial for maintaining and enhancing profitability. As urban populations grow and mobility demands evolve, taxi companies face increasing challenges in managing fleet operations, adjusting pricing dynamically, and meeting fluctuating customer demands efficiently.

<h3>Problem Statement:</h3>
Companies seek to optimize taxi fleet management, resource allocation, and customer service by accurately predicting taxi demand based on historical data and key variables such as location coordinates and time. The objective is to leverage data from previous years (January to March 2015) to forecast taxi pickups for the same period in the subsequent year (January to March 2016), ensuring a strategic advantage in operational planning and market responsiveness.

<h3> Objectives:</h3>

<h4>Optimization of Taxi Fleet Management:</h4>

- Improve the positioning and scheduling of taxis to reduce operational costs and minimize passenger wait times.
- Enhance the utilization rate of the fleet by deploying taxis where and when they are most needed.
Dynamic Pricing Strategy:

- Implement dynamic pricing adjustments during peak demand periods to balance the market, enhance revenue, and manage customer expectations.

<h4>Resource Allocation and Driver Scheduling:</h4>

- Optimize driver schedules based on predictive demand insights, thereby reducing overheads related to underutilized labor and increasing earnings during high-demand periods.

<h4>Urban Planning and Traffic Contribution:</h4>

- Provide data-driven insights to city planners and traffic management authorities to aid in the design of more efficient public transportation systems and traffic congestion mitigation.

<h4>Customer Service Enhancement:</h4>

- Achieve superior customer satisfaction levels by ensuring taxi availability aligns closely with demand, thereby reducing wait times and improving service reliability.
  
<h4>Innovation and Service Development:</h4>

- Utilize predictive insights to identify market trends and develop innovative service offerings such as on-demand carpooling or special event transportation solutions.

<h4>Competitive Advantage and Market Share:</h4>

- Strengthen market positioning by offering timely and dependable services, differentiating our company from competitors, and potentially increasing market share.
  
<h3>Expected Outcomes:</h3>

By addressing these objectives through predictive analytics, companies anticipate not only an improvement in operational efficiencies and financial performance but also a notable enhancement in customer service quality.

<h2> Data Source </h2>

https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

## Information on taxis:

<h5> Yellow Taxi: Yellow Medallion Taxicabs</h5>
<p> These are the famous NYC yellow taxis that provide transportation exclusively through street-hails. The number of taxicabs is limited by a finite number of medallions issued by the TLC. You access this mode of transportation by standing in the street and hailing an available taxi with your hand. The pickups are not pre-arranged.</p>

<h5> For Hire Vehicles (FHVs) </h5>
<p> FHV transportation is accessed by a pre-arrangement with a dispatcher or limo company. These FHVs are not permitted to pick up passengers via street hails, as those rides are not considered pre-arranged. </p>

<h5> Green Taxi: Street Hail Livery (SHL) </h5>
<p>  The SHL program will allow livery vehicle owners to license and outfit their vehicles with green borough taxi branding, meters, credit card machines, and ultimately the right to accept street hails in addition to pre-arranged rides. </p>
<p> Credits: Quora</p>

<h3>Footnote:</h3>
In the given notebook we are considering only the yellow taxis for the time period between Jan - Mar 2015 & Jan - Mar 2016

<h2> Exploring data</h2>

In [1]:
import pandas as pd

df=df = pd.read_parquet('/Users/pinakshome/Downloads/yellow_tripdata_2015-01.parquet', engine='pyarrow')

In [2]:
df.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
0,1,2015-01-01 00:11:33,2015-01-01 00:16:48,1,1.0,1,N,41,166,1,5.7,0.5,0.5,1.4,0.0,0.0,8.4,,
1,1,2015-01-01 00:18:24,2015-01-01 00:24:20,1,0.9,1,N,166,238,3,6.0,0.5,0.5,0.0,0.0,0.0,7.3,,
2,1,2015-01-01 00:26:19,2015-01-01 00:41:06,1,3.5,1,N,238,162,1,13.2,0.5,0.5,2.9,0.0,0.0,17.4,,
3,1,2015-01-01 00:45:26,2015-01-01 00:53:20,1,2.1,1,N,162,263,1,8.2,0.5,0.5,2.37,0.0,0.0,11.87,,
4,1,2015-01-01 00:59:21,2015-01-01 01:05:24,1,1.0,1,N,236,141,3,6.0,0.5,0.5,0.0,0.0,0.0,7.3,,


<h3> Features in the Dataset</h3>

### Data Dictionary - Yellow Taxi Trip Records

| Field Name           | Description |
|----------------------|-------------|
| **VendorID**         | A code indicating the TPEP provider that provided the record. <br>1= Creative Mobile Technologies, LLC <br>2= VeriFone Inc. |
| **tpep_pickup_datetime** | The date and time when the meter was engaged. |
| **tpep_dropoff_datetime** | The date and time when the meter was disengaged. |
| **Passenger_count** | The number of passengers in the vehicle. This is a driver-entered value. |
| **Trip_distance** | The elapsed trip distance in miles reported by the taximeter. |
| **PULocationID** | TLC Taxi Zone in which the taximeter was engaged. |
| **DOLocationID** | TLC Taxi Zone in which the taximeter was disengaged. |
| **RateCodeID** | The final rate code in effect at the end of the trip. <br>1= Standard rate <br>2= JFK <br>3= Newark <br>4= Nassau or Westchester <br>5= Negotiated fare <br>6= Group ride |
| **Store_and_fwd_flag** | This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. <br>Y= store and forward trip <br>N= not a store and forward trip |
| **Payment_type** | A numeric code signifying how the passenger paid for the trip. <br>1= Credit card <br>2= Cash <br>3= No charge <br>4= Dispute <br>5= Unknown <br>6= Voided trip |
| **Fare_amount** | The time-and-distance fare calculated by the meter. |
| **Extra** | Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges. |
| **MTA_tax** | $0.50 MTA tax that is automatically triggered based on the metered rate in use. |
| **Improvement_surcharge** | $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015. |
| **Tip_amount** | Tip amount – This field is automatically populated for credit card tips. Cash tips are not included. |
| **Tolls_amount** | Total amount of all tolls paid in trip. |
| **Total_amount** | The total amount charged to passengers. Does not include cash tips. |
| **Congestion_Surcharge** | Total amount collected in trip for NYS congestion surcharge. |
| **Airport_fee** | $1.25 for pick up only at LaGuardia and John F. Kennedy Airports |
