# Taxi Data Analysis

## Tabel Content
* [Business Problem Statement](#business-problem-statement)
* [Task](#task)
* [Dataset Information](#dataset-information)
* [Question 1](#question-1)
* [Question 2](#question-2)
* [Question 3](#question-3)
* [Question 4](#question-4)
* [Question 5](#question-5)
* [Question 6](#question-6)


# Business Problem Statement

Our business is centered around the operations of a taxi service, particularly on a busy Saturday. We aim to optimize our bonus payout system and enhance driver performance based on the provided dataset (`dataset_2.csv`). The key objectives are to:

1. **Bonus Payout Optimization:**
   - Determine the total bonus payout under two different options (Option 1 and Option 2) to assess their impact on overall driver incentives.

2. **Performance Comparison:**
   - Identify the number of drivers qualifying for a bonus under Option 1 but not under Option 2, shedding light on the effectiveness of each bonus structure.

3. **Driver Performance Metrics:**
   - Analyze the operational performance of drivers by calculating the percentages of those who completed less than 10 trips, maintained an acceptance rate below 90%, and held a rating of 4.7 or higher.

4. **Strategic Decision-Making:**
   - Provide actionable insights to the management for strategic decision-making based on the analyzed data, aiming to enhance overall efficiency and profitability.

It's important to note that the analysis for questions 1-4 will rely solely on the data provided in `dataset_2.csv`. Questions 5 and 6 are scenario-based and will not involve dataset utilization. The ultimate goal is to derive valuable insights that can inform business decisions, improve driver satisfaction, and optimize the bonus payout structure for sustainable growth.

![Image](http://taxiwala.co/theme/images/taxi-cab-service.png)

# Task
**Based on the operational scenarios and the provided dataset, respond to the following inquiries:**

1. What is the total bonus payout under Option 1?
2. What is the total bonus payout under Option 2?
3. How many drivers qualify for a bonus under Option 1 but not under Option 2?
4. Determine the percentages of online drivers who completed less than 10 trips, maintained an acceptance rate below 90%, and held a rating of 4.7 or higher.
5. Calculate the annual earnings (after expenses) for a taxi driver without partnering with Uber.
6. If you are persuading the aforementioned driver to invest in a Town Car and collaborate with Uber, considering the new car costs $40,000, calculate the weekly increase needed in gross fares to fully cover the car expenses in the first year while maintaining the same yearly profit margin as before.

# Dataset Information
1. The dataset, accessible in the file `dataset_2.csv`, compiles information from rides conducted on a bustling Saturday and is aggregated on a per-driver basis. The dataset includes details such as the total number of completed trips, the driver's acceptance rate, the total hours on duty, and the average rating.

2. Questions 1-4 should be addressed exclusively using this dataset. Unfortunately, no dataset is provided for questions 5 and 6, as these are dependent on the provided scenario alone.

This Python code utilizes the Pandas library to read a CSV file (`dataset_2.csv`) into a DataFrame and then displays the first 10 rows of the DataFrame. The file path is specified as './datasets/dataset_2.csv'.


In [9]:
import pandas as pd
import numpy as np

# Read the CSV file into a Pandas DataFrame
file_path = './datasets/dataset_2.csv'
data_frame = pd.read_csv(file_path)

# Display the first 10 rows of the DataFrame
data_frame.head(10)



Unnamed: 0,Name,Trips Completed,Accept Rate,Supply Hours,Rating
0,Abdul,1,100%,3,4.8
1,Abraham,12,83%,5,4.7
2,Adelina,1,100%,2,4.7
3,Akilah,1,100%,2,4.9
4,Alec,21,76%,11,5.0
5,Alesha,7,100%,4,4.8
6,Alvaro,17,88%,11,4.6
7,Andra,16,94%,11,4.6
8,Augusta,19,84%,11,4.7
9,Aurora,10,90%,4,4.6


The code uses the Pandas `info()` method to display detailed information about the DataFrame, including data types, non-null counts, and memory usage.


In [10]:
# Display information about the DataFrame
data_frame_information = data_frame.info()
data_frame_information


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             119 non-null    object 
 1   Trips Completed  119 non-null    int64  
 2   Accept Rate      119 non-null    object 
 3   Supply Hours     119 non-null    int64  
 4   Rating           119 non-null    float64
dtypes: float64(1), int64(2), object(2)
memory usage: 4.8+ KB


# Question 1

### What is the total bonus payout under Option 1?

Option 1 Criteria:
- Online for at least 8 hours
- Accepts 90% of requests
- Completes 10 trips
- Has a rating of 4.7 or better

Let's denote the bonus amount as $50.

Steps to calculate the total bonus payout with Option 1:

1. Identify the subset of drivers in the dataset that meet all the criteria for Option 1.
2. Count the number of drivers in this subset.
3. Multiply the count by the bonus amount.

Mathematically, if "N" is the number of drivers meeting Option 1 criteria, the total bonus payout (Payout_Option1) can be calculated as:

\[ Payout_{Option1} = N \times \$50 \]

Please provide the dataset or relevant information so we can proceed with the calculations.


This code snippet transforms the 'Accept Rate' column values from strings to floats by removing the percentage symbol and applying a conversion function. The updated DataFrame is then displayed.


In [11]:
# Convert 'Accept Rate' column values from string to float for later conditions
data_frame['Accept Rate'] = data_frame['Accept Rate'].apply(lambda x: float(x[:-1]))

# Display the updated DataFrame
data_frame


Unnamed: 0,Name,Trips Completed,Accept Rate,Supply Hours,Rating
0,Abdul,1,100.0,3,4.8
1,Abraham,12,83.0,5,4.7
2,Adelina,1,100.0,2,4.7
3,Akilah,1,100.0,2,4.9
4,Alec,21,76.0,11,5.0
...,...,...,...,...,...
114,Virgen,4,100.0,6,4.5
115,Yang,7,71.0,2,4.5
116,Yessenia,8,88.0,5,5.0
117,Yukiko,9,78.0,6,4.5


The code filters the DataFrame based on specified conditions, including minimum supply hours, trips completed, acceptance rate, and rating. The resulting filtered DataFrame is displayed.


In [12]:
# Filter the DataFrame based on given conditions
filtered_data_frame = data_frame[
    (data_frame['Supply Hours'] >= 8) &
    (data_frame['Trips Completed'] >= 10) &
    (data_frame['Accept Rate'] >= 90) &
    (data_frame['Rating'] >= 4.7)
]

# Display the filtered DataFrame
filtered_data_frame


Unnamed: 0,Name,Trips Completed,Accept Rate,Supply Hours,Rating
11,Byron,15,100.0,10,4.9
12,Carlota,14,100.0,8,5.0
19,Dannette,14,100.0,9,4.9
23,Demetrius,14,100.0,9,5.0
26,Dimple,15,100.0,10,4.9
32,Emil,12,100.0,9,5.0
37,Garth,15,100.0,10,5.0
40,Hanh,14,94.0,9,4.9
53,Keshia,20,100.0,11,4.8
57,Latonia,13,100.0,9,5.0


This code calculates the total payout for the first option by multiplying a fixed rate (50) with the number of rows in the filtered DataFrame. The result is displayed as the total payout in dollars.


In [13]:
# Calculate the total payout for the first option based on a fixed rate and the length of the filtered DataFrame
fixed_rate = 50
first_option_total_payout = fixed_rate * len(filtered_data_frame)

# Display the total payout in dollars
total_payout_string = '$' + str(first_option_total_payout)
print(total_payout_string)


$1050


# Question 2

###  What is the total bonus payout under Option 2?


The code filters the DataFrame for the second option using specified conditions, including a minimum number of trips completed and a minimum rating. The resulting filtered DataFrame for the second option is displayed.


In [15]:
# Filter the DataFrame for the second option based on given conditions
second_option_filtered_df = data_frame[
    (data_frame['Trips Completed'] >= 12) &
    (data_frame['Rating'] >= 4.7)
]

# Display the filtered DataFrame for the second option
(second_option_filtered_df)


Unnamed: 0,Name,Trips Completed,Accept Rate,Supply Hours,Rating
1,Abraham,12,83.0,5,4.7
4,Alec,21,76.0,11,5.0
8,Augusta,19,84.0,11,4.7
10,Buffy,13,54.0,6,5.0
11,Byron,15,100.0,10,4.9
12,Carlota,14,100.0,8,5.0
15,Chu,14,71.0,7,4.8
19,Dannette,14,100.0,9,4.9
21,Deane,22,77.0,9,4.7
23,Demetrius,14,100.0,9,5.0


This code calculates the total payout for the second option by multiplying a fixed rate (4) with the sum of 'Trips Completed' in the filtered DataFrame. The result is displayed as the total payout for the second option in dollars.


In [16]:
# Calculate the total payout for the second option based on a fixed rate and the sum of 'Trips Completed'
fixed_rate_second_option = 4
second_option_total_payout = fixed_rate_second_option * second_option_filtered_df['Trips Completed'].sum()

# Display the total payout for the second option in dollars
total_payout_second_option_string = '$' + str(second_option_total_payout)
print(total_payout_second_option_string)


$2976


# Question 3
###  How many drivers would qualify for a bonus under Option 1 but not under Option 2?

The code merges both filtered DataFrames based on common columns, and includes a '_merge' column to indicate the source ('both', 'left_only', or 'right_only'). The resulting merged DataFrame is displayed.


In [19]:
# Merge both filtered DataFrames and include the 'driver' from the '_merge' column
merged_data_frame = filtered_data_frame.merge(
    second_option_filtered_df,
    on=filtered_data_frame.columns.tolist(),
    how='left',
    indicator=True
)

# Display the merged DataFrame
(merged_data_frame)


Unnamed: 0,Name,Trips Completed,Accept Rate,Supply Hours,Rating,_merge
0,Byron,15,100.0,10,4.9,both
1,Carlota,14,100.0,8,5.0,both
2,Dannette,14,100.0,9,4.9,both
3,Demetrius,14,100.0,9,5.0,both
4,Dimple,15,100.0,10,4.9,both
5,Emil,12,100.0,9,5.0,both
6,Garth,15,100.0,10,5.0,both
7,Hanh,14,94.0,9,4.9,both
8,Keshia,20,100.0,11,4.8,both
9,Latonia,13,100.0,9,5.0,both


This code extracts rows present in option 1 but not in option 2 from the merged DataFrame based on the '_merge' column. The resulting DataFrame contains entries exclusively from option 1.


In [21]:
# Include only the rows that are present in option 1 and not in option 2 based on the '_merge' column
option1_only_data_frame = merged_data_frame[merged_data_frame["_merge"] == 'left_only']

# Display the DataFrame with only option 1 entries
(option1_only_data_frame)


Unnamed: 0,Name,Trips Completed,Accept Rate,Supply Hours,Rating,_merge
16,Oren,11,91.0,9,4.8,left_only
17,Phyllis,10,90.0,8,4.8,left_only


# Question 4
### What percentages of drivers online completed less than 10 trips, had an acceptance rate of less than 90%, and had a rating of 4.7 or higher?

The code filters the DataFrame for entries with fewer than 10 trips completed, an acceptance rate below 90%, and a rating equal to or greater than 4.7. The resulting DataFrame contains entries meeting these conditions for less trips.


In [22]:
# Filter the DataFrame based on given conditions for less trips
less_trips_filtered_df = data_frame[
    (data_frame['Trips Completed'] < 10) &
    (data_frame['Accept Rate'] < 90) &
    (data_frame['Rating'] >= 4.7)
]

# Display the filtered DataFrame for less trips
(less_trips_filtered_df)


Unnamed: 0,Name,Trips Completed,Accept Rate,Supply Hours,Rating
17,Cris,7,71.0,5,5.0
18,Dalila,7,57.0,2,4.8
20,Dannielle,3,67.0,5,5.0
22,Delfina,4,50.0,3,4.7
27,Domenica,9,89.0,5,4.9
36,Floyd,3,67.0,1,4.8
41,Hee,9,89.0,7,4.7
45,Ingrid,7,43.0,4,4.8
66,Lilla,9,89.0,8,4.7
67,Loree,9,89.0,8,4.7


This code calculates and prints the percentage of drivers with fewer than 10 trips based on the given conditions. The result is presented with a precision of two decimal places.


In [26]:
# Calculate and print the percentage of drivers with fewer trips based on the given condition
percentage_less_trips = len(less_trips_filtered_df) / len(data_frame) * 100
print(f'{percentage_less_trips:.2f}% of drivers have fewer trips based on the given condition.')


10.92% of drivers have fewer trips based on the given condition.


# Question 5

### How much money (after expenses) does the taxi driver make per year without partnering with Uber?

How much money (after expenses) does the taxi driver make per year without partnering with Uber?

Assuming an earnings_per_trip of $15, a total of 1000 trips per year, and annual expenses of $8000, the code calculates and displays the total income per year and the net income after deducting expenses.


In [24]:
# Assuming the driver's earnings per trip and other relevant details
earnings_per_trip = 15  # Replace this with the actual earnings per trip in dollars
total_trips_per_year = 1000  # Replace this with the actual number of trips per year
annual_expenses = 8000  # Replace this with the actual annual expenses in dollars

# Calculate the total income and net income per year
total_income = earnings_per_trip * total_trips_per_year
net_income = total_income - annual_expenses

# Display the results
print(f'Total Income per Year: ${total_income}')
print(f'Net Income per Year (After Expenses): ${net_income}')


Total Income per Year: $15000
Net Income per Year (After Expenses): $7000


# Question 6
###  You are convincing the same driver above to buy a Town Car and partner with Uber. Assuming the new car is $40,000, how much would the driver's gross fares need to increase per week to fully pay for the car in year 1 and maintain the same yearly profit margin as before?



Assuming an earnings_per_trip of $15, 1000 trips per year, and annual expenses of $8000, along with additional expenses for a new car ($40,000 cost and $5000 per year), the code calculates the additional gross fares needed per week to cover the new car expenses and maintain the same profit margin.


In [25]:
# Assuming current financial details
earnings_per_trip = 15  # Replace this with the actual earnings per trip in dollars
total_trips_per_year = 1000  # Replace this with the actual number of trips per year
annual_expenses = 8000  # Replace this with the actual annual expenses in dollars

# Additional expenses for the new car
new_car_cost = 40000  # Replace this with the actual cost of the new car in dollars
additional_expenses_per_year = 5000  # Replace this with the estimated additional expenses in dollars

# Calculate the total income and net income per year before the new car
total_income_before_new_car = earnings_per_trip * total_trips_per_year
net_income_before_new_car = total_income_before_new_car - annual_expenses

# Calculate the total income needed to cover new car expenses and maintain the same profit margin
total_income_with_new_car = net_income_before_new_car + additional_expenses_per_year

# Calculate the additional gross fares needed per week
additional_gross_fares_per_week = (total_income_with_new_car - total_income_before_new_car) / 52

# Display the results
print(f'Additional Gross Fares Needed per Week: ${additional_gross_fares_per_week}')


Additional Gross Fares Needed per Week: $-57.69230769230769
