# Hopper Ride Sharing Company Queries (Part 2)

In the ever-evolving world of ride-hailing platforms, understanding driver engagement is a cornerstone of operational success. How effectively are drivers participating in the ecosystem? Are they actively contributing to fulfilling ride requests? These are critical questions for platforms aiming to optimize their services and retain both drivers and customers.

In this blog, we tackle a fascinating data problem that delves into measuring the percentage of working drivers for each month of a given year. By working drivers, we mean those who actively contribute by accepting rides, while available drivers are those eligible to work based on their onboarding dates.

Through this analytical lens, we aim to explore:
- How to define and measure driver engagement over time.
- The importance of balancing the supply of active drivers with user demand.
- The challenges of handling edge cases, such as months with no available drivers.

By analyzing driver engagement on a month-by-month basis, we uncover actionable insights that could influence resource allocation, performance optimization, and even policy decisions.

Let’s dive into this problem, combining Pandas and logic to create a solution that not only answers the question at hand but also highlights the power of data in driving informed decisions.

### Table: Drivers
| Column Name | Type    |
|-------------|---------|
| driver_id   | int     |
| join_date   | date    |

- `driver_id` is the column with unique values for this table.
- Each row of this table contains the driver's ID and the date they joined the Hopper company.

---

### Table: Rides
| Column Name  | Type    |
|--------------|---------|
| ride_id      | int     |
| user_id      | int     |
| requested_at | date    |

- `ride_id` is the column with unique values for this table.
- Each row of this table contains the ID of a ride, the user's ID that requested it, and the day they requested it.
- There may be some ride requests in this table that were not accepted.

---

### Table: AcceptedRides
| Column Name   | Type    |
|---------------|---------|
| ride_id       | int     |
| driver_id     | int     |
| ride_distance | int     |
| ride_duration | int     |

- `ride_id` is the column with unique values for this table.
- Each row of this table contains some information about an accepted ride.
- It is guaranteed that each accepted ride exists in the `Rides` table.

---

### Task
Write a solution to report the percentage of working drivers (`working_percentage`) for each month of 2020 where:

- **Working drivers** are those who accepted at least one ride during the month.
- **Available drivers** are those who joined on or before the last day of the month.

- If the number of available drivers during a month is zero, the `working_percentage` is 0.

- Return the result table ordered by month in ascending order, where `month` is the month's number (January is 1, February is 2, etc.).
- Round `working_percentage` to the nearest 2 decimal places.

---

### Example

#### Input

**Drivers Table**

| driver_id | join_date  |
|-----------|------------|
| 10        | 2019-12-10 |
| 8         | 2020-1-13  |
| 5         | 2020-2-16  |
| 7         | 2020-3-8   |
| 4         | 2020-5-17  |
| 1         | 2020-10-24 |
| 6         | 2021-1-5   |

**Rides Table**

| ride_id | user_id | requested_at |
|---------|---------|--------------|
| 6       | 75      | 2019-12-9    |
| 1       | 54      | 2020-2-9     |
| 10      | 63      | 2020-3-4     |
| 19      | 39      | 2020-4-6     |
| 3       | 41      | 2020-6-3     |
| 13      | 52      | 2020-6-22    |
| 7       | 69      | 2020-7-16    |
| 17      | 70      | 2020-8-25    |
| 20      | 81      | 2020-11-2    |
| 5       | 57      | 2020-11-9    |
| 2       | 42      | 2020-12-9    |
| 11      | 68      | 2021-1-11    |
| 15      | 32      | 2021-1-17    |
| 12      | 11      | 2021-1-19    |
| 14      | 18      | 2021-1-27    |

**AcceptedRides Table**

| ride_id | driver_id | ride_distance | ride_duration |
|---------|-----------|---------------|---------------|
| 10      | 10        | 63            | 38            |
| 13      | 10        | 73            | 96            |
| 7       | 8         | 100           | 28            |
| 17      | 7         | 119           | 68            |
| 20      | 1         | 121           | 92            |
| 5       | 7         | 42            | 101           |
| 2       | 4         | 6             | 38            |
| 11      | 8         | 37            | 43            |
| 15      | 8         | 108           | 82            |
| 12      | 8         | 38            | 34            |
| 14      | 1         | 90            | 74            |

---

#### Output

**Result Table**

| month | working_percentage |
|-------|--------------------|
| 1     | 0.00               |
| 2     | 0.00               |
| 3     | 25.00              |
| 4     | 0.00               |
| 5     | 0.00               |
| 6     | 20.00              |
| 7     | 20.00              |
| 8     | 20.00              |
| 9     | 0.00               |
| 10    | 0.00               |
| 11    | 33.33              |
| 12    | 16.67              |

---

#### Explanation

1. **January**: Two active drivers (10, 8), no accepted rides → Percentage = 0%.
2. **February**: Three active drivers (10, 8, 5), no accepted rides → Percentage = 0%.
3. **March**: Four active drivers (10, 8, 5, 7), one accepted ride by driver 10 → Percentage = (1 / 4) * 100 = 25%.
4. **April**: Four active drivers (10, 8, 5, 7), no accepted rides → Percentage = 0%.
5. **May**: Five active drivers (10, 8, 5, 7, 4), no accepted rides → Percentage = 0%.
6. **June**: Five active drivers (10, 8, 5, 7, 4), one accepted ride by driver 10 → Percentage = (1 / 5) * 100 = 20%.
7. **July**: Five active drivers (10, 8, 5, 7, 4), one accepted ride by driver 8 → Percentage = (1 / 5) * 100 = 20%.
8. **August**: Five active drivers (10, 8, 5, 7, 4), one accepted ride by driver 7 → Percentage = (1 / 5) * 100 = 20%.
9. **September**: Five active drivers (10, 8, 5, 7, 4), no accepted rides → Percentage = 0%.
10. **October**: Six active drivers (10, 8, 5, 7, 4, 1), no accepted rides → Percentage = 0%.
11. **November**: Six active drivers (10, 8, 5, 7, 4, 1), two accepted rides by drivers 1, 7 → Percentage = (2 / 6) * 100 = 33.33%.
12. **December**: Six active drivers (10, 8, 5, 7, 4, 1), one accepted ride by driver 4 → Percentage = (1 / 6) * 100 = 16.67%.


In [75]:
import pandas as pd
import numpy as np

data = [[10, '2019-12-10'], 
        [8, '2020-1-13'], 
        [5, '2020-2-16'], 
        [7, '2020-3-8'], 
        [4, '2020-5-17'], 
        [1, '2020-10-24'], 
        [6, '2021-1-5']]
drivers = pd.DataFrame(
        data, 
        columns=['driver_id', 
                 'join_date']).astype({'driver_id':'Int64', 
                 'join_date':'datetime64[ns]'})
display(drivers)


Unnamed: 0,driver_id,join_date
0,10,2019-12-10
1,8,2020-01-13
2,5,2020-02-16
3,7,2020-03-08
4,4,2020-05-17
5,1,2020-10-24
6,6,2021-01-05


In [76]:
data = [[6, 75, '2019-12-9'],
        [1, 54, '2020-2-9'], 
        [10, 63, '2020-3-4'], 
        [19, 39, '2020-4-6'], 
        [3, 41, '2020-6-3'], 
        [13, 52, '2020-6-22'], 
        [7, 69, '2020-7-16'], 
        [17, 70, '2020-8-25'], 
        [20, 81, '2020-11-2'], 
        [5, 57, '2020-11-9'], 
        [2, 42, '2020-12-9'], 
        [11, 68, '2021-1-11'], 
        [15, 32, '2021-1-17'], 
        [12, 11, '2021-1-19'], 
        [14, 18, '2021-1-27']]
rides = pd.DataFrame(
        data, 
        columns=['ride_id', 
                 'user_id', 
                 'requested_at']).astype({'ride_id':'Int64', 
                 'user_id':'Int64', 
                 'requested_at':'datetime64[ns]'})
display(rides)

Unnamed: 0,ride_id,user_id,requested_at
0,6,75,2019-12-09
1,1,54,2020-02-09
2,10,63,2020-03-04
3,19,39,2020-04-06
4,3,41,2020-06-03
5,13,52,2020-06-22
6,7,69,2020-07-16
7,17,70,2020-08-25
8,20,81,2020-11-02
9,5,57,2020-11-09


In [77]:
data = [[10, 10, 63, 38], 
        [13, 10, 73, 96], 
        [7, 8, 100, 28], 
        [17, 7, 119, 68], 
        [20, 1, 121, 92], 
        [5, 7, 42, 101], 
        [2, 4, 6, 38], 
        [11, 8, 37, 43], 
        [15, 8, 108, 82], 
        [12, 8, 38, 34], 
        [14, 1, 90, 74]]
accepted_rides = pd.DataFrame(
        data, 
        columns=['ride_id', 
                 'driver_id', 
                 'ride_distance', 
                 'ride_duration']).astype({'ride_id':'Int64', 
                 'driver_id':'Int64', 
                 'ride_distance':'Int64', 
                 'ride_duration':'Int64'})
display(accepted_rides)

Unnamed: 0,ride_id,driver_id,ride_distance,ride_duration
0,10,10,63,38
1,13,10,73,96
2,7,8,100,28
3,17,7,119,68
4,20,1,121,92
5,5,7,42,101
6,2,4,6,38
7,11,8,37,43
8,15,8,108,82
9,12,8,38,34


**Step 1. Create a DataFrame for Months**
- Creates a DataFrame months with a single column month, containing numbers from 1 to 12, representing all months in a year.
    

In [78]:
months = pd.DataFrame({'month': range(1, 13)})
display(months.head())

Unnamed: 0,month
0,1
1,2
2,3
3,4
4,5


**Step 2. Filter Drivers by Joining Year (Up to 2020)**
- Filters the drivers DataFrame to include only rows where the join_date is in the year 2020 or earlier.

**Step 3. Assign Driver Month**
- Creates a new column driver_month:
If the driver joined in or before 2019, assigns 1 (represents January).
- Otherwise, assigns the month of the join_date.

In [79]:
drivers= drivers[drivers['join_date'].dt.year<=2020]
drivers['driver_month']=drivers['join_date'].apply(lambda x: 1 if x.year<=2019 else x.month)
display(drivers)

Unnamed: 0,driver_id,join_date,driver_month
0,10,2019-12-10,1
1,8,2020-01-13,1
2,5,2020-02-16,2
3,7,2020-03-08,3
4,4,2020-05-17,5
5,1,2020-10-24,10


**Step 4. Count Active Drivers by Month**
- Groups the drivers DataFrame by driver_month.
- Counts the number of driver_id entries in each group, representing the count of active drivers per month.
- Resets the index and names the resulting column active_drivers.

In [80]:
active_drivers=drivers.groupby('driver_month')[
    'driver_id'].count().reset_index(name='active_drivers')
display(active_drivers)

Unnamed: 0,driver_month,active_drivers
0,1,2
1,2,1
2,3,1
3,5,1
4,10,1


**Step 5. Merge with Months and Fill Missing Values**
- Merges the months DataFrame with the active_drivers DataFrame: Matches month in months with driver_month in active_drivers. Performs a left join, keeping all months from months even if there are no active drivers for a month.
- Fills missing values (NaN) with 0 for months without active drivers.


In [81]:
df = months.merge(active_drivers,
                  left_on='month',
                  right_on='driver_month',
                  how='left').fillna(0)
display(df)

Unnamed: 0,month,driver_month,active_drivers
0,1,1.0,2
1,2,2.0,1
2,3,3.0,1
3,4,0.0,0
4,5,5.0,1
5,6,0.0,0
6,7,0.0,0
7,8,0.0,0
8,9,0.0,0
9,10,10.0,1


**Step 6. Calculate Cumulative Active Drivers**
- Computes the cumulative sum of active_drivers across months, adding up the total active drivers month by month.


In [82]:
df['active_drivers']=df['active_drivers'].cumsum()
display(df)

Unnamed: 0,month,driver_month,active_drivers
0,1,1.0,2
1,2,2.0,3
2,3,3.0,4
3,4,0.0,4
4,5,5.0,5
5,6,0.0,5
6,7,0.0,5
7,8,0.0,5
8,9,0.0,5
9,10,10.0,6


**Step 7. Filter Accepted Rides for 2020**
- Merges the rides DataFrame with accepted_rides using ride_id as the key, performing a right join.
- Filters the resulting accept_rides DataFrame to include only rows where requested_at is in the year 2020.

In [83]:
accept_rides = rides.merge(accepted_rides,
                           how='right',
                           on='ride_id')
accept_rides=accept_rides[accept_rides['requested_at'].dt.year==2020]
display(accept_rides)

Unnamed: 0,ride_id,user_id,requested_at,driver_id,ride_distance,ride_duration
0,10,63,2020-03-04,10,63,38
1,13,52,2020-06-22,10,73,96
2,7,69,2020-07-16,8,100,28
3,17,70,2020-08-25,7,119,68
4,20,81,2020-11-02,1,121,92
5,5,57,2020-11-09,7,42,101
6,2,42,2020-12-09,4,6,38


**Step 8. Add Ride Request Month**
- Creates a new column month in accept_rides, extracting the month from the requested_at datetime field.

In [84]:
accept_rides['month']=accept_rides['requested_at'].dt.month
display(accept_rides)

Unnamed: 0,ride_id,user_id,requested_at,driver_id,ride_distance,ride_duration,month
0,10,63,2020-03-04,10,63,38,3
1,13,52,2020-06-22,10,73,96,6
2,7,69,2020-07-16,8,100,28,7
3,17,70,2020-08-25,7,119,68,8
4,20,81,2020-11-02,1,121,92,11
5,5,57,2020-11-09,7,42,101,11
6,2,42,2020-12-09,4,6,38,12


**Step 9. Remove Duplicate Month-Driver Combinations**
- Removes duplicate entries based on the combination of month and driver_id to ensure that each driver contributes only once per month.

In [85]:
accept_rides.drop_duplicates(['month', 'driver_id'], inplace=True)
display(accept_rides)

Unnamed: 0,ride_id,user_id,requested_at,driver_id,ride_distance,ride_duration,month
0,10,63,2020-03-04,10,63,38,3
1,13,52,2020-06-22,10,73,96,6
2,7,69,2020-07-16,8,100,28,7
3,17,70,2020-08-25,7,119,68,8
4,20,81,2020-11-02,1,121,92,11
5,5,57,2020-11-09,7,42,101,11
6,2,42,2020-12-09,4,6,38,12


**Step 10. Count Accepted Rides by Month**
- Groups the accept_rides DataFrame by month.
- Counts the number of unique ride_id entries for each month.
- Resets the index and names the resulting column accept_rides.

In [86]:
accept_rides=accept_rides.groupby('month')['ride_id'].count().reset_index(name='accept_rides')
display(accept_rides)

Unnamed: 0,month,accept_rides
0,3,1
1,6,1
2,7,1
3,8,1
4,11,2
5,12,1


**Step 11. Merge Accepted Rides with Main DataFrame**
- Merges the df DataFrame with the accept_rides DataFrame: Matches on the month column. Performs a left join to keep all months from df.
- Fills missing values (NaN) with 0 for months without accepted rides.

In [87]:
df= df.merge(accept_rides,on='month',how='left').fillna(0)
display(df)

Unnamed: 0,month,driver_month,active_drivers,accept_rides
0,1,1.0,2,0
1,2,2.0,3,0
2,3,3.0,4,1
3,4,0.0,4,0
4,5,5.0,5,0
5,6,0.0,5,1
6,7,0.0,5,1
7,8,0.0,5,1
8,9,0.0,5,0
9,10,10.0,6,0


**Step 12. Calculate Working Percentage**
- Computes the working_percentage for each month
- If active_drivers is 0, assigns 0 (to avoid division by zero).
- Otherwise, calculates the percentage of accepted rides relative to active drivers and rounds to two decimal places.

In [88]:
df['working_percentage'] = np.where(df['active_drivers']==0,0, (df['accept_rides']/df['active_drivers']*100).round(2))
display(df)

Unnamed: 0,month,driver_month,active_drivers,accept_rides,working_percentage
0,1,1.0,2,0,0.0
1,2,2.0,3,0,0.0
2,3,3.0,4,1,25.0
3,4,0.0,4,0,0.0
4,5,5.0,5,0,0.0
5,6,0.0,5,1,20.0
6,7,0.0,5,1,20.0
7,8,0.0,5,1,20.0
8,9,0.0,5,0,0.0
9,10,10.0,6,0,0.0


**Step 13. Select Final Columns**
- Selects only the month and working_percentage columns for the final DataFrame.


In [89]:
df = df[['month','working_percentage']]
display(df)

Unnamed: 0,month,working_percentage
0,1,0.0
1,2,0.0
2,3,25.0
3,4,0.0
4,5,0.0
5,6,20.0
6,7,20.0
7,8,20.0
8,9,0.0
9,10,0.0


References: [1] https://leetcode.com/problems/hopper-company-queries-ii/description/?lang=pythondata