<a href="https://colab.research.google.com/github/ClaytonSdS/UberDriver/blob/main/UberDriverViz_Sep2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **My Personal Uber Driver Stats**
### Share Phase
---

### **General code for plotting**

**Importing general useful libraries**


In [2]:
from datetime import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

In [3]:
data2viz =  pd.read_csv('UberDriverData_June2023_Processed_V1.csv', sep=",", quotechar='"')
data2viz['date'] = data2viz['parsed_datatime']
data2viz.drop('parsed_datatime', inplace=True, axis=1)
data2viz.drop('Unnamed: 0', inplace=True, axis=1)
data2viz.set_index('date')
data2viz['day_of_the_week'] = [data2viz['data_time_merged'][x].split(',')[0] for x in range(len(data2viz))];

Getting all neighborhood names

In [4]:
data2viz['origin'].drop_duplicates()
data2viz.groupby(['origin'])['origin'].count();
neighborhoods = list(data2viz.groupby(['origin'])['origin'].count().index)


In [5]:
days_of_the_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]


rides_per_dayWeek = data2viz.groupby(['day_of_the_week'])['day_of_the_week'].count().values
profit_per_dayWeek = {day: data2viz.loc[data2viz['day_of_the_week'] == day]['profit'].cumsum().values[-1] for day in days_of_the_week}

Creating a function that returns the part of the day based on time input

In [6]:
def filter(x):
  time = datetime.strptime(x, "%Y-%m-%d %H:%M:%S").time().strftime("%H:%M")

  if time >= "01:00" and time <= "11:59":
    return "morning"

  elif time == "12:00":
    return "noon"

  elif time > "12:00" and time <= "17:59":
    return "afternoon"

  elif time >= "18:00" and time <= "20:59":
    return "evening"

  else:
    return "night"


In [7]:
parts_of_day = [filter(i) for i in data2viz['date'].values] # applying the filter function in a list comprehesion
data2viz['parts_of_the_day'] = parts_of_day # creating the new column in the dataframe
rides = [data2viz.groupby(['origin'])['origin'].count()[ride] for ride in neighborhoods] # list comprehesion to have the number rides per neighborhood

Filtering and summing the total individual profit of each neighborhood using the cumsum() function

In [8]:
profit_per_neighborhod = {neighborhood: data2viz.loc[data2viz['origin'] == neighborhood]['profit'].cumsum().values[-1] for neighborhood in neighborhoods}

Finding the top 10 neighborhoods with the highest and lowest profits

In [9]:
from heapq import nlargest, nsmallest
nlargest_names = nlargest(10, profit_per_neighborhod, key=profit_per_neighborhod.get)
nsmallest_names = nsmallest(10, profit_per_neighborhod, key=profit_per_neighborhod.get)

In [10]:
def infostack (neighborhood, dataframe = data2viz):

  p = {0:{"morning":-1},
       1:{"noon":-1},
       2:{"afternoon":-1},
       3:{"evening":-1},
       4:{"night":-1},
       5:{'total':-1}}

  for n in range(5):
    try:
      part_of_the_day = list(p[n].keys())[0]
      p[n][part_of_the_day] = dataframe.loc[(dataframe['origin'] == neighborhood) & (dataframe['parts_of_the_day'] == part_of_the_day)]['profit'].cumsum().values[-1]


    except IndexError:
      p[n][part_of_the_day] = 0

  p[5]['total'] = dataframe.loc[dataframe['origin'] == neighborhood]['profit'].cumsum().values[-1]

  morning = p[0]['morning']
  noon = p[1]['noon']
  afternoon = p[2]['afternoon']
  evening = p[3]['evening']
  night = p[4]['night']
  total = p[5]['total']

  return (neighborhood, morning, noon, afternoon, evening, night, total)





In [11]:
neighborhood_profit = [infostack(x) for x in neighborhoods]
profit_df = pd.DataFrame(neighborhood_profit, columns = ['neighborhood', 'morning', 'noon', 'afternoon', 'evening', 'night', 'total_profit'])
profit_df['total_rides'] = rides

### **Results**

In [12]:
profitable_neighborhoods = profit_df.loc[profit_df['neighborhood'].isin(nlargest_names)]
profitable_neighborhoods = profitable_neighborhoods.sort_values('total_profit')


fig = px.bar(profitable_neighborhoods, x='neighborhood', y=['morning', 'noon', 'afternoon', 'evening', 'night'], title="Figure 1: Possible Profitable Neighborhoods", hover_data=['total_profit', 'total_rides'],
             color_discrete_sequence=px.colors.qualitative.Dark2)
fig.show()

In [13]:
unprofitable_neighborhoods = profit_df.loc[profit_df['neighborhood'].isin(nsmallest_names)]
unprofitable_neighborhoods = unprofitable_neighborhoods.sort_values('total_profit', ascending=False)

fig = px.bar(unprofitable_neighborhoods, x='neighborhood', y=['morning', 'noon', 'afternoon', 'evening', 'night'], title="Figure 2: Possible Non-Profitable Neighborhoods", hover_data=['total_profit', 'total_rides'],
             color_discrete_sequence=px.colors.qualitative.Dark2)
fig.show()

In [14]:
ProfitRides = pd.DataFrame(list(profit_per_dayWeek.items()), columns=['day_of_the_week', 'total_profit'])
ProfitRides['rides'] = rides_per_dayWeek
ProfitRides['profit/ride'] = ProfitRides['total_profit']/ProfitRides['rides']

In [15]:
figBar = px.bar(ProfitRides, x='day_of_the_week', y='total_profit', title="Figure 3: Total profit by day of the week",
             color_discrete_sequence=px.colors.sequential.RdBu, color = 'day_of_the_week')
figBar.show()

In [16]:
figBar = px.bar(ProfitRides, x='day_of_the_week', y='rides', title="Figure 4: Total rides by day of the week",
             color_discrete_sequence=px.colors.sequential.RdBu, color = 'day_of_the_week')
figBar.show()

In [19]:
figBar = px.bar(ProfitRides, x='day_of_the_week', y='profit/ride', title="Figure 5: Total Profit per Ride",
             color_discrete_sequence=px.colors.sequential.RdBu, color = 'day_of_the_week')
figBar.show()

### **About the Trends**



After analyzing the visualizations presented earlier and the collected data, we present the following interpretation:

Regarding the data, a low frequency of occurrence was observed for some records. This imbalance can be seen in Figure 1, titled "Non-Profitable Neighborhoods," where it is evident that all neighborhoods have only a single recorded ride. This may indicate that these records do not necessarily accurately represent the profitability of these neighborhoods, and therefore the use of the term "possible."

However, for neighborhoods with higher profit, the opposite is observed. It can be seen that the most profitable neighborhoods are located in the central region of the city, such as the Centro, Vila Redentora, Boa Vista, among others, as shown in Figure 2. These neighborhoods, in addition to having good data, have a good frequency of occurrence, indicating a profitability trend in them.

Next, we observe a profit growth trend throughout the week (Figure 3), and, in contrast, an oscillatory pattern of increase and decrease in ride counts during the weekdays (Figure 4). These two figures can be combined, and this behavior of low predictability can be mitigated and more easily interpreted when we consider the variable as the result of dividing the total profit by the number of rides for each day of the week (Figure 5).

Through this figure, we can notice a profit increase trend throughout the week. This metric is more effective in normalizing the results and allowing a comparison of performance on each day of the week in question.

In this regard, we can summarize this trend as an increase in profitability per ride throughout the week, meaning that fewer rides were needed to achieve a higher net revenue.



### **Final Result and Recommendations**

It is concluded that the main variables affecting profitability are neighborhood locations and weekdays. Therefore, with the goal of maximizing profitability and effective working time, it is recommended to position oneself near central regions since these areas are more likely to yield profit as well as more profitable rides.

Regarding weekdays, it is concluded that during the early weekdays, to achieve higher revenue, it will be necessary to carry out a greater number of rides. This is because, as indicated by the data, the trend is low revenue followed by a high number of rides per day.