### Uber Cancellation and No Cab availability Problem


<b>Problem Statement:</b> You may have some experience of travelling to and from the airport. Have you ever used Uber or any other cab service for this travel? Did you at any time face the problem of cancellation by the driver or non-availability of cars?
<br>
Well, if these are the problems faced by customers, these very issues also impact the business of Uber. If drivers cancel the request of riders or if cars are unavailable, Uber loses out on its revenue. Let’s hear more about such problems that Uber faces during its operations.
<br>
As an analyst, you decide to address the problem Uber is facing - driver cancellation and non-availability of cars leading to loss of potential revenue. 
<br><br>
<b>Business Objective:</b>The aim of analysis is to identify the root cause of the problem (i.e. cancellation and non-availability of cars) and recommend ways to improve the situation. As a result of your analysis, you should be able to present to the client the root cause(s) and possible hypotheses of the problem(s) and recommend ways to improve them.  

<b>NOTE - </b> This data set is a masked data set which is similar to what data analysts at Uber handle. This is just a sample dataset which can be used for analysis.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [None]:
#read the data 
df = pd.read_csv('/kaggle/input/uber-rides-data-bw-city-and-airport/Uber Request Data.csv')

In [None]:
df.head()

In [None]:
#describe the dataset
df.info()

#### convert the columns to proper datatype

In [None]:
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'])
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'])

In [None]:
df.info()

In [None]:
df.head()

In [None]:
print('There are '+str(len(df['Request id'].unique()))+' unique request ids')

In [None]:
#request id is of no use for the analysis, lets remove this column
df.drop('Request id',axis=1,inplace=True)

In [None]:
df.head()

In [None]:
#change datatype of timestamps from object to timestamp
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'])
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'])

In [None]:
df.head()

In [None]:
df.info()

It is observed that there are missing values for columns driver id and drop timestamp, lets analyze those first
<ul>
    <li>Assumption is driver id is null for all the rides which are 'No Cars Available' status</li>
    <li>Drop time is null for all the records with 'Cancelled' status</li>
</ul>

In [None]:
df.groupby(['Status']).count()

From the above table it is evident that the assumptions are correct and functionally we don';t have any missing values


### Univariate Analysis
<hr>

In [None]:
df.describe()

It is clear that the driver ids are uniformly distributed thus we have almost equal number of rides or requests for all the 300 drivers.

Here are going to search for drivers who may not be working properly or cancelling rides or switched off their devices intentionally 

In [None]:
sns.distplot(df[df['Status']!='Trip Completed']['Driver id'])

It is a uniform distribution thus we can conclude that at least not a significant number of drivers are cancelling or turning the devices off intentionally.

In [None]:
df.Status.value_counts()*100/df.Status.count()

More than half of the requests are not getting fulfilled by uber.

In [None]:
#lets add some more columns for better understanding derived from timestamps
df['r_date'] = df['Request timestamp'].dt.date
df['r_time'] = df['Request timestamp'].dt.time
df['d_date'] = df['Drop timestamp'].dt.date
df['d_time'] = df['Drop timestamp'].dt.time
df['r_hour'] = df['Request timestamp'].dt.hour.astype(int)
day_of_week = {0:'Mon',1:'Tue',2:'Wed',3:'Thr',4:'Fri',5:'Sat',6:'Sun'}
# define bins and labels
bins = ['00:00:00', '03:00:00', '11:00:00','15:00:00', '17:00:00', '23:59:59']
labels = ['Late Night', 'Early Morning', 'Mid-Day','Evening', 'Late Evening']
df['dayofweek'] = df['Request timestamp'].dt.dayofweek.map(day_of_week)
df['timeofday'] = pd.cut(pd.to_timedelta(df['Request timestamp'].dt.time.astype(str)), bins=pd.to_timedelta(bins), labels=labels, ordered=False)


In [None]:
#drop the columns which are not required for analysis
df.drop(['Request timestamp','Drop timestamp','d_time','r_time'],axis=1,inplace=True)

In [None]:
df.head()

We have derived some new columns i.e r_date,d_date, dayofweek and timeofday

##### now we are concerned about the trips which are not completed so lets deep dive more into that and look for problems


In [None]:
df_new = df[df['Status']!='Trip Completed']
len(df_new)

In [None]:
#by pickup points
df_temp = df_new.groupby(['Status','Pickup point']).agg({'r_date':'count'}).reset_index(level=1)
pd.pivot_table(df_temp, values ='Pickup point',index=['Pickup point'],columns =['Status']).plot(kind='bar',stacked=True)
plt.show()

In [None]:
display(df_temp)

Observations:
<ol>
    <li>There are more cancelled trips when rides are from City to Airport</li>
    <li>There are more Not cabs found, when rides are from Airport to City</li>
</ol>

In [None]:
#by dayofweek
df_temp = df_new.groupby(['Status','dayofweek']).agg({'r_date':'count'}).reset_index(level=1)
pd.pivot_table(df_temp, values ='dayofweek',index=['dayofweek'],columns =['Status']).plot(kind='bar',stacked=True)
plt.show()

Voila!! We have found that that there is a huge spike of cancelled and cabs not found problem on Wednesday

Lets analysis ride requests on wednesday for Trips which are not successfully completed and see if we can get some insights

In [None]:
#by timeofday
df_temp = df_new.groupby(['Status','timeofday']).agg({'r_date':'count'}).reset_index(level=1)
pd.pivot_table(df_temp, values ='timeofday',index=['timeofday'],columns =['Status']).plot(kind='bar',stacked=True)
plt.show()

Clearly, early morning and late evenings should be our area of concerns which needs more detailing further.
<ol>
    <li>Late evenings(5 PM - 12 AM), we have a surge in no cars available</li>
    <li>Early mornings(3 AM - 11 AM), cancellation is more than no cars available</li>
</ol>


#### Findings by Univariate analysis
<ol>
    <li>Approx 60% of the rides are not being completed. It is a pressing issue for business</li>
    <li>Rides from airport to city suffers from 'No cars Available' more than city to airport</li>
    <li>As per the provided sample, it seems there is something happening on wednesday. It needs more attention</li>
    <li>Late evenings(5 PM - 12 AM), we have a surge in no cars available</li>
    <li>Early mornings(3 AM - 11 AM), cancellation is more than no cars available</li>
</ol>

### Bivariate Analysis
<hr>

In [None]:
#lets validate our finding number 2
df_temp = df_new.groupby(['dayofweek','Status','Pickup point']).agg({'r_date':'count'}).reset_index(level=0)
df_temp.sort_values('r_date',ascending=False)

In [None]:
#lets validate our finding number 2
df_temp = df_new.groupby(['timeofday','Status','Pickup point']).agg({'r_date':'count'}).reset_index(level=0)
df_temp.sort_values('r_date',ascending=False)

In [None]:
plt.figure(figsize=(15,7)) 
plt.xticks(np.arange(0, 23, 1.0))
keys = list(range(0,24))
df_new[df_new['dayofweek']=='Wed']['r_hour'].value_counts().sort_index().plot()
plt.show()

It is evident from the above graph that on wednesday, there is a peak of 'Cabs not available' and 'Cancelled' rides between 3 AM - 11 AM and 4 PM - 12 AM

#### Recommendation to business:
<ol>
    <li>It is identified that on Wednesdays and late evening, most of the users suffer from 'No Cabs Available'. Business should try to increase supply of rides during this time. As we show there is a peak of 'Cabs not available' and 'Cancelled' rides between 3 AM - 11 AM and 4 PM - 12 AM</li>
    <li>It is evident that most of the rides get cancelled early morning by drivers from city to airport, which could be due to several reasons i.e drivers are not willing to take rides, may be they are not able to find ride back to city easily in the morning.</li>
    <li>It is identied that during Wednesdays may be lot of flights land or takeoff and a huge demand is there during the early morning hours or late night times. "More analysis can be done on this to identify the exact timeframe when more drivers and cabs can be made available either by providing additional bonus or money to drivers for airport to city travels and vice versa"</li>
    
</ol>