# Exploratory Data Analysis for 'Yellow' and 'Pink' Cab Companies
## G2M Case Study for the two Cab Companies

### In Total We Have 4 Datasets

The First One Containing
* Date of Travel
* Company (Pink Cab, or Yellow Cab)
* Kilometers Travelled per Trip
* Price Charged per Trip
* Cost of Trip
* Transaction ID

The Second One Containing

* Gender of Customer
* Age of Customer
* Income (USD/Month)
* Customer ID

The Third One Containing 

* Mode of Payment for Each Trip
* Customer ID
* Transaction ID

The Fourth One Containing 

* Names of 20 Cities 
* Population of Each City
* Number of Cab Users for Each City 

#### We begin by seting up the notebook and importing the tools we will need

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from matplotlib.ticker import FuncFormatter
import pandas as pd
import datetime as dt

#### We define the working directory and load the first dataset

In [None]:
dataset = pd.read_csv("https://raw.githubusercontent.com/EniasVontas/DataSets/main/Cab_Data.csv")
dataset.dtypes

##### Date of Travel is an integer and refers to the distance from the date (1899/12/30) the excel files use as origin date
##### We convert it and select only dates from 2016/12/31 until 2018/12/31. We verify the result

In [None]:
dataset['Date of Travel']= pd.TimedeltaIndex(dataset['Date of Travel'], unit='d') + dt.datetime(1899, 12, 30)
dataset = dataset[dataset['Date of Travel'] > '2016-01-30 00:00:00']

In [None]:
dataset['Date of Travel'].min(), dataset['Date of Travel'].max()

##### We extract the year, month, day and weekday from 'Date of Travel' column

In [None]:
dataset['year'] = dataset['Date of Travel'].dt.year
dataset['month'] = dataset['Date of Travel'].dt.month
dataset['day'] = dataset["Date of Travel"].dt.day

dayOfWeek={0:'Monday', 1:'Tuesday', 2:'Wednesday', 3:'Thursday', 4:'Friday', 5:'Saturday', 6:'Sunday'}
dataset['weekday'] = dataset['Date of Travel'].dt.dayofweek.map(dayOfWeek)

#### We load the rest of the datasets

In [None]:
df_customers = pd.read_csv("https://raw.githubusercontent.com/EniasVontas/DataSets/main/Customer_ID.csv")
df_transactions = pd.read_csv("https://raw.githubusercontent.com/EniasVontas/DataSets/main/Transaction_ID.csv")

#### We merge the Customer_ID and the Transaction_ID datasets on 'Customer_ID' key 
#### We do this because we want to keep records of customers for each of the transactions we have

In [None]:
df = pd.merge(df_customers, df_transactions, on="Customer ID")

#### Next we merge our dataset with the already merged datasets on 'Transaction_ID' key and compute the Profit Column for each Transaction

In [None]:
dataset = pd.merge(dataset,df,on="Transaction ID")

dataset['Profit'] = dataset['Price Charged'] - dataset['Cost of Trip']

##### We load the Cities dataset and replace the " , " that is in the numbers as they are perceived as objects in python

In [None]:
df_cities = pd.read_csv("https://raw.githubusercontent.com/EniasVontas/DataSets/main/City.csv")

df_cities["Population"] = df_cities.Population.str.replace("," , "")
df_cities["Users"] = df_cities.Users.str.replace("," , "")

##### We modify the two columns to be perceived as integers from python

In [None]:
df_cities["Users"] = df_cities["Users"].astype(str).astype(int)
df_cities["Population"] = df_cities["Population"].astype(str).astype(int)

##### We drop the row for 'San Francisco CA' as there are no observations in our dataset for this city 
##### This is something that we discover later but choose to remove it from now

In [None]:
df_cities = df_cities.drop([14],axis=0)

### The areas and the hypotheses we are going to investigate are provided below

*  What are the Profits of each company for each year (Total and Mean)? 
*  Ia there any seasonality in the Profit? By year, weekday or quarter?
*  What are our customers like (Age, Monthly Income)? 
*  What are the number of customers for each company? Do they seem to change year after year?
*  Is there a difference in customer segmentation for the cities? Do people prefer different cab companies in each city?


###  Profit Analysis
#####  Calculate Total Profit for each company for every year and plot the results
We use function from https://dfrieds.com/data-visualizations/how-format-large-tick-values.html in order to turn large tick values of y axis into Millions (M) and Thousands (K)

Yellow Cab company has greater total profit for each year compared  to Pink Cab company. Greatest value for Yellow Cab being at 16.5 Million in 2017 and for Pink Cab at 2 Million again at 2017

In [None]:
def reformat_large_tick_values(tick_val, pos):
    """
    Turns large tick values (in the billions, millions and thousands) such as 4500 into 4.5K and also appropriately turns 4000 into 4K (no zero after the decimal).
    """
    if tick_val >= 1000000000:
        val = round(tick_val/1000000000, 1)
        new_tick_format = '{:}B'.format(val)
    elif tick_val >= 1000000:
        val = round(tick_val/1000000, 1)
        new_tick_format = '{:}M'.format(val)
    elif tick_val >= 1000:
        val = round(tick_val/1000, 1)
        new_tick_format = '{:}K'.format(val)
    elif tick_val < 1000:
        new_tick_format = round(tick_val, 1)
    else:
        new_tick_format = tick_val

    # make new_tick_format into a string value
    new_tick_format = str(new_tick_format)
    
    # code below will keep 4.5M as is but change values such as 4.0M to 4M since that zero after the decimal isn't needed
    index_of_decimal = new_tick_format.find(".")
    
    if index_of_decimal != -1:
        value_after_decimal = new_tick_format[index_of_decimal+1]
        if value_after_decimal == "0":
            # remove the 0 after the decimal point since it's not needed
            new_tick_format = new_tick_format[0:index_of_decimal] + new_tick_format[index_of_decimal+2:]
            
    return new_tick_format

names = [2016,2017,2018]

### Profit of each company for each year  ###
df = dataset.groupby(['year','Company'])['Profit'].sum()
df.unstack().plot(linestyle='--',marker='o')
plt.title("Total Profit per Year")
ax = plt.gca()
ax.yaxis.set_major_formatter(ticker.FuncFormatter(reformat_large_tick_values))
y_axis = ax.axes.get_yaxis()
y_axis.set_label_text("Total Profit ($)")
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
plt.xticks(names, rotation=45)
plt.show()
df

####  Calculate Mean Trip Profit for each company for every year and plot the results
Yellow Cab company seems to have greater mean profit per trip compared to Pink Cab company

In [None]:
by_year = (dataset
           .groupby(['year','Company'])
           .agg({'Profit':'mean'}))
           
by_year.unstack().plot(linestyle='--',marker='o')
plt.title("Mean Trip Profit per Year")
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Mean Profit ($)")
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
L = plt.legend()
L.get_texts()[0].set_text('Pink Cab')
L.get_texts()[1].set_text('Yellow Cab')
plt.xticks([2016,2017,2018], rotation=45)
plt.show()

####  Calculate mean trip profit for every day of the month, regardless of year
There is some fluctuation (we will see a reason why in the next plot) in the Profit but it seems to remain relatively constant for both companies

In [None]:
by_day = (dataset
              .groupby(['day','Company'])
              .agg({"Profit":'mean'}))
by_day.unstack().plot(linestyle='--',marker='o')
plt.title("Mean Profit for each day of Year")
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Mean Profit ($)")
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text("Day of Month")
L = plt.legend()
L.get_texts()[0].set_text('Pink Cab')
L.get_texts()[1].set_text('Yellow Cab')

#### Calculate mean profit for weekdays and plot the results
We see that highest mean Profit is on Sunday, then on Friday and Saturday and then Monday followed by the rest three days of the week. 
This is observed in both companies, meaning that people tend to use cab services mostly on the weekend and Friday.

In [None]:
by_weekday = (dataset
              .groupby(['weekday','Company'])
              .agg({"Profit":'mean'}))
by_weekday.unstack().plot(linestyle='--',marker='o')
plt.title("Mean Profit for Weekdays")
plt.xticks(rotation=45)
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Mean Profit ($)")
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text("Day of Month")
x_label = x_axis.get_label()
x_label.set_visible(False)
L = plt.legend()
L.get_texts()[0].set_text('Pink Cab')
L.get_texts()[1].set_text('Yellow Cab')

#### Calculate profit change in Quarter of every year for each company. 
Vertical lines represent Q4 of every year, which is also highest point. We notice seasonality in the profit data for both companies 

In [None]:
dataset["quarter"] = dataset['Date of Travel'].dt.to_period("Q")

df = dataset.groupby(["quarter","Company"])["Profit"].sum()
df.unstack().plot(linestyle='-',marker='o')
plt.title("Total Profit per Quarter of each Year")
plt.axvline(x='2016Q4',color="k",linestyle="--")
plt.axvline(x='2017Q4',color="k",linestyle="--")
ax = plt.gca()
ax.yaxis.set_major_formatter(ticker.FuncFormatter(reformat_large_tick_values))
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Total Profit ($)")
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
plt.show()

### Trips and Customer Analysis
##### Calculate total number of trips by company for weekdays

In [None]:
by_weekday = dataset.groupby(['weekday','Company'])["weekday"].count().unstack("Company").fillna(0)
by_weekday = by_weekday.sort_values("Yellow Cab",ascending=False)
by_weekday[["Pink Cab","Yellow Cab"]].plot(kind='bar',stacked=False)  
plt.title("Total Number of Trips for Weekdays")
plt.xticks(rotation=45)
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Number of Trips") 
plt.show()

##### Calculate Number of Customers by Company for Weekdays
Taking into account the previous plot as well, we can see that Yellow company has more customers using its services compared to Pink Company. We can also see that Yellow company's customers take more rides than Pink cab's.

In [None]:
by_weekday = dataset.groupby(['weekday','Company'])["Customer ID"].nunique().unstack("Company").fillna(0)
by_weekday = by_weekday.sort_values("Yellow Cab",ascending=False)
by_weekday[["Pink Cab","Yellow Cab"]].plot(kind='bar',stacked=False)
plt.title("Total Number of Customers for Weekdays")
plt.ylim(0,25000)             
plt.xticks(rotation=45)
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Number of Customers")
plt.show()

##### Calculate Unique Customers  per year for each company
Again we can see more customers for Yellow company than for Pink

In [None]:
item_counts = dataset.groupby(["year","Company"])["Customer ID"].nunique().unstack("Company").fillna(0)

ax = item_counts.plot(kind='bar',stacked=False)
ax.set_alpha(0.8)
plt.title("Number of Customers for each Year")
plt.ylim(0,37000)
plt.xticks(rotation=45)
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Total Number of Customers")
for i in ax.patches:
    
    ax.text(i.get_x(), i.get_height()+500, \
            str(round((i.get_height()), 2)), fontsize=9, color='dimgrey')
plt.legend(loc="upper left")        
plt.show()

 ##### Calculate total number of unique customers of each company for every city   

In [None]:
item_counts = dataset.groupby(["Company","City"])["Customer ID"].nunique().unstack("Company").fillna(0)
item_counts = item_counts.sort_values("Yellow Cab",ascending=False)
item_counts.plot(kind='bar')
plt.title("Customers per City")
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Customers")

#### Calculate percentage of customers per city cab users for all 19 cities
#We can see that Yellow cab has biggest difference from Pink cab in number of customers for the cities of Washington, New York, Chicago and Boston. We also see an interesting result, where the % of customers for Pink company is greater than that of Yellow company in the cities of San Diego, Sacramento, Pittsburgh and Nashville.

In [None]:
by_ct = dataset.groupby(['City','Company'])['Customer ID'].nunique().unstack("Company").fillna(0)

cities = pd.merge(df_cities,by_ct,on="City")
cities["Pink Cab"] = cities["Pink Cab"] / cities["Users"]
cities["Yellow Cab"] = cities["Yellow Cab"] / cities["Users"]
cities = cities.drop("Population",axis=1)
cities = cities.drop("Users",axis=1)
cities = cities.sort_values("Yellow Cab",ascending=False)
labels = cities["City"]
cities.plot(kind='bar')
plt.title("% of Customers per City Cab Users")
ax = plt.axes()
y_axis = ax.axes.get_yaxis()
y_axis.set_label_text("% of Customers")
ax.set_xticks(range(0,len(labels)))
ax.set_xticklabels(labels)
plt.tight_layout()
plt.show()

#### Calculate the percentage of user covered by each company in all 19 cities
We observe that in total Yellow company reaches more customers than Pink company in these 19 cities

In [None]:
item_counts = dataset.groupby("Company")["Customer ID"].nunique()
a=item_counts[[1]]/df_cities["Users"].sum()
b=item_counts[[0]]/df_cities["Users"].sum()
Users = [a[0]*100,b[0]*100]
labels = ["Yellow Cab","Pink Cab"]
ax = pd.Series(Users).plot(kind='bar',color=["darkorange","royalblue"])
plt.title("% of Cab Users Covered in all 19 Cities")
plt.ylim(0.4)
ax.set_xticks(range(0,2))
ax.set_xticklabels(labels,rotation = 0)
rects = ax.patches
labels = [str(Users[0].round(3)),str(Users[1].round(3))]
for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax.text(rect.get_x() + rect.get_width() / 2, height, label,
            ha='center', va='bottom')

### Analysis by Customer Demographics
#### Calculate number of Male and Female customers in total and plot results. 
We observe that most customers are Male (201948) than Female (153074). And in the plot we can see the gender distribution in each company 

In [None]:
dataset.groupby(["Gender"]).size()

In [None]:
df = dataset.groupby(["Gender","Company"]).size()
df.unstack("Company").plot(kind='bar',stacked=False)
plt.title("Number of Cab Users by Gender")
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Customers")
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
plt.xticks(rotation=45)

#### Calculate profit by gender for each company
We can see that most Profit for each company is generated by Male customers 

In [None]:
df = dataset.groupby(["Gender","Company"])["Profit"].sum()
df.unstack("Company").plot(kind='bar',stacked=False)
plt.title("Total Profit by Gender")
ax = plt.gca()
ax.yaxis.set_major_formatter(ticker.FuncFormatter(reformat_large_tick_values))
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Total Profit ($)")
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
plt.xticks(rotation=45)

#### Calculate profit by age group of customers
We observe that for both companies the greatest contribution in Total Profit come from the two age groups [27 - 40] and [18 - 26], again with yellow company earning more in total

In [None]:
dataset["Age Group"] = pd.cut(dataset.Age,bins=[18,27,40,55,99],labels=["[18-26]","[27-40]","[41-55]","[56+]"])
by_agegroup = (dataset
              .groupby(["Age Group","year",'Company'])
              .agg({"Profit":'sum'}))
labels = ["2016,Pink Cab","2016,Yellow Cab","2017,Pink Cab","2017,Yellow Cab","2018,Pink Cab","2018,Yellow Cab"]
by_agegroup.unstack("Age Group").plot(kind='bar',stacked=False)
plt.title("Total Profit by Age Group")
ax = plt.gca()
ax.yaxis.set_major_formatter(ticker.FuncFormatter(reformat_large_tick_values))
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Total Profit ($)")
ax1.set_xticks(range(0,6))
ax1.set_xticklabels(labels,rotation = 45)
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
plt.ylim(0,8500000)
L = plt.legend()
L.get_texts()[0].set_text('[18-26]')
L.get_texts()[1].set_text('[27-40]')
L.get_texts()[2].set_text('[41-55]')
L.get_texts()[3].set_text('[55+]')
plt.show()

#### Calculate customer percentage by age group 
Again we observe that most customers for both companies come from the age groups  [27 - 40] and [18 - 26]

In [None]:
by_agegroup = (dataset
              .groupby(["Age Group","year",'Company'])
              .agg({"Profit":'count'}))

labels = ["2016,Pink Cab","2016,Yellow Cab","2017,Pink Cab","2017,Yellow Cab","2018,Pink Cab","2018,Yellow Cab"]
by_agegroup["user"] = by_agegroup["Profit"] / by_agegroup["Profit"].sum()
by_agegroup = by_agegroup.drop("Profit",axis=1)
by_agegroup.unstack("Age Group").plot(kind='bar',stacked=True)
plt.title("Customer % by Age Group")
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("% of Customers")
ax1.set_xticks(range(0,6))
ax1.set_xticklabels(labels,rotation = 45)
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
plt.ylim(0,0.35)
plt.legend(loc='upper left')
L = plt.legend()
L.get_texts()[0].set_text('[18-26]')
L.get_texts()[1].set_text('[27-40]')
L.get_texts()[2].set_text('[41-55]')
L.get_texts()[3].set_text('[55+]')
plt.show()

#### Calculate income wise profit for each company
We can see that Middle and Upper class customers provide the majority of profits for each company

In [None]:
dataset["Income Group"] = pd.cut(dataset["Income (USD/Month)"],bins=[0,3000,15000,50000],labels=["[0-3000]","[3000-15000]","[15000+]"])
by_income = (dataset
              .groupby(["Income Group","year",'Company'])
              .agg({"Profit":'sum'}))
labels = ["2016,Pink Cab","2016,Yellow Cab","2017,Pink Cab","2017,Yellow Cab","2018,Pink Cab","2018,Yellow Cab"]
by_income.unstack("Income Group").plot(kind='bar',stacked=False)
plt.title("Total Proft by Monthly Income Group")
ax = plt.gca()
ax.yaxis.set_major_formatter(ticker.FuncFormatter(reformat_large_tick_values))
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Total Profit ($)")
ax1.set_xticks(range(0,6))
ax1.set_xticklabels(labels,rotation = 45)
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
plt.ylim(0,10000000)
L = plt.legend()
L.get_texts()[0].set_text('[0-3000]')
L.get_texts()[1].set_text('[3000-15000]')
L.get_texts()[2].set_text('[15000+]')
plt.show()

#### Calculate customer base for each income class by year by company
Again, the majority of customers come from Middle and Upper class citizens

In [None]:
dataset["Income Group"] = pd.cut(dataset["Income (USD/Month)"],bins=[0,3000,15000,50000],labels=["[0-3000]","[3000-15000]","[15000+]"])
by_income = (dataset
              .groupby(["Income Group","year",'Company'])
              .agg({"Profit":'count'}))
labels = ["2016,Pink Cab","2016,Yellow Cab","2017,Pink Cab","2017,Yellow Cab","2018,Pink Cab","2018,Yellow Cab"]
by_income.unstack("Income Group").plot(kind='bar',stacked=False)
plt.title("Number of Customers by Monthly Income Group")
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_axis.set_label_text("Customers")
ax1.set_xticks(range(0,6))
ax1.set_xticklabels(labels,rotation = 45)
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
x_label.set_visible(False)
L = plt.legend()
L.get_texts()[0].set_text('[0-3000]')
L.get_texts()[1].set_text('[3000-15000]')
L.get_texts()[2].set_text('[15000+]')
plt.show()

### Customer Retention Analysis
We would like to see how many customers use the cab services repeatedly.
We select only customers with [1- 5] trips and those that have >6 trips and calculate for each company

In [None]:
df = dataset.groupby(['Customer ID',"Company",'year']).size().unstack("Company").fillna(0)

df_P = df[df["Pink Cab"] != 0]
df_P = pd.cut(df_P["Pink Cab"],bins=[0,5,50],labels=["[1-5]","[5+]"]).to_frame()
df1 = df_P.groupby(["year"]).size().to_frame()
df_Y = df[df["Yellow Cab"] != 0]
df_Y = pd.cut(df_Y["Yellow Cab"],bins=[0,5,50],labels=["[1-5]","[5+]"]).to_frame()
df_P1 = df_P[df_P['Pink Cab'] == "[1-5]"]
df_P2 = df_P[df_P['Pink Cab'] != "[1-5]"]

df_Y1 = df_Y[df_Y['Yellow Cab'] == "[1-5]"]
df_Y2 = df_Y[df_Y['Yellow Cab'] != "[1-5]"]

p1 = df_P1.groupby(['year']).size()
p2 = df_P2.groupby(['year']).size()
y1 = df_Y1.groupby(['year']).size()
y2 = df_Y2.groupby(['year']).size()

#### Plot the number of customers with 1-5 trips with each  company 
We can see that while there is still a difference between the companies, with Yellow company having more customers, it is not that substantial

In [None]:
labels = ["2016","2017","2018"]
yellow = [y1.iloc[0],y1.iloc[1],y1.iloc[2]]
pink = [p1.iloc[0],p1.iloc[1],p1.iloc[2]]
x = np.arange(len(labels))
width = 0.35
fig, ax = plt.subplots()
rects1 = ax.bar(x+width/2,pink,width,label="Pink Cab")
rects2 = ax.bar(x-width/2,yellow,width,label="Yellow Cab")
ax.set_ylabel('Number of Trips')
ax.set_title('Less than or equal to 5 trips')
plt.ylim(0,27000)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
for i in ax.patches:
    
    ax.text(i.get_x()+0.05, i.get_height()+55, \
            str(round((i.get_height()), 2)), fontsize=9, color='dimgrey')
plt.tight_layout()
plt.show()


#### Plot the number of customers with >6 trips with each  company 
Here we can see a very big difference in customer willingness to choose Yellow company over Pink company repeatedly

In [None]:
labels = ["2016","2017","2018"]
yellow = [y2.iloc[0],y2.iloc[1],y2.iloc[2]]
pink = [p2.iloc[0],p2.iloc[1],p2.iloc[2]]
x = np.arange(len(labels))
width = 0.35
fig, ax = plt.subplots()
rects1 = ax.bar(x+width/2,pink,width,label="Pink Cab")
rects2 = ax.bar(x-width/2,yellow,width,label="Yellow Cab")
ax.set_ylabel('Number of Trips')
ax.set_title('Greater than 5 trips')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
for i in ax.patches:
    
    ax.text(i.get_x()+0.06, i.get_height()+55, \
            str(round((i.get_height()), 2)), fontsize=9, color='dimgrey')
plt.tight_layout()
plt.show()