# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
name - Deepika k

# **Project Summary -**

A wide range of projects, including the original company's business and technological evolution, or data science projects analyzing Uber data. Common themes in data projects involve using machine learning to predict ride prices or clustering pickup locations, while broader business projects analyze market position, strategic growth, and challenges like regulatory battles.

# **GitHub Link -**

Provide your GitHub Link here.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler
import plotly.express as px
import random
from wordcloud import wordcloud
import ast
import statsmodels as stat
import geopandas as gpd
import missingno as ms

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
df=pd.read_csv('//content//drive//MyDrive//Uber.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
ms.bar(df)

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df['Driver id']=df['Driver id'].fillna(df['Driver id'].mode()[0])

In [None]:
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], format='%d/%m/%y %H:%M')

In [None]:
df['Drop timestamp']=df['Drop timestamp'].fillna(df['Drop timestamp'].mode()[0])


In [None]:
df.isnull().sum()

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.style.use('dark_background')
plt.figure(figsize=(8,8))
ax=sns.countplot(x='Status',data=df,hue='Status',palette="dark",legend=0)
for p in ax.patches:
  count=int(p.get_height())
  x=p.get_x()+p.get_width()/2
  y=p.get_height()
  ax.annotate(
  str(count),
  (x,y),ha='center',va='bottom')
plt.title('count each status type')
plt.xlabel('Status')
plt.ylabel('count')
plt.grid(False)
plt.tight_layout()
plt.show()

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(8,8))
ax=sns.countplot(x='Pickup point',data=df,hue='Pickup point',palette="coolwarm",legend=0)
for p in ax.patches:
  count=int(p.get_height())
  x=p.get_x()+p.get_width()/2
  y=p.get_height()
  ax.annotate(
  str(count),
  (x,y),ha='center',va='bottom')
plt.title('count each pickup point')
plt.xlabel('Pickup point')
plt.ylabel('count')
plt.grid(False)
plt.tight_layout()
plt.show()


#### Chart - 3

In [None]:
# Chart - 3 visualization code
trip_c=df[df.Status=='Trip Completed'].groupby('Pickup point')['Pickup point'].count()
trip_c_1=pd.DataFrame(trip_c)
trip_c_1.rename(columns={'Pickup point': 'count'}, inplace=True)
trip_c_1.reset_index(inplace=True)
plt.figure(figsize=(8,8))
ax=sns.barplot(x='Pickup point',y='count',data=trip_c_1,hue='Pickup point',palette="coolwarm",legend=0)
for p in ax.patches:
  count=int(p.get_height())
  x=p.get_x()+p.get_width()/2
  y=p.get_height()
  ax.annotate(
  str(count),
  (x,y),ha='center',va='bottom')
plt.title('count each pickup fro trip completed')
plt.xlabel('Pickup point')
plt.ylabel('count')
plt.grid(False)
plt.tight_layout()
plt.show()

#### Chart - 4

In [None]:
# Chart - 4 visualization code
cancel=df[df.Status=='Cancelled'].groupby('Pickup point')['Pickup point'].count()
cancel_1=pd.DataFrame(cancel)
cancel_1.rename(columns={'Pickup point': 'count'}, inplace=True)
cancel_1.reset_index(inplace=True)
plt.figure(figsize=(8,8))
ax=sns.barplot(x='Pickup point',y='count',data=cancel_1,hue='Pickup point',palette="coolwarm",legend=0)
for p in ax.patches:
  count=int(p.get_height())
  x=p.get_x()+p.get_width()/2
  y=p.get_height()
  ax.annotate(
  str(count),
  (x,y),ha='center',va='bottom')
plt.title('count each pickup cancelled')
plt.xlabel('Pickup point')
plt.ylabel('count')
plt.grid(False)
plt.tight_layout()
plt.show()


#### Chart - 5

In [None]:
# Chart - 5 visualization code
nocars=df[df.Status=='No Cars Available'].groupby('Pickup point')['Pickup point'].count()
nocars_1=pd.DataFrame(nocars)
nocars_1.rename(columns={'Pickup point': 'count'}, inplace=True)
nocars_1.reset_index(inplace=True)
plt.figure(figsize=(8,8))
ax=sns.barplot(x='Pickup point',y='count',data=nocars_1,hue='Pickup point',palette="coolwarm",legend=0)
for p in ax.patches:
  count=int(p.get_height())
  x=p.get_x()+p.get_width()/2
  y=p.get_height()
  ax.annotate(
  str(count),
  (x,y),ha='center',va='bottom')
plt.title('count no cars')
plt.xlabel('Pickup point')
plt.ylabel('count')
plt.grid(False)
plt.tight_layout()
plt.show()


In [None]:
# Chart - 6 visualization code
airport_df=df[df['Pickup point'].str.strip()=='Airport']
type_count=airport_df['Status'].value_counts().reset_index()
type_count.columns=['Status','count']
fig=px.treemap(type_count,path=['Status'],values='count',title='Status Count in Airport')
fig.show()

#### Chart - 7

In [None]:
# Chart - 7 visualization code
city_df=df[df['Pickup point'].str.strip()=='City']
city_count=city_df['Status'].value_counts().reset_index()
city_count.columns=['Status','count']
fig=px.treemap(city_count,path=['Status'],values='count',title='Status Count in city')
fig.show()


#### Chart - 8

In [None]:
# Chart - 8 visualization code
p_1=df.groupby(["Request timestamp","Pickup point"]).size().unstack(fill_value=0)
p_2=p_1.div(p_1.sum(axis=1), axis=0)*100

top_p=p_1.sum(axis=1).sort_values(ascending=False).head(20).index
p_top20=p_2.loc[top_p]

fig, ax=plt.subplots(figsize=(15,8))
p_top20.plot(kind="barh",stacked=True,colormap="summer",ax=ax)

plt.xlabel("Distribution for pickup point")
plt.ylabel("Request Timestamp")
plt.title("Distribution of pickup point vs Request time")
plt.legend(title="Pickup point",bbox_to_anchor=(1.05,1),loc="upper left")

plt.tight_layout()
plt.show()

#### Chart - 9

In [None]:
# Chart - 9 visualization code
s_1=df.groupby(["Request timestamp","Status"]).size().unstack(fill_value=0)
s_2=s_1.div(s_1.sum(axis=1), axis=0)*100

top_s=s_1.sum(axis=1).sort_values(ascending=False).head(20).index
s_top20=s_2.loc[top_s]

fig, ax=plt.subplots(figsize=(15,8))
s_top20.plot(kind="barh",stacked=True,colormap="plasma",ax=ax)

plt.xlabel("Distribution for status")
plt.ylabel("Request Timestamp")
plt.title("Distribution of status vs Request time")
plt.legend(title="status",bbox_to_anchor=(1.05,1),loc="upper left")

plt.tight_layout()
plt.show()

#### Chart - 10

In [None]:
# Chart - 10 visualization code
pd_1=df.groupby(["Drop timestamp","Pickup point"]).size().unstack(fill_value=0)
pd_2=pd_1.div(pd_1.sum(axis=1), axis=0)*100

top_pd=pd_1.sum(axis=1).sort_values(ascending=False).head(20).index
pd_top20=pd_2.loc[top_pd]

fig, ax=plt.subplots(figsize=(15,8))
pd_top20.plot(kind="barh",stacked=True,colormap="coolwarm",ax=ax)

plt.xlabel("Distribution for pickup point")
plt.ylabel("Drop Timestamp")
plt.title("Distribution of pickup point vs Drop time")
plt.legend(title="Pickup point",bbox_to_anchor=(1.05,1),loc="upper left")

plt.tight_layout()
plt.show()

#### Chart - 11

In [None]:
# Chart - 11 visualization code
sd_1=df.groupby(["Drop timestamp","Status"]).size().unstack(fill_value=0)
sd_2=sd_1.div(sd_1.sum(axis=1), axis=0)*100

top_sd=sd_1.sum(axis=1).sort_values(ascending=False).head(20).index
sd_top20=sd_2.loc[top_sd]

fig, ax=plt.subplots(figsize=(15,8))
sd_top20.plot(kind="barh",stacked=True,colormap="plasma",ax=ax)

plt.xlabel("Distribution for status")
plt.ylabel("Drop Timestamp")
plt.title("Distribution of status vs Drop time")
plt.legend(title="status",bbox_to_anchor=(1.05,1),loc="upper left")

plt.tight_layout()
plt.show()

#### Chart - 12

In [None]:
# Chart - 12 visualization code
co= df['Request timestamp'].value_counts().head(10)
plt.figure(figsize=(15,7))
co.plot(
    kind='pie',
    labels=co.index,
    autopct=lambda p:'{:.0f}%'.format(p),
    colors=['yellow','crimson'],
    startangle=90,
    wedgeprops={'edgecolor':'black'},
    textprops={'fontsize':14}
)
plt.title("Top 10 Request timestamp")
plt.ylabel('')
plt.tight_layout()
plt.show()

#### Chart - 13

In [None]:
# Chart - 13 visualization code
df.columns=df.columns.str.strip()

df['Driver id']=pd.to_numeric(df['Driver id'], errors='coerce')
df['Request id']=pd.to_numeric(df['Request id'], errors='coerce')

df=df.dropna(subset=['Driver id','Request id'])
top=df[['Request id','Driver id']].sort_values(by='Driver id',ascending=False).head(15)
plt.figure(figsize=(15,7))

plt.plot(top['Request id'], top['Driver id'],marker='o',color='tab:blue',linestyle='-')

plt.xlabel("Request id")
plt.ylabel("Driver id")
plt.title("Top 15 Request id vs Driver id")
plt.grid(True)
plt.show()

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
numeric=df.select_dtypes(include=['int64','float64'])
corr=numeric.corr().fillna(0)
plt.figure(figsize=(15,10))

sns.heatmap(corr,annot=True,cmap='Set1',fmt=".2f",linewidth=0.5)
plt.title('Correlation Heatmap')
plt.show()

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Uber's business objectives are to expand its global transportation and delivery network, provide a reliable and convenient platform for users, and achieve profitability through innovation, safety, and strategic expansion into new markets like freight and food delivery

# **Conclusion**

Based on an analysis of its business model and growth strategy, a conclusion on the Uber project finds that it successfully leveraged technology to disrupt the traditional transportation industry, but continues to face challenges related to profitability, regulation, and ethical issues.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***

In [None]:
Thank you