# PROJECT NAME
# UBER REQUEST DATA- EDA

# PROBLEM STATEMENT
#### Perform EDA on uber supply and demand using pandas on uber request data, the data consist of 6 rows and 6745 enteries, which includes pickup point(Airport, City), status(Trip completed, cancelled, no cars available), request time and drop time.




# SUMMARY
#### The analysis is mainly done to uncover number of rides have been completed, number of rides cancelled either by not availabilty of cars or cancelled by drivers, what time of the day the rides are being taken or what day the maximum rides are completed.

# Github link - https://github.com/Anoushka-Thakur/My-activities

# KNOW YOUR DATA

### IMPORTING LIBRARIES

In [253]:
import pandas as pd


### DATA LOADING

In [254]:
# data loading
print('\n--Data loading and reviewing head and checking missing values--')
uber_df= pd.read_csv("/content/drive/MyDrive/Uber Request Data (2).csv")
print(uber_df.head())





--Data loading and reviewing head and checking missing values--
   Request id Pickup point  Driver id          Status    Request timestamp  \
0         619      Airport        1.0  Trip Completed      11/7/2016 11:51   
1         867      Airport        1.0  Trip Completed      11/7/2016 17:57   
2        1807         City        1.0  Trip Completed       12/7/2016 9:17   
3        2532      Airport        1.0  Trip Completed      12/7/2016 21:08   
4        3112         City        1.0  Trip Completed  13-07-2016 08:33:16   

        Drop timestamp  
0      11/7/2016 13:00  
1      11/7/2016 18:47  
2       12/7/2016 9:58  
3      12/7/2016 22:03  
4  13-07-2016 09:25:47  


### DATASET FIRST REVIEW

In [None]:
# REVIEW
print(uber_df.head())

### DATASET INFORMATION

In [255]:
# display info
print('\n--Info--')
print (uber_df.info())


--Info--
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Request id         6745 non-null   int64  
 1   Pickup point       6745 non-null   object 
 2   Driver id          4095 non-null   float64
 3   Status             6745 non-null   object 
 4   Request timestamp  6745 non-null   object 
 5   Drop timestamp     2831 non-null   object 
dtypes: float64(1), int64(1), object(4)
memory usage: 316.3+ KB
None


### DATA SET ROWS AND COLUMNS COUNT

In [265]:
# COUNT ROWS AND COLUMN
print('\n--Rows and columns--')
print(uber_df.shape)


--Rows and columns--
(6745, 10)


### UNDERSTANDING VARIABLES

In [256]:
# Perform descriptive Analysis
print('\n--Descriptive Analysis')
print(uber_df.describe())



--Descriptive Analysis
        Request id    Driver id
count  6745.000000  4095.000000
mean   3384.644922   149.501343
std    1955.099667    86.051994
min       1.000000     1.000000
25%    1691.000000    75.000000
50%    3387.000000   149.000000
75%    5080.000000   224.000000
max    6766.000000   300.000000


###  DATA PREPROCESSING, CHECKING MISSING VALUES, EXTRACTING INSIGHTS

#### This columns is where data manipulation is done I have extracted request hour from request timestamp, days of the week from request dates, day of the time(morning, noon, evening, night by using def function)

In [262]:
# --- Data Cleaning/Preprocessing ---
# Convert 'Request timestamp' and 'Drop timestamp' to datetime
# Using errors='coerce' will turn unparseable dates into NaT (Not a Time)
uber_df['Request timestamp'] = pd.to_datetime(uber_df['Request timestamp'], errors='coerce')
uber_df['Drop timestamp'] = pd.to_datetime(uber_df['Drop timestamp'], errors='coerce')


uber_df['Driver id'] = uber_df['Driver id'].replace('NA', pd.NA)
uber_df['Driver id'] = uber_df['Driver id'].fillna('No Driver')
print('\n--Missing null values--')
print(uber_df.isnull().sum())

# Extract 'Request Hour'
uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour

# Extract 'Request Day of Week' (Monday=0, Sunday=6)
uber_df['Request Day of Week'] = uber_df['Request timestamp'].dt.dayofweek

# Map to names for better readability
print('\n--Request day name--')
day_names = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}
uber_df['Request Day Name'] = uber_df['Request Day of Week'].map(day_names)
print(uber_df['Request Day Name'])

#extract day of the time (Morning, noon, evening, night, midnight)
# Extract day of the time
print('\n--Extracting day of the time--')
def get_time_slot(hour):
    if 0 <= hour <= 4:
        return 'Midnight'
    elif 5 <= hour <= 9:
        return 'Morning'
    elif 10 <= hour <= 14:
        return 'Noon'
    elif 15 <= hour <= 19:
        return 'Evening'
    else:
        return 'Night'

uber_df['Time Slot'] = uber_df['Request Hour'].apply(get_time_slot)
time_slot_counts = uber_df['Time Slot'].value_counts()

print(time_slot_counts)
pickup_time_slot_counts = uber_df.groupby(['Pickup point', 'Time Slot']).size().unstack(fill_value=0)
display(pickup_time_slot_counts)


--Missing null values--
Request id                0
Pickup point              0
Driver id                 0
Status                    0
Request timestamp      4071
Drop timestamp         5595
Request Hour           4071
Request Day of Week    4071
Request Day Name       4071
Time Slot                 0
dtype: int64

--Request day name--
0          Monday
1          Monday
2       Wednesday
3       Wednesday
4             NaN
          ...    
6740          NaN
6741          NaN
6742          NaN
6743          NaN
6744          NaN
Name: Request Day Name, Length: 6745, dtype: object

--Extracting day of the time--
Time Slot
Night       4626
Morning      841
Evening      704
Noon         371
Midnight     203
Name: count, dtype: int64


Time Slot,Evening,Midnight,Morning,Night,Noon
Pickup point,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Airport,511,97,200,2278,152
City,193,106,641,2348,219


# UNIVARIATE ANALYSIS
#### This analysis is done to showcase individual column distribution
#### The first analysis shows status count of Trip Completed 2831, No Cars Available 2650 Cancelled 1264,
#### the second analysis is done on pickup point distribution there are 3507 in Cityand Airport has 3238 pockup total pickup is 6745.
#### the third analysis is how many request have been made in 24 hours, in total there are 2674 request have been made, keeping in mind hour 00:00 to 4:00 am has least amount of requesthad its midnight.
#### lastly we have day of the week and according to analysis monday and wednesday is usually busy.

In [263]:
# --- 4. Univariate Analysis ---

print("\n--- Status Distribution ---")
print(uber_df['Status'].value_counts())
print("\n--- Pickup Point Distribution ---")
print(uber_df['Pickup point'].value_counts())
print("\n--- Request Hour Distribution ---")
print(uber_df['Request Hour'].value_counts().sort_index())
print("\n--- Request Day of Week Distribution ---")
print(uber_df['Request Day Name'].value_counts())


--- Status Distribution ---
Status
Trip Completed       2831
No Cars Available    2650
Cancelled            1264
Name: count, dtype: int64

--- Pickup Point Distribution ---
Pickup point
City       3507
Airport    3238
Name: count, dtype: int64

--- Request Hour Distribution ---
Request Hour
0.0      32
1.0      28
2.0      30
3.0      31
4.0      82
5.0     171
6.0     171
7.0     150
8.0     159
9.0     190
10.0     92
11.0     81
12.0     81
13.0     59
14.0     58
15.0     67
16.0     71
17.0    168
18.0    205
19.0    193
20.0    195
21.0    186
22.0    107
23.0     67
Name: count, dtype: int64

--- Request Day of Week Distribution ---
Request Day Name
Monday       1367
Wednesday    1307
Name: count, dtype: int64


# BIVARIATE ANALYSIS

#### This analysis shows relationship between two columns and we are performing this analysis between Pickup Point and Status and Request Hour and Status, i have use crosstab function to perform simple cross tabualtion between two columns.
#### The first is relationship between Pickup Point and Status which shows categorically which category has has more weight so trip completed in city are more weigjtage and the least of them is airportas the rides are being cancelled.
#### The next relationship is request hour and status, this shows hourly status, we can see during 18:00 to 20:00 cars were not available.

In [264]:
# --- 5. Bivariate Analysis ---

print("\n--- Relationship between Pickup Point and Status ---")
# Using pd.crosstab to get counts of Status for each Pickup Point
pickup_status_crosstab = pd.crosstab(uber_df['Pickup point'], uber_df['Status'])
print(pickup_status_crosstab)

print("\n--- Relationship between Request Hour and Status ---")
# Using pd.crosstab to see status breakdown by hour
hour_status_crosstab = pd.crosstab(uber_df['Request Hour'], uber_df['Status'])
print(hour_status_crosstab)


--- Relationship between Pickup Point and Status ---
Status        Cancelled  No Cars Available  Trip Completed
Pickup point                                              
Airport             198               1713            1327
City               1066                937            1504

--- Relationship between Request Hour and Status ---
Status        Cancelled  No Cars Available  Trip Completed
Request Hour                                              
0.0                   1                 18              13
1.0                   3                 17               8
2.0                   2                 21               7
3.0                   1                 18              12
4.0                  14                 31              37
5.0                  66                 30              75
6.0                  64                 35              72
7.0                  64                 16              70
8.0                  64                 37              58
9.0    

# CONCLUSION