# Data Analysis of cab rides to and fro NYC airport

This is a basic data analysis project made using Python. The library used is NumPy.

Here we have used data set fetched from (https://drive.google.com/file/d/1Zu1xLzjcRKgWJ-B2k1j4BdMVFutkiqic/view). 

Our data set contains pickup year, month, time, pickup location, drop off location, distance between the locations, time taken and the fares and tips of the rides.

Using this data we will answer a few questions like:
* Mean speed of our cab rides.
* Rides in a particular month
* No. of times where drivers were tipped over a certain amount
* Months where drivers were tipped this amount
* No. of times riders were dropped off to the airport.

Use the "Run" button to execute the code.

<p><img src=https://cdn.vox-cdn.com/thumbor/b3d9ki9vlWdYuGTkx0ojgKA7WcQ=/0x0:640x427/920x613/filters:focal(0x0:640x427):format(webp)/cdn.vox-cdn.com/assets/1335956/taxis_nyc.jpg)></p>

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [3]:
# Execute this to save new versions of the notebook
jovian.commit(project="numpy-cabs-from-jfk-nyc-taxi")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kinnarkarchinmaya/numpy-cabs-from-jfk-nyc-taxi" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kinnarkarchinmaya/numpy-cabs-from-jfk-nyc-taxi[0m


'https://jovian.ai/kinnarkarchinmaya/numpy-cabs-from-jfk-nyc-taxi'

In [4]:
import numpy as np

In [5]:
taxi = np.genfromtxt('nyc_taxis.csv',delimiter = ',',skip_header=True)

In [6]:
taxi

array([[2.016e+03, 1.000e+00, 1.000e+00, ..., 1.165e+01, 6.999e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 1.000e+00, ..., 8.000e+00, 5.430e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 1.000e+00, ..., 0.000e+00, 3.780e+01,
        2.000e+00],
       ...,
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 5.000e+00, 6.334e+01,
        1.000e+00],
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 8.950e+00, 4.475e+01,
        1.000e+00],
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 0.000e+00, 5.484e+01,
        2.000e+00]])

**Q1: Calculate the mean speed of taxis**

Hint: use taxi_length column to calculate time

In [7]:
#Speed = distance / time
# taxi_distance is in km and taxi_length is in sec
speed = taxi[:,7] / (taxi[:,8]/3600)

In [8]:
mean_speed = np.mean(speed)
mean_speed

32.24258580925573

In [9]:
mean_speed = speed.mean()
mean_speed

32.24258580925573

**Q2: Calculate the rides in the month of February**

In [10]:
# Using inbuilt numpy function count_nonzero() 

month = taxi[:,1]
feb_rides = np.count_nonzero(month == 2)

In [11]:
print(f"The number of rides in month of February are {feb_rides}.")

The number of rides in month of February are 13333.


In [12]:
# Using indexing and slicing. Here we have sliced the boolean and passed through nd array to get values.
# We have also used shape function to find the no.of rides

rides_feb = taxi[taxi[:,1]==2,1].shape[0]

In [13]:
print(f"The number of rides in month of February are {rides_feb}.")

The number of rides in month of February are 13333.


**Q3: Calculate number of rides with tip greater than `$50`**

In [14]:
tips_over_50=taxi[taxi[:,12] >=50,12].shape[0]

In [15]:
print("Number of rides with tips greater than $50 is {}".format(tips_over_50))

Number of rides with tips greater than $50 is 20


In [16]:
tips = taxi[taxi[:,12]>50,12].shape[0]

In [17]:
print(f"Correct ans is {tips}")

Correct ans is 16


## Tips difference
>The first answer i.e 20 was printed because we used `>=`sign whereas the question asked greater than 50. This means there are **4** rides where tip was exact $50

**Q4: Find out the months where tip was greater than $50**

In [18]:
months,counts=np.unique(taxi[taxi[:,12]>=50,1],return_counts=True)
dict(zip(months,counts))

{1.0: 5, 2.0: 1, 3.0: 3, 4.0: 1, 5.0: 4, 6.0: 6}

**Q5: Calculate the number of rides where drop off location is JFK**

`Hint` : Drop off code for JFK is 2

In [19]:
jfk_drop = taxi[taxi[:,6] == 2,6].shape[0]

In [20]:
print(f"The number of rides droped off to JFK were {jfk_drop}.")

The number of rides droped off to JFK were 11832.


## *Where to go from here?*

We can extend this project by finding:
 
* Busiest month for cab drivers
* Most tipped routes
* Least tipped routes
* Longest time taken in a ride
* Shortest time taken to reach the destination

We can also show a visual representation of the findings through libraries like matplotlib and seaborn
Once done with this dataset, we can also extend this to any other city in USA or any part of the world.

#### Hope you enjoyed this project!

# *Thank You*

 -[Chinmaya Kinnarkar](https://in.linkedin.com/in/chinmaya-kinnarkar-ab0620ba)
  
  [Github](https://github.com/ChinmayaKinnarkar)

In [21]:
jovian.commit(project="numpy-cabs-from-jfk-nyc-taxi")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "kinnarkarchinmaya/numpy-cabs-from-jfk-nyc-taxi" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/kinnarkarchinmaya/numpy-cabs-from-jfk-nyc-taxi[0m


'https://jovian.ai/kinnarkarchinmaya/numpy-cabs-from-jfk-nyc-taxi'