# **NYC** **Taxi** **Data** **Analysis**

This code conducts a basic analysis on a dataset of NYC taxi information loaded from a CSV file using the NumPy library. The dataset is assumed to contain various details about taxi rides, such as distance traveled, time taken, tip amount, and payment type. Below is an overview of each section

## **Loading Data:**

* The NumPy library is used to load data from the CSV file (nyc_taxis.csv).
* The delimiter=' , ' parameter specifies that the data is comma-separated, and skip_header=True skips the header row.

In [9]:
# Importing the necessary library
import numpy as np

# Loading NYC taxi data from a CSV file
taxi = np.genfromtxt('/nyc_taxis.csv', delimiter = ',', skip_header = True)

In [10]:
# Calculating the speed for each ride in miles per hour
speed = taxi[:, 7]/ (taxi[:, 8]/3600)

## **Calculating Mean Speed:**

* The code calculates the speed for each ride by dividing the distance (taxi[:, 7]) by the time taken (taxi[:, 8]), converted from seconds to hours.
* The mean speed of all rides is then computed using NumPy's mean() function.

In [11]:
# Calculating Mean Speed of all rides
mean_speed = speed.mean()
print(mean_speed)

32.24258580925573


## **Counting Rides in February:**

* Rides occurring in February are identified by filtering rows where the second column (assuming columns are zero-indexed) has a value of 2.
* The total number of rides in February is printed.

In [14]:
# Filtering and counting the number of rides taken in February
rides_feb = taxi[taxi[:, 1] == 2, 1]
print(rides_feb.shape[0])

13333


## **Counting Rides with Tip > $50:**

* Rides with a tip amount greater than $50 are identified by filtering rows where the third-to-last column has a value exceeding $50.
* The total number of such rides is printed.

In [18]:
 # Filtering and counting the number of rides with a tip greater than $50
print(taxi[taxi[:, -3] > 50, -3].shape[0])

16


## **Counting Rides with Payment Type 2:**

* Rides with a payment type of 2 are identified by filtering rows where the seventh column represents payment type.
* The total number of rides with payment type 2 is printed.

In [17]:
# Filtering and counting the number of rides with payment type 2
print(taxi[taxi[:, 6] == 2, 6].shape[0])


11832
