## **NYC Taxi Rides**
#### **Chloé Blanchard | chb2132 | 5210 Python**


*Data Provided by the New York City Taxi and Limousine Commission.*

*Full dataset is 170 million taxi trips, 100GB of free space.* 
*Our subset is 0.5% of all trips, about 850,000 rides.*

```
Resources:

Download the dataset from Cyrille Rossant on GitHub: (https://github.com/ipython-books/minibook-2nd-data)
NYC Taxi & Limousine Commission website (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml) 
NYC Gov data description website (http://www.nyc.gov/html/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf)
Markdown basics (http://daringfireball.net/projects/markdown/basics)
```

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
%matplotlib inline


In [None]:
#put your path here

data_filename = '../data/nyc_data.csv' 

In [None]:
data = pd.read_csv(data_filename, 
                   parse_dates=['pickup_datetime', 'dropoff_datetime'] )

*head() method of DataFrames displays the first three lines of the table*

In [None]:
data.head()

###**Get the actual coordinates:**

*Four DataFrame columns*


These four variables are all Series objects:

In [None]:
p_lng = data.pickup_longitude
p_lat = data.pickup_latitude
d_lng = data.dropoff_longitude
d_lat = data.dropoff_latitude

In [None]:
# a Series is an indexed list of values

p_lng.head()

In [None]:
# Get the coordinates of points in pixels from geographical coordinates

def lat_lng_to_pixels(lat, lng):
    lat_rad = lat * np.pi / 180.0
    lat_rad = np.log(np.tan((lat_rad + np.pi / 2.0) / 2.0))
    x = 100 * (lng + 180.0) / 360.0
    y = 100 * (lat_rad - np.pi) / (2.0 * np.pi)
    return (x, y)

In [None]:
# Get pickup coordinates from pickup latitude and longitude

px, py = lat_lng_to_pixels(p_lat, p_lng)
py.head()

*Display a scatter plot of pickup locations*

Matplotlib scatter function makes a scatter plot of x vs y, where x and y are sequence like objects of the same length.

```
Documentation:

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter
(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter)
```

In [None]:
plt.scatter(px, py)

### **Customize our plot:**
- Make markers smaller
- Make fewer points by making some points transparent
- Zoom in around Manhattan
- Make figure bigger
- Don't display the axes

*plt or matplotlib.pyplot is a collection of command style functions. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates
the plot with labels, etc. ...*

In [None]:
# Specify the figure size

plt.figure(figsize=(8, 6))

# s argument is used to make the marker size smaller
# alpha specifies opacity

plt.scatter(px, py, s=.1, alpha=0.03)

# equal aspect ratio

plt.axis('equal')

# zoom in

plt.xlim(29.40, 29.55)
plt.ylim(-37.63, -37.54)

# remove the axes

plt.axis('off')

### **Display a histogram of the trip distances**

```pandas Series hist()``` 
Draws histogram of the input Series using Matplotlib.

```numpy linspace()```
Returns evenly spaced numbers over a specified interval.

**Parameters:**

- start - interval start
- stop - interval stop
- num - number of numbers
- numpy linspace()

```
Documentation:
(https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)
(http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.hist.html).
```

In [None]:
bin_array = np.linspace(start=0., stop=10., num=100)
bin_array

In [None]:
data.trip_distance.hist(bins=bin_array)

### **Filter with boolean indexing**

*Select long rides*

In [None]:
data.loc[data.trip_distance > 100]
#End