## CRQ2: Visualize Taxis movements! 

"NYC is divided in many Taxis zones. For each yellow cab trip we know the zone the Taxi pick up and drop off the users. "
We are going to visualize on a chropleth map:
* the number of trips that starts in each zone. 
* the number of trips that ends in each zone. 
We would like to compare these maps, as well. For that, we create the third map with the number of trips which are differences between 'start_trips' and 'end_trips'.

To perform this task we use the library folium.
The Geojson we use to trace the zones is taxi_zones.json in the Homework's repository.

In [1]:
import folium
import json
import requests
import pandas as pd
import CRQ2_functions

We download the 'taxi_zones.json' file from the Homework's repository.
We will also use data previously prepared and cleaned during the previous EDA (filtration.py). 

In [14]:
geojson_url = "https://github.com/CriMenghini/ADM-2018/raw/master/Homework_2/taxi_zones.json"
response = requests.get(geojson_url)
geojson = json.loads(response.text)

In [10]:
all_data_df = CRQ2_functions.get_cleaned_data()

cleaned_yellow_tripdata_2018-01.csv
cleaned_yellow_tripdata_2018-02.csv
cleaned_yellow_tripdata_2018-03.csv
cleaned_yellow_tripdata_2018-04.csv
cleaned_yellow_tripdata_2018-05.csv
cleaned_yellow_tripdata_2018-06.csv


In [15]:
taxi_zone_lookup = pd.read_csv("taxi_zone_lookup.csv")

To calculate the number of trips that starts and ends in each zone we need to:
* join on LocationID the data with taxi_zone_lookup file 
* group by the column "Zone" and count the number of trips.

Then, running the function **prepare_and_draw_colormap** (CRQ2_functions.py), we create a chropleth map colored by the amount of the trips.

In [16]:
start_trip_location = pd.merge(all_data_df, taxi_zone_lookup, left_on='PULocationID' , right_on='LocationID')
end_trip_location = pd.merge(all_data_df, taxi_zone_lookup, left_on='DOLocationID' , right_on='LocationID')

start_grouped_by_zone = start_trip_location[['Zone', 'PULocationID']].groupby(['Zone']).agg({'count'})
end_grouped_by_zone = end_trip_location[['Zone', 'DOLocationID']].groupby(['Zone']).agg({'count'})

________
* To see the map colored by the number of 'start_trips', please open the file **"Start_trips_map.html"** from a repository.

In [182]:
start_map = CRQ2_functions.prepare_and_draw_colormap(start_grouped_by_zone, column = 'PULocationID', file='Start_trips_map.html')

'Freshkills Park'


_________
* To see the map colored by the number of 'start_trips', please open the file **"End_trips_map.html"** from a repository.

In [183]:
end_map = CRQ2_functions.prepare_and_draw_colormap(end_grouped_by_zone, column = 'DOLocationID', file='End_trips_map.html')

'Great Kills Park'


______________
In order to visualise the differences between these maps, we calculate the difference between the number of trips that starts and ends in each zone.

**'difference'** = **number_of_trips that starts**  -  **number_of_trips that ends** 

In [181]:
joined_grouped_by_zone = start_grouped_by_zone.join(end_grouped_by_zone)
joined_grouped_by_zone['difference'] = joined_grouped_by_zone['PULocationID'] - joined_grouped_by_zone['DOLocationID']

joined_grouped_by_zone.sort_values(['difference'], ascending=False).head()

Unnamed: 0_level_0,PULocationID,DOLocationID,difference
Unnamed: 0_level_1,count,count,Unnamed: 3_level_1
Zone,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
LaGuardia Airport,1497471,583484.0,913987.0
JFK Airport,1219862,398411.0,821451.0
Penn Station/Madison Sq West,1775762,1385359.0,390403.0
Garment District,1102072,822772.0,279300.0
Upper East Side South,2216156,1940867.0,275289.0


In [189]:
joined_grouped_by_zone.sort_values(['difference'], ascending=True).head()

Unnamed: 0_level_0,PULocationID,DOLocationID,difference
Unnamed: 0_level_1,count,count,Unnamed: 3_level_1
Zone,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
East Harlem South,379709,642722.0,-263013.0
West Chelsea/Hudson Yards,753557,981386.0,-227829.0
East Harlem North,212792,418717.0,-205925.0
Two Bridges/Seward Park,88200,250158.0,-161958.0
Astoria,85583,231153.0,-145570.0


* To see the map colored by the differences in number of 'start_trips' and 'end_trips', please open the file **"Diff_trips_map.html"** from a repository.

In [184]:
diff_map = CRQ2_functions.prepare_and_draw_colormap(joined_grouped_by_zone, column = 'difference', file='Diff_trips_map.html')

'Freshkills Park'
Thresholds are not sorted.


### Basic Results
________
**We noticed that at LaGuardia and JFK Airport airports is the biggest difference between pick-ups and drop-offs, with more pick-ups**. This seems to be a sensible result as there are airports, where the taxi traffic is usually high.


**In the case of East Harlem South, West Chelsea / Hudson Yards, East Harlem North the difference is also significant, but with more drop-offs than pickups.**

**We also observed high taxi traffic in most of the zones in the Manhattan area.**