# Data Visualization

There are a seemingly infinite number of different tools for data visualization in Python. For today, we're going to focus on Matplotlib and Seaborn. 

> Matplotlib is a standard, Python, 2D plotting library (https://matplotlib.org/) <br> 
> Seaborn is also a Python, data visualization library built atop Matplotlib (https://seaborn.pydata.org/)

We'll also delve into some work with geographic plotting using geopandas [bokeh](https://bokeh.pydata.org/en/latest/index.html). 

## Data

Today we are going to use the NYC Vehicle Collisions '[accidents.csv](https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions-Crashes/h9gi-nx95)' dataset again. Remember this curl command is going to take a while, so I recommend just uploading the CSV from Brightspace directly into your Colab environment.

In [None]:
# !curl 'https://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv?accessType=DOWNLOAD' -o accidents.csv

## Dtypes

As usual, we need to take a moment and convert some of our dtypes:

As we did previously, let's create our new DATETIME column, as well as convert "CRASHTIME" and "DATE" to Datetime format. 

---

# ⭕ **QUESTIONS?**

---

## Feature Creation

We also want to create two new columns, one called 'Injury' that hosts a true value if there was at least one injury in an accident, and another column called 'Death' that hosts a true value if there was at least one death in an accident.

## Overplotting

As you can see, when we try to plot or Lat/Long there is clearly an issue...we seem to have overplotted.

To solve, we can create a mask where we are restricting the Lat/Long data to what Google tells us is the bounds of NYC.

This is definitely better. Let's try increasing the figure size, too.

## Addressing Overplotting

Other than using our mask and increasing the figure size, there a few other ways to address overplotting: 

## `sampling` 

We can specify how many points we want to plot by either passing an integer or fraction

## `marker size`

## `marker transparency`

---

# ⭕ **QUESTIONS?**

---

## Histograms, Density Plots, and Contour Plots

The hexbin (Hexagonal Bin Plot) creates a 2-d histogram, where the color signals the number of points within a particular area; The gridsize parameter chooses the size of each bin. 

## Density Plots

## Contour Plots

## Combining plots

We can combine multiple plots using the ax parameter (think of 'ax' as representative of an individual plot). 

## Adding Geographic Boundaries using Bokeh

We'll create a truncated version of our dataset that only has certain columns...

For Bokeh, we'll then cast these columns as lists...

Note: If you want to avoid the "For Dev Purposes Only" message on the following map, go [here](https://developers.google.com/maps/get-started) and follow the instructions to set u pa Google API account.

In [None]:
# https://docs.bokeh.org/en/latest/

import bokeh.io

from bokeh.io import output_file, show, output_notebook
from bokeh.models import *

bokeh.io.output_notebook()


map_options = GMapOptions(lat=40.7128, lng=-74.0060, map_type="roadmap", zoom=11)

plot = GMapPlot(x_range=Range1d(), y_range=Range1d(), map_options=map_options,api_key = "AIzaSyDmyE8tAty-Lhd-rJQvIsGk8ocOIdHwYSE")

source = ColumnDataSource(
    data = dict(
        lat=lat_list,
        lon=lon_list,
        date = date_list,
        time = time_list,
        borough = borough_list, 
        vehicle = vehicle_list
    ))

circle = Circle(x="lon", y="lat", size=15, fill_color="blue", fill_alpha=0.8, line_color=None)
plot.add_glyph(source, circle)

plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool(), BoxZoomTool())

plot.title.text="NYC Accidents"

plot.add_tools(HoverTool(
    tooltips=[
        ( 'date',   '@date' ),
        ( 'time',  '@time' ), 
        ( 'borough', '@borough' ), 
        ( 'vehicle', '@vehicle' )
    ],

    formatters={
        'date' : 'datetime', # use 'datetime' formatter for 'date' field
        'time' : 'printf',
        'borough' : 'numeral',
        'vehicle' : 'numeral'
    },

    mode='vline'
))

#output_file("gmap_plot.html")

bokeh.io.show(plot)

---

# ⭕ **QUESTIONS?**

---

# Example: Analyzing Citibike Station Activity using Pandas

We are going to download 201306-citibike-tripdata.csv from [this AWS s3 bucket](https://s3.amazonaws.com/tripdata/index.html).

---

## Examining Time Series per Station

Let's create a pivot table to examine the time series for individual stations.

---

# Exercise 2:

Let's limit our plot to just two stations:
* Station at "Mercer St & Bleecker St"
* Station at "LaGuardia Pl & W 3 St"

which are nearby and tend to exhibit similar behavior. Remember that the list of stations is [available as a JSON](https://feeds.citibikenyc.com/stations/stations.json) 

In [None]:
# your code here

# Solution

----