# Daredevil Module
---

In this lab, we will explore the potential privacy concerns regarding location data that is supposedly anonymous. We will use a modified version of NYC Taxi data (which is made public and can be found [here](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml)) and modified NYC complaints data (found [here](https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Map-Year-to-Date-/2fra-mtpn)).

Based on the fictional Marvel superhero Daredevil (originally an environmentally-conscious batman), we will use these two datasets to find the identity/location of Daredevil (pretending we do not already know who he is).

While this is a seemingly trivial example, it turns out that knowing just a little bit of information can be combined with a dataset to discover much more than [intended](https://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/).

**We will look at past crime data, and knowing that Daredevil is blind and thus cannot drive himself (assume Uber does not yet exist), must use a taxi to reach crimes far from his home**

*Estimated Time: X minutes*

---

**Topics Covered:**
- Short sentence topic 1
- Short sentence topic 2
- Short sentence topic 3

**Parts:**
- Subheading 1
- Subheading 2
- Subheading 3

**Dependencies:**

In [1]:
# Just run this cell. It imports all of the packages we will use
import numpy as np
from datascience import *
import folium
import helpers

## Loading and processing data

In [2]:
# The lines below will load the data
taxis = Table.read_table("taxi_data_draft.csv")
complaints = Table.read_table("january_complaints.csv")

# Use .show(x) function to show the first x lines of a table
print("Taxi Data:")
taxis.show(5)
print("Complaints Data:")
complaints.show(5)

Taxi Data:


Pickup_latitude,Pickup_longitude,Dropoff_latitude,Dropoff_longitude,Passenger_count,pickup_dt,dropoff_dt
40.83428192138672,-73.852294921875,40.83419799804688,-73.87448120117188,1,2016-01-15 17:58:18,2016-01-15 18:04:28
40.67536163330078,-73.98857879638672,40.67994689941406,-73.99510192871094,1,2016-01-15 20:26:20,2016-01-15 20:33:29
40.79811477661133,-73.95232391357422,40.78320693969727,-73.95670318603516,1,2016-01-09 08:43:10,2016-01-09 08:46:54
40.82011032104492,-73.93660736083984,40.80520629882813,-73.93932342529298,1,2016-01-17 17:48:42,2016-01-17 17:52:24
40.787330627441406,-73.9540786743164,40.79885864257813,-73.96993255615234,1,2016-01-29 15:14:41,2016-01-29 15:24:53


Complaints Data:


OFNS_DESC,PD_DESC,LAW_CAT_CD,BORO_NM,Longitude,Latitude,TIME
FRAUDS,"FRAUD,UNCLASSIFIED-MISDEMEANOR",MISDEMEANOR,QUEENS,-73.7408,40.6536,2016-01-20 08:00:00
PETIT LARCENY,"LARCENY,PETIT FROM BUILDING,UN",MISDEMEANOR,BRONX,-73.8586,40.8881,2016-01-19 06:00:00
FORGERY,"FORGERY,ETC.,UNCLASSIFIED-FELO",FELONY,MANHATTAN,-73.979,40.7601,2016-01-07 16:27:00
FRAUDS,"FRAUD,UNCLASSIFIED-MISDEMEANOR",MISDEMEANOR,MANHATTAN,-73.979,40.7601,2016-01-07 16:27:00
GRAND LARCENY,"LARCENY,GRAND BY DISHONEST EMP",FELONY,MANHATTAN,-73.988,40.7623,2016-01-04 09:00:00


Look at the data given and make sure you understand each column. Think about what parts of the data we might look into and about what parts of the data we might not need. 

We will now start processing the data so we can easily work with the data. 

In [3]:
# This cell will remove columns that are not needed
unneeded_columns_taxi = []


## Visualization

Before we begin trying to find our DareDevil, we will explore some of the visualization tools that we can use to easily see the data. We will be using folium for this purpose as opposed to the built in mapping function in the datascience package for technical reasons. You can look through the folium [quickstart guide](https://folium.readthedocs.io/en/latest/) or use some of the built in helper functions we provide

In [12]:
# This is the syntax to create an empty map centered at coordinates 40.7127,-74.0059
# This is also the coordinates of NYC so you can simply use these coordinates in any other maps for this lab
map_example = folium.Map(location=[40.7128,-74.0059])

# to display the map simply type the name
map_example

In order to start plotting points for the lab, folium uses a class called Markers. You can read more documentation [here](https://folium.readthedocs.io/en/latest/quickstart.html#markers). The basics of folium are displayed below.

In [5]:
# Creating a new marker at coordinates (40.8436, -73.5633)
marker_example = folium.Marker([40.8436, -73.5633])
# adds the marker to the map
marker_example.add_to(map_example)
# Note that there is no easy way to remove a marker once you add it to the map
# If you want reset a map, simply run map_example = folium.Map(location=[40.7128,-74.0059])
# in order to create a new one instead

# display the map
map_example

We have provided a function addMarkers in the helpers.py file (already imported) that you may find useful. This function will automatically add markers to a map from a given table assuming the table has 2 columns called 'Latitude' and 'Longitude'.

In [6]:
# helpers.addMarkers will automatically add up to 100 points from a table
# we add the first 100 complaints data to our map_example
helpers.addMarkers(map_example, complaints)

map_example



In [14]:
# You can also change the color and icon of the markers using the syntax below
helpers.addMarkers(map_example, taxis, color='red', icon='cloud')
# type help(folium.Icon) to get some details of what you can put in color and icon

map_example



## Analyzing Data

We start to actually look into how we are going to analyze the data. We will be looking at the latitude and longitude data from complaints and taxi as well as the times of each table (so if you dropped these columns earlier, go back and change your selection so these columns are included).

Our rationale of the data is that DareDevil uses complaints sent to the NYPD to then go to the location of a crime. Thus, if we look at a crime that Daredevil was present, we expect to find a corresponding taxi that goes to the general area. Then, if we look at where this taxi originated from, we should be (in theory) able to find where Daredevil originates from and thus get closer to identify him.

In [1]:
# CODE

---

## Bibliography

---

#### Notes for Notebook Style:

- Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for Python
- No two cells of successive code or markdown
- Run all cells with no errors
- Clear all cell output before pushing
- Create a binder for the repo on [mybinder.org](http://mybinder.org) and paste the badge to the top of the README markdown file