<a href="https://colab.research.google.com/github/afeld/nyu-python-public-policy/blob/master/lecture_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NYU Wagner - Python Coding for Public Policy**
In this course, we will work extensively with the [311 Service Requests dataset published on NYC Open Data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9).


# Class 0: Opening data and exploring the contents

## Dependencies

In [0]:
!pip install --quiet gdown
import pandas as pd # package that makes it easier to use and manipulate data tables
import gdown # package that allows you to access files in google drive

## LECTURE: Opening data and exploring contents

### Start by importing the 311 Service Requests dataset from Google Drive

A subset of the NYC 311 data is uploaded to Google Drive for you.

In [10]:
import gdown
gdown.download("https://drive.google.com/uc?id=1KvIfY2u6aJ27GhUAImqESzeCczSRjnOH", "311.csv", quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1KvIfY2u6aJ27GhUAImqESzeCczSRjnOH
To: /content/311.csv
359MB [00:01, 226MB/s]


'311.csv'

We'll use pandas to open the CSV file and save it as a DataFrame called `df`. What's a DataFrame? What is `df`? you can name it whatever you want, but many developers use `df`.


In [0]:
df = pd.read_csv('311.csv', header='infer', low_memory=False)

### Preview the data contents using head( ), tail( ), and sample(n)

In [27]:
df.head() # defaults to providing the first 5 if you don't specify a number

Unnamed: 0,Unique Key,Created Date,Closed Date,Agency,Agency Name,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Intersection Street 1,Intersection Street 2,Address Type,City,Landmark,Facility Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,BBL,Borough,X Coordinate (State Plane),Y Coordinate (State Plane),Open Data Channel Type,Park Facility Name,Park Borough,Vehicle Type,Taxi Company Borough,Taxi Pick Up Location,Bridge Highway Name,Bridge Highway Direction,Road Ramp,Bridge Highway Segment,Latitude,Longitude,Location
0,44988339,11/22/2019 12:00:05 AM,11/22/2019 03:37:08 AM,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10032.0,525 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,AMSTERDAM AVENUE,AUDUBON AVENUE,,NEW YORK,WEST 169 STREET,,Closed,,The Police Department responded to the complai...,11/22/2019 08:37:13 AM,12 MANHATTAN,1021260000.0,MANHATTAN,1001543.0,245653.0,MOBILE,Unspecified,MANHATTAN,,,,,,,,40.840919,-73.937502,"(40.840919022450876, -73.93750151244578)"
1,44988338,11/22/2019 12:00:29 AM,11/22/2019 03:38:07 AM,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10032.0,525 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,AMSTERDAM AVENUE,AUDUBON AVENUE,,NEW YORK,WEST 169 STREET,,Closed,,The Police Department responded to the complai...,11/22/2019 08:38:17 AM,12 MANHATTAN,1021260000.0,MANHATTAN,1001543.0,245653.0,MOBILE,Unspecified,MANHATTAN,,,,,,,,40.840919,-73.937502,"(40.840919022450876, -73.93750151244578)"
2,44981960,11/22/2019 12:00:52 AM,11/22/2019 03:36:45 AM,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10032.0,525 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,AMSTERDAM AVENUE,AUDUBON AVENUE,,NEW YORK,WEST 169 STREET,,Closed,,The Police Department responded to the complai...,11/22/2019 08:37:00 AM,12 MANHATTAN,1021260000.0,MANHATTAN,1001543.0,245653.0,MOBILE,Unspecified,MANHATTAN,,,,,,,,40.840919,-73.937502,"(40.840919022450876, -73.93750151244578)"
3,44984911,11/22/2019 12:01:21 AM,11/22/2019 12:56:20 AM,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,10025.0,50 MANHATTAN AVENUE,MANHATTAN AVENUE,WEST 102 STREET,WEST 103 STREET,WEST 102 STREET,WEST 103 STREET,,NEW YORK,MANHATTAN AVENUE,,Closed,,The Police Department responded to the complai...,11/22/2019 05:56:27 AM,07 MANHATTAN,1018380000.0,MANHATTAN,994403.0,229319.0,MOBILE,Unspecified,MANHATTAN,,,,,,,,40.796098,-73.963331,"(40.79609788727609, -73.96333083470218)"
4,44986039,11/22/2019 12:01:22 AM,11/22/2019 07:04:27 AM,NYPD,New York City Police Department,Illegal Parking,Blocked Bike Lane,Street/Sidewalk,10451.0,255 EAST 138 STREET,EAST 138 STREET,RIDER AVENUE,3 AVENUE,RIDER AVENUE,3 AVENUE,,BRONX,EAST 138 STREET,,Closed,,The Police Department responded to the complai...,11/22/2019 12:04:48 PM,01 BRONX,2023338000.0,BRONX,1004278.0,234888.0,ONLINE,Unspecified,BRONX,,,,,,,,40.811366,-73.927649,"(40.811366393480526, -73.92764911000482)"


In [28]:
df.tail(15) # last 15 records in the dataframe

Unnamed: 0,Unique Key,Created Date,Closed Date,Agency,Agency Name,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Intersection Street 1,Intersection Street 2,Address Type,City,Landmark,Facility Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,BBL,Borough,X Coordinate (State Plane),Y Coordinate (State Plane),Open Data Channel Type,Park Facility Name,Park Borough,Vehicle Type,Taxi Company Borough,Taxi Pick Up Location,Bridge Highway Name,Bridge Highway Direction,Road Ramp,Bridge Highway Segment,Latitude,Longitude,Location
641180,45874527,03/21/2020 11:51:19 PM,03/22/2020 01:33:28 AM,NYPD,New York City Police Department,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10002.0,40 RIVINGTON STREET,RIVINGTON STREET,FORSYTH STREET,ELDRIDGE STREET,FORSYTH STREET,ELDRIDGE STREET,,NEW YORK,RIVINGTON STREET,,Closed,,The Police Department responded to the complai...,03/22/2020 05:33:30 AM,03 MANHATTAN,1004210000.0,MANHATTAN,986720.0,201924.0,MOBILE,Unspecified,MANHATTAN,,,,,,,,40.720911,-73.991089,"(40.720911110528775, -73.9910892685573)"
641181,45873519,03/21/2020 11:51:33 PM,,NYPD,New York City Police Department,Blocked Driveway,No Access,Street/Sidewalk,10466.0,4121 DE REIMER AVENUE,DE REIMER AVENUE,EDENWALD AVENUE,BUSSING AVENUE,EDENWALD AVENUE,BUSSING AVENUE,,BRONX,DE REIMER AVENUE,,In Progress,,,,12 BRONX,2050200000.0,BRONX,1027635.0,264964.0,ONLINE,Unspecified,BRONX,,,,,,,,40.893832,-73.843078,"(40.89383211241913, -73.84307774788059)"
641182,45870835,03/21/2020 11:51:49 PM,,NYPD,New York City Police Department,Noise - Residential,Loud Talking,Residential Building/House,10027.0,550 WEST 126 STREET,WEST 126 STREET,OLD BROADWAY,BEND,OLD BROADWAY,BEND,,NEW YORK,WEST 126 STREET,,In Progress,,,,09 MANHATTAN,1019820000.0,MANHATTAN,996150.0,236439.0,ONLINE,Unspecified,MANHATTAN,,,,,,,,40.815638,-73.957009,"(40.81563814243862, -73.95700864831318)"
641183,45873505,03/21/2020 11:52:11 PM,,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,11206.0,390 BUSHWICK AVENUE,BUSHWICK AVENUE,MOORE STREET,VARET STREET,MOORE STREET,VARET STREET,,BROOKLYN,BUSHWICK AVENUE,,In Progress,,,,01 BROOKLYN,3031290000.0,BROOKLYN,1001504.0,195586.0,ONLINE,Unspecified,BROOKLYN,,,,,,,,40.703498,-73.937771,"(40.703498332474695, -73.93777100634578)"
641184,45870310,03/21/2020 11:52:45 PM,,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10453.0,159 WEST 175 STREET,WEST 175 STREET,MONTGOMERY AVENUE,POPHAM AVENUE,MONTGOMERY AVENUE,POPHAM AVENUE,,BRONX,WEST 175 STREET,,In Progress,,,,05 BRONX,2028770000.0,BRONX,1006143.0,248400.0,ONLINE,Unspecified,BRONX,,,,,,,,40.848449,-73.920868,"(40.84844852235429, -73.9208677302526)"
641185,45874112,03/21/2020 11:52:51 PM,,NYPD,New York City Police Department,Noise - Residential,Loud Television,Residential Building/House,11102.0,28-05 33 STREET,33 STREET,28 AVENUE,28 ROAD,28 AVENUE,28 ROAD,,ASTORIA,33 STREET,,In Progress,,,,01 QUEENS,4006270000.0,QUEENS,1006793.0,218932.0,ONLINE,Unspecified,QUEENS,,,,,,,,40.767565,-73.918617,"(40.76756541833179, -73.91861726165381)"
641186,45871486,03/21/2020 11:52:53 PM,,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11366.0,73-09 162 STREET,162 STREET,73 AVENUE,75 AVENUE,73 AVENUE,75 AVENUE,,FRESH MEADOWS,162 STREET,,In Progress,,Your complaint has been received by the Police...,03/22/2020 04:15:40 AM,08 QUEENS,4068440000.0,QUEENS,1037741.0,204380.0,MOBILE,Unspecified,QUEENS,,,,,,,,40.727491,-73.807008,"(40.72749084460407, -73.80700753761462)"
641187,45870937,03/21/2020 11:54:08 PM,,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11211.0,383 SOUTH THIRD STREET,SOUTH THIRD STREET,HOOPER STREET,HEWES STREET,HOOPER STREET,HEWES STREET,,BROOKLYN,SOUTH 3 STREET,,In Progress,,,,01 BROOKLYN,3024250000.0,BROOKLYN,997511.0,197493.0,MOBILE,Unspecified,BROOKLYN,,,,,,,,40.708739,-73.952169,"(40.708739481884855, -73.95216856726594)"
641188,45874712,03/21/2020 11:55:03 PM,,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,10456.0,1372 FRANKLIN AVENUE,FRANKLIN AVENUE,EAST 169 STREET,JEFFERSON PLACE,EAST 169 STREET,JEFFERSON PLACE,,BRONX,FRANKLIN AVENUE,,In Progress,,,,03 BRONX,2029330000.0,BRONX,1011726.0,242919.0,MOBILE,Unspecified,BRONX,,,,,,,,40.833389,-73.90071,"(40.8333891772294, -73.90071044228503)"
641189,45873512,03/21/2020 11:55:35 PM,,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11230.0,1495 EAST 10 STREET,EAST 10 STREET,AVENUE N,AVENUE O,AVENUE N,AVENUE O,,BROOKLYN,EAST 10 STREET,,In Progress,,Your complaint has been received by the Police...,03/22/2020 04:39:40 AM,12 BROOKLYN,3065920000.0,BROOKLYN,994264.0,162578.0,ONLINE,Unspecified,BROOKLYN,,,,,,,,40.61291,-73.963932,"(40.61290979614288, -73.9639321049533)"


In [29]:
df.sample(5) # random sample of size determined by you

Unnamed: 0,Unique Key,Created Date,Closed Date,Agency,Agency Name,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Intersection Street 1,Intersection Street 2,Address Type,City,Landmark,Facility Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,BBL,Borough,X Coordinate (State Plane),Y Coordinate (State Plane),Open Data Channel Type,Park Facility Name,Park Borough,Vehicle Type,Taxi Company Borough,Taxi Pick Up Location,Bridge Highway Name,Bridge Highway Direction,Road Ramp,Bridge Highway Segment,Latitude,Longitude,Location
541364,45754828,03/03/2020 10:43:19 PM,03/04/2020 09:25:00 PM,DOT,Department of Transportation,Street Condition,Pothole,,10003.0,WASHINGTON PLACE,WASHINGTON PLACE,BROADWAY,MERCER STREET,,,BLOCKFACE,NEW YORK,,,Closed,,The Department of Transportation inspected thi...,03/04/2020 09:25:00 PM,02 MANHATTAN,,MANHATTAN,985872.0,204946.0,UNKNOWN,Unspecified,MANHATTAN,,,,,,,,40.729206,-73.994148,"(40.72920596301535, -73.9941477710689)"
563111,45780436,03/07/2020 07:26:56 PM,03/08/2020 06:14:36 PM,HPD,Department of Housing Preservation and Develop...,HEAT/HOT WATER,ENTIRE BUILDING,RESIDENTIAL BUILDING,10467.0,3414 KNOX PLACE,KNOX PLACE,,,,,ADDRESS,BRONX,,,Closed,,The Department of Housing Preservation and Dev...,03/08/2020 06:14:36 PM,07 BRONX,2033240000.0,BRONX,1016402.0,260533.0,MOBILE,Unspecified,BRONX,,,,,,,,40.881719,-73.883728,"(40.88171850260691, -73.88372839111564)"
487319,45673299,02/22/2020 02:23:34 AM,02/22/2020 06:53:03 AM,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11103.0,43-03 BROADWAY,BROADWAY,43 STREET,44 STREET,43 STREET,44 STREET,,ASTORIA,BROADWAY,,Closed,,The Police Department reviewed your complaint ...,02/22/2020 11:53:09 AM,01 QUEENS,4006930000.0,QUEENS,1007392.0,215372.0,ONLINE,Unspecified,QUEENS,,,,,,,,40.757793,-73.916467,"(40.75779259130661, -73.9164670619896)"
265234,45359551,01/10/2020 02:03:35 PM,01/16/2020 01:50:04 PM,DPR,Department of Parks and Recreation,Damaged Tree,Branch or Limb Has Fallen Down,Street,11204.0,1631 44 STREET,44 STREET,16 AVENUE,17 AVENUE,16 AVENUE,17 AVENUE,,BROOKLYN,44 STREET,,Closed,,The Department of Parks and Recreation visited...,01/16/2020 06:50:06 PM,12 BROOKLYN,3053790000.0,BROOKLYN,989330.0,170007.0,PHONE,Unspecified,BROOKLYN,,,,,,,,40.633305,-73.981698,"(40.63330504997503, -73.98169753351674)"
532612,45739792,03/02/2020 11:05:00 AM,03/02/2020 02:01:00 PM,DOT,Department of Transportation,Street Light Condition,Street Light Out,,11226.0,,,,,CLARKSON AVENUE,NOSTRAND AVENUE,INTERSECTION,BROOKLYN,,,Closed,,Service Request status for this request is ava...,03/02/2020 02:01:00 PM,09 BROOKLYN,,BROOKLYN,998105.0,178069.0,UNKNOWN,Unspecified,BROOKLYN,,,,,,,,40.655424,-73.950066,"(40.655424121152336, -73.95006599184083)"


### How many records are in the dataset?

#### size method

In [30]:
# how many cells are there in the data table?
df.size

26288995

In [31]:
# what if I only care about how many rows there are?
# the columns in the dataframe are like a list. 
# you can use a column name as an index to get one column from the dataframe

# size does include of null (empty) values
df['Facility Type'].size

641195

#### count( ) method

You can also use the `count()` function, which gives the count of values per column. `count()` doesn't include of null (empty) values.

In [33]:
df.count()

Unique Key                        641195
Created Date                      641195
Closed Date                       584568
Agency                            641195
Agency Name                       641195
Complaint Type                    641195
Descriptor                        630744
Location Type                     445669
Incident Zip                      620625
Incident Address                  596773
Street Name                       596757
Cross Street 1                    418926
Cross Street 2                    418405
Intersection Street 1             375431
Intersection Street 2             375396
Address Type                      317489
City                              593305
Landmark                          305005
Facility Type                      20575
Status                            641195
Due Date                            7133
Resolution Description            608834
Resolution Action Updated Date    622428
Community Board                   641195
BBL             

To just get the count in the "unique key" column:

In [35]:
df['Unique Key'].count()

641195

#### info( ) method

In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 641195 entries, 0 to 641194
Data columns (total 41 columns):
Unique Key                        641195 non-null int64
Created Date                      641195 non-null object
Closed Date                       584568 non-null object
Agency                            641195 non-null object
Agency Name                       641195 non-null object
Complaint Type                    641195 non-null object
Descriptor                        630744 non-null object
Location Type                     445669 non-null object
Incident Zip                      620625 non-null float64
Incident Address                  596773 non-null object
Street Name                       596757 non-null object
Cross Street 1                    418926 non-null object
Cross Street 2                    418405 non-null object
Intersection Street 1             375431 non-null object
Intersection Street 2             375396 non-null object
Address Type                      

### What are the distinct sets of values in columns that seem most useful?

#### set () function for getting a set of unique values

Let's look at the "status" column. What are the status options for these 311 complaints?

In [39]:
set(df['Status'])

{'Assigned', 'Closed', 'In Progress', 'Open', 'Pending', 'Started'}

In [40]:
set(df['Open Data Channel Type'])


{'MOBILE', 'ONLINE', 'OTHER', 'PHONE', 'UNKNOWN'}

In [41]:
set(df['Agency'])

{'DCA',
 'DEP',
 'DHS',
 'DOB',
 'DOE',
 'DOHMH',
 'DOITT',
 'DOT',
 'DPR',
 'DSNY',
 'EDC',
 'HPD',
 'NYPD',
 'TLC'}

In [42]:
set(df['Complaint Type'])

{'APPLIANCE',
 'Abandoned Vehicle',
 'Air Quality',
 'Animal Facility - No Permit',
 'Animal in a Park',
 'Animal-Abuse',
 'Asbestos',
 'BEST/Site Safety',
 'Beach/Pool/Sauna Complaint',
 'Bike Rack Condition',
 'Bike/Roller/Skate Chronic',
 'Blocked Driveway',
 'Boilers',
 'Borough Office',
 'Bottled Water',
 'Bridge Condition',
 'Broken Parking Meter',
 'Building Condition',
 'Building Marshals office',
 'Building/Use',
 'Bus Stop Shelter Complaint',
 'Bus Stop Shelter Placement',
 'Calorie Labeling',
 'Construction Lead Dust',
 'Consumer Complaint',
 'Cooling Tower',
 'Cranes and Derricks',
 'Curb Condition',
 'DEP Street Condition',
 'DOOR/WINDOW',
 'Damaged Tree',
 'Day Care',
 'Dead/Dying Tree',
 'Dept of Investigations',
 'Derelict Bicycle',
 'Derelict Vehicles',
 'Dirty Conditions',
 'Disorderly Youth',
 'Drinking',
 'Drinking Water',
 'Drug Activity',
 'ELECTRIC',
 'ELEVATOR',
 'Electrical',
 'Elevator',
 'Emergency Response Team (ERT)',
 'Employee Behavior',
 'Executive Inspe

FYI you can also use `df['column_name'].unique()` to get a list of unique values.

## HOMEWORK 1

### Tutorials
* Tutorials from Mode:
 * [Python Basics: Lists, Dictionaries, & Booleans](https://mode.com/python-tutorial/python-basics). NOTE: Remember we're using Google Colab, not Mode to run python. Ignore the “Using Mode Python Notebooks” section and run these code snippets in Google Colab instead. We're also using Python 3 instead of Python 2, so use parentheses with the print function.
 * [Python Methods, Functions, & Libraries](https://mode.com/python-tutorial/python-methods-functions-and-libraries). NOTE: Remember we're using Google Colab, not Mode to run python. Ignore the “Using Mode Python Notebooks” section and run these code snippets in Google Colab instead. We're also using Python 3 instead of Python 2, so use parentheses with the print function.
* [Data aggregation using Pandas](https://data36.com/pandas-tutorial-2-aggregation-and-grouping/) Ignore the "Before we start" section. You can load the dataset to code along using `zoo = pd.read_csv('http://46.101.230.157/datacoding101/zoo.csv', delimiter = ',')` Remember to import pandas and numpy first.

### Coding
* Create a Google Colab notebook called “HW1”. In it,  
  1. Read in the 311 dataset and save it as "df". Remember there are packages you need to import in order for this to work.
  2. Answer these two questions about the data, showing your code that produced the result in two individual code cells: 
    - 1) What is the minimum value in the "Created Date" column?
    - 2) What is the maximum value in the "Created Date" column?
  3. Create a text cell where you will briefly discuss the min and max results: does anything about the resulting min and max values surprise you? What do think causes this? 
    - Hint: Look at the year. If you use .head() and .tail() you can find the real min and max dates. We'll learn more about how to properly handle dates in a later lecture.
  4. Create another text cell and write at least 1 question you have from doing the tutorials.
  5. Once finished, upload the Google Colab notebook to your “nyu-python-public-policy” GitHub repository and then submit the link to your repository via NYU Classes.
  
### Resources to Reference
* [Today's setup example notebook](https://colab.research.google.com/drive/1nP_4NfBpHfGbguAosam7ECPqrB5Qq9ZL)
* [Lecture 0 notebook](https://drive.google.com/file/d/1J0Wa8dOaGb9vd9dhppmuthF7Fshf0Qxi/view?usp=sharing)
* [Lecture 0 slides](https://docs.google.com/presentation/d/1XlBJ3pPjQpWOOo2098-1_B9iTWu-DEQ_HPNdNty_H7s/edit?usp=sharing)
