# Data munge of the SF Police Departments Incidents 

This notebook will:

* Select the incidentes categories that are a threat to tourists
* Save them in appropriat formats for use by the webpage

In [1]:
import numpy as np
from pandas import DataFrame
import pandas as pd

## Overview

In [2]:
sf_df = pd.read_csv('PDI_2016.csv')
sf_df.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,160021160,PROSTITUTION,INMATE/KEEPER OF HOUSE OF PROSTITUTION,Friday,01/08/2016 12:00:00 AM,12:00,CENTRAL,"ARREST, BOOKED",300 Block of KEARNY ST,-122.404195,37.791226,"(37.7912255174119, -122.404195424821)",16002116013010
1,160021160,PROSTITUTION,HUMAN TRAFFICKING,Friday,01/08/2016 12:00:00 AM,12:00,CENTRAL,"ARREST, BOOKED",300 Block of KEARNY ST,-122.404195,37.791226,"(37.7912255174119, -122.404195424821)",16002116013045
2,160021160,PROSTITUTION,SOLICITS FOR ACT OF PROSTITUTION,Friday,01/08/2016 12:00:00 AM,12:00,CENTRAL,"ARREST, BOOKED",300 Block of KEARNY ST,-122.404195,37.791226,"(37.7912255174119, -122.404195424821)",16002116013060
3,160021160,OTHER OFFENSES,MASSAGE ESTABLISHMENT PERMIT VIOLATION,Friday,01/08/2016 12:00:00 AM,12:00,CENTRAL,"ARREST, BOOKED",300 Block of KEARNY ST,-122.404195,37.791226,"(37.7912255174119, -122.404195424821)",16002116030011
4,160021160,NON-CRIMINAL,SEARCH WARRANT SERVICE,Friday,01/08/2016 12:00:00 AM,12:00,CENTRAL,"ARREST, BOOKED",300 Block of KEARNY ST,-122.404195,37.791226,"(37.7912255174119, -122.404195424821)",16002116075025


## Filter relevant columns

Not all incidents type are a threat to tourists. For instance, burglary is not a major problem for non-residents

In [3]:
sf_df.Category.unique()

array(['PROSTITUTION', 'OTHER OFFENSES', 'NON-CRIMINAL', 'LARCENY/THEFT',
       'WARRANTS', 'DRUG/NARCOTIC', 'SECONDARY CODES', 'BURGLARY',
       'VEHICLE THEFT', 'ASSAULT', 'VANDALISM', 'ROBBERY', 'FRAUD',
       'MISSING PERSON', 'SUSPICIOUS OCC', 'STOLEN PROPERTY',
       'DRIVING UNDER THE INFLUENCE', 'WEAPON LAWS', 'DRUNKENNESS',
       'TRESPASS', 'BAD CHECKS', 'FORGERY/COUNTERFEITING',
       'SEX OFFENSES, FORCIBLE', 'DISORDERLY CONDUCT', 'RECOVERED VEHICLE',
       'RUNAWAY', 'ARSON', 'LOITERING', 'LIQUOR LAWS', 'KIDNAPPING',
       'EXTORTION', 'FAMILY OFFENSES', 'EMBEZZLEMENT', 'BRIBERY',
       'SEX OFFENSES, NON FORCIBLE', 'PORNOGRAPHY/OBSCENE MAT', 'SUICIDE',
       'GAMBLING', 'TREA'], dtype=object)

In [4]:
crimes_tourist_dont_care = [
    'OTHER OFFENSES', 
    'NON-CRIMINAL', 
    'WARRANTS',
    'SECONDARY CODES', 
    'BURGLARY',
    'VANDALISM', 
    'FRAUD',
    'SUSPICIOUS OCC', 
    'DRUNKENNESS',
    'TRESPASS', 
    'BAD CHECKS', 
    'FORGERY/COUNTERFEITING',
    'DISORDERLY CONDUCT', 
    'RECOVERED VEHICLE',
    'RUNAWAY', 
    'LOITERING', 
    'LIQUOR LAWS',
    'EXTORTION', 
    'FAMILY OFFENSES', 
    'EMBEZZLEMENT', 
    'BRIBERY',
    'SEX OFFENSES, NON FORCIBLE', 
    'PORNOGRAPHY/OBSCENE MAT', 
    'SUICIDE',
    'GAMBLING', 
    'TREA'
]

In [5]:
for crime in crimes_tourist_dont_care:
    
    sf_df = sf_df[sf_df.Category.str.contains(crime) == False]

sf_df = sf_df.ix[:, ['Category', 'DayOfWeek', 'Date', 'Time', 'PdDistrict', 'Resolution', 'Address', 'Location']]
#sf_df = sf_df.ix[:, ['Category', 'Y', 'X']]
sf_df.Category.unique()

array(['PROSTITUTION', 'LARCENY/THEFT', 'DRUG/NARCOTIC', 'VEHICLE THEFT',
       'ASSAULT', 'ROBBERY', 'MISSING PERSON', 'STOLEN PROPERTY',
       'DRIVING UNDER THE INFLUENCE', 'WEAPON LAWS',
       'SEX OFFENSES, FORCIBLE', 'ARSON', 'KIDNAPPING'], dtype=object)

## Rename categories

Rename some categories to make identification of the type of crime easier

In [6]:
sf_df = sf_df.replace({'Category': {'LARCENY/THEFT': 'THEFT'} })
sf_df = sf_df.replace({'Category': {'DRUG/NARCOTIC': 'DRUG'} })

## Crime locations

This is the current solution to plot the safety circle. **It will be deprecated in the near future**.

In [7]:
locations_df = sf_df.ix[:, ['Y', 'X']]
locations = locations_df.as_matrix()

### Save js arrays

save locations in javascript arrays. **This is a poor solution and it will be deprecated in the future**.

In [8]:
with open('../../js/crimeLatitudes.js', 'w') as f:
    f.write('var crimeLatitudes = [\n')
    for location in locations[:-1]:
        f.write('{0},\n'.format(location[0]))
    f.write('{0}\n'.format(locations[-1][0])) # No comma
    f.write(']')

In [9]:
with open('../../js/crimeLongitudes.js', 'w') as f:
    f.write('var crimeLongitudes = [\n')
    for location in locations[:-1]:
        f.write('{0},\n'.format(location[1]))
    f.write('{0}\n'.format(locations[-1][1])) # No comma
    f.write(']')

## New solution

Save the dataframe of interest as a json. The dataframe will have more columns besides 'Y' and 'X'.

In [10]:
sf_df.to_json(
    path_or_buf='sfCrimeTourist2016.json', 
    orient='records')

In [13]:
sf_df.Location

0         (37.7912255174119, -122.404195424821)
1         (37.7912255174119, -122.404195424821)
2         (37.7912255174119, -122.404195424821)
5         (37.7868516471087, -122.421441956086)
8         (37.7816542806076, -122.415508242782)
9         (37.7816542806076, -122.415508242782)
11        (37.7852071918419, -122.406690592261)
14        (37.7857828233879, -122.428151140162)
17        (37.7830791191471, -122.393622739343)
18         (37.786137125238, -122.419858277339)
20        (37.7640756627972, -122.403399653371)
23        (37.7830791191471, -122.393622739343)
24         (37.781863316866, -122.414302052544)
28        (37.7902088255652, -122.423156806774)
30        (37.7817511307229, -122.411071423064)
31        (37.7817511307229, -122.411071423064)
32        (37.7980205895825, -122.435679786498)
34          (37.78123525699, -122.418809581375)
36         (37.742315306831, -122.388192921769)
41         (37.775420706711, -122.403404791479)
42        (37.7745145380854, -122.452839