# San Francisco and Seattle -  Criminal offences similarities


## Methodology

This iPython notebook aims to try to discover, if any, similarities between the criminal patterns by month and the offences per capita (when comparable) betwen San Francisco and Seattle, in the months of summer of 2014. 

For that aim, the code below will be using, each step described accordingly.

## Development

In [158]:
import numpy as np
import pandas as pd
import plotly.plotly as py
from plotly.graph_objs import *
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
import warnings
init_notebook_mode()
warnings.filterwarnings('ignore')
month = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
         'August', 'September', 'October', 'November', 'December']

In [113]:
san_francisco = pd.read_csv("https://raw.githubusercontent.com/uwescience/datasci_course_materials/master/assignment6/sanfrancisco_incidents_summer_2014.csv")
seattle = pd.read_csv("https://raw.githubusercontent.com/uwescience/datasci_course_materials/master/assignment6/seattle_incidents_summer_2014.csv")
san_francisco['month'] = pd.DatetimeIndex(san_francisco['Date']).month

The libraries are imported, the dataset loaded, the month column added to the San Francisco dataset, as it is missing.

In [156]:
type_crimes_sf = san_francisco['Category'].unique()
type_crimes_seattle = seattle['Summarized Offense Description'].unique()
crimes_per_day_sf = san_francisco.groupby(san_francisco['month'])['Category'].count().reset_index()
crimes_per_day_seattle = seattle.groupby(seattle['Month'])['RMS CDW ID'].count().reset_index()
san_francisco_population = 4.335 * 1e6
seattle_population = 662400
common_crimes = set(type_crimes_seattle).intersection(type_crimes_sf)
crime_per_capita = pd.DataFrame(
{"San Francisco": 
san_francisco[san_francisco['Category'].isin(common_crimes)].
groupby(['Category'])['PdId'].count().apply(lambda x: (x/san_francisco_population) * 100000) ,
 "Seattle":
seattle[seattle['Summarized Offense Description'].isin(common_crimes)].
groupby(['Summarized Offense Description'])['RMS CDW ID'].count().apply(lambda x: (x/seattle_population) * 100000)})

In this point, we got our working dataset, and the matching criminal offences list created, plus the crime per capita, per 100,000 people, also calculated, using the estimated population in 2014 of both cities, both available in their data portals

In [157]:
data = [
    Scatter(
        y=crimes_per_day_sf['Category'], # assign x as the dataframe column 'x'
        x=[month[x] for x in crimes_per_day_sf['month']],
        name= 'San Francisco'
    ),
    Scatter(
        y=crimes_per_day_seattle['RMS CDW ID'], # assign x as the dataframe column 'x'
        x=[month[x] for x in crimes_per_day_seattle['Month']],
        name= 'Seattle'
    ),
    
]
layout = dict(title = 'Criminal offences by month in San Francisco Vs Seattle',
              xaxis = dict(title = 'Amount of Offences'),
              yaxis = dict(title = 'Month'),
              )
fig = Figure(data=data,layout=layout)

## Conclusions and Findings


We can briefly extract two broad conclussions here.

- From the chart below, it becomes apparent that along the course of the summer the overall crime figures decreases in Seattle and increases in San Francisco. The reason for this seemingly so seasonal behaviour has to be explained, perhaps with a correlation analysis of resident flux (it is possible that people leaves the city, yet uncertain), or similar.

- From the table summarized, except for Disorderly Conduct, Seattle appears to be a far more dangerous place to live than San Francisco. Of course, due to the fact that Seattl has roughly four times the number of inhabitants, that could be a case of a statistical imbalance (as we're comparing two very different populations). Nonetheless, in the same period and as per the chart submitted, Seattle is also topping the charts in terms of sheer number of reported offences.

There is a final caveat and comment: In the provided corpus, the category 'Homicide' was present in San Francisco, whereas in Seattle was not. As the table below also summarizes, most of the offences below are felonies against property (but Assault), thus, this is a no minor finding that needs to be taken in consideration before concluding that a place is more dangerous to live than another.

### Criminal trends (overall), summer 2014, San Francisco Vs Seattle

In [160]:
iplot(fig)

![](./sanfrancisco.png)
### Crime Per Capita - San Francisco vs Seattle

Offences per 100,000 inhabitants.


In [155]:
crime_per_capita

Unnamed: 0_level_0,San Francisco,Seattle
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
ASSAULT,66.482122,304.649758
BURGLARY,0.138408,484.903382
DISORDERLY CONDUCT,0.71511,0.301932
FRAUD,5.582468,222.373188
PROSTITUTION,2.583622,30.495169
ROBBERY,7.10496,111.111111
STOLEN PROPERTY,0.184544,171.497585
TRESPASS,6.482122,73.369565
VEHICLE THEFT,45.351788,461.503623
