<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-data" data-toc-modified-id="Load-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load data</a></span><ul class="toc-item"><li><span><a href="#Crimes" data-toc-modified-id="Crimes-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Crimes</a></span></li><li><span><a href="#Blocks" data-toc-modified-id="Blocks-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Blocks</a></span></li></ul></li><li><span><a href="#Selection-of-crimes" data-toc-modified-id="Selection-of-crimes-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Selection of crimes</a></span><ul class="toc-item"><li><span><a href="#Plot-missing" data-toc-modified-id="Plot-missing-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Plot missing</a></span></li></ul></li><li><span><a href="#Average-area-per-block" data-toc-modified-id="Average-area-per-block-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Average area per block</a></span></li></ul></div>

**Description**: This notebook contains code to reproduce some of the numbers and figures from the Appendix.

In [1]:
import pickle
import sys
from pathlib import Path

import altair as alt
import pandas as pd

sys.path.append('../..')
from src.prepare_data.crime_database import load_crimes
from src.analysis.figures import format_chart

In [2]:
project_root = Path('../..')
data_path = (project_root / 'data')

In [3]:
alt.renderers.enable('notebook')

RendererRegistry.enable('notebook')

In [4]:
def n_rows(df):
    """Prints number of rows with a ',' as a thousand separator
    
    Parameters
    ----------
    df : pd.DataFrame
        Dataframe of which number of rows should be printed
        
    Returns
    -------
    Nothing
    """
    print(format(df.shape[0], ','))
    return

In [5]:
project_folder = Path('../..')

# Load data
## Crimes
The loaded dataset contains all the crimes from the raw dataset for the period of the analysis.

In [6]:
query = """select ID, Date, "Primary Type", Latitude, Longitude, "FBI Code"
from crimes
where Date between '2006-01-01' and '2016-06-30'"""
crimes = load_crimes(query, sqldb_path=str(data_path / 'processed/crimes.db'))

In [7]:
n_rows(crimes)

3,736,428


# Selection of crimes

In [8]:
with (data_path / 'processed/figures/blocks_with_dummies.pkl').open('rb') as f:
    blocks = pickle.load(f)
blocks.head()

Unnamed: 0,tract_bloc,school_year,statefp10,countyfp10,tractce10,geoid10,blockce10,name10,r_numbers,treated_backup,geometry,route_number,school_name,treated,one_over,two_over,three_over,info
0,101001000,SY0506,17,31,10100,170310101001000,1000,Block 1000,,0,POLYGON ((-87.66635499979151 42.02252199950325...,,,0.0,0,0,0,-
1,101001001,SY0506,17,31,10100,170310101001001,1001,Block 1001,,0,POLYGON ((-87.66753999955125 42.02223700032794...,,,0.0,0,0,0,-
2,101001002,SY0506,17,31,10100,170310101001002,1002,Block 1002,,0,POLYGON ((-87.67008600039445 42.02226200030603...,,,0.0,0,0,0,-
3,101001003,SY0506,17,31,10100,170310101001003,1003,Block 1003,,0,"POLYGON ((-87.67009499920478 42.0211490002601,...",,,0.0,0,0,0,-
4,101002000,SY0506,17,31,10100,170310101002000,2000,Block 2000,,0,POLYGON ((-87.67188399967968 42.02298600014132...,,,0.0,0,0,0,-


In [9]:
violent_crime = {
    '01A': 'Homicide 1st & 2nd Degree',
    '02': 'Criminal Sexual Assault',
    '03': 'Robbery',
    '04A': 'Aggravated Assault',
    '04B': 'Aggravated Battery'
}
property_crime = {
    '05': 'Burglary',
    '06': 'Larceny',
    '07': 'Motor Vehicle Theft',
    '09': 'Arson'
}

crime_categories = list(violent_crime.keys()) + list(property_crime.keys())

In [10]:
crimes = crimes[crimes['FBI Code'].isin(crime_categories)]
crimes['violent'] = crimes['FBI Code'].isin(list(violent_crime.keys())) * 1

In [11]:
n_rows(crimes)

1,502,468


Number of violent crimes:

In [12]:
format(crimes['violent'].sum(), ',')

'313,250'

Number of property crimes:

In [13]:
format(crimes.shape[0] - crimes['violent'].sum(), ',')

'1,189,218'

In [14]:
crimes.isnull().sum()

ID                  0
Date                0
Primary Type        0
Latitude        14661
Longitude       14661
FBI Code            0
violent             0
dtype: int64