### NYPD Historic Arrests Data
#### Using Python, Altair to Create a Dashboard

The arrests data is fairly large and complicated. To make it easier, we will be filtering the data:
* 2015 to 2017,
* precincts 90 and 94 (North Brooklyn)
* only felonies

We will be using python's Altair module (Vega-light wrapper for Python), to get very quick results. Altair's geographic capabilities are very limited but it is a good tool to put a tool together quickly. 

##### Before we start it is important to note that the arrest data does not show where the crime is, it shows where the arrests were. It is well known that the data is biased much as the police officers doing the arrests

In [1]:
import pandas as pd
import altair as alt
pd.options.display.max_columns = None

df = pd.read_csv( r"C:\Users\csucuogl\Desktop\DATA\NYC\NYPD_Arrests_Data__Historic_.csv" )
df['date'] = pd.to_datetime( df['ARREST_DATE'])
df = df[ (df.date.dt.year == 2017) | (df.date.dt.year == 2016 ) | (df.date.dt.year == 2015 )  ] #Limit Time

df['week_day'] = df['date'].dt.dayofweek #What day of the week
df['wmonth'] = df['date'].dt.month #What month

df = df[ df['LAW_CAT_CD']=='F' ] #only felonies
df = df[ (df['ARREST_PRECINCT']==90) |  (df['ARREST_PRECINCT']==94) ].copy()#Only North Brooklyn Area

df.sample(3) 

Unnamed: 0,ARREST_KEY,ARREST_DATE,PD_CD,PD_DESC,KY_CD,OFNS_DESC,LAW_CODE,LAW_CAT_CD,ARREST_BORO,ARREST_PRECINCT,JURISDICTION_CODE,AGE_GROUP,PERP_SEX,PERP_RACE,X_COORD_CD,Y_COORD_CD,Latitude,Longitude,date,week_day,wmonth
762189,144365680,07/09/2015,441.0,"LARCENY,GRAND OF AUTO",110.0,GRAND LARCENY OF MOTOR VEHICLE,PL 1553008,F,K,90,0.0,18-24,M,WHITE HISPANIC,996041.0,198196.0,40.710679,-73.957471,2015-07-09,3,7
263496,161139546,01/27/2017,729.0,"FORGERY,ETC.,UNCLASSIFIED-FELONY",113.0,FORGERY,PL 1701003,F,K,94,0.0,25-44,M,WHITE HISPANIC,1000805.0,200665.0,40.717449,-73.940281,2017-01-27,4,1
594737,149177682,01/08/2016,779.0,"PUBLIC ADMINISTRATION,UNCLASSIFIED FELONY",126.0,MISCELLANEOUS PENAL LAW,PL 2155100,F,K,90,2.0,25-44,M,BLACK,998016.0,196598.0,40.70629,-73.95035,2016-01-08,4,1


#### Formating and Cleaning


In [26]:
df["date"] = pd.to_datetime( df["date"] , format='%Y-%m-%d') #Format datetime

df['PERP_RACE'] = [row['PERP_RACE'].split('/')[0] for index,row in df.iterrows()] #String fomating
df =df[df.Longitude<-73.90] #There is one data point far away

alt.data_transformers.disable_max_rows() #To use big datasets (>5000 rows) we have to pass this line.

DataTransformerRegistry.enable('default')

#### Creating the dashboard
There are 4 plots laid in a grid. <br>
The selection filters works from left to right. 

In [29]:

# ------------------   SCATTER PLOT  ------------------------------------
# This is not a geographic plot. We are using lat lon to make a scatter plot
# The visual does not have a projection.

brush = alt.selection_interval( empty='all' , bind = 'scales')  
brush_age = alt.selection_interval( empty='all' ) 

chart = alt.Chart(df).mark_point( filled=True ).encode(
    x = alt.X( 'Longitude:Q' , scale=alt.Scale(zero=False) ),
    y =alt.Y( 'Latitude:Q', scale = alt.Scale(zero=False) ),
    size = alt.Size('count()', scale = alt.Scale( range=[1, 800] ,type='sqrt' )), #count() allows for grouping and counting the data
    opacity=alt.value(0.4),
    tooltip='count(*):Q',
    color=alt.condition(brush, alt.value('#2b7076') , alt.value('lightgray'))
    ).add_selection( brush #Selection will filter the data in other plots
    ).properties(
        width=400,
        height=400,
        title = 'Location Based Clustering of NYPD Arrests Data'
    ).transform_filter( brush_age #Bubble plots selection will filter the scatter plot as well.
    ).interactive() #Allow zoom in and out

# ------------------   RACE - AGE   ------------------------------------
bar = alt.Chart(df).mark_point(filled=True).encode(
    x='AGE_GROUP:N',
    y=alt.Y( 'PERP_RACE:N', axis= alt.Axis( values = df['PERP_RACE'].unique().tolist() ) ),
    size = alt.Size('count()', scale=alt.Scale( range=[1, 1500] )), #Grouping and Counting
    color=alt.condition(brush_age, alt.value('#2b7076') , alt.value('lightgray'))
    ).properties(
        width=300,
        height=400,
        title = 'Race and Age'
    ).transform_filter( brush ).add_selection(brush_age)

# ===================== DAY OF THE WEEK ================================

brush_desc = alt.selection_interval( encodings=['x'] , empty='all' ) #Selection will filter the data on X axis values

time_hist = alt.Chart(df).mark_bar().encode(
    alt.X("week_day:N" , axis= alt.Axis( values = [0,1,2,3,4,5,6] ) ),
    y=alt.Y('count()'),
    color=alt.condition(brush_desc, alt.value('#2b7076') , alt.value('lightgray'))
    ).properties(
        width=400,
        height=200,
        title = 'Day of the Week'
    ).add_selection(brush_desc).transform_filter( brush).transform_filter( brush_age )

#======================= DESCRIPTION ================================
crimes = df['OFNS_DESC'].value_counts()[ df['OFNS_DESC'].value_counts()>10 ].index.tolist()

brush_which = alt.selection_multi( empty='all' ) 
desc = alt.Chart(df).mark_bar().encode(
    alt.X("OFNS_DESC:N" , sort='-y' ), #sort data by count in y axis, so by height
    y=alt.Y('count()'),
    color = alt.condition(brush_desc, alt.value('#2b7076') , alt.value('lightgray'))
    ).properties(
        width=400,
        height=200,
        title = 'Description'
    ).add_selection(brush_which).transform_filter( brush ).transform_filter( brush_age ).transform_filter( brush_desc )

#### Plotting
| puts the graphs next to each other, <br>
& Puts them in the next line.

In [28]:
(chart | bar) & (time_hist | desc)

#### Saving Dashboards
There are 2 ways of saving interactive dashboards.
* You can save your dashboard directly to an HTML file. This will create a functioning html document.
* You can use the same .save command and save as a Json, then use the Vega Embed library to create an HTML documents. 

Although 2. option is longer and complicate, it creates a cleaner HTML file. 
Remove the # from either line and run the code. 

In [None]:
#(chart | bar) & (time_hist | desc).save('dashboard.html')
#(chart | bar) & (time_hist | desc).save('dashboard.json')