### The purpose of this notebook is to answer question three of my analysis questions:

#### How many marijuana-involved incidents did SFPD report different police districts across the city?

In [1]:
#import modules
import pandas as pd
import altair as alt



Import our cleaned dataset that contains all of our marijuana incidents. We made this .csv file in the data_cleaning notebook.

In [2]:
mari_incidents = pd.read_csv('all_data_marijuana.csv', dtype=str)

Convert our incident dates to a datetime data format.

In [3]:
mari_incidents['incident_date'] = pd.to_datetime(mari_incidents['incident_date'])

Check our date ranges

In [4]:
mari_incidents['incident_date'].min()

Timestamp('2003-01-01 00:00:00')

In [5]:
mari_incidents['incident_date'].max()

Timestamp('2021-10-09 00:00:00')

Looks like we've got a full year of data for 2003, our earliest year. But since 2021 ends in October, we can't do full annual analysis on that year. So let's make a dataframe with our full years of data.

In [6]:
full_years = mari_incidents[
    (mari_incidents['incident_date'] >= '2003-01-01') &
    (mari_incidents['incident_date'] < '2021-01-01')
].reset_index(drop=True)

We know from our data dictionary that there are multiple row entries for some individual incidents. But we also know that the incident_number will remain the same across all entries related to the same incident. So since we're just looking at how many incidents there were in each year in each district, we can go ahead and drop all the duplicates in the incident_number column:

In [7]:
full_years_incidents = full_years.drop_duplicates(subset=['incident_number'])

In [8]:
full_years_incidents.head()

Unnamed: 0,row_id,incident_number,incident_code,incident_category,incident_description,day_of_week,incident_date,incident_time,police_district,resolution,longitude,latitude,the_geom
0,16055139916010,160551399,16010,DRUG/NARCOTIC,possession of marijuana,Friday,2016-07-08,08:00,mission,"ARREST, BOOKED",-122.42326589360349,37.765649515945,POINT (-122.42326589360349 37.765649515945)
1,17102985016010,171029850,16010,DRUG/NARCOTIC,possession of marijuana,Thursday,2017-12-21,10:40,taraval,"ARREST, BOOKED",-122.45364594949392,37.72327255110331,POINT (-122.45364594949392 37.72327255110331)
2,17026584716010,170265847,16010,DRUG/NARCOTIC,possession of marijuana,Saturday,2017-04-01,02:10,northern,"ARREST, BOOKED",-122.43959183986,37.783850873845424,POINT (-122.43959183986001 37.783850873845424)
3,16071288616010,160712886,16010,DRUG/NARCOTIC,possession of marijuana,Friday,2016-09-02,17:30,park,"ARREST, BOOKED",-122.45351291112613,37.76869697865512,POINT (-122.45351291112611 37.76869697865512)
4,16054757016030,160547570,16030,DRUG/NARCOTIC,possession of marijuana for sales,Wednesday,2016-07-06,18:32,richmond,NONE,-122.46620466789288,37.77254053915932,POINT (-122.46620466789287 37.772540539159316)


In [9]:
incidents_by_district = full_years_incidents.groupby(['police_district']).count()

In [10]:
clean_incidents_by_district = incidents_by_district[['row_id']].copy()

In [11]:
clean_incidents_by_district = clean_incidents_by_district.reset_index()

In [12]:
#rename columns
clean_incidents_by_district.columns = ['police_district', 'number_of_incidents']

In [13]:
#sort by number of incidents
clean_incidents_by_district = clean_incidents_by_district.sort_values(by=['number_of_incidents'], ascending=False).reset_index(drop=True)

In [14]:
clean_incidents_by_district

Unnamed: 0,police_district,number_of_incidents
0,southern,4310
1,tenderloin,3073
2,park,2797
3,mission,2247
4,bayview,1999
5,ingleside,1174
6,taraval,1059
7,northern,1025
8,richmond,728
9,central,597


So there we have it! That's all the marijuana related incidents the SF Police Department responded to from 2003-2020 by police district. It's clear that incidents that the police responded to are heavily weighted towards certain districts, including Southern, Tenderloin, and Park. An interesting follow up question would be to investigate why. Do more people live in those neighborhoods? Are more marijuana crimes committed in those neighborhoods? Do the police enforce marijuana laws differently in these neighborhoods than other parts of the city?

Let's visualize our data:

In [15]:
alt.Chart(clean_incidents_by_district).mark_bar().encode(
    x='police_district',
    y='number_of_incidents'
).properties(
    title='San Francisco Police: Marijuana Incidents by Police District 2003-2020'
)

That's the end of this analysis!