### The purpose of this notebook is to answer one of my analysis questions:

#### How many marijuana-involved incidents did the SFPD respond to each year over time?

In [1]:
#import modules
import pandas as pd



Import our cleaned dataset that contains all of our marijuana incidents. We made this .csv file in the data_cleaning notebook.

In [2]:
mari_incidents = pd.read_csv('all_data_marijuana.csv', dtype=str)

Convert our incident dates to a datetime data format.

In [3]:
mari_incidents['incident_date'] = pd.to_datetime(mari_incidents['incident_date'])

Check our date ranges

In [4]:
mari_incidents['incident_date'].min()

Timestamp('2003-01-01 00:00:00')

In [5]:
mari_incidents['incident_date'].max()

Timestamp('2021-10-09 00:00:00')

Looks like we've got a full year of data for 2003, our earliest year. But since 2021 ends in October, we can't do full annual analysis on that year. So let's make a dataframe with our full years of data.

In [6]:
full_years = mari_incidents[
    (mari_incidents['incident_date'] >= '2003-01-01') &
    (mari_incidents['incident_date'] < '2021-01-01')
].reset_index(drop=True)

We know from our data dictionary that there are multiple row entries for some individual incidents. But we also know that the incident_number will remain the same across all entries related to the same incident. So since we're just looking at how many incidents there were in each year, we can go ahead and drop all the duplicates in the incident_number column:

In [7]:
full_years_incidents = full_years.drop_duplicates(subset=['incident_number'])

Now we want to get annual counts of incidents using groupby

In [8]:
annual_mari_incidents = full_years_incidents.groupby([pd.Grouper(key='incident_date', axis=0, freq='A')]).count()

In [9]:
annual_mari_incidents

Unnamed: 0_level_0,row_id,incident_number,incident_code,incident_category,incident_description,day_of_week,incident_time,police_district,resolution,longitude,latitude,the_geom
incident_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2003-12-31,1701,1701,1701,1701,1701,1701,1701,1701,1701,1701,1701,1701
2004-12-31,1579,1579,1579,1579,1579,1579,1579,1579,1579,1579,1579,1579
2005-12-31,1125,1125,1125,1125,1125,1125,1125,1125,1125,1125,1125,1125
2006-12-31,1216,1216,1216,1216,1216,1216,1216,1216,1216,1216,1216,1216
2007-12-31,1636,1636,1636,1636,1636,1636,1636,1636,1636,1636,1636,1636
2008-12-31,1880,1880,1880,1880,1880,1880,1880,1880,1880,1880,1880,1880
2009-12-31,1973,1973,1973,1973,1973,1973,1973,1973,1973,1973,1973,1973
2010-12-31,1654,1654,1654,1654,1654,1654,1654,1654,1654,1654,1654,1654
2011-12-31,1086,1086,1086,1086,1086,1086,1086,1086,1086,1086,1086,1086
2012-12-31,1073,1073,1073,1073,1073,1073,1073,1073,1073,1073,1073,1073


### Next steps: look at the berkeley calls pt 2. I need to subset to just the date and case number columns. Then will make it fancy, then probably visualize.