# San Francisco Crime Data Analysis 

### Load some crime-data into your Jupyter notebook
The data we will be working with is perfect for `pandas`, so a good approach is to go to [Pandas Tutorial](https://www.w3schools.com/python/pandas/default.asp) if you don't and figure out how to load data into `pandas` means.

> *Exercise 1*
>
> * Go to https://datasf.org/opendata/
> * Click on "Public Safety"
> * You will notice that the SF crime data is divided into two periods. One from 2003 to May 2018 and one which is all of 2018 to the present. **Today, to keep things easy, we will just work with the data from 2003 to 2018** (from January 1st 2003 to December 31st 2017 to be exact). 
> * Thus, you may simply download all police incidence reports, historical 2003 to may 2018. You can get everything as a big CSV file if you press the *Export* button (it's a snappy little ~500MB file).
> * To get this thing into `pandas`, you can use the tips and tricks described [here](https://www.shanelynn.ie/python-pandas-read_csv-load-data-from-csv-files/). If you want to try your luck without `pandas`, you can use the `csv` package to load the file.
> * Now generate the following simple statistics
>   - Report the total number of crimes in the dataset.
>   - List the various categories of crime. How many are there? 
>   - List the number of crimes in each category.

---

In order to do awesome *predictive policing*, we're going to dissect the SF crime-data quite thoroughly to figure out what has been going on over the last  years on the San Francisco crime scene. 

> *Exercise 2*: The types of crimes. The first field we'll dig into is the column "Category".
> * We have already counted the number of crimes in each category. What is the most commonly occurring category of crime? What is the least frequently occurring?
> * Create a bar-plot over crime occurrences. First essential lesson regarding data visualization: **For a plot to be informative you need to label the axes** (The police chief will be furious if you forget). It can also be nice to other relevant pieces of info, title, labels, etc.). Mine looks like this (but yours doesn't have to look exactly like mine - the important thing is that you clearly communicate the information in the dataset).

<div>
<img src="https://raw.githubusercontent.com/suneman/socialdata2022/main/files/CrimeOccurrencesByCategory.png" width="700"/>
</div>

<small><div style='text-align:right'>(C) Sune Lehmann (https://github.com/suneman/socialdata2024/blob/main/lectures/Week1.ipynb)</div></small>

In [11]:
#!pip install pandas
import pandas as pd

dataFile = pd.read_csv('PD_Reports_2003-2018.csv')

# Get the total number of rows/crimes
total_rows = dataFile.shape[0]

print("Total number of rows:", total_rows)

# Get the unique categories
unique_categories = dataFile['Category'].unique()

# Get the number of unique categories
num_unique_categories = len(unique_categories)

print("Number of different categories:", num_unique_categories)
print("List of categories:", unique_categories)

# Get the counts of each category
category_counts = dataFile['Category'].value_counts()

# Print the number of times each category appears
print("Number of crimes each category:")
print(category_counts)

Total number of rows: 2129525
Number of different categories: 37
List of categories: ['ROBBERY' 'VEHICLE THEFT' 'ARSON' 'ASSAULT' 'TRESPASS' 'BURGLARY'
 'LARCENY/THEFT' 'WARRANTS' 'OTHER OFFENSES' 'DRUG/NARCOTIC'
 'SUSPICIOUS OCC' 'LIQUOR LAWS' 'VANDALISM' 'WEAPON LAWS' 'NON-CRIMINAL'
 'MISSING PERSON' 'FRAUD' 'SEX OFFENSES, FORCIBLE' 'SECONDARY CODES'
 'DISORDERLY CONDUCT' 'RECOVERED VEHICLE' 'KIDNAPPING'
 'FORGERY/COUNTERFEITING' 'PROSTITUTION' 'DRUNKENNESS' 'BAD CHECKS'
 'DRIVING UNDER THE INFLUENCE' 'LOITERING' 'STOLEN PROPERTY' 'SUICIDE'
 'BRIBERY' 'EXTORTION' 'EMBEZZLEMENT' 'GAMBLING' 'PORNOGRAPHY/OBSCENE MAT'
 'SEX OFFENSES, NON FORCIBLE' 'TREA']
Number of crimes each category:
Category
LARCENY/THEFT                  477975
OTHER OFFENSES                 301874
NON-CRIMINAL                   236928
ASSAULT                        167042
VEHICLE THEFT                  126228
DRUG/NARCOTIC                  117821
VANDALISM                      114718
WARRANTS                       