## Misclassified Crimes in LAPD Data (Los Angeles Times)

- [Times Investigation: LAPD misclassified nearly 1,200 violent crimes as minor offenses](https://www.latimes.com/local/la-me-crimestats-lapd-20140810-story.html)
- [LAPD underreported serious assaults, skewing crime stats for 8 years](https://www.latimes.com/local/cityhall/la-me-crime-stats-20151015-story.html)
- [How we reported this story](https://www.latimes.com/local/cityhall/la-me-crime-stats-side-20151015-story.html)

## Definitions

>**Aggravated Assault:** An unlawful attack by one person upon another for the purpose of inflicting severe or aggravated bodily injury. This type of assault usually is accompanied by the use of a weapon or by means likely to produce death or great bodily harm.


>**Other Assault:** Simple, Not Aggravated. Includes all assaults which do not involve the use of a firearm, knife, cutting instrument, or other dangerous weapon and in which the victim did not sustain serious or aggravated injuries. 

## Our Data Sample

The dataset has hundreds of thousands of rows, but we will sample 100 from them for now: https://docs.google.com/spreadsheets/d/1LZ72b3cgVi7mhryMiromE3eT86DSnfna1cXjX-jLvGk/edit#gid=0

## Load the data

In [20]:
%matplotlib inline
import csv, requests, os
import pandas as pd
import numpy as np

In [21]:
def make_regular_gsheet_url(doc_id, sheet_id):
    return f"https://docs.google.com/spreadsheets/d/{doc_id}/edit#gid={sheet_id}"

def make_csv_gsheet_url(doc_id, sheet_id):
    return f"https://docs.google.com/spreadsheets/d/{doc_id}/export?format=csv&id={doc_id}&gid={sheet_id}"

GOOGLE_SHEET_ID = '1LZ72b3cgVi7mhryMiromE3eT86DSnfna1cXjX-jLvGk'
print("Querying Doc:", make_regular_gsheet_url(GOOGLE_SHEET_ID, "0"))
response = requests.get(make_csv_gsheet_url(GOOGLE_SHEET_ID, "0"))
reader = csv.reader(response.text.splitlines())
header = next(reader)
df = pd.DataFrame(list(reader), columns=header)


Querying Doc: https://docs.google.com/spreadsheets/d/1LZ72b3cgVi7mhryMiromE3eT86DSnfna1cXjX-jLvGk/edit#gid=0


# You are the classifier 👈


Based on the definitions provided, categorize the data you have been assigned as `Other Assault` or `Aggrevated Assault`.

## ChatGPT as the classifier 🤖

In [22]:
import os
from dotenv import load_dotenv
load_dotenv()
openai_key = os.getenv("OPENAI_API_KEY")

In [23]:
from openai import OpenAI
client = OpenAI()

MODEL = 'gpt-3.5-turbo'

def ask_chatgpt_to_classify(text_description):
  response = client.chat.completions.create(
    model=MODEL,
    messages=[
      {
        "role": "system",
        "content": "\"You are a classifier that helps to classify between two categories.\n\nAggravated Assault: An unlawful attack by one person upon another for the purpose of inflicting severe or aggravated bodily injury. This type of assault usually is accompanied by the use of a weapon or by means likely to produce death or great bodily harm.\n\nOther Assault: Simple, Not Aggravated. Includes all assaults which do not involve the use of a firearm, knife, cutting instrument, or other dangerous weapon and in which the victim did not sustain serious or aggravated injuries. \n\nI'll give you various snippets and i'd like for you to categorize them as one or the other. Please provide only the response 'Aggravated Assault' or 'Other Assault'"
      },
      {
        "role": "user",
        "content": text_description
      },
    ],
    temperature=0,
    max_tokens=256,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
  )

  return response.choices[0].message.content

In [24]:
from tqdm.notebook import tqdm
tqdm.pandas()
df[MODEL] = df['description'].progress_apply(ask_chatgpt_to_classify)

  0%|          | 0/100 [00:00<?, ?it/s]

## Calculate precision and recall vs LAPD

In [25]:
pd.crosstab(df['lapd'], df[MODEL])

gpt-3.5-turbo,Aggravated Assault,Other Assault
lapd,Unnamed: 1_level_1,Unnamed: 2_level_1
Aggravated Assault,50,0
Other Assault,35,15


In [33]:
# use sklearn to calculate precision, recall, f1 and accuracy
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
print(f"Accuracy score is: {accuracy_score(df['lapd'], df[MODEL])}\n\n\n")
print(classification_report(df['lapd'], df[MODEL]))

Accuracy score is: 0.65



                    precision    recall  f1-score   support

Aggravated Assault       0.59      1.00      0.74        50
     Other Assault       1.00      0.30      0.46        50

          accuracy                           0.65       100
         macro avg       0.79      0.65      0.60       100
      weighted avg       0.79      0.65      0.60       100

