# Assault detection

Looking at reports of a crime, we're going to see whether we can detect the severity of an assault. It's part of a larger analysis [done by the LA Times](https://www.latimes.com/la-me-g-lapd-reclass-htmlstory.html).

## Imports

First we'll set some options up to make everything display correctly. It's mostly because these assault descriptions can be quite long, and the default is to truncate text after a few words.

In [2]:
import pandas as pd

pd.set_option('display.max_colwidth', 200)
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 300)
pd.set_option('display.min_rows', 20)

## Reading in our data

Our dataset is going to be a database of crimes committed between 2008 and 2012. The data has been cleaned and filtered a bit, though, so we're only left with two columns:

* `CCDESC`, what criminal code was violated
* `DO_NARRATIVE`, a short text description of what happened

We're going to use this description to see if we can separate serious cases of assault compared to non-serious cases of assault.

In [3]:
df = pd.read_csv("../2008-2012.csv")
df.head(10)

Unnamed: 0,CCDESC,DO_NARRATIVE
0,SHOPLIFTING - PETTY THEFT ($950 & UNDER),DO-SUSP WAS SEEN THROUGH SURVAILANCE CONCEALING SEVERAL ITEMS INTO HER SHOPPING AND PERSONAL BAG LEAVING WITHOUT PAYING DEPT STORE
1,VIOLATION OF COURT ORDER,DO-SUSP ARRIVED AT VICTS RESID AND ENTERED VICTS RESID IN VIOLATION OF RESTRAINING ORDER
2,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES
3,THEFT PLAIN - PETTY ($950 & UNDER),DO-UNK SUSP TOOK VICT PREPAID GIFT CARD SUSP PURCHASED PRODUCTS WITH ITEM
4,BATTERY - SIMPLE ASSAULT,DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS
5,THEFT OF IDENTITY,DO-UNK SUSP USED VICTS PERSONAL INFO FOR GAIN WITHOUT THE VICTS CONSENT ORKNOWLEDGE
6,SHOPLIFTING - PETTY THEFT ($950 & UNDER),DO-SUSP ENTERED MKT AND SEL ITEMS SUSP CONCEALED ITEMS AND EXITED STORE WOPAYING
7,BURGLARY,DO-UNK SUSP ENTERED VICTS RESIDENCE BY UNLOCKED FRONT DOOR SUSP REMOVED VCTICTS PROPERTY SUSP FLED LOC
8,OTHER MISCELLANEOUS CRIME,DO-SUSP ADMITTED TO PLACING 2010 REG TAG HE ILLEGALLY OBTAINED ON HIS LIC PLATE HIS VEH REG WAS STILL EXP
9,BATTERY - SIMPLE ASSAULT,DO-S APPROACHED V IN VEH S SLAPPED AND LUNGGED AT V


## Filtering for assaults

First, filter the dataset so it **only includes assaults.** No burglary, no identity theft, no shoplifting: just every kind of assault.

In [4]:
df = df[df.CCDESC.str.contains("ASSAULT")]

In [5]:
df.head()

Unnamed: 0,CCDESC,DO_NARRATIVE
2,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES
4,BATTERY - SIMPLE ASSAULT,DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS
9,BATTERY - SIMPLE ASSAULT,DO-S APPROACHED V IN VEH S SLAPPED AND LUNGGED AT V
11,BATTERY - SIMPLE ASSAULT,DO-V STATED THAT SUSP CONFRT HER WHEN SHE TRIED TO APPR HER HUSBAND SUSP AND V HUSBAND ARE FRNDS SUSP YELLED STAY AWAY FROM HIM AND PUSHED V
16,BATTERY - SIMPLE ASSAULT,DO-SUSPS WERE VERBALLY ABUSING VICT DURING WHICH TIME S1 STRUCK VICT THREETIMES ON THE BACK OF HIS LEFT SHOULDER


In [6]:
# What are our possible crime descriptions?
df.CCDESC.value_counts()

BATTERY - SIMPLE ASSAULT                          71951
ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT    43385
INTIMATE PARTNER - SIMPLE ASSAULT                 42102
CHILD ABUSE (PHYSICAL) - SIMPLE ASSAULT            4297
INTIMATE PARTNER - AGGRAVATED ASSAULT              1606
CHILD ABUSE (PHYSICAL) - AGGRAVATED ASSAULT        1481
ASSAULT WITH DEADLY WEAPON ON POLICE OFFICER        749
OTHER ASSAULT                                       394
Name: CCDESC, dtype: int64

In [7]:
df['is_aggravated'] = df.CCDESC.replace({
    'BATTERY - SIMPLE ASSAULT': 0,
    'ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT': 1,
    'INTIMATE PARTNER - SIMPLE ASSAULT': 0,
    'CHILD ABUSE (PHYSICAL) - SIMPLE ASSAULT': 0,
    'INTIMATE PARTNER - AGGRAVATED ASSAULT': 1,
    'CHILD ABUSE (PHYSICAL) - AGGRAVATED ASSAULT': 1,
    'ASSAULT WITH DEADLY WEAPON ON POLICE OFFICER': 1,
    'OTHER ASSAULT': 0
})
df.head()

Unnamed: 0,CCDESC,DO_NARRATIVE,is_aggravated
2,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES,1
4,BATTERY - SIMPLE ASSAULT,DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS,0
9,BATTERY - SIMPLE ASSAULT,DO-S APPROACHED V IN VEH S SLAPPED AND LUNGGED AT V,0
11,BATTERY - SIMPLE ASSAULT,DO-V STATED THAT SUSP CONFRT HER WHEN SHE TRIED TO APPR HER HUSBAND SUSP AND V HUSBAND ARE FRNDS SUSP YELLED STAY AWAY FROM HIM AND PUSHED V,0
16,BATTERY - SIMPLE ASSAULT,DO-SUSPS WERE VERBALLY ABUSING VICT DURING WHICH TIME S1 STRUCK VICT THREETIMES ON THE BACK OF HIS LEFT SHOULDER,0


## Prepare our dataset

In [8]:
# Drop missing data
print('Started with', df.shape)
# We can either drop the rows with missing data in specific columns
# Or we could just do df = df.dropna()
df = df.dropna(subset=['DO_NARRATIVE', 'is_aggravated'])
print('Ended with', df.shape)

Started with (165965, 3)
Ended with (165965, 3)


In [9]:
# Vectorize
from sklearn.feature_extraction.text import TfidfVectorizer

# You are only going to take 1000 words to use!
# (it will be the 1000 most common words)
vectorizer = TfidfVectorizer(max_features=2000)
vectors = vectorizer.fit_transform(df.DO_NARRATIVE)
words_df = pd.DataFrame(vectors.toarray(), columns=vectorizer.get_feature_names())
words_df.head()

Unnamed: 0,10,10yrs,11,12,13,14,15,16,18,18th,1x,1yr,20,2nd,2x,2yrs,30,38th,390,3rd,3x,3yrs,40,4th,4x,4yrs,54th,5th,5x,5yrs,6th,6yrs,7th,7yrs,8th,911,9mm,abdomen,able,about,above,abrasion,abrasions,abuse,abused,abusive,abv,accelerated,accidentally,accused,...,wilshire,wind,window,windows,windshield,wine,wit,wit1,wit2,with,witha,within,without,witness,wits,wndw,wo,woke,woman,wood,wooden,words,work,worker,workers,working,works,would,wound,wounds,wrapped,wre,wrench,wrestled,wrist,wrists,ws,wth,yard,year,years,yell,yelled,yelling,yells,you,your,youre,yr,yrs
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.353481,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.360997,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.236437,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
# !pip install pystemmer

from sklearn.feature_extraction.text import TfidfVectorizer
import Stemmer

# English stemmer from pyStemmer
stemmer = Stemmer.Stemmer('en')

analyzer = TfidfVectorizer().build_analyzer()

# Override TfidfVectorizer
class StemmedTfidfVectorizer(TfidfVectorizer):
    def build_analyzer(self):
        analyzer = super(TfidfVectorizer, self).build_analyzer()
        return lambda doc: stemmer.stemWords(analyzer(doc))

# Create a new StemmedTfidfVectorizer
# vectorizer = StemmedTfidfVectorizer(max_features=2000)
vectorizer = StemmedTfidfVectorizer(max_features=1000)
vectors = vectorizer.fit_transform(df.DO_NARRATIVE)
words_df = pd.DataFrame(vectors.toarray(), columns=vectorizer.get_feature_names())
words_df.head()

Unnamed: 0,10,11,12,13,15,18th,1x,1yr,20,2x,2yr,390,3x,3yr,911,abdomen,abl,about,abov,abras,abus,abv,acceler,accus,across,adn,adv,advis,after,again,against,aggress,ago,aid,air,alcohol,all,alleg,alley,allow,almost,along,also,alt,alterc,am,an,and,anger,angri,...,want,warn,was,watch,water,way,wb,weapon,week,went,were,west,westbound,western,what,when,where,whi,which,while,white,who,wife,will,window,windshield,wit,wit1,with,witha,without,woke,wood,wooden,word,work,would,wound,wrap,wrench,wrestl,wrist,wth,yard,year,yell,you,your,yr,yrs
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103665,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.369356,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.439399,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.128562,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.113155,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.177124,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.219612,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.436526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.211785,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.334494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
#!pip install pystemmer
stemmer.stemWord('shot')

'shot'

In [12]:
stemmer.stemWord('shots')

'shot'

In [13]:
stemmer.stemWord('shoot')

'shoot'

In [14]:
stemmer.stemWord('shooting')

'shoot'

In [15]:
stemmer.stemWord('oranges')

'orang'

In [89]:
stemmer.stemWord('sing')

'sing'

In [90]:
# spacy lemmatization

## Train/test split

In [11]:
# Set X (features) and y (labels)
# X would be df.DO_NARRATIVE, but it isn't numbers!!!!
# What we just did was use our vectorizer to convert df.DO_NARRATIVE
# into fancy df-idf word counts, and that's what we end up giving our model
X = words_df
y = df.is_aggravated

In [12]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

## Create a classifier

Now we'll use this dataset to build a classifier that can detect whether an assault is aggravated assault or simple assault. What kind should we use?

In [13]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.naive_bayes import MultinomialNB

In [14]:
%%time
# clf = LogisticRegression(C=1e9, solver='lbfgs', max_iter=4000)
# clf = RandomForestClassifier(n_estimators=50)
clf = LinearSVC()
# clf = MultinomialNB()
clf.fit(X_train, y_train)

CPU times: user 2.55 s, sys: 22 ms, total: 2.58 s
Wall time: 2.59 s


LinearSVC()

## Evaluating our classifier

In [15]:
clf.score(X_test, y_test)

0.8959076448471994

In [16]:
from sklearn.metrics import confusion_matrix

In [17]:
y_true = y_test
y_pred = clf.predict(X_test)
matrix = confusion_matrix(y_true, y_pred)

label_names = pd.Series(['not aggravated', 'aggravated'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names)

Unnamed: 0,Predicted not aggravated,Predicted aggravated
Is not aggravated,28386,1168
Is aggravated,3151,8787


In [18]:
y_true = y_test
y_pred = clf.predict(X_test)
matrix = confusion_matrix(y_true, y_pred)

label_names = pd.Series(['not aggravated', 'aggravated'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names).div(matrix.sum(axis=1), axis=0)

Unnamed: 0,Predicted not aggravated,Predicted aggravated
Is not aggravated,0.960479,0.039521
Is aggravated,0.263947,0.736053


## Explaining our classifier

In [19]:
# inside of our model are things called WEIGHTS
# and they're the things that make the decision
import eli5

eli5.show_weights(clf, top=(10, 10), feature_names=vectorizer.get_feature_names())

Weight?,Feature
+4.362,shot
+3.898,stab
+3.810,fire
+3.188,handgun
+2.993,bat
+2.928,shoot
+2.876,knife
+2.695,machet
+2.320,point
+2.279,pistol


## Which assaults get misclassified?

In [20]:
df.head(2)

Unnamed: 0,CCDESC,DO_NARRATIVE,is_aggravated
2,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES,1
4,BATTERY - SIMPLE ASSAULT,DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS,0


In [21]:
df['prediction'] = clf.predict(X)
# This is a number that describes how sure it is
# or how negative it is or positive it is
# or how simple it is vs how aggravated it is

# for aggravated
# df['pred_prob'] = clf.predict_proba(X)[:,1]

# for simple assaults
# df['pred_prob'] = clf.predict_proba(X)[:,0]


# 0 -> negative, 1 -> positive, 0.7 -> mostly positive

# This is only for LinearSVC
# higher number -> more likely aggravated
# lower number -> more likely not aggravated
df['pred_prob'] = clf.decision_function(X)
# negative numbers -> simple assault
# positive numbers -> aggravated assault
# 0 is "i'm kind of not sure"
df.head()

Unnamed: 0,CCDESC,DO_NARRATIVE,is_aggravated,prediction,pred_prob
2,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S APPRCHED V AND STATED ARE YOU GOING TO FCK ME V REPLIED NO SUSP PULL ED OUT A KNIFE AND STATED IM HERE TO HURT YOU BTCH S USED PROFANITIES,1,1,0.133621
4,BATTERY - SIMPLE ASSAULT,DO-SUSP USED RIGHT FIST TO PUNCH VICT IN THE HEAD ONCE N PULL VICT HAIR FOR APPRX 15 SECONDS,0,0,-0.876334
9,BATTERY - SIMPLE ASSAULT,DO-S APPROACHED V IN VEH S SLAPPED AND LUNGGED AT V,0,0,-0.028058
11,BATTERY - SIMPLE ASSAULT,DO-V STATED THAT SUSP CONFRT HER WHEN SHE TRIED TO APPR HER HUSBAND SUSP AND V HUSBAND ARE FRNDS SUSP YELLED STAY AWAY FROM HIM AND PUSHED V,0,0,-1.182245
16,BATTERY - SIMPLE ASSAULT,DO-SUSPS WERE VERBALLY ABUSING VICT DURING WHICH TIME S1 STRUCK VICT THREETIMES ON THE BACK OF HIS LEFT SHOULDER,0,0,-0.941609


In [131]:
# These were potentially upgraded
df[(df.is_aggravated == 1) & (df.prediction == 0)]

Unnamed: 0,CCDESC,DO_NARRATIVE,is_aggravated,prediction,pred_prob
87,CHILD ABUSE (PHYSICAL) - AGGRAVATED ASSAULT,DO-V WAS STRUCK WITH BELT,1,0,-0.221965
187,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-BUSINESS DISPUTE SUPS PUNCHED VICT THREE TIMES IN THE FACE VICT FELL TOTHE GROUND SUPS THEN KICKED VICT IN THE FACE,1,0,-0.478885
349,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S PUNCHED V IN FACE AND STATED I DONT WANT YOU HERE V FELL TO GROUND S STRUCK V WITH TWO BY FOUR WOOD FOUR TO FIVE TIMES IN FACE AND BODY,1,0,-0.273212
499,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-V1 AND V2 WERE AT A FRIEND HOUSE WITH A FEW FRIENDS SUSP IT V2 IN THE FACE WITH PALM OF HIS HAND SUSP TRIED TO PUNCH V1 WITH FIST,1,0,-0.686014
568,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO- S1 STRUCK VICT W UNK HAND CAUSING HIM TO FALL TO THE GROUND WHICH RESULTED IN BROKEN ARM WHILE VICT WAS ON GROUND S2 KICKED HIM IN THE HEAD MULTIPLE,1,0,-0.034504
930,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S AND V INVLD IN ARGUMENT S HIT V ADDITIONAL SUSPS ALSO BEGAN TO HIT V CAUSING INJURY AS V RAN AWAY HE POSSIBLY BROKE HIS ANKLE,1,0,-0.445381
1219,CHILD ABUSE (PHYSICAL) - AGGRAVATED ASSAULT,DO-S1 S2 USED UNK OBJ TO HIT V LEAVING VIS INJ V WENT TO SCHL AND ADV PERSONNEL OF THE INC SCHOOL PHONED PD,1,0,-0.437677
1339,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-SUSP AND VICT INVOLVED IN VERBAL AGRUMENT SUSP PUNCHED VICT IN THE FACE AND VICT LOSES CONSCIOUSNESS VICT,1,0,-0.264528
1359,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",DO-S APPR V AND PUNCHED HIM ONCE IN THE FACE DURING THE,1,0,-0.864081
1493,CHILD ABUSE (PHYSICAL) - AGGRAVATED ASSAULT,DO- S BECAME ANGRY AT V AND GRABBED V LEAVING A VISIBLE MARK,1,0,-1.172484


In [132]:
# These were potentially downgraded
df[(df.is_aggravated == 0) & (df.prediction == 1)].sort_values(by='pred_prob', ascending=False)

Unnamed: 0,CCDESC,DO_NARRATIVE,is_aggravated,prediction,pred_prob
509809,BATTERY - SIMPLE ASSAULT,DO-S ATT TO STAB V WITH A KNIFE,0,1,3.392492
549646,INTIMATE PARTNER - SIMPLE ASSAULT,DO-SUSP STABBED VICT WITH GLASS BOTTLE,0,1,3.266768
57764,INTIMATE PARTNER - SIMPLE ASSAULT,DO-SUSP STABBED V,0,1,3.038757
9099,INTIMATE PARTNER - SIMPLE ASSAULT,DO-S AND V ENGAGED IN A VERBAL ALTERCATION S FIRED 4 ROUNDS INTO VICTS VEHS WALKED ONTO SIDEWALK AND FIRED 2 ADDTL ROUNDS TOWARDS VICTS RESID S FLED,0,1,2.950887
531444,BATTERY - SIMPLE ASSAULT,DO-S SHOT TWO ROUND AT VICT CAR S THEN SHOT AT THE VICT TORSO S FLED W B,0,1,2.942379
228905,INTIMATE PARTNER - SIMPLE ASSAULT,DO- V AND S WERE INVOLV IN VERBL ARGUMENT S THEN HIT V ON THE HEAD WITH A PAN AND ATT TO STAB V WITH KNIFE S CUT V WITH KNIFE,0,1,2.899513
671229,BATTERY - SIMPLE ASSAULT,DO-SUSP APPROACHED VICTS VEH AND FIRED TWO ROUNDS AT VICTS,0,1,2.822635
131514,INTIMATE PARTNER - SIMPLE ASSAULT,DO-SUSP STABBED VICT,0,1,2.791435
695489,INTIMATE PARTNER - SIMPLE ASSAULT,DO-SUSP STABBED VICT,0,1,2.791435
789560,INTIMATE PARTNER - SIMPLE ASSAULT,DO-S APP V WITH KITCHEN KNIFE S ATT TO STAB V S CUT V HANDS AS V BLOCKEDKNIFE,0,1,2.706461


In [133]:
df[(df.is_aggravated == 0) & (df.prediction == 1)].shape

(4707, 5)

In [134]:
df[df.is_aggravated == 0].shape

(118744, 5)

In [135]:
# inside of our model are things called WEIGHTS
# and they're the things that make the decision
# neworleans
# new orleans -> neworleans
import eli5

eli5.show_weights(clf, top=(10, 10), feature_names=vectorizer.get_feature_names())

Weight?,Feature
+4.726,shot
+4.141,stab
+4.001,fire
+3.446,handgun
+3.164,bat
+2.992,multp
+2.951,knife
+2.904,round
+2.754,shoot
+2.688,crowbar
