## Checkpoint-5: Natural Language Processing

We start by importing all the libraries we need

In [1]:
import nltk
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize 
nltk.download('punkt')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import time, os

import numpy as np # linear algebra
import pandas as pd # data processing
# pd.set_option('display.max_colwidth', -1) # display the entire contents of each cell

import seaborn as sns # visualization
import matplotlib.pyplot as plt
%matplotlib inline
import re
import string
# from sklearn.cluster import KMeans
# from sklearn.feature_extraction.text import TfidfTransformer

import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!




We are using VADER which comes with a Sentiment Intensity Analyzer for sentiment analysis, which is an NLTK module that provides sentiment scores based on words used.

In [2]:
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

Reading the data(obtained using the following SQL query on DataGrip):


```
SELECT data_officerallegation.officer_id,
       data_allegation.summary,
       data_allegation.incident_date::TIMESTAMP::DATE as incident_date,
       data_officer.appointed_date,
       data_allegation.incident_date::TIMESTAMP::DATE-data_officer.appointed_date as time_of_employment
FROM data_allegation
    left join data_officerallegation on data_officerallegation.allegation_id = data_allegation.crid
    left join data_officer on data_officerallegation.officer_id = data_officer.id
WHERE data_allegation.summary != ''
  AND officer_id is not NULL
  AND incident_date is not NULL AND data_officerallegation.officer_id = data_officer.id;
```




In [3]:
df = pd.read_csv('NLP_csv.csv')
df = pd.DataFrame(df)

In [4]:
df

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment
0,16061,Four on?duty CPD officers ordered a crowd to d...,2014-08-25,2004-11-29,3556
1,25864,"On 06 August 2009, a complaint was registered ...",2009-08-04,2000-12-18,3151
2,23361,In an incident involving an off-duty CPD offic...,2009-11-25,2001-08-27,3012
3,21210,"On June 5, 2011, a complaint was registered wi...",2011-06-05,2007-07-30,1406
4,16338,"On January 4, 2007, a complaint was registered...",2006-01-25,2003-04-28,1003
...,...,...,...,...,...
1922,10317,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719
1923,27962,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719
1924,19785,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453
1925,26373,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453


In [5]:
df.columns

Index(['officer_id', 'summary', 'incident_date', 'appointed_date',
       'time_of_employment'],
      dtype='object')

Checking for nulls and duplicates and removing if any exist:

In [6]:
df.isnull().sum() 

officer_id            0
summary               0
incident_date         0
appointed_date        0
time_of_employment    0
dtype: int64

In [7]:
duplicateRowsDF = df[df.duplicated()]
duplicateRowsDF, duplicateRowsDF.shape

(Empty DataFrame
 Columns: [officer_id, summary, incident_date, appointed_date, time_of_employment]
 Index: [], (0, 5))

In [8]:
df = df[df['officer_id'].notna()]
df.shape

(1927, 5)

In [9]:
df['summary']

0       Four on?duty CPD officers ordered a crowd to d...
1       On 06 August 2009, a complaint was registered ...
2       In an incident involving an off-duty CPD offic...
3       On June 5, 2011, a complaint was registered wi...
4       On January 4, 2007, a complaint was registered...
                              ...                        
1922    On December 23, 2006, a complaint was register...
1923    On December 23, 2006, a complaint was register...
1924    On December 23, 2006, a complaint was register...
1925    On December 23, 2006, a complaint was register...
1926    On December 23, 2006, a complaint was register...
Name: summary, Length: 1927, dtype: object

In [10]:
df['summary'][0]

'Four on?duty CPD officers ordered a crowd to disperse from a City sidewalk. One of the individuals, refused that order and subsequent orders to disperse. In response, the officers approached - who then took a fighting posture. The officers then arrested -.'

In [11]:
df['summary'][1]

'On 06 August 2009, a complaint was registered with the Independent Police Review Authority (IPRA) regarding an incident that occurred in the 16th District, on August 4, 2009. It was alleged that a Chicago Police Department officer improperly handcuffed the complainant; handcuffed the complainant without justification; searched the residence of the complainant without consent and; threatened to arrest complainant in the event that the complainant should register a complaint against him. IPRA recommended to SUSTAIN the following allegations of violations committed by the accused: improperly handcuffed the complainant; handcuffed the complainant without justification and; searched the residence of the complainant without consent against the accused member based on corroborating witness statements, the officers statement, and reports. IPRA recommended a five (5) day suspension for the accused member.'

Text pre-processing: 
1. Removing digits and uncommon characters
2. Keeping sentences with at least 2 words

In [12]:
def text_to_word_list(text):
    text = str(text)

    # Clean the text
    text = re.sub(r"[^A-Za-z^,!?.\/'+]", " ", text)
    text = re.sub(r"\s{2,}", " ", text)
    return text  


In [13]:
df['cleaned_summary'] = df['summary'].apply(lambda x: text_to_word_list(x))


In [14]:
df['cleaned_summary'][0]

'Four on?duty CPD officers ordered a crowd to disperse from a City sidewalk. One of the individuals, refused that order and subsequent orders to disperse. In response, the officers approached who then took a fighting posture. The officers then arrested .'

Tokenize to get a list of words and sentences to be able to compute lexical diversity (a measure of how many different words appear in a text ) of the vocabulary

In [15]:
words, sents = [], []
for text in df['cleaned_summary']:
        #sentence tokenization
        sents.append(sent_tokenize(text))
        #word tokenization
        words.append(word_tokenize(text))

In [16]:
sents[0]

['Four on?duty CPD officers ordered a crowd to disperse from a City sidewalk.',
 'One of the individuals, refused that order and subsequent orders to disperse.',
 'In response, the officers approached who then took a fighting posture.',
 'The officers then arrested .']

In [17]:
length_sents = [len(sublist) for sublist in sents]

In [18]:
avg = sum(length_sents)/len(length_sents)
avg

7.0435910742086145

In [19]:
word_list = [item for sublist in words for item in sublist]
#number of words in the text
text_words = len(word_list)
text_words

463250

In [20]:
unique_list = list(set(word_list)) 
#number of words in the vocabulary
vocab_words = len(unique_list) 
vocab_words

5285

In [21]:
lexical_diversity = vocab_words/text_words
lexical_diversity

0.011408526713437669

In [22]:
sid = SentimentIntensityAnalyzer()

Function sid.polarity_scores returns 4 elements :

neg : negative sentiment score.

neu : neutral sentiment score.

pos : positive sentiment score

compound : computed by normalising the scores above.

Compound value around zero signifies neutral sentiments.

In [23]:
df['scores'] = df['cleaned_summary'].apply(lambda summary: sid.polarity_scores(summary))
 
#We break the dict generated above and pull only column 'compound'

df['compound']  = df['scores'].apply(lambda s : s['compound'])
 
#the step above returns values from -1 to 1.  
df['positive_or_negative'] = df['compound'].apply(lambda n : 1 if n >=0 else 0)
 
df.head()


Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative
0,16061,Four on?duty CPD officers ordered a crowd to d...,2014-08-25,2004-11-29,3556,Four on?duty CPD officers ordered a crowd to d...,"{'neg': 0.187, 'neu': 0.813, 'pos': 0.0, 'comp...",-0.7783,0
1,25864,"On 06 August 2009, a complaint was registered ...",2009-08-04,2000-12-18,3151,"On August , a complaint was registered with th...","{'neg': 0.242, 'neu': 0.69, 'pos': 0.068, 'com...",-0.9535,0
2,23361,In an incident involving an off-duty CPD offic...,2009-11-25,2001-08-27,3012,In an incident involving an off duty CPD offic...,"{'neg': 0.161, 'neu': 0.823, 'pos': 0.016, 'co...",-0.9833,0
3,21210,"On June 5, 2011, a complaint was registered wi...",2011-06-05,2007-07-30,1406,"On June , , a complaint was registered with th...","{'neg': 0.162, 'neu': 0.758, 'pos': 0.08, 'com...",-0.7351,0
4,16338,"On January 4, 2007, a complaint was registered...",2006-01-25,2003-04-28,1003,"On January , , a complaint was registered with...","{'neg': 0.213, 'neu': 0.739, 'pos': 0.048, 'co...",-0.9874,0


In [24]:
max(df['compound'])

0.9935

In [25]:
min(df['compound'])

-0.9999

In [76]:
rookie_df = df[df['time_of_employment'] <= 365]

In [27]:
rookie_df

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative
62,430,Subject 1 was pulled over for driving a vehicl...,2013-08-26,2012-10-05,325,Subject was pulled over for driving a vehicle ...,"{'neg': 0.037, 'neu': 0.963, 'pos': 0.0, 'comp...",-0.466,0
89,32658,"On July 23, 2017, at approximately 10:00 PM, S...",2017-07-23,2017-07-17,6,"On July , , at approximately PM, Subject and h...","{'neg': 0.048, 'neu': 0.917, 'pos': 0.035, 'co...",-0.6525,0
150,4486,"On June 18, 2017, several concerned citizens p...",2017-06-18,2016-06-27,356,"On June , , several concerned citizens placed ...","{'neg': 0.073, 'neu': 0.916, 'pos': 0.011, 'co...",-0.9268,0
163,9614,Body worn camera and Police Observation Device...,2016-11-01,2016-08-29,64,Body worn camera and Police Observation Device...,"{'neg': 0.261, 'neu': 0.739, 'pos': 0.0, 'comp...",-0.5574,0
291,33034,"Complainant, Subject 1, alleged he was unlawfu...",2018-04-16,2017-07-17,273,"Complainant, Subject , alleged he was unlawful...","{'neg': 0.296, 'neu': 0.704, 'pos': 0.0, 'comp...",-0.8442,0
327,15903,"On 18 April 2007, a complaint was registered w...",2007-04-18,2006-08-28,233,"On April , a complaint was registered with the...","{'neg': 0.169, 'neu': 0.797, 'pos': 0.034, 'co...",-0.9954,0
383,4255,"On 25 September 2007, a complaint was register...",2007-09-24,2006-11-27,301,"On September , a complaint was registered with...","{'neg': 0.125, 'neu': 0.829, 'pos': 0.046, 'co...",-0.9403,0
406,6538,"On 30 December 2007, a complaint was registere...",2007-12-30,2007-06-04,209,"On December , a complaint was registered with ...","{'neg': 0.058, 'neu': 0.88, 'pos': 0.062, 'com...",-0.1027,0
437,28136,"On 30 March 2008, a complaint was registered w...",2008-03-30,2007-04-02,363,"On March , a complaint was registered with the...","{'neg': 0.147, 'neu': 0.791, 'pos': 0.062, 'co...",-0.9501,0
901,3694,"On April 5th, 2011, a complaint was registered...",2011-04-03,2010-04-16,352,"On April th, , a complaint was registered with...","{'neg': 0.205, 'neu': 0.737, 'pos': 0.057, 'co...",-0.9919,0


In [28]:
rookie_df.shape

(24, 9)

In [29]:
nonrookie_df = df[df['time_of_employment']>365]

In [30]:
nonrookie_df

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative
0,16061,Four on?duty CPD officers ordered a crowd to d...,2014-08-25,2004-11-29,3556,Four on?duty CPD officers ordered a crowd to d...,"{'neg': 0.187, 'neu': 0.813, 'pos': 0.0, 'comp...",-0.7783,0
1,25864,"On 06 August 2009, a complaint was registered ...",2009-08-04,2000-12-18,3151,"On August , a complaint was registered with th...","{'neg': 0.242, 'neu': 0.69, 'pos': 0.068, 'com...",-0.9535,0
2,23361,In an incident involving an off-duty CPD offic...,2009-11-25,2001-08-27,3012,In an incident involving an off duty CPD offic...,"{'neg': 0.161, 'neu': 0.823, 'pos': 0.016, 'co...",-0.9833,0
3,21210,"On June 5, 2011, a complaint was registered wi...",2011-06-05,2007-07-30,1406,"On June , , a complaint was registered with th...","{'neg': 0.162, 'neu': 0.758, 'pos': 0.08, 'com...",-0.7351,0
4,16338,"On January 4, 2007, a complaint was registered...",2006-01-25,2003-04-28,1003,"On January , , a complaint was registered with...","{'neg': 0.213, 'neu': 0.739, 'pos': 0.048, 'co...",-0.9874,0
...,...,...,...,...,...,...,...,...,...
1921,8382,"On December 23, 2006, a complaint was register...",2006-12-23,2004-11-29,754,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0
1922,10317,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0
1923,27962,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0
1924,19785,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0


In [31]:
nonrookie_df.shape

(1903, 9)

In [32]:
df

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative
0,16061,Four on?duty CPD officers ordered a crowd to d...,2014-08-25,2004-11-29,3556,Four on?duty CPD officers ordered a crowd to d...,"{'neg': 0.187, 'neu': 0.813, 'pos': 0.0, 'comp...",-0.7783,0
1,25864,"On 06 August 2009, a complaint was registered ...",2009-08-04,2000-12-18,3151,"On August , a complaint was registered with th...","{'neg': 0.242, 'neu': 0.69, 'pos': 0.068, 'com...",-0.9535,0
2,23361,In an incident involving an off-duty CPD offic...,2009-11-25,2001-08-27,3012,In an incident involving an off duty CPD offic...,"{'neg': 0.161, 'neu': 0.823, 'pos': 0.016, 'co...",-0.9833,0
3,21210,"On June 5, 2011, a complaint was registered wi...",2011-06-05,2007-07-30,1406,"On June , , a complaint was registered with th...","{'neg': 0.162, 'neu': 0.758, 'pos': 0.08, 'com...",-0.7351,0
4,16338,"On January 4, 2007, a complaint was registered...",2006-01-25,2003-04-28,1003,"On January , , a complaint was registered with...","{'neg': 0.213, 'neu': 0.739, 'pos': 0.048, 'co...",-0.9874,0
...,...,...,...,...,...,...,...,...,...
1922,10317,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0
1923,27962,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0
1924,19785,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0
1925,26373,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0


In [33]:
min(df['time_of_employment'])

-3428

In [34]:
df[df['time_of_employment']<0]

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative
1710,30806,In an incident involving an on-duty CPD Probat...,2004-12-08,2014-04-28,-3428,In an incident involving an on duty CPD Probat...,"{'neg': 0.085, 'neu': 0.848, 'pos': 0.067, 'co...",-0.128,0


In [35]:
df = df[df['time_of_employment'] > 0]

In [36]:
df = df.reset_index(drop = True)

In [37]:
df_bins = pd.qcut(df['time_of_employment'], 4, labels=None, retbins=False, precision=3, duplicates='raise')

In [38]:
df['bins'] = df_bins

In [39]:
df_bins

0       (2047.75, 3918.0]
1       (2047.75, 3918.0]
2       (2047.75, 3918.0]
3        (5.999, 2047.75]
4        (5.999, 2047.75]
              ...        
1921     (5.999, 2047.75]
1922     (5.999, 2047.75]
1923     (5.999, 2047.75]
1924     (5.999, 2047.75]
1925     (5.999, 2047.75]
Name: time_of_employment, Length: 1926, dtype: category
Categories (4, interval[float64]): [(5.999, 2047.75] < (2047.75, 3918.0] < (3918.0, 6032.5] <
                                    (6032.5, 14423.0]]

In [40]:
df

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative,bins
0,16061,Four on?duty CPD officers ordered a crowd to d...,2014-08-25,2004-11-29,3556,Four on?duty CPD officers ordered a crowd to d...,"{'neg': 0.187, 'neu': 0.813, 'pos': 0.0, 'comp...",-0.7783,0,"(2047.75, 3918.0]"
1,25864,"On 06 August 2009, a complaint was registered ...",2009-08-04,2000-12-18,3151,"On August , a complaint was registered with th...","{'neg': 0.242, 'neu': 0.69, 'pos': 0.068, 'com...",-0.9535,0,"(2047.75, 3918.0]"
2,23361,In an incident involving an off-duty CPD offic...,2009-11-25,2001-08-27,3012,In an incident involving an off duty CPD offic...,"{'neg': 0.161, 'neu': 0.823, 'pos': 0.016, 'co...",-0.9833,0,"(2047.75, 3918.0]"
3,21210,"On June 5, 2011, a complaint was registered wi...",2011-06-05,2007-07-30,1406,"On June , , a complaint was registered with th...","{'neg': 0.162, 'neu': 0.758, 'pos': 0.08, 'com...",-0.7351,0,"(5.999, 2047.75]"
4,16338,"On January 4, 2007, a complaint was registered...",2006-01-25,2003-04-28,1003,"On January , , a complaint was registered with...","{'neg': 0.213, 'neu': 0.739, 'pos': 0.048, 'co...",-0.9874,0,"(5.999, 2047.75]"
...,...,...,...,...,...,...,...,...,...,...
1921,10317,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]"
1922,27962,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]"
1923,19785,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]"
1924,26373,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]"


In [41]:
df['bin_code'] = df['bins'].cat.codes

In [42]:
df

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative,bins,bin_code
0,16061,Four on?duty CPD officers ordered a crowd to d...,2014-08-25,2004-11-29,3556,Four on?duty CPD officers ordered a crowd to d...,"{'neg': 0.187, 'neu': 0.813, 'pos': 0.0, 'comp...",-0.7783,0,"(2047.75, 3918.0]",1
1,25864,"On 06 August 2009, a complaint was registered ...",2009-08-04,2000-12-18,3151,"On August , a complaint was registered with th...","{'neg': 0.242, 'neu': 0.69, 'pos': 0.068, 'com...",-0.9535,0,"(2047.75, 3918.0]",1
2,23361,In an incident involving an off-duty CPD offic...,2009-11-25,2001-08-27,3012,In an incident involving an off duty CPD offic...,"{'neg': 0.161, 'neu': 0.823, 'pos': 0.016, 'co...",-0.9833,0,"(2047.75, 3918.0]",1
3,21210,"On June 5, 2011, a complaint was registered wi...",2011-06-05,2007-07-30,1406,"On June , , a complaint was registered with th...","{'neg': 0.162, 'neu': 0.758, 'pos': 0.08, 'com...",-0.7351,0,"(5.999, 2047.75]",0
4,16338,"On January 4, 2007, a complaint was registered...",2006-01-25,2003-04-28,1003,"On January , , a complaint was registered with...","{'neg': 0.213, 'neu': 0.739, 'pos': 0.048, 'co...",-0.9874,0,"(5.999, 2047.75]",0
...,...,...,...,...,...,...,...,...,...,...,...
1921,10317,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0
1922,27962,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0
1923,19785,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0
1924,26373,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0


In [62]:
bin_group = df.groupby(df['bin_code'])

In [63]:
# Time of employment in the range of (5.999, 2047.75) days ~ up to 5 years 7 months
bin_group.get_group(0)

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative,bins,bin_code
3,21210,"On June 5, 2011, a complaint was registered wi...",2011-06-05,2007-07-30,1406,"On June , , a complaint was registered with th...","{'neg': 0.162, 'neu': 0.758, 'pos': 0.08, 'com...",-0.7351,0,"(5.999, 2047.75]",0
4,16338,"On January 4, 2007, a complaint was registered...",2006-01-25,2003-04-28,1003,"On January , , a complaint was registered with...","{'neg': 0.213, 'neu': 0.739, 'pos': 0.048, 'co...",-0.9874,0,"(5.999, 2047.75]",0
8,8690,"On July 28, 2016, at approximately 10:30 pm, a...",2016-07-28,2014-10-27,640,"On July , , at approximately pm, at S. Eberhar...","{'neg': 0.086, 'neu': 0.86, 'pos': 0.054, 'com...",-0.6440,0,"(5.999, 2047.75]",0
14,4978,"On July 26, 2106, was loudly arguing with his ...",2016-07-26,2013-05-01,1182,"On July , , was loudly arguing with his girlfr...","{'neg': 0.14, 'neu': 0.767, 'pos': 0.093, 'com...",-0.9846,0,"(5.999, 2047.75]",0
16,31298,"On July 28, 2016,_was arrested for Possession ...",2016-07-28,2011-10-17,1746,"On July , , was arrested for Possession of a C...","{'neg': 0.147, 'neu': 0.809, 'pos': 0.044, 'co...",-0.9805,0,"(5.999, 2047.75]",0
...,...,...,...,...,...,...,...,...,...,...,...
1921,10317,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0
1922,27962,"On December 23, 2006, a complaint was register...",2006-12-23,2005-01-03,719,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0
1923,19785,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0
1924,26373,"On December 23, 2006, a complaint was register...",2006-12-23,2005-09-26,453,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(5.999, 2047.75]",0


In [64]:
# Time of employment in the range of (2047.75, 3918.0) days ~ between 5 years 7 months and 10 years 9 months
bin_group.get_group(1)

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative,bins,bin_code
0,16061,Four on?duty CPD officers ordered a crowd to d...,2014-08-25,2004-11-29,3556,Four on?duty CPD officers ordered a crowd to d...,"{'neg': 0.187, 'neu': 0.813, 'pos': 0.0, 'comp...",-0.7783,0,"(2047.75, 3918.0]",1
1,25864,"On 06 August 2009, a complaint was registered ...",2009-08-04,2000-12-18,3151,"On August , a complaint was registered with th...","{'neg': 0.242, 'neu': 0.69, 'pos': 0.068, 'com...",-0.9535,0,"(2047.75, 3918.0]",1
2,23361,In an incident involving an off-duty CPD offic...,2009-11-25,2001-08-27,3012,In an incident involving an off duty CPD offic...,"{'neg': 0.161, 'neu': 0.823, 'pos': 0.016, 'co...",-0.9833,0,"(2047.75, 3918.0]",1
9,30599,"On July 28, 2016, at approximately 10:30 pm, a...",2016-07-28,2006-11-27,3531,"On July , , at approximately pm, at S. Eberhar...","{'neg': 0.086, 'neu': 0.86, 'pos': 0.054, 'com...",-0.6440,0,"(2047.75, 3918.0]",1
19,10273,"On November 12, 2016, Of?cer (?Of?cer and Offi...",2016-11-12,2007-07-30,3393,"On November , , Of?cer ?Of?cer and Officer ?Of...","{'neg': 0.06, 'neu': 0.92, 'pos': 0.019, 'comp...",-0.9197,0,"(2047.75, 3918.0]",1
...,...,...,...,...,...,...,...,...,...,...,...
1897,23566,"On 9 April 2007, a complaint was registered wi...",2007-04-09,1998-03-16,3311,"On April , a complaint was registered with the...","{'neg': 0.164, 'neu': 0.794, 'pos': 0.043, 'co...",-0.9524,0,"(2047.75, 3918.0]",1
1903,31242,"On 05 April 2007, a complaint was registered w...",2007-04-05,1998-06-08,3223,"On April , a complaint was registered with the...","{'neg': 0.201, 'neu': 0.744, 'pos': 0.055, 'co...",-0.9965,0,"(2047.75, 3918.0]",1
1907,2022,_an from the accused officers after they saw h...,2016-06-19,2010-09-01,2118,an from the accused officers after they saw h...,"{'neg': 0.078, 'neu': 0.884, 'pos': 0.038, 'co...",-0.7569,0,"(2047.75, 3918.0]",1
1908,8059,"On December 23, 2006, a complaint was register...",2006-12-23,2001-03-26,2098,"On December , , a complaint was registered wit...","{'neg': 0.216, 'neu': 0.74, 'pos': 0.044, 'com...",-0.9994,0,"(2047.75, 3918.0]",1


In [65]:
# Time of employment in the range of (3918.0, 6032.5) days ~ between 10 years 9 months and 16 years 6 months
bin_group.get_group(2)

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative,bins,bin_code
5,27176,"On April 3, 2016, Officers and were on routine...",2016-04-03,2003-04-28,4724,"On April , , Officers and were on routine patr...","{'neg': 0.151, 'neu': 0.826, 'pos': 0.023, 'co...",-0.9626,0,"(3918.0, 6032.5]",2
6,14739,was involved in a traffic collision with Based...,2015-05-31,2000-09-11,5375,was involved in a traffic collision with Based...,"{'neg': 0.038, 'neu': 0.905, 'pos': 0.058, 'co...",0.5147,1,"(3918.0, 6032.5]",2
7,26865,"On April 3, 2016, Officers and were on routine...",2016-04-03,2002-09-30,4934,"On April , , Officers and were on routine patr...","{'neg': 0.151, 'neu': 0.826, 'pos': 0.023, 'co...",-0.9626,0,"(3918.0, 6032.5]",2
11,21479,"On July 26, 2016, ?called the Independent Poli...",2015-05-14,2000-01-24,5589,"On July , , ?called the Independent Police Rev...","{'neg': 0.0, 'neu': 0.924, 'pos': 0.076, 'comp...",0.2640,1,"(3918.0, 6032.5]",2
12,20651,"On August 22, 2016, at approximately 1:10 am, ...",2016-08-22,2004-09-27,4347,"On August , , at approximately am, members of ...","{'neg': 0.11, 'neu': 0.831, 'pos': 0.059, 'com...",-0.9755,0,"(3918.0, 6032.5]",2
...,...,...,...,...,...,...,...,...,...,...,...
1893,22838,"On March 30, 2007, a complaint was registered ...",2007-03-02,1992-09-28,5268,"On March , , a complaint was registered with t...","{'neg': 0.18, 'neu': 0.769, 'pos': 0.052, 'com...",-0.9986,0,"(3918.0, 6032.5]",2
1899,32330,"On May 31st, 2011, a complaint was registered ...",2011-05-29,1999-05-10,4402,"On May st, , a complaint was registered with t...","{'neg': 0.302, 'neu': 0.662, 'pos': 0.037, 'co...",-0.9986,0,"(3918.0, 6032.5]",2
1902,28821,"On 05 April 2007, a complaint was registered w...",2007-04-05,1993-11-22,4882,"On April , a complaint was registered with the...","{'neg': 0.201, 'neu': 0.744, 'pos': 0.055, 'co...",-0.9965,0,"(3918.0, 6032.5]",2
1904,4599,"On 20 September 2007, a complaint was register...",2007-09-20,1994-12-05,4672,"On September , a complaint was registered with...","{'neg': 0.145, 'neu': 0.774, 'pos': 0.081, 'co...",-0.8636,0,"(3918.0, 6032.5]",2


In [68]:
# Time of employment in the range of (6032.5, 14423.0) days ~ between 16 years 6 months and 39 years 5 months
bin_group.get_group(3)

Unnamed: 0,officer_id,summary,incident_date,appointed_date,time_of_employment,cleaned_summary,scores,compound,positive_or_negative,bins,bin_code
10,25442,"On July 9, 2016, Subject 1 held a birthday par...",2016-07-09,1999-10-04,6123,"On July , , Subject held a birthday party at h...","{'neg': 0.088, 'neu': 0.873, 'pos': 0.039, 'co...",-0.6486,0,"(6032.5, 14423.0]",3
18,31631,In?Car Camera footage depicts Accused CPD Serg...,2016-10-27,1988-12-05,10188,In?Car Camera footage depicts Accused CPD Serg...,"{'neg': 0.2, 'neu': 0.8, 'pos': 0.0, 'compound...",-0.6124,0,"(6032.5, 14423.0]",3
27,19485,"On 20 September 2016, at approximately 12:15 P...",2016-09-20,1998-10-13,6552,"On September , at approximately P.M., Subject ...","{'neg': 0.075, 'neu': 0.925, 'pos': 0.0, 'comp...",-0.6553,0,"(6032.5, 14423.0]",3
28,29236,"On September 20, 2016, at approximately 8:01 p...",2016-09-20,1998-02-17,6790,"On September , , at approximately pm, in the v...","{'neg': 0.112, 'neu': 0.826, 'pos': 0.062, 'co...",-0.3400,0,"(6032.5, 14423.0]",3
33,3383,"On November 27, 2016, at approximately 10:00 a...",2016-11-27,1994-05-02,8245,"On November , , at approximately am, Officer A...","{'neg': 0.144, 'neu': 0.821, 'pos': 0.035, 'co...",-0.9517,0,"(6032.5, 14423.0]",3
...,...,...,...,...,...,...,...,...,...,...,...
1873,20418,"On 25 February 2007, a complaint was registere...",2007-02-21,1982-02-01,9151,"On February , a complaint was registered with ...","{'neg': 0.276, 'neu': 0.688, 'pos': 0.036, 'co...",-0.9837,0,"(6032.5, 14423.0]",3
1894,7208,"On March 30, 2007, a complaint was registered ...",2007-03-02,1986-11-17,7410,"On March , , a complaint was registered with t...","{'neg': 0.18, 'neu': 0.769, 'pos': 0.052, 'com...",-0.9986,0,"(6032.5, 14423.0]",3
1895,25440,"On March 30, 2007, a complaint was registered ...",2007-03-02,1976-10-18,11092,"On March , , a complaint was registered with t...","{'neg': 0.18, 'neu': 0.769, 'pos': 0.052, 'com...",-0.9986,0,"(6032.5, 14423.0]",3
1900,29575,"On September 13, 2011, a complaint was registe...",2011-09-13,1986-10-13,9101,"On September , , a complaint was registered wi...","{'neg': 0.191, 'neu': 0.758, 'pos': 0.051, 'co...",-0.9816,0,"(6032.5, 14423.0]",3


In [66]:
bin_group.compound.count()

bin_code
0    482
1    481
2    481
3    482
Name: compound, dtype: int64

In [67]:
bin_group.compound.mean()

bin_code
0   -0.763613
1   -0.800647
2   -0.781778
3   -0.737321
Name: compound, dtype: float64

In [69]:
bin_group.compound.median()

bin_code
0   -0.95205
1   -0.95490
2   -0.93200
3   -0.89410
Name: compound, dtype: float64

In [70]:
bin_group.compound.std()

bin_code
0    0.425798
1    0.358609
2    0.362523
3    0.401559
Name: compound, dtype: float64