## Crime Rate Prediction based on Lunar Cycle

#### Purpose:
To predict the crime rate for days closer to the Full Moon Days for any given year.

#### Assumption:
1. The thesis that proves the existance of relationship between Lunar Cycle and Criminal behavior of human beings is true.
2. The Lunar Day information obtained from the website https://www.timeanddate.com is accurate.
3. The Census information obtained from the website https://www.google.com/publicdata/explore?ds=kf7tgg1uo9ude is accurate.

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
import warnings;
warnings.simplefilter('ignore')

In [3]:
import pandas as pd
import numpy as np
import datetime as dt

In [4]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Do The following steps for scraping Lunar Cycle Data from www.timeanddate.com <br/><br/>
Step#1: Open Anaconda Command Prompt<br/><br/>
Step#2: Run the below command<br/>
<p>
*        scrapy runspider C:\Users\dhars\Documents\GL_ML\Capstone\Crime_Rate_Prediction_And_Lunar_Cycle\Code\Web_Scrapping\Lunar_Cycle_WebScrapper_v3.py -o output.json</p><br/>

Step#3: When the code prompts you for user input, provide the name of the city and country along with the years between which you need the lunar data.<br/>
<p>
*        Eg. City: Atlanta<br/>
*        Country: USA<br/>
*        Starting Year: 2009<br/>
*        Ending Year: 2018
</p><br/>

Step#4: The file Lunar_Cycle_2009_2018.csv will be created in the below location<br/>
*     "C:\Users\dhars\Documents\GL_ML\Capstone\Crime_Rate_Prediction_And_Lunar_Cycle\Code\Web_Scrapping"<br/>

In [5]:
Lunar_Cycle = pd.read_csv('../Code/Web_Scrapping/Lunar_Cycle_2009_2018.csv')

In [6]:
np.sort(Lunar_Cycle.Full_Moon.unique())

array(['01-Aug', '01-Jan', '01-Jul', '01-Mar', '02-Dec', '02-Jun',
       '02-Nov', '03-Dec', '03-Feb', '03-Jul', '03-May', '04-Apr',
       '04-Jan', '04-Jun', '04-Nov', '04-Oct', '04-Sep', '05-Aug',
       '05-Mar', '05-May', '05-Oct', '06-Apr', '06-Dec', '06-Nov',
       '06-Sep', '07-Aug', '07-Feb', '07-Jul', '07-Jun', '08-Mar',
       '08-Oct', '08-Sep', '09-Apr', '09-Feb', '09-Jan', '09-Jul',
       '09-Jun', '09-May', '10-Aug', '10-Dec', '10-Feb', '10-Jan',
       '10-Mar', '10-May', '10-Nov', '11-Apr', '11-Oct', '12-Dec',
       '12-Jan', '12-Jul', '12-Mar', '12-Sep', '13-Aug', '13-Dec',
       '13-Jun', '13-Nov', '14-Feb', '14-May', '14-Nov', '14-Oct',
       '15-Apr', '15-Jan', '15-Jul', '15-Jun', '15-Sep', '16-Aug',
       '16-Mar', '16-Oct', '16-Sep', '17-Apr', '17-Dec', '17-May',
       '17-Nov', '18-Aug', '18-Feb', '18-Jul', '18-Jun', '18-Oct',
       '19-Jan', '19-Jul', '19-Mar', '19-May', '19-Sep', '20-Apr',
       '20-Aug', '20-Feb', '20-Jun', '21-Dec', '21-Mar', '21-M

In [7]:
Lunar_Cycle.drop(index=Lunar_Cycle[Lunar_Cycle.Full_Moon=='\xa0'].index,inplace=True)
Lunar_Cycle.Full_Moon=Lunar_Cycle.Full_Moon.str.cat(Lunar_Cycle.Year.astype('str'),'-')
Lunar_Cycle.Full_Moon=pd.to_datetime(Lunar_Cycle.Full_Moon)
Lunar_Cycle.drop(columns=['SL','Lunation'],inplace=True)
Lunar_Cycle.head()

Unnamed: 0,Year,Full_Moon
0,2008,2008-01-22
1,2008,2008-02-20
2,2008,2008-03-21
3,2008,2008-04-20
4,2008,2008-05-19


In [8]:
Calendar = pd.DataFrame(columns=['Year','Date'])
Calendar.Date = pd.date_range(start='1/1/2009', end='31/12/2018')
Calendar.Year = Calendar.Date.dt.year
Calendar['Lunar_Day']=0
Calendar = Calendar.merge(Lunar_Cycle,how='left',left_on=Calendar['Date'],right_on=Lunar_Cycle.Full_Moon)
Calendar['Year'] = Calendar['Year_x']
Calendar.drop(columns=['key_0','Year_x','Year_y'],inplace=True)
Calendar.Full_Moon = pd.to_datetime(Calendar.Full_Moon)
Calendar.loc[Calendar[Calendar.Full_Moon.notna()].index,'Lunar_Day'] = 1
Calendar.head()

Unnamed: 0,Date,Lunar_Day,Full_Moon,Year
0,2009-01-01,0,NaT,2009
1,2009-01-02,0,NaT,2009
2,2009-01-03,0,NaT,2009
3,2009-01-04,0,NaT,2009
4,2009-01-05,0,NaT,2009


In [9]:
for i in range(Calendar.Year.count()-1):
    if ((Calendar.loc[i,'Lunar_Day'] != 0)&(Calendar.loc[i+1,'Lunar_Day'] != 1)):
        Calendar.loc[i+1,'Lunar_Day'] = Calendar.loc[i,'Lunar_Day'] + 1

In [10]:
prev_lunar = Lunar_Cycle[Lunar_Cycle.Year == Calendar[Calendar.Lunar_Day == 0].Year.unique()[0]-1].Full_Moon.dt.date.values[-1]
Calendar.loc[Calendar[Calendar.Lunar_Day == 0].index,'Lunar_Day'] = (Calendar[Calendar.Lunar_Day == 0].Date.dt.date - prev_lunar).dt.days
Calendar.drop(columns=['Full_Moon','Year'],inplace=True)
Calendar.head(15)

Unnamed: 0,Date,Lunar_Day
0,2009-01-01,20
1,2009-01-02,21
2,2009-01-03,22
3,2009-01-04,23
4,2009-01-05,24
5,2009-01-06,25
6,2009-01-07,26
7,2009-01-08,27
8,2009-01-09,28
9,2009-01-10,1


In [11]:
CrimeReport = pd.read_csv('../Dataset/Atlanta_2009.csv')
CrimeReport.columns=['Report_Number','Report_Date','Occur_Date','Occur_Time','Possible_Date',
                     'Possible_Time','Beat','Apartment_Office_Prefix','Apartment_Number',
                     'Location','Shift_Occurence','Location_Type','UCR_Literal','UCR_Code',
                     'IBR_Code','Neighborhood','NPU','Latitude','Longitude']

In [12]:
CrimeReport.head().transpose()

Unnamed: 0,0,1,2,3,4
Report_Number,90010061,90020500,153221163,90010939,90020008
Report_Date,01-01-2009,02-01-2009,18-11-2015,01-01-2009,02-01-2009
Occur_Date,01-01-2009,01-01-2009,01-01-2009,01-01-2009,01-01-2009
Occur_Time,'0010,'1800,'0000,'0200,'1630
Possible_Date,01-01-2009,02-01-2009,19-09-2015,01-01-2009,02-01-2009
Possible_Time,'0020,'0600,'1525,'1154,'0002
Beat,309,309,414,408,309
Apartment_Office_Prefix,,,,,
Apartment_Number,,,,R-2,517
Location,501 TUFTON TRL SE,3065 BROWNS MILL RD SE,3301 N CAMP CREEK,3000 STONE HOGAN CONN SW,2762 VINEYARDS DR SE


In [13]:
CrimeReport.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Report_Number,39327.0,91921890.0,1361731.0,61681980.0,90992230.0,91871280.0,92770860.0,153221200.0
Beat,39307.0,369.607,165.5563,101.0,208.0,406.0,506.0,612.0
UCR_Code,39327.0,588.6044,112.8458,110.0,511.0,640.0,670.0,730.0
Latitude,39327.0,33.75626,0.04387195,33.64835,33.72975,33.75605,33.77953,33.88606
Longitude,39327.0,-84.4064,0.04773341,-84.55041,-84.43276,-84.39517,-84.37206,-84.29013


In [14]:
CrimeReport['Lat1']=np.round(CrimeReport.Latitude,1)
CrimeReport['Long1']=np.round(CrimeReport.Longitude,1)
CrimeReport['Coordinates1']=list(zip(CrimeReport.Lat1,CrimeReport.Long1))
CrimeReport['Lat3']=np.round(CrimeReport.Latitude,3)
CrimeReport['Long3']=np.round(CrimeReport.Longitude,3)
CrimeReport['Coordinates3']=list(zip(CrimeReport.Lat3,CrimeReport.Long3))
CrimeReport['Occur_Date']=pd.to_datetime(CrimeReport['Occur_Date'])

In [15]:
CrimeReport=CrimeReport.merge(Calendar,how='left',left_on=CrimeReport.Occur_Date,right_on=Calendar.Date)
CrimeReport.drop(columns=['key_0','Date'],inplace=True)
CrimeReport.head()

Unnamed: 0,Report_Number,Report_Date,Occur_Date,Occur_Time,Possible_Date,Possible_Time,Beat,Apartment_Office_Prefix,Apartment_Number,Location,...,NPU,Latitude,Longitude,Lat1,Long1,Coordinates1,Lat3,Long3,Coordinates3,Lunar_Day
0,90010061,01-01-2009,2009-01-01,'0010,01-01-2009,'0020,309.0,,,501 TUFTON TRL SE,...,Z,33.66477,-84.3843,33.7,-84.4,"(33.7, -84.4)",33.665,-84.384,"(33.665, -84.384)",20
1,90020500,02-01-2009,2009-01-01,'1800,02-01-2009,'0600,309.0,,,3065 BROWNS MILL RD SE,...,Z,33.67107,-84.3824,33.7,-84.4,"(33.7, -84.4)",33.671,-84.382,"(33.671, -84.382)",20
2,153221163,18-11-2015,2009-01-01,'0000,19-09-2015,'1525,414.0,,,3301 N CAMP CREEK,...,P,33.6721,-84.50122,33.7,-84.5,"(33.7, -84.5)",33.672,-84.501,"(33.672, -84.501)",20
3,90010939,01-01-2009,2009-01-01,'0200,01-01-2009,'1154,408.0,,R-2,3000 STONE HOGAN CONN SW,...,R,33.67424,-84.49697,33.7,-84.5,"(33.7, -84.5)",33.674,-84.497,"(33.674, -84.497)",20
4,90020008,02-01-2009,2009-01-01,'1630,02-01-2009,'0002,309.0,,517,2762 VINEYARDS DR SE,...,Z,33.67957,-84.36996,33.7,-84.4,"(33.7, -84.4)",33.68,-84.37,"(33.68, -84.37)",20


In [16]:
CrimeReport.head().transpose()

Unnamed: 0,0,1,2,3,4
Report_Number,90010061,90020500,153221163,90010939,90020008
Report_Date,01-01-2009,02-01-2009,18-11-2015,01-01-2009,02-01-2009
Occur_Date,2009-01-01 00:00:00,2009-01-01 00:00:00,2009-01-01 00:00:00,2009-01-01 00:00:00,2009-01-01 00:00:00
Occur_Time,'0010,'1800,'0000,'0200,'1630
Possible_Date,01-01-2009,02-01-2009,19-09-2015,01-01-2009,02-01-2009
Possible_Time,'0020,'0600,'1525,'1154,'0002
Beat,309,309,414,408,309
Apartment_Office_Prefix,,,,,
Apartment_Number,,,,R-2,517
Location,501 TUFTON TRL SE,3065 BROWNS MILL RD SE,3301 N CAMP CREEK,3000 STONE HOGAN CONN SW,2762 VINEYARDS DR SE


In [17]:
UCR=CrimeReport.loc[:,['UCR_Literal','UCR_Code']]
UCR=UCR.drop_duplicates().sort_values(by=['UCR_Literal','UCR_Code']).reset_index().drop(columns='index')
UCR.head()

Unnamed: 0,UCR_Literal,UCR_Code
0,AGG ASSAULT,410
1,AGG ASSAULT,420
2,AGG ASSAULT,430
3,AGG ASSAULT,440
4,AUTO THEFT,710


In [18]:
CrimeReport1=CrimeReport.drop(columns=['Apartment_Number',
'Apartment_Office_Prefix',
'Beat',
'Latitude','Lat1','Lat3',
'Location',
'Longitude','Long1','Long3',
'Occur_Time',
'Possible_Date',
'Possible_Time',
'Report_Date',
'Report_Number',
'Shift_Occurence',
'UCR_Literal'])

In [19]:
CrimeReport1.head()

Unnamed: 0,Occur_Date,Location_Type,UCR_Code,IBR_Code,Neighborhood,NPU,Coordinates1,Coordinates3,Lunar_Day
0,2009-01-01,20.0,511,2202,Glenrose Heights,Z,"(33.7, -84.4)","(33.665, -84.384)",20
1,2009-01-01,,511,2202,Glenrose Heights,Z,"(33.7, -84.4)","(33.671, -84.382)",20
2,2009-01-01,26.0,410,1314,,P,"(33.7, -84.5)","(33.672, -84.501)",20
3,2009-01-01,18.0,710,2404,Greenbriar,R,"(33.7, -84.5)","(33.674, -84.497)",20
4,2009-01-01,18.0,720,2424,Rosedale Heights,Z,"(33.7, -84.4)","(33.68, -84.37)",20


In [20]:
from sklearn.preprocessing import LabelEncoder
lb_make = LabelEncoder()

Coord1=pd.DataFrame()
Coord1['Coordinates1']=CrimeReport1['Coordinates1']
Coord1['Categories']=lb_make.fit_transform(CrimeReport1['Coordinates1'])
CrimeReport1['Coordinates1']=lb_make.fit_transform(CrimeReport1['Coordinates1'])
Coord1=Coord1.drop_duplicates().sort_values(by='Categories').reset_index(drop=True)
Coord1.head()

Coord3=pd.DataFrame()
Coord3['Coordinates3']=CrimeReport1['Coordinates3']
Coord3['Categories']=lb_make.fit_transform(CrimeReport1['Coordinates3'])
CrimeReport1['Coordinates3']=lb_make.fit_transform(CrimeReport1['Coordinates3'])
Coord3=Coord3.drop_duplicates().sort_values(by='Categories').reset_index(drop=True)
Coord3.head()

Unnamed: 0,Coordinates1,Categories
0,"(33.6, -84.4)",0
1,"(33.7, -84.6)",1
2,"(33.7, -84.5)",2
3,"(33.7, -84.4)",3
4,"(33.7, -84.3)",4


Unnamed: 0,Coordinates3,Categories
0,"(33.648, -84.366)",0
1,"(33.648, -84.364)",1
2,"(33.649, -84.391)",2
3,"(33.649, -84.378)",3
4,"(33.649, -84.366)",4


In [21]:
CrimeReport1.head()

Unnamed: 0,Occur_Date,Location_Type,UCR_Code,IBR_Code,Neighborhood,NPU,Coordinates1,Coordinates3,Lunar_Day
0,2009-01-01,20.0,511,2202,Glenrose Heights,Z,3,175,20
1,2009-01-01,,511,2202,Glenrose Heights,Z,3,289,20
2,2009-01-01,26.0,410,1314,,P,2,304,20
3,2009-01-01,18.0,710,2404,Greenbriar,R,2,358,20
4,2009-01-01,18.0,720,2424,Rosedale Heights,Z,3,527,20


In [22]:
CrimeReport1.dtypes

Occur_Date       datetime64[ns]
Location_Type            object
UCR_Code                  int64
IBR_Code                 object
Neighborhood             object
NPU                      object
Coordinates1              int64
Coordinates3              int64
Lunar_Day                 int64
dtype: object

In [23]:
CrimeReport2=pd.DataFrame(data=CrimeReport1.groupby(by=['Occur_Date','Coordinates1','Coordinates3','Lunar_Day']).count()['UCR_Code'].values,
                   index=CrimeReport1.groupby(by=['Occur_Date','Coordinates1','Coordinates3','Lunar_Day']).count()['UCR_Code'].index)
CrimeReport2=CrimeReport2.reset_index()
CrimeReport2=CrimeReport2.reset_index(drop=True)
CrimeReport2['Crime_Count'] = CrimeReport2.loc[:,0]
CrimeReport2=CrimeReport2.drop(columns=0)
CrimeReport2.head()

Unnamed: 0,Occur_Date,Coordinates1,Coordinates3,Lunar_Day,Crime_Count
0,2009-01-01,2,304,20,1
1,2009-01-01,2,358,20,1
2,2009-01-01,2,535,20,1
3,2009-01-01,2,714,20,2
4,2009-01-01,2,738,20,1


In [24]:
CrimeReport2['Occur_Date']=lb_make.fit_transform(CrimeReport2['Occur_Date'])
#CrimeReport2=CrimeReport2[((CrimeReport2.Lunar_Day.between(25,30)) | (CrimeReport2.Lunar_Day.between(1,5)))]

In [25]:
CrimeReport2['Crime_Rate']=(CrimeReport2.Crime_Count*100000)/483450
CrimeReport2.sort_values(by='Crime_Rate',ascending=False).head()

Unnamed: 0,Occur_Date,Coordinates1,Coordinates3,Lunar_Day,Crime_Count,Crime_Rate
37555,363,6,10327,29,9,1.86162
34368,330,6,10327,26,7,1.447926
37353,361,6,10327,27,6,1.24108
17118,166,6,10327,10,6,1.24108
37196,359,6,10327,25,6,1.24108


In [26]:
#CrimeReport2.plot.scatter(x='Lunar_Day',y='Crime_Count',c='Coordinates1')

In [27]:
#sns.pairplot(data=CrimeReport2,hue='Crime_Count',size=5)

In [28]:
CrimeReport2.sort_values(by='Crime_Count',ascending=False).head()

Unnamed: 0,Occur_Date,Coordinates1,Coordinates3,Lunar_Day,Crime_Count,Crime_Rate
37555,363,6,10327,29,9,1.86162
34368,330,6,10327,26,7,1.447926
37353,361,6,10327,27,6,1.24108
17118,166,6,10327,10,6,1.24108
37196,359,6,10327,25,6,1.24108


In [29]:
x=CrimeReport2.drop(columns=['Crime_Count','Crime_Rate'])
y=CrimeReport2.Crime_Count

In [30]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)

In [31]:
x_train.reset_index(inplace=True,drop=True)
y_train.reset_index(inplace=True,drop=True)
x_test.reset_index(inplace=True,drop=True)
y_test.reset_index(inplace=True,drop=True)

In [32]:
from sklearn.tree import DecisionTreeRegressor
import sklearn.metrics as met

In [33]:
DTR_Model=DecisionTreeRegressor(max_depth=2,random_state=0)
DTR_Model.fit(x_train,y_train)
y_pred=DTR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))
print('Test Accuracy Score:',round(met.accuracy_score(y_test,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_test,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_test,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_test.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf
y_pred=DTR_Model.predict(x_train)
y_pred=np.round(y_pred,0)
print('Train Accuracy Score:',round(met.accuracy_score(y_train,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_train,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_train,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_train.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf

DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=0, splitter='best')

Explained Variance Score: 0.112
Mean Absolute Error: 0.045
Mean Squared Error: 0.058
Root Mean Squared Error: 0.24
Mean Squared Log Error: 0.008
Median Absolute Error: 0.0
R2 Score: 0.088
Test Accuracy Score: 0.961
Confusion Matrix:
 [[10838    34     0     0     0]
 [  339    21     0     0     0]
 [   39    12     0     0     0]
 [    6    10     0     0     0]
 [    0     2     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.965782,0.996873,0.981081,10872
1,2,0.265823,0.058333,0.095672,360
2,3,0.0,0.0,0.0,51
3,4,0.0,0.0,0.0,16
4,5,0.0,0.0,0.0,2


Train Accuracy Score: 0.963
Confusion Matrix:
 [[25346    83     0     0     0     0     0     0]
 [  749    51     0     0     0     0     0     0]
 [   67    40     0     0     0     0     0     0]
 [    9     8     0     0     0     0     0     0]
 [    2     8     0     0     0     0     0     0]
 [    0     4     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.968403,0.996736,0.982365,25429
1,2,0.260204,0.06375,0.10241,800
2,3,0.0,0.0,0.0,107
3,6,0.0,0.0,0.0,17
4,4,0.0,0.0,0.0,10
5,5,0.0,0.0,0.0,4
6,9,0.0,0.0,0.0,1
7,7,0.0,0.0,0.0,1


In [34]:
from sklearn.linear_model import LinearRegression
LR_Model=LinearRegression()

In [35]:
LR_Model.fit(x_train,y_train)
LR_Model.score(x_test,y_test)
y_pred=LR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))
print('Test Accuracy Score:',round(met.accuracy_score(y_test,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_test,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_test,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_test.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf
y_pred=DTR_Model.predict(x_train)
y_pred=np.round(y_pred,0)
print('Train Accuracy Score:',round(met.accuracy_score(y_train,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_train,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_train,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_train.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

0.004161856136745357

Explained Variance Score: -0.0
Mean Absolute Error: 0.046
Mean Squared Error: 0.065
Root Mean Squared Error: 0.256
Mean Squared Log Error: 0.009
Median Absolute Error: 0.0
R2 Score: -0.033
Test Accuracy Score: 0.962
Confusion Matrix:
 [[10872     0     0     0     0]
 [  360     0     0     0     0]
 [   51     0     0     0     0]
 [   16     0     0     0     0]
 [    2     0     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.962039,1.0,0.980652,10872
1,2,0.0,0.0,0.0,360
2,3,0.0,0.0,0.0,51
3,4,0.0,0.0,0.0,16
4,5,0.0,0.0,0.0,2


Train Accuracy Score: 0.963
Confusion Matrix:
 [[25346    83     0     0     0     0     0     0]
 [  749    51     0     0     0     0     0     0]
 [   67    40     0     0     0     0     0     0]
 [    9     8     0     0     0     0     0     0]
 [    2     8     0     0     0     0     0     0]
 [    0     4     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.968403,0.996736,0.982365,25429
1,2,0.260204,0.06375,0.10241,800
2,3,0.0,0.0,0.0,107
3,6,0.0,0.0,0.0,17
4,4,0.0,0.0,0.0,10
5,5,0.0,0.0,0.0,4
6,9,0.0,0.0,0.0,1
7,7,0.0,0.0,0.0,1


In [36]:
from sklearn.ensemble import RandomForestRegressor
RFReg_Model=RandomForestRegressor(n_estimators=int(x_train.Occur_Date.count()/100),max_depth=2,random_state=0)

  from numpy.core.umath_tests import inner1d


In [37]:
RFReg_Model.fit(x_train,y_train)
y_pred=RFReg_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))
print('Test Accuracy Score:',round(met.accuracy_score(y_test,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_test,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_test,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_test.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf
y_pred=DTR_Model.predict(x_train)
y_pred=np.round(y_pred,0)
print('Train Accuracy Score:',round(met.accuracy_score(y_train,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_train,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_train,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_train.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=2,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=263, n_jobs=1,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

Explained Variance Score: 0.112
Mean Absolute Error: 0.045
Mean Squared Error: 0.058
Root Mean Squared Error: 0.24
Mean Squared Log Error: 0.008
Median Absolute Error: 0.0
R2 Score: 0.088
Test Accuracy Score: 0.961
Confusion Matrix:
 [[10838    34     0     0     0]
 [  339    21     0     0     0]
 [   39    12     0     0     0]
 [    6    10     0     0     0]
 [    0     2     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.965782,0.996873,0.981081,10872
1,2,0.265823,0.058333,0.095672,360
2,3,0.0,0.0,0.0,51
3,4,0.0,0.0,0.0,16
4,5,0.0,0.0,0.0,2


Train Accuracy Score: 0.963
Confusion Matrix:
 [[25346    83     0     0     0     0     0     0]
 [  749    51     0     0     0     0     0     0]
 [   67    40     0     0     0     0     0     0]
 [    9     8     0     0     0     0     0     0]
 [    2     8     0     0     0     0     0     0]
 [    0     4     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.968403,0.996736,0.982365,25429
1,2,0.260204,0.06375,0.10241,800
2,3,0.0,0.0,0.0,107
3,6,0.0,0.0,0.0,17
4,4,0.0,0.0,0.0,10
5,5,0.0,0.0,0.0,4
6,9,0.0,0.0,0.0,1
7,7,0.0,0.0,0.0,1


In [38]:
x=CrimeReport2.drop(columns=['Crime_Count','Crime_Rate'])
y=CrimeReport2.Crime_Count
x.reset_index(inplace=True,drop=True)
y.reset_index(inplace=True,drop=True)

In [39]:
from sklearn.model_selection import KFold
LRModel=LinearRegression()
nsplits=int(x_train.Occur_Date.count()/100)

In [40]:
%%time
kfold_summary=pd.DataFrame()
for i in range(0,10):
    KF=KFold(n_splits=nsplits,shuffle=True,random_state=i)
    for train,test in KF.split(x,y):
        x_train,x_test=x.iloc[train,:],x.iloc[test,:]
        y_train,y_test=y[train],y[test]
        LRModel=LRModel.fit(x_train,y_train)
        y_pred1=LRModel.predict(x_test)
        y_pred1=np.round(y_pred1)
        y_pred2=LRModel.predict(x_train)
        y_pred2=np.round(y_pred2)
        #kfold_summary=kfold_summary.append(pd.DataFrame(data=[i,train,test,met.regression.explained_variance_score(y_test,y_pred1),met.accuracy_score(y_train,y_pred2)]).transpose())
        kfold_summary=kfold_summary.append(pd.DataFrame(data=[i,train,test,met.accuracy_score(y_test,y_pred1),met.accuracy_score(y_train,y_pred2)]).transpose())

Wall time: 1min 9s


In [41]:
print("No.of splits:",KF.get_n_splits(x,y))
kfold_summary.columns=['random_state','train','test','test_score','train_score']

kfold_summary=kfold_summary.sort_values(by=['test_score','train_score'],ascending=False).reset_index(drop=True)

train1=np.array(kfold_summary.head(1).train.tolist()).flatten()
test1=np.array(kfold_summary.head(1).test.tolist()).flatten()

x_train,x_test=x.iloc[train1,:],x.iloc[test1,:]
y_train,y_test=y.iloc[train1],y.iloc[test1]

No.of splits: 263


In [42]:
kfold_summary.loc[kfold_summary[kfold_summary.test_score != 1].index,:].head()

Unnamed: 0,random_state,train,test,test_score,train_score
7,0,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[487, 545, 651, 1007, 1167, 1603, 1613, 1759, ...",0.993056,0.963545
8,0,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[666, 934, 1878, 2032, 3393, 3913, 4168, 4441,...",0.993056,0.963545
9,0,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[91, 226, 281, 577, 583, 1011, 1026, 1304, 150...",0.993056,0.963545
10,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[22, 150, 1093, 1317, 1453, 2017, 2089, 2830, ...",0.993056,0.963545
11,3,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[490, 584, 1026, 1367, 2049, 2290, 2483, 2632,...",0.993056,0.963545


In [43]:
kfold_summary.test_score.value_counts()

0.965035    369
0.972028    343
0.958042    303
0.979021    258
0.951049    207
0.986014    158
0.944056    150
0.972222    114
0.965278    111
0.937063    102
0.958333    102
0.979167     79
0.951389     63
0.993007     53
0.930070     40
0.944444     36
0.986111     35
0.937500     29
0.923077     17
0.993056     17
0.930556     12
0.916084      8
0.923611      7
1.000000      7
0.909091      4
0.916667      4
0.902098      2
Name: test_score, dtype: int64

In [44]:
LRModel=LinearRegression()
LRModel.fit(x_train,y_train)
y_pred=LRModel.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))
print('Test Accuracy Score:',round(met.accuracy_score(y_test,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_test,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_test,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_test.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf
y_pred=DTR_Model.predict(x_train)
y_pred=np.round(y_pred,0)
print('Train Accuracy Score:',round(met.accuracy_score(y_train,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_train,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_train,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_train.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Explained Variance Score: 1.0
Mean Absolute Error: 0.0
Mean Squared Error: 0.0
Root Mean Squared Error: 0.0
Mean Squared Log Error: 0.0
Median Absolute Error: 0.0
R2 Score: 1.0
Test Accuracy Score: 1.0
Confusion Matrix:
 [[143]]

Precision Recall FScore Support Matrix:



Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,1.0,1.0,1.0,143


Train Accuracy Score: 0.962
Confusion Matrix:
 [[36041   117     0     0     0     0     0     0]
 [ 1088    72     0     0     0     0     0     0]
 [  106    52     0     0     0     0     0     0]
 [   15    18     0     0     0     0     0     0]
 [    2    10     0     0     0     0     0     0]
 [    0     4     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.967492,0.996764,0.98191,36158
1,2,0.261818,0.062069,0.100348,1160
2,5,0.0,0.0,0.0,158
3,4,0.0,0.0,0.0,33
4,3,0.0,0.0,0.0,12
5,6,0.0,0.0,0.0,4
6,7,0.0,0.0,0.0,1
7,9,0.0,0.0,0.0,1


In [45]:
DTR_Model=DecisionTreeRegressor(max_depth=2,random_state=0)
DTR_Model.fit(x_train,y_train)
y_pred=DTR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))
print('Test Accuracy Score:',round(met.accuracy_score(y_test,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_test,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_test,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_test.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf
y_pred=DTR_Model.predict(x_train)
y_pred=np.round(y_pred,0)
print('Train Accuracy Score:',round(met.accuracy_score(y_train,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_train,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_train,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_train.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf

DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=0, splitter='best')

Explained Variance Score: 1.0
Mean Absolute Error: 0.0
Mean Squared Error: 0.0
Root Mean Squared Error: 0.0
Mean Squared Log Error: 0.0
Median Absolute Error: 0.0
R2 Score: 1.0
Test Accuracy Score: 1.0
Confusion Matrix:
 [[143]]

Precision Recall FScore Support Matrix:



Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,1.0,1.0,1.0,143


Train Accuracy Score: 0.962
Confusion Matrix:
 [[36041   117     0     0     0     0     0     0]
 [ 1088    72     0     0     0     0     0     0]
 [  106    52     0     0     0     0     0     0]
 [   15    18     0     0     0     0     0     0]
 [    2    10     0     0     0     0     0     0]
 [    0     4     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.967492,0.996764,0.98191,36158
1,2,0.261818,0.062069,0.100348,1160
2,5,0.0,0.0,0.0,158
3,4,0.0,0.0,0.0,33
4,3,0.0,0.0,0.0,12
5,6,0.0,0.0,0.0,4
6,7,0.0,0.0,0.0,1
7,9,0.0,0.0,0.0,1


In [46]:
RFReg_Model.fit(x_train,y_train)
y_pred=RFReg_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))
print('Test Accuracy Score:',round(met.accuracy_score(y_test,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_test,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_test,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_test.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf
y_pred=DTR_Model.predict(x_train)
y_pred=np.round(y_pred,0)
print('Train Accuracy Score:',round(met.accuracy_score(y_train,y_pred),3))
print('Confusion Matrix:\n',met.confusion_matrix(y_train,y_pred))
precision, recall, f_score, support = met.classification.precision_recall_fscore_support(y_train,y_pred)
prf = pd.DataFrame()
prf['Crime_Count']=y_train.unique().flatten()
prf['Precision']=precision.flatten()
prf['Recall']=recall.flatten()
prf['Fscore']=f_score.flatten()
prf['Support']=support.flatten()
print('\nPrecision Recall FScore Support Matrix:\n')
prf

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=2,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=263, n_jobs=1,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

Explained Variance Score: 1.0
Mean Absolute Error: 0.0
Mean Squared Error: 0.0
Root Mean Squared Error: 0.0
Mean Squared Log Error: 0.0
Median Absolute Error: 0.0
R2 Score: 1.0
Test Accuracy Score: 1.0
Confusion Matrix:
 [[143]]

Precision Recall FScore Support Matrix:



Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,1.0,1.0,1.0,143


Train Accuracy Score: 0.962
Confusion Matrix:
 [[36041   117     0     0     0     0     0     0]
 [ 1088    72     0     0     0     0     0     0]
 [  106    52     0     0     0     0     0     0]
 [   15    18     0     0     0     0     0     0]
 [    2    10     0     0     0     0     0     0]
 [    0     4     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]
 [    0     1     0     0     0     0     0     0]]

Precision Recall FScore Support Matrix:



  'precision', 'predicted', average, warn_for)


Unnamed: 0,Crime_Count,Precision,Recall,Fscore,Support
0,1,0.967492,0.996764,0.98191,36158
1,2,0.261818,0.062069,0.100348,1160
2,5,0.0,0.0,0.0,158
3,4,0.0,0.0,0.0,33
4,3,0.0,0.0,0.0,12
5,6,0.0,0.0,0.0,4
6,7,0.0,0.0,0.0,1
7,9,0.0,0.0,0.0,1
