## Crime Rate Prediction based on Lunar Cycle

#### Purpose:
To predict the crime rate for days closer to the Full Moon Days for any given year.

#### Assumption:
1. The thesis that proves the existance of relationship between Lunar Cycle and Criminal behavior of human beings is true.
2. The Lunar Day information obtained from the website https://www.timeanddate.com is accurate.
3. The Census information obtained from the website https://www.google.com/publicdata/explore?ds=kf7tgg1uo9ude is accurate.

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity="all"

In [2]:
import warnings;
warnings.simplefilter('ignore')

In [3]:
import pandas as pd
import numpy as np
import datetime as dt

In [4]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import KFold
import sklearn.metrics as met
from sklearn.model_selection import GridSearchCV

Do The following steps for scraping Lunar Cycle Data from www.timeanddate.com <br/><br/>
Step#1: Open Anaconda Command Prompt<br/><br/>
Step#2: Run the below command<br/>
<p>
*        scrapy runspider C:\Users\dhars\Documents\GL_ML\Capstone\Crime_Rate_Prediction_And_Lunar_Cycle\Code\Web_Scrapping\Lunar_Cycle_WebScrapper_v3.py -o output.json</p><br/>

Step#3: When the code prompts you for user input, provide the name of the city and country along with the years between which you need the lunar data.<br/>
<p>
*        Eg. City: Atlanta<br/>
*        Country: USA<br/>
*        Starting Year: 2009<br/>
*        Ending Year: 2018
</p><br/>

Step#4: The file Lunar_Cycle_2009_2018.csv will be created in the below location<br/>
*     "C:\Users\dhars\Documents\GL_ML\Capstone\Crime_Rate_Prediction_And_Lunar_Cycle\Code\Web_Scrapping"<br/>

##### Comment: <br/><br/>Import the full moon day information from the scrapped date into a data frame.<br/><br/> Remove records with special characters and other unnecessary features.

In [6]:
Lunar_Cycle=pd.read_csv('../Code/Web_Scrapping/Lunar_Cycle_2009_2018.csv')
# Below line of code displayed records with '\xa0' special characters in the dataframe 
    # and they are removed in the following steps
#np.sort(Lunar_Cycle.Full_Moon.unique())
Lunar_Cycle.drop(index=Lunar_Cycle[Lunar_Cycle.Full_Moon=='\xa0'].index,inplace=True)
Lunar_Cycle.Full_Moon=Lunar_Cycle.Full_Moon.str.cat(Lunar_Cycle.Year.astype('str'),'-')
Lunar_Cycle.Full_Moon=pd.to_datetime(Lunar_Cycle.Full_Moon)
Lunar_Cycle.drop(columns=['SL','Lunation'],inplace=True)
Lunar_Cycle.head()

Unnamed: 0,Year,Full_Moon
0,2008,2008-01-22
1,2008,2008-02-20
2,2008,2008-03-21
3,2008,2008-04-20
4,2008,2008-05-19


##### Comment: <br/><br/> Create a new data frame named Calendar with all days between 2009 and 2018 and calculate the Lunar Day by merging Calendar with the Lunar Cycle data frame. <br/><br/> P.S. : Lunar Day is the number assigned to each date based on how far it is from the previous Full Moon Day. For example, all the Full Moon days will have Lunar Day value as 1 and the day next to the Full Moon day will 2 and so on, upto 28, 29 or 30 based on when the next Full Moon day is.

In [7]:
#%%time

# The %%time magic command shows how long it takes to run the code in this cell
    # and it needs to be the first line in the cell even before comments.

# Code:
Calendar=pd.DataFrame(columns=['Year','Date'])
Calendar.Date=pd.date_range(start='1/1/2009',end='31/12/2018')
Calendar.Year=Calendar.Date.dt.year
Calendar['Lunar_Day']=0
Calendar=Calendar.merge(Lunar_Cycle,how='left',left_on=Calendar['Date'],right_on=Lunar_Cycle.Full_Moon)
Calendar['Year']=Calendar['Year_x']
Calendar.drop(columns=['key_0','Year_x','Year_y'],inplace=True)
Calendar.Full_Moon=pd.to_datetime(Calendar.Full_Moon)
Calendar.loc[Calendar[Calendar.Full_Moon.notna()].index,'Lunar_Day']=1
for i in range(Calendar.Year.count()-1):
    if ((Calendar.loc[i,'Lunar_Day'] != 0)&(Calendar.loc[i+1,'Lunar_Day'] != 1)):
        Calendar.loc[i+1,'Lunar_Day']=Calendar.loc[i,'Lunar_Day'] + 1
        prev_lunar=Lunar_Cycle[Lunar_Cycle.Year == Calendar[Calendar.Lunar_Day == 0].Year.unique()[0]-1].Full_Moon.dt.date.values[-1]
Calendar.loc[Calendar[Calendar.Lunar_Day == 0].index,'Lunar_Day']=(Calendar[Calendar.Lunar_Day == 0].Date.dt.date - prev_lunar).dt.days
Calendar.drop(columns=['Full_Moon','Year'],inplace=True)
Calendar.head()

Unnamed: 0,Date,Lunar_Day
0,2009-01-01,20
1,2009-01-02,21
2,2009-01-03,22
3,2009-01-04,23
4,2009-01-05,24


In [8]:
Population_Atlanta=pd.read_csv('../Code/Web_Scrapping/Population_Atlanta.csv')
Population_Atlanta.drop(columns=['Unnamed: 0'],inplace=True)
Population_Atlanta.Year=Population_Atlanta.Year.astype('int64')
Population_Atlanta.Population=Population_Atlanta.Population.astype('int64')
Population_Atlanta.head()

Unnamed: 0,Year,Population
0,2008,474509
1,2009,483450
2,2010,422849
3,2011,431729
4,2012,443008


##### Comment: <br/><br/> Import the Crime Data fo Atlanta for all years between 2009 and 2018. We can filter the data frame into a smaller subset when necessary. <br/><br/> Assign the column names as given below, so that column names will not contain special characters and will be easier to access with reference to the data frame. <br/><br/> Keep a copy of the data frame for future use.

In [9]:
CrimeReport=pd.read_csv('../Dataset/Atlanta_2009_2018.csv')
CrimeReport.columns=['Report_Number','Report_Date','Occur_Date','Occur_Time','Possible_Date',
                     'Possible_Time','Beat','Apartment_Office_Prefix','Apartment_Number',
                     'Location','Shift_Occurence','Location_Type','UCR_Literal','UCR_Code',
                     'IBR_Code','Neighborhood','NPU','Latitude','Longitude']
CrimeReport_Org=CrimeReport

##### Comment: <br/><br/> 1. The column 'Beat' sometimes has null. Replacing those null values with 0 and converting the column to int. <br/><br/> 2. Converting column 'Occur_Date' to date. <br/><br/> 3. Creating a new column called 'Crime_Year' to capture the year in which the crime actually happened. <br/><br/> 4. Filtering the data frame to a subset for crimes between the years 2009 and 2011. <br/><br/> 5. Merge the Calendar data frame to Crime Report to get Lunar Day in Crime Report. <br/><br/> 6. Creating new columns Coordinates1 and Coordinates3 by rounding up Latitude and Longitude to 1 and 3 decimal places and merging the respoective rounded values together

In [10]:
%%time

CrimeReport.Beat.fillna(value=0,inplace=True)
CrimeReport.Beat=CrimeReport.Beat.astype('int64')

CrimeReport['Occur_Date']=pd.to_datetime(CrimeReport['Occur_Date'])

CrimeReport['Crime_Year']=CrimeReport.Occur_Date.dt.year
CrimeReport['Crime_Month']=CrimeReport.Occur_Date.dt.month
CrimeReport['Crime_Day']=CrimeReport.Occur_Date.dt.day

CrimeReport=CrimeReport.loc[CrimeReport[CrimeReport.Crime_Year.isin(['2009','2010','2011','2012','2013','2014','2015','2016','2017','2018'])].index,:]
CrimeReport=CrimeReport.reset_index(drop=True)

CrimeReport=CrimeReport.merge(Calendar,how='left',left_on=CrimeReport.Occur_Date,right_on=Calendar.Date)
CrimeReport.drop(columns=['key_0','Date'],inplace=True)

CrimeReport.Lunar_Day=CrimeReport.Lunar_Day.astype('int64')

Wall time: 49.3 s


In [11]:
CrimeReport.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Report_Number,317506.0,133336400.0,28564170.0,51401020.0,110621200.0,131811100.0,160231600.0,190032100.0
Occur_Time,317506.0,1359.988,668.2579,0.0,900.0,1455.0,1900.0,2447.0
Possible_Time,317487.0,1310.559,642.0945,0.0,830.0,1352.0,1830.0,3015.0
Beat,317506.0,358.1682,166.1098,0.0,208.0,401.0,505.0,612.0
UCR_Code,317506.0,593.9587,112.4762,110.0,511.0,640.0,670.0,730.0
Latitude,317506.0,33.7567,0.06921599,16.59605,33.72961,33.75617,33.78153,33.88613
Longitude,317506.0,-84.40734,0.1390977,-84.55049,-84.43276,-84.39677,-84.37383,-41.82593
Crime_Year,317506.0,2013.139,2.854092,2009.0,2011.0,2013.0,2016.0,2018.0
Crime_Month,317506.0,6.587526,3.4149,1.0,4.0,7.0,10.0,12.0
Crime_Day,317506.0,15.80912,8.71709,1.0,8.0,16.0,23.0,31.0


In [13]:
lb_make=LabelEncoder()

Crime_Date=pd.DataFrame()
Crime_Date['Occur_Date']=CrimeReport['Occur_Date']
Crime_Date['Date_Category']=lb_make.fit_transform(Crime_Date['Occur_Date'])
Crime_Date=Crime_Date.drop_duplicates().sort_values(by=['Occur_Date']).reset_index(drop=True)
Crime_Date.head()

Coordinates=CrimeReport.loc[:,['Latitude','Longitude']]
Coordinates['Coordinates1']=list(zip(np.round(Coordinates.Latitude,1),np.round(Coordinates.Longitude,1)))
Coordinates['Coordinates1']=Coordinates['Coordinates1'].astype('str')
Coordinates['Coord1_Category']=lb_make.fit_transform(Coordinates['Coordinates1'])
Coordinates['Coordinates3']=list(zip(np.round(Coordinates.Latitude,3),np.round(Coordinates.Longitude,3)))
Coordinates['Coordinates3']=Coordinates['Coordinates3'].astype('str')
Coordinates['Coord3_Category']=lb_make.fit_transform(Coordinates['Coordinates3'],)
Coordinates=Coordinates.drop_duplicates().sort_values(by=['Latitude','Longitude']).reset_index(drop=True)
Coordinates.head()

UCR=pd.DataFrame()
UCR=CrimeReport.loc[:,['UCR_Literal','UCR_Code']]
UCR['UCR_Code_Category']=lb_make.fit_transform(UCR['UCR_Code'])
UCR=UCR.drop_duplicates().sort_values(by=['UCR_Literal','UCR_Code']).reset_index(drop=True)
UCR.head()

IBR=pd.DataFrame()
CrimeReport['IBR_Code']=CrimeReport['IBR_Code'].astype('str')
IBR['IBR_Code']=CrimeReport['IBR_Code']
IBR['IBR_Category']=lb_make.fit_transform(IBR['IBR_Code'])
IBR=IBR.drop_duplicates().sort_values(by=['IBR_Code']).reset_index(drop=True)
IBR.head()

Loc_Type=pd.DataFrame()
CrimeReport.Location_Type.fillna(value=0,inplace=True)
CrimeReport['Location_Type']=CrimeReport['Location_Type'].astype('str')
Loc_Type['Location_Type']=CrimeReport['Location_Type']
Loc_Type['Location_Type_Code']=lb_make.fit_transform(Loc_Type['Location_Type'])
Loc_Type=Loc_Type.drop_duplicates().sort_values(by=['Location_Type']).reset_index(drop=True)
Loc_Type.head()

CrimeReport['Occur_Date']=lb_make.fit_transform(CrimeReport['Occur_Date'])
CrimeReport['UCR_Code']=lb_make.fit_transform(CrimeReport['UCR_Code'])
CrimeReport['IBR_Code']=lb_make.fit_transform(CrimeReport['IBR_Code'])
CrimeReport['Location_Type']=lb_make.fit_transform(CrimeReport['Location_Type'])
CrimeReport['Coordinates1']=list(zip(np.round(CrimeReport.Latitude,1),np.round(CrimeReport.Longitude,1)))
CrimeReport['Coordinates1']=CrimeReport['Coordinates1'].astype('str')
CrimeReport['Coordinates1']=lb_make.fit_transform(CrimeReport['Coordinates1'])
CrimeReport['Coordinates3']=list(zip(np.round(CrimeReport.Latitude,3),np.round(CrimeReport.Longitude,3)))
CrimeReport['Coordinates3']=CrimeReport['Coordinates3'].astype('str')
CrimeReport['Coordinates3']=lb_make.fit_transform(CrimeReport['Coordinates3'])

Unnamed: 0,Occur_Date,Date_Category
0,0,0
1,1,1
2,2,2
3,3,3
4,4,4


array(['(33.7, -84.5)', '(33.8, -84.4)', '(33.8, -84.5)', '(33.8, -84.3)',
       '(33.7, -84.4)', '(33.9, -84.4)', '(33.7, -84.3)', '(33.6, -84.4)',
       '(33.9, -84.3)', '(33.7, -84.6)', '(33.9, -84.5)', '(16.6, -41.8)',
       '(16.7, -41.9)', '(33.5, -84.2)'], dtype=object)

Unnamed: 0,Latitude,Longitude,Coordinates1,Coord1_Category,Coordinates3,Coord3_Category
0,16.59605,-41.84585,"(16.6, -41.8)",0,"(16.596, -41.846)",0
1,16.62054,-41.82593,"(16.6, -41.8)",0,"(16.621, -41.826)",1
2,16.65279,-41.87132,"(16.7, -41.9)",1,"(16.653, -41.871)",2
3,33.47,-84.23517,"(33.5, -84.2)",2,"(33.47, -84.235)",3
4,33.6375,-84.44768,"(33.6, -84.4)",3,"(33.638, -84.448)",4


Unnamed: 0,UCR_Literal,UCR_Code,UCR_Code_Category
0,AGG ASSAULT,30,30
1,AGG ASSAULT,31,31
2,AGG ASSAULT,32,32
3,AGG ASSAULT,33,33
4,AUTO THEFT,49,49


Unnamed: 0,IBR_Code,IBR_Category
0,0,0
1,1,1
2,10,2
3,11,3
4,12,4


Unnamed: 0,Location_Type,Location_Type_Code
0,0,0
1,1,1
2,10,2
3,11,3
4,12,4


In [14]:
CrimeReport.head().transpose()

Unnamed: 0,0,1,2,3,4
Report_Number,90010930,90011083,90011208,90011218,90011289
Report_Date,01-01-2009,01-01-2009,01-01-2009,01-01-2009,01-01-2009
Occur_Date,0,0,0,0,0
Occur_Time,1145,1330,1500,1450,1600
Possible_Date,01-01-2009,01-01-2009,01-01-2009,01-01-2009,01-01-2009
Possible_Time,1148,1330,1520,1510,1700
Beat,408,506,413,204,408
Apartment_Office_Prefix,,,,,
Apartment_Number,,,,,
Location,2841 GREENBRIAR PKWY,12 BROAD ST SW,3500 MARTIN L KING JR DR SW,3393 PEACHTREE RD NE,2841 GREENBRIAR PKWY SW


In [None]:
df=pd.DataFrame(data=CrimeReport.groupby(by=['Coordinates1']).count()['Report_Number'].values,
                   index=CrimeReport.groupby(by=['Coordinates1']).count()['Report_Number'].index,
                   columns=['Count']).reset_index()
df.to_csv('Location_vs_Count.csv',index=False)

In [None]:
#CrimeReport=CrimeReport[((CrimeReport.Lunar_Day.between(25,30)) | (CrimeReport.Lunar_Day.between(1,5)))]

In [None]:
CrimeReport4=pd.DataFrame(data=CrimeReport.groupby(by=['Crime_Year','Crime_Month','Coordinates1','Coordinates3','Lunar_Day']).count()['Report_Number'].values,
                   index=CrimeReport.groupby(by=['Crime_Year','Crime_Month','Coordinates1','Coordinates3','Lunar_Day']).count()['Report_Number'].index)
CrimeReport4=CrimeReport4.reset_index()
CrimeReport4['Crime_Count']=CrimeReport4.loc[:,0]
CrimeReport4=CrimeReport4.drop(columns=0)
CrimeReport4=CrimeReport4.sort_values(by=['Lunar_Day','Crime_Count','Coordinates1','Coordinates3'],ascending=['True','False','True','True']).reset_index(drop=True)
CrimeReport4.head()

In [None]:
CrimeReport41=CrimeReport4[CrimeReport4.Lunar_Day==1]
CrimeReport42=CrimeReport4[(CrimeReport4.Lunar_Day==14)|(CrimeReport4.Lunar_Day==15)]
CrimeReport43=CrimeReport4[(CrimeReport4.Lunar_Day!=1)&(CrimeReport4.Lunar_Day!=14)&(CrimeReport4.Lunar_Day!=15)]

In [None]:
x=CrimeReport4.drop(columns=['Crime_Count'])
y=CrimeReport4.Crime_Count
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
x_train.reset_index(inplace=True,drop=True)
y_train.reset_index(inplace=True,drop=True)
x_test.reset_index(inplace=True,drop=True)
y_test.reset_index(inplace=True,drop=True)
np.sort(y.unique())

In [None]:
from mlxtend.regressor import StackingRegressor
DTR_Model=DecisionTreeRegressor(random_state=0)
stregr = StackingRegressor(regressors=[DTR_Model, DTR_Model, DTR_Model], meta_regressor=DTR_Model)
stregr.fit(x_train,y_train)
y_pred=stregr.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
#CrimeReport4.to_csv('../Dataset/CrimeReport4.csv')

In [None]:
DTR_Model=DecisionTreeRegressor(max_depth=18,random_state=0)
DTR_Model.fit(x_train,y_train)
y_pred=DTR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
RFReg_Model=RandomForestRegressor(n_estimators=10,max_depth=18,random_state=0)
RFReg_Model.fit(x_train,y_train)
y_pred=RFReg_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
CrimeReport5=pd.DataFrame(data=CrimeReport.groupby(by=['Coordinates1','Coordinates3','IBR_Code','Lunar_Day']).count()['Report_Number'].values,
                   index=CrimeReport.groupby(by=['Coordinates1','Coordinates3','IBR_Code','Lunar_Day']).count()['Report_Number'].index)
CrimeReport5=CrimeReport5.reset_index()
CrimeReport5['Crime_Count']=CrimeReport5.loc[:,0]
CrimeReport5=CrimeReport5.drop(columns=0)
CrimeReport5=CrimeReport5.sort_values(by=['Lunar_Day','Crime_Count','Coordinates1','Coordinates3'],ascending=['True','False','True','True']).reset_index(drop=True)
CrimeReport5.head()

In [None]:
x=CrimeReport5.drop(columns=['Crime_Count'])
y=CrimeReport5.Crime_Count
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
x_train.reset_index(inplace=True,drop=True)
y_train.reset_index(inplace=True,drop=True)
x_test.reset_index(inplace=True,drop=True)
y_test.reset_index(inplace=True,drop=True)
np.sort(y.unique())

In [None]:
DTR_Model=DecisionTreeRegressor(max_depth=15,random_state=0)
DTR_Model.fit(x_train,y_train)
y_pred=DTR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
RFReg_Model=RandomForestRegressor(n_estimators=5,max_depth=15,random_state=0)
RFReg_Model.fit(x_train,y_train)
y_pred=RFReg_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
CrimeReport2=pd.DataFrame(data=CrimeReport.groupby(by=['Crime_Year','Occur_Date','Coordinates1','Coordinates3','Lunar_Day']).count()['Report_Number'].values,
                   index=CrimeReport.groupby(by=['Crime_Year','Occur_Date','Coordinates1','Coordinates3','Lunar_Day']).count()['Report_Number'].index)
CrimeReport2=CrimeReport2.reset_index()
CrimeReport2['Crime_Count']=CrimeReport2.loc[:,0]
CrimeReport2=CrimeReport2.drop(columns=0)
CrimeReport2['Crime_Rate']=(CrimeReport2.Crime_Count*100000)/483450
CrimeReport2=CrimeReport2.sort_values(by=['Occur_Date','Crime_Rate'],ascending=[True,False]).reset_index(drop=True)
CrimeReport2.head()

In [None]:
#CrimeReport2.plot.scatter(x='Lunar_Day',y='Crime_Count',c='Coordinates1')

In [None]:
#sns.pairplot(data=CrimeReport2,hue='Crime_Count',size=5)

In [None]:
CrimeReport2.sort_values(by='Crime_Count',ascending=False).head()

In [None]:
x=CrimeReport2.drop(columns=['Crime_Count','Crime_Rate'])
y=CrimeReport2.Crime_Count
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
x_train.reset_index(inplace=True,drop=True)
y_train.reset_index(inplace=True,drop=True)
x_test.reset_index(inplace=True,drop=True)
y_test.reset_index(inplace=True,drop=True)

In [None]:
DTR_Model=DecisionTreeRegressor(max_depth=2,random_state=0)
DTR_Model.fit(x_train,y_train)
y_pred=DTR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
LR_Model=LinearRegression()
LR_Model.fit(x_train,y_train)
LR_Model.score(x_test,y_test)
y_pred=LR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
RFReg_Model=RandomForestRegressor(n_estimators=int(x_train.Occur_Date.count()/100),max_depth=2,random_state=0)
RFReg_Model.fit(x_train,y_train)
y_pred=RFReg_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
break;

In [None]:
x=CrimeReport2.drop(columns=['Crime_Count','Crime_Rate'])
y=CrimeReport2.Crime_Count
x.reset_index(inplace=True,drop=True)
y.reset_index(inplace=True,drop=True)

In [None]:
%%time
LRModel=LinearRegression()
nsplits=int(x_train.Occur_Date.count()/100)
kfold_summary=pd.DataFrame()
for i in range(0,10):
    KF=KFold(n_splits=nsplits,shuffle=True,random_state=i)
    for train,test in KF.split(x,y):
        x_train,x_test=x.iloc[train,:],x.iloc[test,:]
        y_train,y_test=y[train],y[test]
        LRModel=LRModel.fit(x_train,y_train)
        test_y_pred=LRModel.predict(x_test)
        test_y_pred=np.round(test_y_pred)
        train_y_pred=LRModel.predict(x_train)
        train_y_pred=np.round(train_y_pred)
        kfold_summary=kfold_summary.append(pd.DataFrame(data=[i,train,test]).transpose())

In [None]:
print("No.of splits:",KF.get_n_splits(x,y))
kfold_summary.columns=['random_state','train','test']

#kfold_summary=kfold_summary.sort_values(by=['test_score','train_score'],ascending=False).reset_index(drop=True)

train1=np.array(kfold_summary.head(1).train.tolist()).flatten()
test1=np.array(kfold_summary.head(1).test.tolist()).flatten()

x_train,x_test=x.iloc[train1,:],x.iloc[test1,:]
y_train,y_test=y.iloc[train1],y.iloc[test1]

In [None]:
LRModel=LinearRegression()
LRModel.fit(x_train,y_train)
y_pred=LRModel.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
DTR_Model=DecisionTreeRegressor(max_depth=2,random_state=0)
DTR_Model.fit(x_train,y_train)
y_pred=DTR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
RFReg_Model.fit(x_train,y_train)
y_pred=RFReg_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
CrimeReport3=pd.DataFrame(data=CrimeReport.groupby(by=['Crime_Year','Occur_Date','Beat','UCR_Code','IBR_Code','Coordinates1','Coordinates3','Lunar_Day']).count()['Report_Number'].values,
                   index=CrimeReport.groupby(by=['Crime_Year','Occur_Date','Beat','UCR_Code','IBR_Code','Coordinates1','Coordinates3','Lunar_Day']).count()['Report_Number'].index)
CrimeReport3=CrimeReport3.reset_index()
CrimeReport3['Crime_Count']=CrimeReport3.loc[:,0]
CrimeReport3=CrimeReport3.drop(columns=0)
CrimeReport3=CrimeReport3.sort_values(by=['Crime_Count','Crime_Year','Occur_Date','Coordinates1','Coordinates3'],ascending=['False','True','True','True','True']).reset_index(drop=True)
CrimeReport3.head()

In [None]:
Crime_Rate_Yr=pd.DataFrame(data=CrimeReport3.groupby(by=['Crime_Year']).sum()['Crime_Count']).reset_index()
Crime_Rate_Yr=Crime_Rate_Yr.merge(right=Population_Atlanta,left_on=Crime_Rate_Yr.Crime_Year,right_on=Population_Atlanta.Year).drop(columns=['key_0','Year'])
Crime_Rate_Yr['Crime_Rate']=((Crime_Rate_Yr.Crime_Count*100000)/Crime_Rate_Yr.Population)
Crime_Rate_Yr

In [None]:
Crime_Rate_Dt=pd.DataFrame(data=CrimeReport3.groupby(by=['Crime_Year','Occur_Date','Lunar_Day']).sum()['Crime_Count']).reset_index()
Crime_Rate_Dt=Crime_Rate_Dt.merge(right=Population_Atlanta,left_on=Crime_Rate_Dt.Crime_Year,right_on=Population_Atlanta.Year).drop(columns=['key_0','Year'])
Crime_Rate_Dt['Crime_Rate']=((Crime_Rate_Dt.Crime_Count*100000)/Crime_Rate_Dt.Population)
Crime_Rate_Dt.head()

In [None]:
x=CrimeReport3.drop(columns=['Crime_Count'])
y=CrimeReport3.Crime_Count
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
x_train.reset_index(inplace=True,drop=True)
y_train.reset_index(inplace=True,drop=True)
x_test.reset_index(inplace=True,drop=True)
y_test.reset_index(inplace=True,drop=True)
np.sort(y.unique())

In [None]:
DTR_Model=DecisionTreeRegressor(max_depth=5,random_state=0)
DTR_Model.fit(x_train,y_train)
y_pred=DTR_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))

In [None]:
RFReg_Model=RandomForestRegressor(n_estimators=10,max_depth=5,random_state=0)
RFReg_Model.fit(x_train,y_train)
y_pred=RFReg_Model.predict(x_test)
y_pred=np.round(y_pred,0)
print('Explained Variance Score:',round(met.regression.explained_variance_score(y_test,y_pred),3))
print('Mean Absolute Error:',round(met.regression.mean_absolute_error(y_test,y_pred),3))
print('Mean Squared Error:',round(met.regression.mean_squared_error(y_test,y_pred),3))
print('Root Mean Squared Error:',round(np.sqrt(met.regression.mean_squared_error(y_test,y_pred)),3))
print('Mean Squared Log Error:',round(met.regression.mean_squared_log_error(y_test,y_pred),3))
print('Median Absolute Error:',round(met.regression.median_absolute_error(y_test,y_pred),3))
print('R2 Score:',round(met.regression.r2_score(y_test,y_pred),3))