# Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods (Week 1)
Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve.

## 1) Introduction/Business Problem
Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

__The idea of this study is to help people planning their journey in the San Francisco. It will help them to choose the right route where the crime is low.__


## 2) Data
Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

__To help people we will use dataset from site San Francisco police department. It contains data from 2003 to 2018.  
Site: https://datasf.org/__

## Import Libraries

In this section we import the libraries that will be required to process the data.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np

In [2]:
%matplotlib inline 

import matplotlib as mpl
import matplotlib.pyplot as plt

import seaborn as sns

In [3]:
# Use Folium to display the Maps for Visualisation
import folium
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster
from folium import plugins


In [4]:
# Module to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

In [5]:
# All the SciKit Learn Libraries Required
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
from sklearn.model_selection import KFold, cross_val_score


## Import the DataSet

In [15]:
#data were taken from https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry/data
df = pd.read_csv('Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv')

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2215024 entries, 0 to 2215023
Data columns (total 33 columns):
IncidntNum                                              int64
Category                                                object
Descript                                                object
DayOfWeek                                               object
Date                                                    object
Time                                                    object
PdDistrict                                              object
Resolution                                              object
Address                                                 object
X                                                       float64
Y                                                       float64
Location                                                object
PdId                                                    int64
SF Find Neighborhoods                                   float64
Curr

In [9]:
df.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,DELETE - Fire Prevention Districts,DELETE - Police Districts,DELETE - Supervisor Districts,DELETE - Zip Codes,DELETE - Neighborhoods,DELETE - 2017 Fix It Zones,Civic Center Harm Reduction Project Boundary,Fix It Zones as of 2017-11-06,DELETE - HSOC Zones,Fix It Zones as of 2018-02-07,"CBD, BID and GBD Boundaries as of 2017","Areas of Vulnerability, 2016",Central Market/Tenderloin Boundary,Central Market/Tenderloin Boundary Polygon - Updated,HSOC Zones as of 2018-06-05,OWED Public Spaces
0,146196161,NON-CRIMINAL,LOST PROPERTY,Tuesday,09/23/2014,01:00,SOUTHERN,NONE,800 Block of BRYANT ST,-122.403405,37.775421,POINT (-122.403404791479 37.775420706711),14619616171000,32.0,1.0,10.0,34.0,14.0,2.0,9.0,28853.0,34.0,,,,,,,2.0,,,,
1,150045675,ASSAULT,BATTERY,Thursday,01/15/2015,17:00,TARAVAL,NONE,1800 Block of VICENTE ST,-122.485604,37.738821,POINT (-122.48560378101 37.7388214326705),15004567504134,40.0,10.0,7.0,35.0,1.0,8.0,3.0,29491.0,35.0,,,,,,,1.0,,,,
2,140632022,SUSPICIOUS OCC,INVESTIGATIVE DETENTION,Wednesday,07/30/2014,09:32,BAYVIEW,NONE,100 Block of GILLETTE AV,-122.396535,37.71066,POINT (-122.396535107224 37.7106603302503),14063202264085,89.0,2.0,9.0,1.0,10.0,3.0,8.0,309.0,1.0,,,,,,,1.0,,,,
3,150383259,ASSAULT,BATTERY,Saturday,05/02/2015,23:10,BAYVIEW,"ARREST, BOOKED",2400 Block of PHELPS ST,-122.400131,37.730093,POINT (-122.400130573297 37.7300925390327),15038325904134,87.0,2.0,9.0,1.0,10.0,3.0,8.0,58.0,1.0,,,,,,,2.0,,,,
4,40753980,OTHER OFFENSES,RECKLESS DRIVING,Friday,07/02/2004,13:43,BAYVIEW,NONE,I-280 / CESAR CHAVEZ ST,-120.5,90.0,POINT (-120.5 90),4075398065020,,,,,,,,,,,,,,,,,,,,


In [11]:
df.shape

(2215024, 33)

# Clean up the data and prepare

In [12]:
 df.isnull().sum()

IncidntNum                                                    0
Category                                                      0
Descript                                                      0
DayOfWeek                                                     0
Date                                                          0
Time                                                          0
PdDistrict                                                    1
Resolution                                                    0
Address                                                       0
X                                                             0
Y                                                             0
Location                                                      0
PdId                                                          0
SF Find Neighborhoods                                      6077
Current Police Districts                                   1112
Current Supervisor Districts            

In [50]:
#We take only important columns
df_import=df[['IncidntNum','Category','Descript','DayOfWeek','Date','Time','PdDistrict','Address','X','Y','PdId']]

In [51]:
df_import.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2215024 entries, 0 to 2215023
Data columns (total 11 columns):
IncidntNum    int64
Category      object
Descript      object
DayOfWeek     object
Date          object
Time          object
PdDistrict    object
Address       object
X             float64
Y             float64
PdId          int64
dtypes: float64(2), int64(2), object(7)
memory usage: 185.9+ MB


In [52]:
df_import.head(3)

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Address,X,Y,PdId
0,146196161,NON-CRIMINAL,LOST PROPERTY,Tuesday,09/23/2014,01:00,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421,14619616171000
1,150045675,ASSAULT,BATTERY,Thursday,01/15/2015,17:00,TARAVAL,1800 Block of VICENTE ST,-122.485604,37.738821,15004567504134
2,140632022,SUSPICIOUS OCC,INVESTIGATIVE DETENTION,Wednesday,07/30/2014,09:32,BAYVIEW,100 Block of GILLETTE AV,-122.396535,37.71066,14063202264085


We have to add new column-it will help us to make the models.          
Add new columns for the  day, month and year of the crime:  
1.Day  
2.Month Number  
3.Year  
4.Year and Month  

In [53]:
df_import['Date'] =  pd.to_datetime(df_import['Date'], format='%m/%d/%Y')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [54]:
df_import['Day'] = df_import['Date'].dt.day
df_import['month'] = df_import['Date'].dt.month
df_import['year'] = df_import['Date'].dt.year
df_import['year_month'] = df_import['Date'].dt.to_period('M')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .l

In [55]:
df_import.isna().sum()

IncidntNum    0
Category      0
Descript      0
DayOfWeek     0
Date          0
Time          0
PdDistrict    1
Address       0
X             0
Y             0
PdId          0
Day           0
month         0
year          0
year_month    0
dtype: int64

In [56]:
df_import.dropna(inplace=True)
df_import.reindex()
df_import.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Address,X,Y,PdId,Day,month,year,year_month
0,146196161,NON-CRIMINAL,LOST PROPERTY,Tuesday,2014-09-23,01:00,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421,14619616171000,23,9,2014,2014-09
1,150045675,ASSAULT,BATTERY,Thursday,2015-01-15,17:00,TARAVAL,1800 Block of VICENTE ST,-122.485604,37.738821,15004567504134,15,1,2015,2015-01
2,140632022,SUSPICIOUS OCC,INVESTIGATIVE DETENTION,Wednesday,2014-07-30,09:32,BAYVIEW,100 Block of GILLETTE AV,-122.396535,37.71066,14063202264085,30,7,2014,2014-07
3,150383259,ASSAULT,BATTERY,Saturday,2015-05-02,23:10,BAYVIEW,2400 Block of PHELPS ST,-122.400131,37.730093,15038325904134,2,5,2015,2015-05
4,40753980,OTHER OFFENSES,RECKLESS DRIVING,Friday,2004-07-02,13:43,BAYVIEW,I-280 / CESAR CHAVEZ ST,-120.5,90.0,4075398065020,2,7,2004,2004-07
