# Capstone Project: The Battle of Neighborhoods (Week 1)

For this week, you are required to submit the following: 

_Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem._

This submission will eventually become your **'Introduction/Business Problem'** section in your final report. 

_It is recommended that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it._

1. A description of the problem and a discussion of the background. (15 marks)
2. A description of the data and how it will be used to solve the problem. (15 marks)

# 1 Introduction

In this section a description of the problem and a discussion of the background is provided.

### 1.1 Background Information
Almost everyone needs to move at some point in their lifetime. Be it moving neighborhoods, cities or even continents. As you move to a new city there are several factors that may influence where you settle in and where you are likely to set up a business within the region you have moved to. Besides the price of the housing, access to facilities and proximity to important resources, one very important factor is safety. Even though different people put emphasis on different things, your safety mostly comes first. Therefore most people moving to a new city have a common problem: How to find a neighborhood to settle in that is safe and secure? In this report we will attempt to answer this question using Data Science.

### 1.2 Problem statement: Where to settle and start a business in Vancouver, Canada
In this report, we are trying to find the safest borough in Vancouver for Alex, where he would most likely would feel comfortable, safe and secure. Alex is 30 years old and has just moved to Vancouver. He has had good stories about living in Vancouver but he recently came through an article that indicated increased burglaries and crime rates in Vancouver. Alex has been running a grocery store in Toronto where he used to stay before moving and intends to open a similar store in his new city.

The aim of this project is to find a safe and secure location for opening of commercial establishments in Vancouver, Canada. Specifically, this report will be targeted to stakeholders interested in opening any business place in Vancouver City, Canada.

The first task would be to choose the safest borough by analysing crime data for opening a grocery store and short listing a neighbourhood, where grocery stores are not amongst the most commom venues, and yet as close to the city as possible.

We will make use of our data science tools to analyse data and focus on the safest borough and explore its neighborhoods and the 10 most common venues in each neighborhood so that the best neighborhood where grocery store is not amongst the most common venue can be selected.

### 1.3 Target audience for this report
This report is an analysis of the boroughs in Vancouver to establish the safest area to open commercial establishments in and specifically grocery stores. The information gathered from Foursquare in combination with data science methods form a good basis to derive data driven decisions regarding boroughs that best fit the specific needs at hand. It would even be possible for any new comers to Vancouver to use some similar approaches to find the perfect home for their business in the City.

# 2 Data 

Based on definition of our problem, factors that will influence our decision are as follows:

- Finding the safest borough in Vancouver based on crime statistics
- Finding the most common venues
- Choosing the right neighbourhood within the borough

We will be using the geographical coordinates of Vancouver to plot neighbourhoods in a borough that is safe and in the city's vicinity, and finally cluster our neighborhoods and present our findings.

Following data sources will be needed to extract/generate the required information:

1. _Part 1_: Using a real world data set from Kaggle containing the Vancouver Crimes from 2003 to 2019: A dataset consisting of the crime statistics of each Neighbourhoof in Vancouver along with type of crime, recorded year, month and hour.

2. _Part 2_: Gathering additional information of the list of officially categorized boroughs in Vancouver from Wikipedia.: Borough information will be used to map the existing data where each neighbourhood can be assigned with the right borough.

3. _Part 3_: Creating a new consolidated dataset of the Neighborhoods, along with their boroughs, crime data and the respective Neighbourhood's co-ordinates.: This data will be fetched using OpenCage Geocoder to find the safest borough and explore the neighbourhood by plotting it on maps using Folium and perform exploratory data analysis.

4. _Part 4_: Creating a new consolidated dataset of the Neighborhoods, boroughs, and the most common venues and the respective Neighbourhood along with co-ordinates.: This data will be fetched using Four Square API to explore the neighbourhood venues and to apply machine learning algorithm to cluster the neighbourhoods and present the findings by plotting it on maps using Folium.

### Part 1

#### Using a real world data set from Kaggle containing the Vancouver Crimes from 2003

Vancouver Crime Report
Properties of the Crime Report

- TYPE - Crime type
- YEAR - Recorded year
- MONTH - Recorded month
- DAY - Recorded day
- HOUR - Recorded hour
- MINUTE - Recorded minute
- HUNDRED_BLOCK - Recorded block
- NEIGHBOURHOOD - Recorded neighborhood
- X - GPS longtitude
- Y - GPS latitude

Data set URL: https://www.kaggle.com/agilesifaka/vancouver-crime-report/version/2

### Importing all the necessary libraries

In [2]:
import numpy as np
import pandas as pd

#Command to install OpenCage Geocoder for fetching Lat and Lng of Neighborhood
!pip install opencage

#Importing OpenCage Geocoder
from opencage.geocoder import OpenCageGeocode

# use the inline backend to generate the plots within the browser
%matplotlib inline 

#Importing Matplot lib and associated packages to perform Data Visualisation and Exploratory Data Analysis
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') # optional: for ggplot-like style

# check for latest version of Matplotlib
print ('Matplotlib version: ', mpl.__version__) # >= 2.0.0

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Importing folium to visualise Maps and plot based on Lat and Lng
import folium

#Requests to request web pages by making get requests to FourSquare REST Client
import requests

#To normalise data returned by FourSquare API
from pandas.io.json import json_normalize

#Importing KMeans from SciKit library to Classify neighborhoods into clusters
from sklearn.cluster import KMeans

print('Libraries imported')

Collecting opencage
  Downloading https://files.pythonhosted.org/packages/44/56/e912b950ab7b05902c08ebc3eb6c6e22f40ca2657194e04fc205a9d793e7/opencage-1.2.2-py3-none-any.whl
Collecting backoff>=1.10.0 (from opencage)
  Downloading https://files.pythonhosted.org/packages/f0/32/c5dd4f4b0746e9ec05ace2a5045c1fc375ae67ee94355344ad6c7005fd87/backoff-1.10.0-py2.py3-none-any.whl
Installing collected packages: backoff, opencage
Successfully installed backoff-1.10.0 opencage-1.2.2
Matplotlib version:  3.3.2
Libraries imported


### Reading from the Dataset
Since the dataset was really huge (~ 600,000 rows), it was not possible to process all of them and instead for this project we will be considering the Vancouver crime report for 2018.

In [3]:
vnc_crime_df = pd.read_csv('https://raw.githubusercontent.com/RamanujaSVL/Coursera_Capstone/master/vancouver_crime_records_2018.csv', index_col=None)

#Dropping X,Y which represents Lat, Lng data as Coordinates, the data seems to be corrupt
vnc_crime_df.drop(['Unnamed: 0','MINUTE', 'HUNDRED_BLOCK', 'X', 'Y'], axis = 1, inplace = True)

#vnc_crime_df.columns

vnc_crime_df.head()

Unnamed: 0,TYPE,YEAR,MONTH,DAY,HOUR,NEIGHBOURHOOD
0,Break and Enter Commercial,2018,3,2,6,West End
1,Break and Enter Commercial,2018,6,16,18,West End
2,Break and Enter Commercial,2018,12,12,0,West End
3,Break and Enter Commercial,2018,4,9,6,Central Business District
4,Break and Enter Commercial,2018,10,2,18,Central Business District


#### Total Crime count in different Neighborhoods in Vancouver

In [5]:
vnc_crime_df['NEIGHBOURHOOD'].value_counts()

Central Business District    10857
West End                      3031
Mount Pleasant                2396
Strathcona                    1987
Kitsilano                     1802
Fairview                      1795
Renfrew-Collingwood           1762
Grandview-Woodland            1761
Kensington-Cedar Cottage      1391
Hastings-Sunrise              1270
Sunset                         967
Riley Park                     866
Marpole                        828
Victoria-Fraserview            600
Killarney                      565
Oakridge                       499
Dunbar-Southlands              474
Kerrisdale                     417
Shaughnessy                    414
West Point Grey                372
Arbutus Ridge                  311
South Cambie                   292
Stanley Park                   154
Musqueam                        17
Name: NEIGHBOURHOOD, dtype: int64