# Capstone Project - The Battle of the Neighborhoods (Week 2)
## Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

A description of the problem and a discussion of the background
As of 2016, Detroit has the fourth highest murder rate among major cities in the United States after St. Louis, Baltimore and New Orleans and the 42nd highest murder rate in the world. The crime rate has decreased over the years, but the city is overridden with economic downturns and high unemployment. This analysis will help local government agencies as well as tourists to identify geographical areas of interests. Government agencies will be able to make informed and focussed decisions to reap out desired outcomes more efficiently. The tourists will be aware of the areas that should be avoided for safe travel and stay. Enterpreneurs can also understand demographics of various areas around city to make better investments for growth and profits.

There are 100's, maybe even 1000's, of travel sites on the Internet, including FourSquare, that will tell you all about places to go, things to see, restaurants to eat at, bars to drink in, nightclubs to part the night away in and then where to go in the morning to get breakfast and a strong coffee. The problems with these sites is that they are one dimensional. If you want to find out all this information about a city you plan to visit next month, you have to do the hard work. Also, just because a venue is the hottest place to go for a night out does not always mean that the unwitting tourist should just ramble in unprepared. The areas surrounding this new venue might be riddled with crime including muggings, car theft and assault, for example. Approach the venue from any direction other than from the north and you could be putting your life in danger. This is when my idea comes in.

Imagine the following scenario:

Where to eat at a particular time of the day?

Where to invest for a promising business growth in a new city?

Which areas are crime prone and require better supervison and monitoring?

### What do you do ... ?

#### Project Idea

My idea for the Capstone Project is to show that when driven by venue and location data from FourSquare, backed up with open source crime data, that it is possible to present the cautious and nervous traveller with a list of attractions to visit supplementd with a graphics showing the occurance of crime in the region of the venue.

A high level approach is as follows:

- The travellers decides on a city location [in this case Detroit]
- The ForeSquare website is scrapped for the top venues in the city
- From this list of top venues the list is augmented with additional grographical data
- Using this additional geographical data the top nearby restaurents are selects
- The historical crime within a predetermined distance of all venues are obtained
- A map is presented to the to the traveller showing the selected venues and crime statistics of the area.
- The future probability of a crime happening near or around the selected top sites is also presented to the user

#### Beneficiaries

This solution is targeted for informed decision making. The want to see all the main sites of a city that they have never visited before but at the same time, for whatever reaons unknown, they want to be able to do all that they can to make sure that they stay clear of trouble i.e. is it safe to visit this venue and this restaurant at 4:00 pm in the afternoon.

Some examples of envisioned users include:

1. Government Agencies
2. Traveller
3. Enterpreneur

#### There are many data science aspect of this project including:

- Data Acquisition
- Data Cleansing
- Data Analysis
- Machine Learning

## Data <a name="data"></a>

In this section, I will describe the data used to solve the problem as described previously.

As noted below in the Further Development Section, it is possible to attempt quite complex and sophisticated scenarios when approaching this problem. However, given the size of the project and for simplicity only the following scenario will be addressed:

1. Query the FourSqaure website for the top sites in Detroit

2. Use the FourSquare API to get supplemental geographical data about the top sites

3. Use the FourSquare API to get top restaurent recommendations closest to each of the top site

4. Use open source Detroit Crime data to provide the user with additional crime data

### Data Acquistion
The first phase of the project is to acquire all of the data that is needed for this project. The initial data required can be broken down into three separate data sets:

- The FourSquare Top 30 Venues to Visit in Detroit
- For each of the Top Site get a list of up restaurants in the surrounding area
- The Detroit Police Department Crime Data for the 2018

### Import Libraries

In this section we import the libraries that will be required to process the data.

**Pandas**.
Pandas is an open source, BSD-licensed library, providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas will be used to house each of the data sets.

**Requests**. Requests is a Python HTTP library, released under the Apache2 License. The goal of the project is to make HTTP requests simpler and more human-friendly.

**BeautifulSoup**. Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

**json** Library to handle JSON files

**Nominatim** Convert an address into latitude and longitude values

In [62]:
# Import Pandas to provide DataFrame support
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Import Requests
import requests

# Import BeautifulSoup
from bs4 import BeautifulSoup

import json 

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim 

import requests #Send GET request and examine results

Solving environment: done

# All requested packages already installed.



### FourSquare Top 30 Venues to Visit in Detroit

FourSquare does not actually provide an API that will return a list of the top venues to visit in a city. To get this list we can though use the FourSquare website directly to request the top sites in Detroit and then use BeautifulSoup to scrape the data we need. Once we have this starting data the other supplemental data we need to complete this dataset can be retrieved from using the FourSquare Venue API.

#### Define Foursquare Credentials and Version

In [63]:
CLIENT_ID = 'Y5PGXOKZOSY5YPEH2DL3JY1DMADMGC5IYUOL2002DOQ2IAP3' # your Foursquare ID
CLIENT_SECRET = 'E3HPKJDZJMLJWV43GM5NSO32VPMQQBFMWHWC13EENQZ4FFQB' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 50
radius = 50000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y5PGXOKZOSY5YPEH2DL3JY1DMADMGC5IYUOL2002DOQ2IAP3
CLIENT_SECRET:E3HPKJDZJMLJWV43GM5NSO32VPMQQBFMWHWC13EENQZ4FFQB


In [64]:
address = 'Detroit'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)


42.3315509 -83.0466403


In [65]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=Y5PGXOKZOSY5YPEH2DL3JY1DMADMGC5IYUOL2002DOQ2IAP3&client_secret=E3HPKJDZJMLJWV43GM5NSO32VPMQQBFMWHWC13EENQZ4FFQB&ll=42.3315509,-83.0466403&v=20180604&radius=50000&limit=50'

In [66]:
results = requests.get(url).json()

In [67]:
results

{'meta': {'code': 200, 'requestId': '5d0730bbd9a6e60033f809c5'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-456fd59ff964a520333e1fe3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d163941735',
         'name': 'Park',
         'pluralName': 'Parks',
         'primary': True,
         'shortName': 'Park'}],
       'id': '456fd59ff964a520333e1fe3',
       'location': {'address': 'Woodward Ave',
        'cc': 'US',
        'city': 'Detroit',
        'country': 'United States',
        'crossStreet': 'at Michigan Ave',
        'distance': 4,
        'formattedAddress': ['Woodward Ave (at Michigan Ave)',
         'Detroit, MI 48226',
         'United States'],
        'labeledLatLngs': [{'

From this HTML the following data will be extracted:

- Venue Name
- Venue Score
- Venue Category
- Venue HREF
- Venue ID (Extracted from the HREF)

### Create Top Venues Dataframe

The top_venues list, a sample of which is shown above, only contains some of the data required. In addition to the attributes extracted directly from the HTML code the following attributes are also required:

- Venue Address
- Venue Postalcode
- Venue City
- Venue Latitude
- Venue Longitude

These attributes will be obtained directly from FourSquare using the venues API. The process is as follows:

1. Create a new empty Pandas dataframe to hold the data for the Top Sites / Venues
2. Extract the available attributes from the HTML code
3. For each venue
    - Contruct a URL to interagate the FourSquare Venue API for each top site
    - Using the venues API and the URL request the data from FourSquare
    - Get the properly formatted address and the latitude and longitude data from the returned JSON
    - Write the data for each venue to the top venues dataframs

In [68]:
items = results['response']['groups'][0]['items']
items[0]

{'reasons': {'count': 0,
  'items': [{'reasonName': 'globalInteractionReason',
    'summary': 'This spot is popular',
    'type': 'general'}]},
 'referralId': 'e-0-456fd59ff964a520333e1fe3-0',
 'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
     'suffix': '.png'},
    'id': '4bf58dd8d48988d163941735',
    'name': 'Park',
    'pluralName': 'Parks',
    'primary': True,
    'shortName': 'Park'}],
  'id': '456fd59ff964a520333e1fe3',
  'location': {'address': 'Woodward Ave',
   'cc': 'US',
   'city': 'Detroit',
   'country': 'United States',
   'crossStreet': 'at Michigan Ave',
   'distance': 4,
   'formattedAddress': ['Woodward Ave (at Michigan Ave)',
    'Detroit, MI 48226',
    'United States'],
   'labeledLatLngs': [{'label': 'display',
     'lat': 42.33157500935305,
     'lng': -83.04659843444824}],
   'lat': 42.33157500935305,
   'lng': -83.04659843444824,
   'postalCode': '48226',
   'state': 'MI'},
  'name': 'Campus Marti

#### Process JSON and convert it to a clean dataframe

In [69]:
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

In [70]:
dataframe = json_normalize(items) # flatten JSON
dataframe.head()

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.delivery.id,venue.delivery.provider.icon.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.name,venue.delivery.url,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,venue.location.distance,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.neighborhood,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups,venue.venuePage.id
0,0,"[{'reasonName': 'globalInteractionReason', 'ty...",e-0-456fd59ff964a520333e1fe3-0,"[{'primary': True, 'icon': {'suffix': '.png', ...",,,,,,,456fd59ff964a520333e1fe3,Woodward Ave,US,Detroit,United States,at Michigan Ave,4,"[Woodward Ave (at Michigan Ave), Detroit, MI 4...","[{'lat': 42.33157500935305, 'label': 'display'...",42.331575,-83.046598,,48226,MI,Campus Martius,0,[],
1,0,"[{'reasonName': 'globalInteractionReason', 'ty...",e-0-4a64d346f964a520c6c61fe3-1,"[{'primary': True, 'icon': {'suffix': '.png', ...",,,,,,,4a64d346f964a520c6c61fe3,Atwater St,US,Detroit,United States,at Civic Center Dr,600,"[Atwater St (at Civic Center Dr), Detroit, MI ...","[{'lat': 42.32649077625808, 'label': 'display'...",42.326491,-83.044109,,48226,MI,Detroit RiverWalk,0,[],
2,0,"[{'reasonName': 'globalInteractionReason', 'ty...",e-0-53b98b96498ef894b65e877c-2,"[{'primary': True, 'icon': {'suffix': '.png', ...",609876.0,/delivery_provider_grubhub_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",grubhub,https://www.grubhub.com/restaurant/wright--com...,53b98b96498ef894b65e877c,1500 Woodward Ave Fl 2,US,Detroit,United States,at John R St,439,"[1500 Woodward Ave Fl 2 (at John R St), Detroi...","[{'lat': 42.33500043335515, 'label': 'display'...",42.335,-83.04923,,48226,MI,Wright & Co.,0,[],
3,0,"[{'reasonName': 'globalInteractionReason', 'ty...",e-0-58ef8f318f2c1a27684b29cb-3,"[{'primary': True, 'icon': {'suffix': '.png', ...",,,,,,,58ef8f318f2c1a27684b29cb,1049 Woodward,US,Detroit,United States,,167,"[1049 Woodward, Detroit, MI 48226, United States]","[{'lat': 42.33283408579149, 'label': 'display'...",42.332834,-83.047694,,48226,MI,Avalon Cafe and Bakery,0,[],
4,0,"[{'reasonName': 'globalInteractionReason', 'ty...",e-0-4ae4de7df964a520f59e21e3-4,"[{'primary': True, 'icon': {'suffix': '.png', ...",,,,,,,4ae4de7df964a520f59e21e3,2211 Woodward Ave,US,Detroit,United States,at Montcalm St.,905,"[2211 Woodward Ave (at Montcalm St.), Detroit,...","[{'lat': 42.33857952985405, 'label': 'display'...",42.33858,-83.052181,,48201,MI,Fox Theatre,0,[],33279924.0


In [71]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(10)

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Campus Martius,Park,Woodward Ave,US,Detroit,United States,at Michigan Ave,4,"[Woodward Ave (at Michigan Ave), Detroit, MI 4...","[{'lat': 42.33157500935305, 'label': 'display'...",42.331575,-83.046598,,48226.0,MI,456fd59ff964a520333e1fe3
1,Detroit RiverWalk,Waterfront,Atwater St,US,Detroit,United States,at Civic Center Dr,600,"[Atwater St (at Civic Center Dr), Detroit, MI ...","[{'lat': 42.32649077625808, 'label': 'display'...",42.326491,-83.044109,,48226.0,MI,4a64d346f964a520c6c61fe3
2,Wright & Co.,Cocktail Bar,1500 Woodward Ave Fl 2,US,Detroit,United States,at John R St,439,"[1500 Woodward Ave Fl 2 (at John R St), Detroi...","[{'lat': 42.33500043335515, 'label': 'display'...",42.335,-83.04923,,48226.0,MI,53b98b96498ef894b65e877c
3,Avalon Cafe and Bakery,Café,1049 Woodward,US,Detroit,United States,,167,"[1049 Woodward, Detroit, MI 48226, United States]","[{'lat': 42.33283408579149, 'label': 'display'...",42.332834,-83.047694,,48226.0,MI,58ef8f318f2c1a27684b29cb
4,Fox Theatre,Theater,2211 Woodward Ave,US,Detroit,United States,at Montcalm St.,905,"[2211 Woodward Ave (at Montcalm St.), Detroit,...","[{'lat': 42.33857952985405, 'label': 'display'...",42.33858,-83.052181,,48201.0,MI,4ae4de7df964a520f59e21e3
5,Detroit Athletic Club (DAC),Gym,241 Madison St,US,Detroit,United States,,639,"[241 Madison St, Detroit, MI 48226, United Sta...","[{'lat': 42.33728538599592, 'label': 'display'...",42.337285,-83.047067,,48226.0,MI,4b5da7a7f964a520546529e3
6,The Belt,Art Gallery,,US,Detroit,United States,,318,"[Detroit, MI, United States]","[{'lat': 42.33438623492856, 'label': 'display'...",42.334386,-83.046087,,,MI,55bd50ef498ed9ea3ebec47b
7,Comerica Park,Baseball Stadium,2100 Woodward Ave,US,Detroit,United States,,880,"[2100 Woodward Ave, Detroit, MI 48201, United ...",,42.339241,-83.04914,,48201.0,MI,4b15507bf964a520a7b023e3
8,The Fillmore Detroit,Concert Hall,2115 Woodward Ave,US,Detroit,United States,at W Elizabeth St.,813,"[2115 Woodward Ave (at W Elizabeth St.), Detro...","[{'lat': 42.33752629478861, 'label': 'display'...",42.337526,-83.052324,,48201.0,MI,4aac7b54f964a520195e20e3
9,Windsor Riverwalk,Park,Ouellette Ave.,CA,Windsor,Canada,,1385,"[Ouellette Ave., Windsor ON, Canada]","[{'lat': 42.31958952170523, 'label': 'display'...",42.31959,-83.041985,,,ON,4b96d534f964a52096e734e3


In [72]:
# Verify the shape of the top venues dataframe
dataframe_filtered.shape

(50, 16)

In [73]:
# Verify the dtypes of the top venues dataframe
dataframe_filtered.dtypes

name                 object
categories           object
address              object
cc                   object
city                 object
country              object
crossStreet          object
distance              int64
formattedAddress     object
labeledLatLngs       object
lat                 float64
lng                 float64
neighborhood         object
postalCode           object
state                object
id                   object
dtype: object

#### Removing the irrelevant columns

In [75]:
dataframe_filtered.drop(['country','crossStreet','labeledLatLngs','formattedAddress','cc'], axis = 1, inplace=True)


ValueError: labels ['country' 'crossStreet' 'labeledLatLngs' 'formattedAddress' 'cc'] not contained in axis

In [81]:
dataframe_filtered.head()

Unnamed: 0,name,categories,address,city,distance,lat,lng,neighborhood,postalCode,state,id
0,Campus Martius,Park,Woodward Ave,Detroit,4,42.331575,-83.046598,,48226,MI,456fd59ff964a520333e1fe3
1,Detroit RiverWalk,Waterfront,Atwater St,Detroit,600,42.326491,-83.044109,,48226,MI,4a64d346f964a520c6c61fe3
2,Wright & Co.,Cocktail Bar,1500 Woodward Ave Fl 2,Detroit,439,42.335,-83.04923,,48226,MI,53b98b96498ef894b65e877c
3,Avalon Cafe and Bakery,Café,1049 Woodward,Detroit,167,42.332834,-83.047694,,48226,MI,58ef8f318f2c1a27684b29cb
4,Fox Theatre,Theater,2211 Woodward Ave,Detroit,905,42.33858,-83.052181,,48201,MI,4ae4de7df964a520f59e21e3


## Import and process the Detroit Crime DataSet

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Detroit in the last year, minus the most recent seven days. Data is extracted from the Detroit Police Department's open data platform. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified.

### Import the 2018 DataSet

The full dataset contains crime data from 2016 to present.. The following clean-up steps are required:

Not all of the columns are required. The following columns are kept:
- Crime ID
- Census Block GEOID 
- Incident Date & Time
- Neighborhood
- Latitude
- Longitude

In [250]:
pd.read_csv(r'DPD__All_Crime_Incidents__December_6__2016_-_Present.csv')

FileNotFoundError: File b'DPD__All_Crime_Incidents__December_6__2016_-_Present.csv' does not exist

In [244]:
import os


In [245]:
pwd


'/home/dsxuser/work'