# Capstone Project - The Battle of Neighborhoods

## Week 1 

For this week, you will be required to submit the following:

1. A description of the problem and a discussion of the background. (15 marks)
2. A description of the data and how it will be used to solve the problem. (15 marks)

# Section 1: Introduction

## Description of the Problem

### Background

There are numerous travel sites scattered about the Internet, FourSquare being on of these, that give you various information pertaining to, restaurants, bars, nightclubs, where to get  breakfast and a good cup of coffee in the morning. The problems with these sites is usually only detail one aspect of the venue. The venue may be most popular place for a night out but it doesn't mean that a tourist or someone new to the city should automatically visit the venoue without more information. The areas surrounding the venue may be a high crime area including robbery, drug activity and assault as examples. These factors may fluctuate depending on timing of proposed visit. The idea of this project is to provide the information pertaining to the venue but coupling this data with crime data to better inform the prospective customer with readily available data to make an informed and safe decision for an enjoyable experience. 

### Project Concept

The concept for this project is to provide visitors to the Orlando area, venues based upon the the FourSquare API query and accompanied with crime data, venue options that they can feel comfortable and safe with their families or possibly single adult.

The approach will follow the basic approach outlined as follows:

1. The travellers decides on a city location. (this case being Orlando, FL)
2. The ForeSquare website is scraped for the top venues in Orlando
3. The list of venues is supplied with geographical data
4. Historical crimes within a given distance of all venues are presented
5. A map is produced showing the selected venues and crime statistics in the area
6. The probability of a crime ocurring near the selected top sites is also presented


### Target Audience

The target audience of htis project is the 10's of thousands or visitors to the Orlando, FL area. The weather and many attractions, including Disney World and Universal Studios, which are consistently in the top attractions of the US, produce many visitors that are not familiar with the local area every year. Along with these transactions there are thousands or restaurants and night clubs for the enjoyment of visitors, but not always in the most family or single visitor safe areas. These visitors include the elderly and single female persons that tend to let there "guard down" when visiting the area.

### Import libraries to read data

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

### Setup credentials for call of the FourSquare API

In [2]:
CLIENT_ID = 'VEC3NZNS5L2KNODCHV3PF10JUXKS32QT4NJ1EQK4ROWBZVXS' # your Foursquare ID
CLIENT_SECRET = 'Q40CAFX222FY32FNK1QA113KP3QT0P2R0C45LWIXDXIVLJ53' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VEC3NZNS5L2KNODCHV3PF10JUXKS32QT4NJ1EQK4ROWBZVXS
CLIENT_SECRET:Q40CAFX222FY32FNK1QA113KP3QT0P2R0C45LWIXDXIVLJ53


### Retrieve Orlando, FL latitude and longitude

In [3]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

address = 'Orlando, FL'

geolocator = Nominatim(user_agent="orlando_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Orlando, FL are {}, {}.'.format(latitude, longitude))

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

The geograpical coordinate of Orlando, FL are 28.5421109, -81.3790304.


# Section 2: Data

### FourSquare Data

Setup URL and parameters for FourSquare API call

In [12]:
LIMIT = 100 # limit the number of venues returned from Foursquare
radius = 2000 # define radius
sortByPopularity = 1
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&sortByPopularity={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT,
    sortByPopularity)

# get the result to a json file
results = requests.get(url).json()

In [14]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

The top **100** most popular venues, as determined by the FourSquare API, will be extracted using the FourSquare API for Orlando, FL. 

In [15]:
from pandas.io.json import json_normalize 
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Lake Eola Park,Park,28.543852,-81.373176
1,SunRail Station at Lynx Central Station,Train Station,28.548547,-81.380897
2,Publix,Grocery Store,28.541963,-81.372978
3,Amway Center,Basketball Stadium,28.539210,-81.383798
4,Dr. Phillips Center for the Performing Arts,Theater,28.537622,-81.377202
...,...,...,...,...
95,Royal Tea,Tea Room,28.553753,-81.364569
96,Chillers,Bar,28.540427,-81.379681
97,Texas Fried Chicken,Fried Chicken Joint,28.537945,-81.397543
98,Cuban-american Cafe,Cuban Restaurant,28.542527,-81.381526


### Description of the FourSquare data

In [16]:
nearby_venues.categories.describe()

count                     100
unique                     56
top       American Restaurant
freq                        8
Name: categories, dtype: object

### Orlando Crime Data

#### Read OPD_Crimes.csv file downloaded from the City of Orlando open data website

In [18]:
Orlando_crime = pd.read_csv("OPD_Crimes.csv")
Orlando_crime.head()

Unnamed: 0,Case Number,Case Date Time,Case Location,Case Offense Location Type,Case Offense Category,Case Offense Type,Case Offense Charge Type,Case Disposition,Status,Location
0,2020-00000700,01/01/2020 03:44:00 PM,9400 Block of JEFF FUQUA BLVD,Airport,Theft,All other larceny,Committed,Open,Unmapped,
1,2019-00002818,01/03/2019 08:22:00 AM,4900 Block of FIJI CIR,Residence/Single,Theft,All other larceny,Committed,Closed,Mapped,"(28.60235426, -81.43691172)"
2,2020-00004890,01/05/2020 09:48:00 AM,9300 Block of JEFF FUQUA BLVD,Airport,Theft,All other larceny,Committed,Closed,Unmapped,
3,2020-00007368,01/07/2020 01:04:00 PM,4700 Block of N PINE HILLS RD,Apartment/Condo,Robbery,Robbery,Committed,Arrest,Mapped,"(28.60018728, -81.45147832)"
4,2020-00008073,01/08/2020 01:18:00 AM,9400 Block of JEFF FUQUA BLVD,Airport,Theft,All other larceny,Committed,Closed,Unmapped,


In [27]:
Orlando_crime.columns = Orlando_crime.columns.str.replace(' ','')

### Description of Orlando Crime data

In [37]:
Orlando_crime.describe()

Unnamed: 0,CaseNumber,CaseDateTime,CaseLocation,CaseOffenseLocationType,CaseOffenseCategory,CaseOffenseType,CaseOffenseChargeType,CaseDisposition,Status,Location
count,230312,230310,230312,230312,230312,230312,230312,230312,230312,221022
unique,230311,224480,14645,79,13,25,3,5,3,17765
top,CaseNumber,04/25/2018 09:16:00 AM,4900 Block of INTERNATIONAL DR,Apartment/Condo,Theft,All other larceny,Committed,Closed,Mapped,"(28.43180352, -81.30852827)"
freq,2,5,4552,40447,116334,59069,217391,163651,221022,4736


### Frequency of criminal offenses - Orlando, FL

In [38]:
Orlando_crime['CaseOffenseCategory'].value_counts()

Theft                  116334
Burglary                29594
Assault                 25713
Narcotics               24529
Fraud                   15200
Vehicle Theft           11811
Robbery                  6458
Arson                     290
Homicide                  197
Kidnapping                123
Embezzlement               58
Bribery                     3
CaseOffenseCategory         2
Name: CaseOffenseCategory, dtype: int64

### Frequency of criminal offenses are top locations - these locations will be converted to latitude/longitude for mapping and clustering activities

In [39]:
Orlando_crime['CaseLocation'].value_counts()

4900 Block of INTERNATIONAL DR         4552
2500 Block of S KIRKMAN RD             2190
4200 Block of CONROY RD                1876
5900 Block of S GOLDENROD RD           1787
1000 Block of UNIVERSAL STUDIOS PLZ    1766
                                       ... 
CONROY RD / SOUTHGATE DR                  1
100 Block of W NEW HAMPSHIRE ST           1
RIO GRANDE / 24TH ST                      1
C R SMITH ST/S JOHN YOUNG PKWY            1
3500 Block of CHELSEA ST                  1
Name: CaseLocation, Length: 14645, dtype: int64