# Capstone Project - Washington activities
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for activities in Washington. 

Since there are lots of activities as restaurant, theaters and bars near from his hotel in Washington we will try to detect **locations where they can make more activities as close to city center as possible**, assuming that first two conditions are met.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of activities)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of activities and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Washington center will be obtained using **Google Maps API geocoding** of well known Washington location 

In [20]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

from sklearn.cluster import KMeans


! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [10]:
CLIENT_ID = 'LOZHY4NOIVOOHDDMZQ5AQJYCCQVEYX44KG2QPXSBJWIWSA2V' # your Foursquare ID
CLIENT_SECRET = 'NMUURBJEQ3RXCHAHW2WPNW0EJ33ZEUCMFWD0I5J1ANHDB4J2' # your Foursquare Secret
ACCESS_TOKEN = 'DPJMFOZHV55HBZVQZK1XOWMJJ0RPPCHCUYIPB3G2SYP3UFSI' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 50
address = '1731 New Hampshire Ave NW, Washington, DC'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
search_query = 'theater'
radius = 400

We will work with this information, taking three points: Teathers, restaurants and bars; nearest from where they are alocated. 

In [11]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
TheaterResults = requests.get(url).json()
#TheaterResults

search_query = 'Restaurant'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
RestaurantResults = requests.get(url).json()
#RestaurantResults

search_query = 'Bar'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
BarResults = requests.get(url).json()
#BarResults

In [41]:
venuesA = TheaterResults['response']['venues']
venuesB = RestaurantResults['response']['venues']
venuesC = BarResults['response']['venues']


# tranform venues into a dataframe
dataframeA = pd.json_normalize(venuesA)
dataframeB = pd.json_normalize(venuesB)
dataframeC = pd.json_normalize(venuesC)
dataframe = pd.concat([dataframeA,dataframeB,dataframeC])
dataframe.head()

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

#dataframe_filtered


In [15]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Carlyle Suite Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

After of make an initial exploration of the data, we need determinate where can they come to meet the city.
So, we will cluster the information to see where can be the better place to visit, based on visit more places in the short time.

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of W. DC, where the scientist can visit fast and secure.

In first step we have collected the required **data: location and type (category) of every restaurant within 6km from Berlin center** (Alexanderplatz). We have also **identified Italian restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Berlin - we will use **heatmaps** to identify a few promising areas close to center with low number of restaurants in general (*and* no Italian restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 400 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

In [54]:
venues_grouped = dataframe_filtered.groupby('categories').mean().reset_index()

# set number of clusters
kclusters = 5
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30] 

array([3, 0, 3, 3, 2, 1, 0, 1, 3, 2, 2, 2, 0, 4, 3, 2, 2, 0, 0, 1],
      dtype=int32)

In [78]:
from folium import plugins
#dataframe_filtered

N = 100
venues_map_Clusters = folium.Map(location=[latitude, longitude], zoom_start=17) # generate map centred around the Conrad Hotel
data = np.array(
    [
        dataframe_filtered.lat,  # Random latitudes in Europe.
        dataframe_filtered.lng,  # Random longitudes in Europe.
    ]
).T


popups = [str(i) for i in range(N)]  # Popups texts are simple numbers.

folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Carlyle Suite Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map_Clusters)

folium.plugins.MarkerCluster(data, popups=popups).add_to(venues_map_Clusters)


venues_map_Clusters

Our scientist will come to the Q Street Nothwest with 17th Street nortwest, there they will visit 7 diferent places and will have the posibility of move some more an visit alround of 100mts six new places, not registered.


## Results and Discussion <a name="results"></a>

One venue of interest is alround of 250mts and near of this, alround of 100mts, would visit another 6 sites, distributed, this is the best options in the map.

## Conclusion <a name="conclusion"></a>

the scientist have so many options, but using python will have the best option to meet venues in the city, alround of 14 places can be know by them