# Capstone Project - The Battle of Neighborhoods

---
## (Location-based Personalized Advertisements)

* ### **Introduction/Business Problem:**

Marketing is an essential topic for every business. If you want to boost your business, you have to present your products to the audience, or even better, the targeted audience. Commercials, advertisements, coupons are everywhere now in digital world nowadays. People cannot avoid them but also, they cannot be attracted to most of the information because they have limited concentration and would rather allocate more time on things they actually desire. That’s why customers are welcoming to relevant marketing messages rather than being bombarded with a plethora of deals that do not excite them. A customer in the kids’ section is likely to be interested in offers on kid garments. However, a standee displaying offers on kids’ garments is not relevant to most customers walking passed it. 

So now the question lies ahead, if you are business owner and you want to advertise your products, how can your ads be targeted more efficiently on potential customers? How can your ads be distributed to interested parties including customers with more precision and less cost if possible? These would be the business problem that we would like to solve in this project.

Geo-Targeted Mobile Ads could be a solution! Geo-targeting services from Google, Yahoo!, allow advertisers to allocate search campaign resources at a local level.  With geo-targeting options available today, audiences can be targeted at the country, state, city, and ZIP code level to determine the best potential ad placements.

* ### **Data:**

We mentioned above that with Geo-Targeted Mobile Ads we can target on potential customers better. Now we need to know the characteristics of each neighborhoods in the city so that we can know where to dispose different kinds of mobile Ads.

In this case, let’s use **Los Angeles** as our object. There are many great resources we can find online to grab data. We choose 'Los Angeles Times' and use the 'Mapping L.A. Boundaries API' to download neighborhood covering Los Angeles. 

(Find more info about dataset used in this project:  http://boundaries.latimes.com/set/la-county-regions-v6/)

Now we can start cleaning data and categorizing them to show neighborhoods and their coordinates. Moreover, we can combine the useful informations about venues and category from Foursquare with geolocation data processed before to have a better understanding of which region is more likely to consists of convenience store, for example. In that case we can distribute 7-11 ads or coupons to those areas where more people are interested in convenience store. Thus the money spent on ads are being used more efficiently and are bring more investment returns to business owners.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
!wget -q -O 'la_data.json' http://s3-us-west-2.amazonaws.com/boundaries.latimes.com/archive/1.0/boundary-set/la-county-neighborhoods-v6.geojson
print('Data downloaded!')

Data downloaded!


In [3]:
with open('la_data.json') as json_data:
    la_data = json.load(json_data)

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [4]:
neighborhoods_data = la_data['features']

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
ladata = pd.DataFrame(columns=column_names)

In [6]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['metadata'] ['county']
    neighborhood_name = data['properties']['metadata'] ['region']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[0][0][0][1]
    neighborhood_lon = neighborhood_latlon[0][0][0][0]
    
    ladata = ladata.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [7]:
ladata.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,los-angeles,antelope-valley,34.539023,-118.207034
1,los-angeles,south-la,34.037396,-118.308002
2,los-angeles,santa-monica-mountains,34.168157,-118.776212
3,los-angeles,northwest-county,34.488109,-118.378224
4,los-angeles,san-gabriel-valley,34.10504,-118.121747


In [8]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(ladata['Borough'].unique()),
        ladata.shape[0]
    )
)

The dataframe has 2 boroughs and 318 neighborhoods.


In [9]:
address = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="la_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Los Angeles are 34.0536834, -118.2427669.


Create a map of Los Angeles with neighborhoods superimposed on top.

In [10]:
# create map of Los Angeles using latitude and longitude values
map_la = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(ladata['Latitude'], ladata['Longitude'], ladata['Borough'], ladata['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la)  
    
map_la

Take an example out of LA dataset. Let's explore a random neighborhood in our dataframe.

In [11]:
CLIENT_ID = '5SVVVU4AU3C4K5F5UY53YABJQ1CJ3Y3U2DCDMVVWNS42DWE4' # your Foursquare ID
CLIENT_SECRET = 'QCA3IRJDRRV2KL14JLGAENMQRSURVCKPGJ2LCR3MKTBCMHYM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [12]:
ladata.loc[66, 'Neighborhood']

'westside'

In [13]:
neighborhood_latitude = ladata.loc[66, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = ladata.loc[66, 'Longitude'] # neighborhood longitude value
neighborhood_name = ladata.loc[66, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of westside are 34.00177200000013, -118.41164100000016.


With Foursquare, we use the latitude and longitude of neighborhood to find the surrounding venues!

Let's get the top 100 venues that are in *'westside'* within a radius of 500 meters.

In [14]:
LIMIT = 100 
radius = 500 

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [15]:
results = requests.get(url).json()

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Adopt and Shop,Pet Service,34.005017,-118.4108
1,Pulp Fiction,Comic Shop,34.004228,-118.408749
2,Samosa House,Indian Restaurant,34.000999,-118.416162
3,Pho Show,Vietnamese Restaurant,34.003726,-118.408382
4,Samy's Camera,Electronics Store,34.00304,-118.407244
5,Swanya Thai Cuisine,Thai Restaurant,34.004232,-118.408884
6,Culver City Home Brewing Supply,Hobby Shop,34.003676,-118.407685
7,Green Peas,Vegetarian / Vegan Restaurant,34.002707,-118.406756
8,Jasmine Market,Indian Restaurant,34.006001,-118.412433
9,Dear John's,American Restaurant,34.004292,-118.409985


In this example, we can clearly see the venues around 'westside' in LA. In Foursquare dataset, it returns 21 venues available that are near westside of Los Angeles. We have categorized those venues. Most of the venues are restaurants and they varies in style as well. Also there are bar, pet shops, comic shops, convenience store available. With each venues, we attached the coordinates of them and later we need these dataset to cluster similar characteristics. 