# Capstone Project - The Battle of Neighborhoods

## Introduction

**Business problem and project use case**

"Would you suggest, based on imperical data, where to open a cinema in Los Angeles, or whether or not it is even a profitable endeavor?

The stakeholder wants to open a new cinema as company's new business.

**Factors to analyze**
- Transportation
- Accesibility / Amenities in near proximity
- Food (places to eat nearby)



He wants me concentrated on selection of cinema location according to its nearby environment. Cinema facility and rental price is not my concern. He lists out his top 10 favorite cinemas in Hong Kong with rating.

I work with my teammates and select 5 possible locations to build the cinema. Which location should be suggested to the stakeholder?

Data
Data where you describe the data that will be used to solve the problem and the source of the data.

According to the question, following data are required.

1. Geographic coordinate of Hong Kong cinemas
I need to compare 5 possible locations with current cinemas in Hong Kong. Therefore, I need to find a list of Hong Kong cinema and cinemas' geographic coordinates. Luckily, I can find the list and coordinates from the website https://hkmovie6.com/cinema .

In [None]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import folium # map rendering library
from folium import plugins
import matplotlib.cm as cm
import matplotlib.colors as colors

import seaborn as sns
from sklearn.cluster import KMeans



print('Libraries imported.')

In [None]:
# Import necessary library
import json
import pandas as pd

# Download the cinema list
!wget -O hk_cinema_list.json https://hkmovie6.com/api/cinemas/lists

# Convert the JSON data into DataFrmae
cinemas_json = None
with open('hk_cinema_list.json', 'r', encoding='utf-8') as f:
    cinemas_json = json.load(f)
    
cinemas = []
for data in cinemas_json['data']:    
    cinemas.append({
        'Name': data['name'],
        'ChiName': data['chiName'],
        'Address': data['address'],
        'Latitude': data['lat'],
        'Longitude': data['lon']
    })
df_cinemas = pd.DataFrame(cinemas, columns=['Name','ChiName','Address','Latitude','Longitude'])



In [None]:
print('There are {} cinemas in Hong Kong'.format(len(df_cinemas)))

In [None]:
df_cinemas.head()

In [None]:
possible_locations = [
    { 'Location': 'L1', 'Address': 'Sau Mau Ping Shopping Centre, Sau Mau Ping'},
    { 'Location': 'L2', 'Address': 'Tuen Mun Ferry, Tuen Mun'},
    { 'Location': 'L3', 'Address': 'Un Chau Shopping Centre, Cheung Sha Wan'},
    { 'Location': 'L4', 'Address': 'Prosperity Millennia Plaza, North Point'},
    { 'Location': 'L5', 'Address': 'Tsuen Fung Centre Shopping Arcade, Tsuen Wan'},
]


In [None]:
!pip3 install -U googlemaps

In [None]:
!ls 

In [None]:
google_act = None
with open('google.json', 'r') as f:
    google_act = json.load(f)
    
GOOGLE_MAP_API_KEY = google_act['key'] 

In [None]:
import googlemaps
gmaps = googlemaps.Client(key=GOOGLE_MAP_API_KEY)

In [None]:
def getLatLng(address):
    latlnt = gmaps.geocode('{}, Hong Kong'.format(address))
    return (latlnt[0]['geometry']['location']['lat'], latlnt[0]['geometry']['location']['lng'])

In [None]:
for loc in possible_locations:        
    (lat, lng) = getLatLng(loc['Address'])
    loc['Latitude'] = lat
    loc['Longitude'] = lng
    
df_possible_locations = pd.DataFrame(possible_locations, columns=['Location', 'Address', 'Latitude', 'Longitude'])
df_possible_locations

In [None]:
jefe_tops = [
    {'Name': 'Broadway Circuit - MONGKOK', 'Rating': 4.5},
    {'Name': 'Broadway Circuit - The ONE', 'Rating': 4.5},
    {'Name': 'Grand Ocean', 'Rating': 4.3},
    {'Name': 'The Grand Cinema', 'Rating': 3.4},
    {'Name': 'AMC Pacific Place', 'Rating': 2.3},
    {'Name': 'UA IMAX @ Airport', 'Rating': 1.5},
]

df_jefe_tops = pd.DataFrame(jefe_tops, columns=['Name','Rating'])
df_jefe_tops

## Food and other factors to consider

Ideally there are nearby shops, maybe bars, eaterys etc.



In [None]:
fs_categories = {
    'Food': '4d4b7105d754a06374d81259',
    'Shop & Service': '4d4b7105d754a06378d81259',
    'Bus Stop': '52f2ab2ebcbc57f1066b8b4f',
    'Metro Station': '4bf58dd8d48988d1fd931735',
    'Nightlife Spot': '4d4b7105d754a06376d81259',
    'Arts & Entertainment': '4d4b7104d754a06370d81259',
    'Comedy Club': '4bf58dd8d48988d18e941735',
    'Arcade': '4bf58dd8d48988d1e1931735',
    'Go Kart Track': '52e81612bcbc57f1066b79ea'
}

In [None]:
', '.join([ cat for cat in fs_categories])

In [None]:
cinema = df_cinemas.loc[0]

In [None]:
print('This is how to explore nearby places: "{}" for example is the closest place'.format(cinema['Name']))

In [None]:
!pip3 install foursquare

In [None]:
import foursquare as fs

In [None]:
User = None
with open('foursquare.json', 'r') as four:
    User = json.load(four)
clientid = User['id'] 
clientsecret = User['secret']

In [None]:
client = fs.Foursquare(client_id=clientid, client_secret=clientsecret)

In [None]:
from pandas.io.json import json_normalize

RADIUS = 800


In [None]:

# Define a function to search nearby information and convert the result as dataframe
def venues_nearby(latitude, longitude, category, verbose=True):    
    results = fs.venues.search(
        params = {
            'query': category, 
            'll': '{},{}'.format(latitude, longitude),
            'radius': RADIUS,
            'categoryId': fs_categories[category]
        }
    )    
    df = json_normalize(results['venues'])
    cols = ['Name','Latitude','Longitude','Tips','Users','Visits']    
    if( len(df) == 0 ):        
        df = pd.DataFrame(columns=cols)
    else:        
        df = df[['name','location.lat','location.lng','stats.tipCount','stats.usersCount','stats.visitsCount']]
        df.columns = cols
    if( verbose ):
        print('{} "{}" venues are found within {}m of location'.format(len(df), category, RADIUS))
    return df

    venues_nearby(cinemas['Latitude'], cinemas['Longitude'], 'Food').head()

# Methedology



Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.

Using the data above, the **content-based recommendation** technique was used in the solution.

1. Combined with the FourSquare API which provides insight data as to best build a new cinema.

2. Create a matrix which captures the characteristic of venues nearby a prospecive cinema. 

3. Stakeholder's favorite list is the variable to combine with the matrix in order to properly create weighted matrix of favorite cinemas.

4. Using the weighted matrix, a currated list of prospective cinemas can be provided to the stakeholder, allowing a single best choice to be provided.

5. Before building the matrix,  data preparation and data analysis must be conducted.

In [None]:
duplicated = df_cinemas.duplicated('Address')
df_cinemas[duplicated].sort_values('Address')




In [None]:
# The Grand SC Starsuite -> The Grand Cinema
df_cinemas.loc[29, 'Name'] = 'The Grand Cinema'

# XXX @ UA MegaBox -> UA MegaBox
df_cinemas.loc[44, 'Name'] = 'UA MegaBox'
df_cinemas.loc[45, 'Name'] = 'UA MegaBox'

# BEA IMAX @ UA Cine Moko -> UA Cine Moko
df_cinemas.loc[42, 'Name'] = 'UA Cine Moko'

# XXX @ UA iSQUARE -> iSQUARE
df_cinemas.loc[43, 'Name'] = 'UA iSQUARE'
df_cinemas.loc[46, 'Name'] = 'UA iSQUARE'

# Emperor Cinemas - Entertainment Building
df_cinemas.loc[1, 'Name'] = 'Emperor Cinemas - Entertainment Building'

# Cinema City VICTORIA (Causeway Bay)
df_cinemas.loc[6, 'Name'] = 'Cinema City VICTORIA (Causeway Bay)'

In [None]:
df_cinemas[duplicated]

In [None]:
df_cinemas.drop_duplicates('Address', inplace=True, keep='first')

In [None]:
df_cinemas[df_cinemas.duplicated('Name')]

In [None]:
df_cinemas.head()

In [None]:
df_cinemas['ChiName'].to_frame()

In [None]:
df_cinemas.drop(index=[65,67], inplace=True)

In [None]:
df_cinemas.head()

In [None]:
df_cinemas.shape

In [None]:
from pathlib import Path

In [None]:
venues_csv = Path('./cinemas_venues.csv')
df_venues = df_cinemas

# check the venues data is explored and downloaded 
if( venues_csv.exists() ):
    df_venues = pd.read_csv('./cinemas_venues.csv')
else:    
    # construct a dataframe to store data
  #  df_venues = pd.DataFrame(columns=['Cinema Name', 'Name', 'Latitude', 'Longitude'])
    for cat, cat_id in fs_categories.items():
        df = venues_nearby(latitude, longitude, category, verbose=False)
        df['Cinema Name'] = name
        df['Category'] = cat
        df_venues = df_venues.append(df, sort=True)

In [None]:
print('Total {} of venues are found'.format(len(df_venues)))