## Data

Data where you describe the data that will be used to solve the problem and the source of the data.

According to the question, I need to find following data to resolve the problem.

#### 1. Geographic coordinate of Hong Kong cinemas

I need to **compare 5 possible locations with current cinemas** in Hong Kong. Therefore, I need to find a list of Hong Kong cinema and cinemas' geographic coordinates. Luckily, I can find the list and coordinates from the website https://hkmovie6.com/cinema .

In [2]:
import json
import pandas as pd
from geopy.geocoders import Nominatim

In [3]:
!wget -O hk_cinema_list.json https://hkmovie6.com/api/cinemas/lists

--2019-08-28 17:00:05--  https://hkmovie6.com/api/cinemas/lists
Resolving hkmovie6.com (hkmovie6.com)... 104.31.67.1, 104.31.66.1, 2606:4700:30::681f:4201, ...
Connecting to hkmovie6.com (hkmovie6.com)|104.31.67.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘hk_cinema_list.json’

hk_cinema_list.json     [  <=>               ]  55.16K   260KB/s    in 0.2s    

2019-08-28 17:00:06 (260 KB/s) - ‘hk_cinema_list.json’ saved [56482]



In [4]:
cinemas_json = None
with open('hk_cinema_list.json', 'r', encoding='utf-8') as f:
    cinemas_json = json.load(f)
    
cinemas = []
for data in cinemas_json['data']:
    try:
        cinemas.append({
            'Name': data['name'],
            'ChiName': data['chiName'],
            'Address': data['address'],
            'Latitude': data['lat'],
            'Longitude': data['lon']
        })
    except:
        continue
df_cinemas = pd.DataFrame(cinemas, columns=['Name','ChiName','Address','Latitude','Longitude'])

In [10]:
df_cinemas

Unnamed: 0,Name,ChiName,Address,Latitude,Longitude
0,Emperor Cinemas - Entertainment Building,英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.154230
1,Emperor Cinemas - Ma On Shan,英皇戲院 - 馬鞍山新港城中心,"L2, MOSTown, Sai Sha Road, Ma On Shan, N.T.",22.424120,114.230957
2,Emperor Cinemas - Tuen Mun,英皇戲院 - 屯門新都商場,"3/F, New Town Commercial Arcade, 2 Tuen Lee St...",22.390776,113.975983
3,The Coronet @ Emperor Cinemas - Entertainment ...,The Coronet @ 英皇戲院 - 娛樂行,"3/F, Emperor Cinemas Entertainment Building, 3...",22.281453,114.154230
4,Festival Grand Cinema,Festival Grand Cinema,"Level UG, Festival Walk, 80 Tat Chee Avenue, K...",22.337882,114.174325
5,Grand Kornhill Cinema,康怡戲院,"4/F, Kornhill Plaza South, 2 Kornhill Road, Qu...",22.284218,114.216428
6,Grand Windsor Cinema,皇室戲院,"4/F, Windsor House, 311 Gloucester Road, Cause...",22.280358,114.186519
7,MCL CHEUNG SHA WAN CINEMA,MCL 長沙灣戲院,"Unit G56 & G58-59, G/F, Lai Sun Commercial Cen...",22.338997,114.116883
8,MCL Green Code Cinema,MCL 粉嶺戲院,"Shop No.G12, Ground Floor, Green Code Plaza, N...",22.500943,114.145833
9,MCL Metro City Cinema,MCL 新都城戲院,"Cinema Area, G/F, Metro City, Phase 2, 8 Yan K...",22.323050,114.257581


#### 2. Geographic coordinates of 5 possible cinema addresses
I also need to know the geographic coordinates of 5 possible cinemas. I can use Google Map API to find this information

In [22]:
possible_locations = [
    { 'Location': 'L1', 'Address': 'Sau Mau Ping Shopping Centre, Sau Mau Ping'},
    { 'Location': 'L2', 'Address': '148 Wu Chui Rd, Tuen Mun, Hong Kong'},
    { 'Location': 'L3', 'Address': 'Un Chau Shopping Centre, Cheung Sha Wan'},
    { 'Location': 'L4', 'Address': 'Prosperity Millennia Plaza, North Point'},
    { 'Location': 'L5', 'Address': '138-168 Sai Lau Kok Rd, Tsuen Wan, Hong Kong'},
]

In [19]:
geolocator = Nominatim(user_agent="foursquare_agent")

In [23]:
# Retrieve geolocation and create the dataframe of pending cinema addresses
def getLatLng(address):
    print(address)
    latlnt = geolocator.geocode(address)
    print (latlnt.latitude, latlnt.longitude)
    return (latlnt.latitude, latlnt.longitude)

Dataframe of 5 possible locations with geographic coordinates information

In [24]:
for loc in possible_locations:        
    (lat, lng) = getLatLng(loc['Address'])
    loc['Latitude'] = lat
    loc['Longitude'] = lng
    
df_possible_locations = pd.DataFrame(possible_locations, columns=['Location', 'Address', 'Latitude', 'Longitude'])
df_possible_locations

Sau Mau Ping Shopping Centre, Sau Mau Ping
22.3191453 114.2315369
148 Wu Chui Rd, Tuen Mun, Hong Kong
22.3727017 113.9691698
Un Chau Shopping Centre, Cheung Sha Wan
22.3375837 114.156045882797
Prosperity Millennia Plaza, North Point
22.29165445 114.208269503639
138-168 Sai Lau Kok Rd, Tsuen Wan, Hong Kong
22.3742678 114.1143613


Unnamed: 0,Location,Address,Latitude,Longitude
0,L1,"Sau Mau Ping Shopping Centre, Sau Mau Ping",22.319145,114.231537
1,L2,"148 Wu Chui Rd, Tuen Mun, Hong Kong",22.372702,113.96917
2,L3,"Un Chau Shopping Centre, Cheung Sha Wan",22.337584,114.156046
3,L4,"Prosperity Millennia Plaza, North Point",22.291654,114.20827
4,L5,"138-168 Sai Lau Kok Rd, Tsuen Wan, Hong Kong",22.374268,114.114361


#### 3. Favorite cinema list of stakeholder

The favorite cinema list is an important information that I can **use it as profile to select the best location**.  
rating is range of 1.0 (worst) to 5.0 (best) values

In [25]:
boss_favorite = [
    {'Name': 'Boradway Circuit - MONGKONG', 'Rating': 4.5},
    {'Name': 'Boradway Circuit - The ONE', 'Rating': 4.5},
    {'Name': 'Grand Ocean', 'Rating': 4.3},
    {'Name': 'The Grand Cinema', 'Rating': 3.4},
    {'Name': 'AMC Pacific Place', 'Rating': 2.3},
    {'Name': 'UA IMAX @ Airport', 'Rating': 1.5},
]

df_boss_favorite = pd.DataFrame(boss_favorite, columns=['Name','Rating'])
df_boss_favorite

Unnamed: 0,Name,Rating
0,Boradway Circuit - MONGKONG,4.5
1,Boradway Circuit - The ONE,4.5
2,Grand Ocean,4.3
3,The Grand Cinema,3.4
4,AMC Pacific Place,2.3
5,UA IMAX @ Airport,1.5


#### 4. Eating, Shopping and Public transportation facility around cinema
The recommended cinema location needs to have many eating and shopping venues nearby. Convenient public transport is also required.  
I can use FourSquare API to find these venues around the location. 

5 minutes walking distance is about 500m. I think it is the suitable distance to search nearby venues.

However, the API provides maximum 50 results only, so it is better to search venues by category. Following categories will be used for finding the target venues. Full list of categories: https://developer.foursquare.com/docs/resources/categories

In [26]:
cinema = df_cinemas.loc[0]

In [27]:
print('Use the first cinema "{}" in the list as example to explore venues nearyby'.format(cinema['Name']))

Use the first cinema "Emperor Cinemas - Entertainment Building" in the list as example to explore venues nearyby


In [28]:
fs_categories = {
    'Food': '4d4b7105d754a06374d81259',
    'Shop & Service': '4d4b7105d754a06378d81259',
    'Bus Stop': '52f2ab2ebcbc57f1066b8b4f',
    'Metro Station': '4bf58dd8d48988d1fd931735',
    'Nightlife Spot': '4d4b7105d754a06376d81259',
    'Arts & Entertainment': '4d4b7104d754a06370d81259'
}

In [29]:
import foursquare
CLIENT_ID = '5GE1XWTIHHCN1CXAN14UNM1EE5WMH4XWIHTA221ENTUPACIP' # your Foursquare ID
CLIENT_SECRET = 'MVXVKGXM3IBITYFC05OWNAIQKXGBFO03UGFZMTOHAGJZPFDX' # your Foursquare Secret
VERSION = '20180604'
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
fs = foursquare.Foursquare(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)

In [30]:
RADIUS = 500 # 500m, around 5 minutes walking time

In [42]:
# Define a function to search nearby information and convert the result as dataframe
def venues_nearby(latitude, longitude, category):    
    results = fs.venues.search(
        params = {
            'query': category, 
            'll': '{},{}'.format(latitude, longitude),
            'radius': RADIUS,
            'categoryId': fs_categories[category]
        }
    )    
    df = json_normalize(results['venues'])
    cols = ['Name','Latitude','Longitude']    
    if( len(df) == 0 ):        
        df = pd.DataFrame(columns=cols)
    else:
        print(df)
        print(1)
        df = df[['name','location.lat','location.lng']]
        df.columns = cols
    print('{} "{}" venues are found within {}m of location'.format(len(df), category, RADIUS))
    return df
    

Find number of MTR station around the cinema

In [43]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Metro Station').head()

                                          categories  hasPerk  \
0  [{'shortName': 'Metro', 'primary': True, 'name...    False   
1  [{'shortName': 'Metro', 'primary': True, 'name...    False   

                         id location.address location.cc location.city  \
0  4b4e9d37f964a520a4f226e3  Harbour View St          HK            中環   
1  4b17a6dbf964a520d2c623e3   Des Voeux Rd C          HK            中区   

  location.country location.crossStreet  location.distance  \
0               香港                  NaN                585   
1               香港         at Pedder St                417   

             location.formattedAddress  \
0                [Harbour View St, 香港]   
1  [Des Voeux Rd C (at Pedder St), 香港]   

                             location.labeledLatLngs  location.lat  \
0  [{'lng': 114.15837868775205, 'lat': 22.2850540...     22.285054   
1                                                NaN     22.282310   

   location.lng location.neighborhood                   

Unnamed: 0,Name,Latitude,Longitude
0,MTR Hong Kong Station (港鐵香港站),22.285054,114.158379
1,MTR Central Station (港鐵中環站),22.28231,114.15818


Find number of bus station around the cinema

In [45]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Bus Stop').head()

                                           categories  hasPerk  \
0   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
1   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
2   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
3   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
4   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
5   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
6   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
7   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
8   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
9   [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
10  [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
11  [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
12  [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
13  [{'shortName': 'Bus Stop', 'primary': True, 'n...    False   
14  [{'sho

Unnamed: 0,Name,Latitude,Longitude
0,HSBC Main Building / Queen's Road Central Bus ...,22.280329,114.159638
1,Douglas Street Bus Stop 德忌利士街巴士站,22.283131,114.15701
2,The Landmark / Central Station / Des Voeux Roa...,22.281178,114.158233
3,Dr. Sun Yat-Sen Museum Bus Stop 孫中山紀念館巴士站,22.279132,114.152743
4,Statue Square Bus Stop (Statue Square Bus Stop...,22.280852,114.159918


Find eating places around the cinema

In [46]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Food').head()

                                           categories  hasPerk  \
0   [{'shortName': 'Vegetarian / Vegan', 'primary'...    False   
1   [{'shortName': 'Juice Bar', 'primary': True, '...    False   
2   [{'shortName': 'Thai', 'primary': True, 'name'...    False   
3   [{'shortName': 'Chinese', 'primary': True, 'na...    False   
4   [{'shortName': 'Snacks', 'primary': True, 'nam...    False   
5   [{'shortName': 'Thai', 'primary': True, 'name'...    False   
6   [{'shortName': 'Chinese', 'primary': True, 'na...    False   
7   [{'shortName': 'Asian', 'primary': True, 'name...    False   
8   [{'shortName': 'Vietnamese', 'primary': True, ...    False   
9   [{'shortName': 'Korean', 'primary': True, 'nam...    False   
10  [{'shortName': 'Restaurant', 'primary': True, ...    False   
11  [{'shortName': 'Juice Bar', 'primary': True, '...    False   
12  [{'shortName': 'Juice Bar', 'primary': True, '...    False   
13  [{'shortName': 'Grocery Store', 'primary': Tru...    False   
14  [{'sho

Unnamed: 0,Name,Latitude,Longitude
0,Mana! Fast Slow Food,22.282921,114.154651
1,nood food,22.283088,114.155551
2,Soul Food,22.281668,114.152495
3,Chiu Lung Fast Food (昭隆美食),22.282659,114.156753
4,Delicious Food Shop 東方小食店,22.279824,114.154215


In [47]:
venues_nearby(cinema['Latitude'], cinema['Longitude'], 'Arts & Entertainment').head()

                                           categories  hasPerk  \
0   [{'shortName': 'Cultural Center', 'primary': T...    False   
1   [{'shortName': 'Art Gallery', 'primary': True,...    False   
2   [{'shortName': 'Art Gallery', 'primary': True,...    False   
3   [{'shortName': 'Art Gallery', 'primary': True,...    False   
4   [{'shortName': 'Art Gallery', 'primary': True,...    False   
5   [{'shortName': 'Art Gallery', 'primary': True,...    False   
6   [{'shortName': 'Art Gallery', 'primary': True,...    False   
7   [{'shortName': 'Art Gallery', 'primary': True,...    False   
8   [{'shortName': 'Art Gallery', 'primary': True,...    False   
9   [{'shortName': 'Art Gallery', 'primary': True,...    False   
10  [{'shortName': 'Performing Arts', 'primary': T...    False   
11  [{'shortName': 'Art Gallery', 'primary': True,...    False   

                          id                                 location.address  \
0   59f6c8529e3b655d33e33d44                                

Unnamed: 0,Name,Latitude,Longitude
0,Tai Kwun Centre for Heritage and Arts (大館古蹟及藝術館),22.281224,114.154032
1,Wah Tung China Arts Limited (華通陶瓷藝術有限公司),22.283046,114.152723
2,Ravenel Fine Arts Limited 睿芙奧,22.281819,114.156906
3,KONG Arts Space,22.281751,114.1533
4,State Of The Arts,22.282225,114.155006


With above data, I can build a **content-based recommender systems** to resolve the problem.  

Combine with FourSquare API on counting how many different venues (Food, Transport, Night Life) and Hong Kong cinema list, a **cinema nearby venues matrix** can be built.the **profile** to combine with cinema nearby venues matrix to become a **weighted matrix of favorite cinema**.

The weighted matrix can be applied on **5 possible locations with venues information** to generate a ranking result. The **the top one** on the ranking list can be recommended to open a new cinema.
