# Singapore hospitality data extraction

## Author: Ankur Shanker

## Student ID: 21159916

### **Section 1:** Importing packages necessary for workbook execution.

Libraries necessary for workbook execution related to functionality responsible for geo-spatial data reading, plotting, and statistical analysis need to be imported in order for this workbook to function.

In [None]:
# Import packages required for analysis
import pandas as pd
from sklearn.model_selection import train_test_split
import rfpimp
from sklearn import model_selection

# Import packages required for numeric operations
import numpy as np
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.tools.tools import add_constant
import statsmodels.api as sm

In [None]:
# Import packages required for geospatial analysis
import geopandas as gpd
import pyproj
import geopy.distance
pyproj.datadir.get_data_dir()

'C:\\Users\\ankur\\AppData\\Roaming\\jupyterlab-desktop\\jlab_server\\Library\\share\\proj'

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn import tree

In [None]:
# Import packages required for data visualisation
import matplotlib.pyplot as plt
import seaborn as sn
sn.set_style("darkgrid", {"grid.color": ".6", "grid.linestyle": ":"})

### **Section 2:** Reading and formatting data

Data related to hotel rates in Singapore is read in below.

In [None]:
singapore_hotels = pd.read_csv('Data/Singapore Hotels/Hotel Rates/hotel_rates.csv')
singapore_hotels.set_index('Name', inplace=True)
singapore_hotels

Unnamed: 0_level_0,Price
Name,Unnamed: 1_level_1
"Hotel Boss, Singapore",121
YOTELAIR Singapore Changi Airport,167
"PARKROYAL COLLECTION Marina Bay, Singapore",315
"The Fullerton Hotel, Singapore\t352",352
"Ibis Budget, Singapore Clarke Quay",99
...,...
"Atelier, Chinatown, Singapore",58
"K Space Inn Owen, Kallang, Singapore",49
"K Space Inn 569, Singapore",51
"K Space Inn 14, Singapore",51


Following this the coordianates of all 321 hotels are acertained.

In [None]:
from geopy.geocoders import Nominatim
import time

geolocator = Nominatim(user_agent="ankur")

xValues = []
yValues = []

for index in singapore_hotels.index:
    location = geolocator.geocode(index)
    if(location is not None):
        xValues.append(location.latitude)
        yValues.append(location.longitude)
    else:
        xValues.append(np.nan)
        yValues.append(np.nan)

The x and y coordinates are then assigned to columns in the "singapore_hotels" dataframe.

In [None]:
singapore_hotels['x'] = xValues
singapore_hotels['y'] = yValues

singapore_hotels.dropna(inplace=True)

In [None]:
singapore_hotels


Unnamed: 0_level_0,Price,x,y
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Hotel Boss, Singapore",121,1.305792,103.860105
YOTELAIR Singapore Changi Airport,167,1.359882,103.990528
"PARKROYAL COLLECTION Marina Bay, Singapore",315,1.291610,103.856816
"The Fullerton Hotel, Singapore\t352",352,1.286202,103.853073
"Carlton Hotel, Singapore",190,1.295722,103.852592
...,...,...,...
"Betel Box Backpackers Hostel, Singapore",34,1.312270,103.900268
"Spacepod, Kallang, Singapore",39,1.310519,103.861933
"The Bohemian, Chinatown, Singapore",53,1.283818,103.844921
"Atelier, Chinatown, Singapore",58,1.280687,103.846771


The singapore hotels dataframe is then converted into a geodataframe and redundant columns are dropped.

In [None]:
singapore_hotels = gpd.GeoDataFrame(singapore_hotels, geometry=gpd.points_from_xy(singapore_hotels['y'], singapore_hotels['x']))
singapore_hotels = singapore_hotels[['Price', 'geometry']]
singapore_hotels

Unnamed: 0_level_0,Price,geometry
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
"Hotel Boss, Singapore",121,POINT (103.86010 1.30579)
YOTELAIR Singapore Changi Airport,167,POINT (103.99053 1.35988)
"PARKROYAL COLLECTION Marina Bay, Singapore",315,POINT (103.85682 1.29161)
"The Fullerton Hotel, Singapore\t352",352,POINT (103.85307 1.28620)
"Carlton Hotel, Singapore",190,POINT (103.85259 1.29572)
...,...,...
"Betel Box Backpackers Hostel, Singapore",34,POINT (103.90027 1.31227)
"Spacepod, Kallang, Singapore",39,POINT (103.86193 1.31052)
"The Bohemian, Chinatown, Singapore",53,POINT (103.84492 1.28382)
"Atelier, Chinatown, Singapore",58,POINT (103.84677 1.28069)


The locations of casinos in Singapore is then specified in a dataframe below.

In [None]:
singapore_casino_locations = pd.DataFrame()

singapore_casino_locations['casino_name'] = ['Marina Bay Sands',
                                             'Resorts World Casino Sentosa']
singapore_casino_locations['x'] = [1.2847, 1.2552]
singapore_casino_locations['y'] = [103.8610, 103.8218]

singapore_casino_locations = gpd.GeoDataFrame(singapore_casino_locations,
                                              geometry = gpd.points_from_xy(singapore_casino_locations['y'],
                                                                 singapore_casino_locations['x']))
singapore_casino_locations = singapore_casino_locations[['casino_name', 'geometry']]
singapore_casino_locations


Unnamed: 0,casino_name,geometry
0,Marina Bay Sands,POINT (103.86100 1.28470)
1,Resorts World Casino Sentosa,POINT (103.82180 1.25520)


The distances betweeen each hotel in Singapore and the "Resorts World" Casino is determined and added to the dataframe.

In [None]:
def CalculateDistance(attribute, tag):
    distances = [];
    for i, row in singapore_hotels.iterrows():

        closest_distance= 1000000000000000

        for j, location in attribute.iterrows():
            coords1 = (singapore_hotels.at[i, 'geometry'].y, singapore_hotels.at[i, 'geometry'].x)
            coords2 = (attribute.at[j, 'geometry'].y, attribute.at[j, 'geometry'].x)
            distance = geopy.distance.geodesic(coords1, coords2).m    
            if(distance < closest_distance):
                closest_distance = distance
        distances.append(closest_distance)
    singapore_hotels[tag] = distances

In [None]:
for i, mall in singapore_casino_locations.iterrows():
    name = singapore_casino_locations.at[i, 'casino_name']
    tag = 'Closest distance to ' +  name
    CalculateDistance(singapore_casino_locations, tag)
singapore_hotels

Unnamed: 0_level_0,Price,geometry,Closest distance to Marina Bay Sands,Closest distance to Resorts World Casino Sentosa
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"Hotel Boss, Singapore",121,POINT (103.86010 1.30579),2334.426627,2334.426627
YOTELAIR Singapore Changi Airport,167,POINT (103.99053 1.35988),16640.528401,16640.528401
"PARKROYAL COLLECTION Marina Bay, Singapore",315,POINT (103.85682 1.29161),894.799660,894.799660
"The Fullerton Hotel, Singapore\t352",352,POINT (103.85307 1.28620),897.743613,897.743613
"Carlton Hotel, Singapore",190,POINT (103.85259 1.29572),1536.588312,1536.588312
...,...,...,...,...
"Betel Box Backpackers Hostel, Singapore",34,POINT (103.90027 1.31227),5328.427528,5328.427528
"Spacepod, Kallang, Singapore",39,POINT (103.86193 1.31052),2856.819956,2856.819956
"The Bohemian, Chinatown, Singapore",53,POINT (103.84492 1.28382),1792.112758,1792.112758
"Atelier, Chinatown, Singapore",58,POINT (103.84677 1.28069),1644.609297,1644.609297


Finally the dataframe is exported and ultimately read in by the "Singapore hospitality analysis [Ankur Shanker].ipynb file and the "Singapore housing data extraction [Ankur Shanker].ipynb file (which also makes use of this data).

In [None]:
#singapore_hotels.drop('geometry', axis=1, inplace=True)
singapore_hotels.to_csv('Singapore hotels.csv')