 #  Coursera Capstone Project (Week 2)
### IBM Applied Data Science Capstone Project

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find the optimal venue ideas for potential investors looking to invest in the area around Egypt's new megaproject, The Grand Egyptian Museum(GEM). Which is projected to open in 2021.

Around the same area of the project is the Giza Pyramids, which are one of the wonders of the world. Needless to say, the area is one of the most touristic areas in the country and one of the most visited by tourists. Therefor, we need to find new ideas for venues that are not common around the area, as well as finding the kind of venues that proved successful around such a project.

How will we be able to find that out? Well, using our data science toolkit we will be looking for the kind of venues that proved to be successful around the most visited museums around the world. Then, we will look for museums that have a similar demographic of venues around as the GEM. Finally, we will generate a list of venue categories that already exist around these museums, and the best possible venue category can be chosen by potential investors.

## Data <a name="data"></a>

Based on the introduction of our problem, we will be using a dataset of museums around the world with more than one million visitors for the year of 2019. The table is available on this wikipedi page:
https://en.wikipedia.org/wiki/List_of_most_visited_art_museums 

Using web scraping techniques, we will parse the table containing the musueum data and load it into a csv file using **BeautifulSoup** and then read it in pandas to a dataframe so we can perform some data wrangling. Then, we will get the location data of these museums using **geopy**. Afterwards, we will explore the venues around these museums using the **Foursquare API**.



**First, we import the necessary modules for our project**

In [1]:
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
import numpy as np
import geocoder
import folium
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from sklearn import linear_model
print('Packages imported successfully!')

Packages imported successfully!


**Now we scrape the table we need from the webpage using BeautifulSoup**

In [3]:
#get url using the requests library
source = requests.get('https://en.wikipedia.org/wiki/List_of_most_visited_art_museums').text

#parse html of the web page using html parser
soup = BeautifulSoup(source,'html.parser')

#open a csv file passing in the writing argument
csv_file = open('museum_data.csv','w')

#assign the column headers
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['no.','name','city','visitors','image'])

#parse table from the web page
table  = soup.find('table')

#iterate over rows to fill the rows of the csv file
rows = table.find_all('tr')
for row in rows:
    csv_row = []
    for cell in row.find_all('td'):
        csv_row.append(cell.get_text())
    csv_writer.writerow(csv_row)
print('data loaded to csv file')

data loaded to csv file


**Now we read the csv file in a pandas dataframe**

In [4]:
museums_df = pd.read_csv('museum_data.csv')
museums_df.head(10)

Unnamed: 0,no.,name,city,visitors,image
0,1,Musée du Louvre,Paris,"9,600,000 (2019)\n",\n
1,2,National Museum of China,Beijing,"7,390,000 (2019)\n",\n
2,3,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n",\n
3,4,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n",\n
4,5,British Museum,London,"6,239,983 (2019)\n",\n
5,6,Tate Modern,London,"6,098,340 (2019)\n",\n
6,7,National Gallery,London,"6,011,007 (2019)\n",\n
7,8,State Hermitage Museum,Saint Petersburg,"4,956,529 (2019)\n",\n
8,9,Museo Reina Sofía,Madrid,"4,425,699 (2019)\n",\n
9,10,National Gallery of Art,"Washington, D.C.","4,074,403 (2019)\n",\n


**Now we drop the unnecessary features from our dataframe**

In [5]:
museums_df = museums_df.drop(['no.','image'],axis=1)
museums_df.head(10)

Unnamed: 0,name,city,visitors
0,Musée du Louvre,Paris,"9,600,000 (2019)\n"
1,National Museum of China,Beijing,"7,390,000 (2019)\n"
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n"
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n"
4,British Museum,London,"6,239,983 (2019)\n"
5,Tate Modern,London,"6,098,340 (2019)\n"
6,National Gallery,London,"6,011,007 (2019)\n"
7,State Hermitage Museum,Saint Petersburg,"4,956,529 (2019)\n"
8,Museo Reina Sofía,Madrid,"4,425,699 (2019)\n"
9,National Gallery of Art,"Washington, D.C.","4,074,403 (2019)\n"


**Now we add the GEM data to our dataset so we can run the same geoanalysis on it**

In [6]:
gem = {'name':'Grand Egyptian Museum', 'city':'Cairo', 'visitors':'none'}
museums_df = museums_df.append(gem, ignore_index=True)
museums_df

Unnamed: 0,name,city,visitors
0,Musée du Louvre,Paris,"9,600,000 (2019)\n"
1,National Museum of China,Beijing,"7,390,000 (2019)\n"
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n"
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n"
4,British Museum,London,"6,239,983 (2019)\n"
...,...,...,...
73,Whitney Museum,New York City,"1,030,945 (2019)"
74,Musée de l'Orangerie,Paris,"1,029,975 (2019)"
75,Museum of Contemporary Art Australia,Sydney,"1,014,021 (2019)"
76,Albertina,Vienna,1001294


**Now we run the geolocator function on the dataset in order to get the location data** 

we'll notice that the output includes the detailed address of each museum in the local language

In [7]:
from geopandas.tools import geocode
#name a user agent 
geolocator = Nominatim(user_agent="museum_pjt")
#apply the geolocator function on the dataset based on the museum name
museums_df['address']  = museums_df['name'].apply(geolocator.geocode)

museums_df.head(10)

Unnamed: 0,name,city,visitors,address
0,Musée du Louvre,Paris,"9,600,000 (2019)\n","(Musée du Louvre, Rue Saint-Honoré, Quartier d..."
1,National Museum of China,Beijing,"7,390,000 (2019)\n","(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3..."
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n","(Musei Vaticani, Stradone dei Giardini, Città ..."
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n","(The Metropolitan Museum of Art, 5th Avenue, M..."
4,British Museum,London,"6,239,983 (2019)\n","(British Museum, Great Russell Street, Holborn..."
5,Tate Modern,London,"6,098,340 (2019)\n","(Tate Modern, Bankside, Borough, London Boroug..."
6,National Gallery,London,"6,011,007 (2019)\n","(National Gallery, St Martin's Street, St. Jam..."
7,State Hermitage Museum,Saint Petersburg,"4,956,529 (2019)\n","(Государственный Эрмитаж, набережная Зимней ка..."
8,Museo Reina Sofía,Madrid,"4,425,699 (2019)\n","(Museo Nacional Centro de Arte Reina Sofía, Ca..."
9,National Gallery of Art,"Washington, D.C.","4,074,403 (2019)\n","(National Gallery of Art, Madison Drive Northw..."


**Now we need to remove the rows with no location data retrieved**

In [8]:
#replace the string 'None' shown in the address column with numpy nan
museums_df.replace("None", np.nan, inplace = True)

#drop rows with nan values in address column
museums_df.dropna(subset=["address"], axis=0, inplace=True)

#reset index of the rows
museums_df.reset_index(drop=True, inplace=True)


museums_df

Unnamed: 0,name,city,visitors,address
0,Musée du Louvre,Paris,"9,600,000 (2019)\n","(Musée du Louvre, Rue Saint-Honoré, Quartier d..."
1,National Museum of China,Beijing,"7,390,000 (2019)\n","(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3..."
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n","(Musei Vaticani, Stradone dei Giardini, Città ..."
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n","(The Metropolitan Museum of Art, 5th Avenue, M..."
4,British Museum,London,"6,239,983 (2019)\n","(British Museum, Great Russell Street, Holborn..."
...,...,...,...,...
66,Whitney Museum,New York City,"1,030,945 (2019)","(Whitney Museum of American Art, Gansevoort St..."
67,Musée de l'Orangerie,Paris,"1,029,975 (2019)","(Musée de l'Orangerie, Quai des Tuileries, Qua..."
68,Museum of Contemporary Art Australia,Sydney,"1,014,021 (2019)","(Museum of Contemporary Art, George Street, Th..."
69,Albertina,Vienna,1001294,"(Albertina, Albertinaplatz, Kärntner Viertel, ..."


**Now we get the latitude and longitude by applying the lambda function to the address column**

In [9]:
museums_df['latlong'] = museums_df['address'].apply(lambda x: (x.latitude, x.longitude))
museums_df.head(10)

Unnamed: 0,name,city,visitors,address,latlong
0,Musée du Louvre,Paris,"9,600,000 (2019)\n","(Musée du Louvre, Rue Saint-Honoré, Quartier d...","(48.8611473, 2.33802768704666)"
1,National Museum of China,Beijing,"7,390,000 (2019)\n","(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3...","(39.903745900000004, 116.39538964666366)"
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n","(Musei Vaticani, Stradone dei Giardini, Città ...","(41.90496095, 12.454661671968115)"
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n","(The Metropolitan Museum of Art, 5th Avenue, M...","(40.779443650000005, -73.96336411385192)"
4,British Museum,London,"6,239,983 (2019)\n","(British Museum, Great Russell Street, Holborn...","(51.51929365, -0.12801772178494725)"
5,Tate Modern,London,"6,098,340 (2019)\n","(Tate Modern, Bankside, Borough, London Boroug...","(51.507456649999995, -0.09934365539515741)"
6,National Gallery,London,"6,011,007 (2019)\n","(National Gallery, St Martin's Street, St. Jam...","(51.5088392, -0.12844740211268446)"
7,State Hermitage Museum,Saint Petersburg,"4,956,529 (2019)\n","(Государственный Эрмитаж, набережная Зимней ка...","(59.9412076, 30.315486517765795)"
8,Museo Reina Sofía,Madrid,"4,425,699 (2019)\n","(Museo Nacional Centro de Arte Reina Sofía, Ca...","(40.4081606, -3.6934515355572533)"
9,National Gallery of Art,"Washington, D.C.","4,074,403 (2019)\n","(National Gallery of Art, Madison Drive Northw...","(38.89129405, -77.01988353942042)"


**Now we seperate the latitude and longitude in seperste columns**

In [10]:
#change the type of the latlong column to string
museums_df['latlong'] = museums_df['latlong'].astype(str)
#use the split function to seperate the latitude and the longitude
museums_df[['lat','long']] = museums_df['latlong'].str.split(",",expand=True,)

museums_df.head(10)

Unnamed: 0,name,city,visitors,address,latlong,lat,long
0,Musée du Louvre,Paris,"9,600,000 (2019)\n","(Musée du Louvre, Rue Saint-Honoré, Quartier d...","(48.8611473, 2.33802768704666)",(48.8611473,2.33802768704666)
1,National Museum of China,Beijing,"7,390,000 (2019)\n","(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3...","(39.903745900000004, 116.39538964666366)",(39.903745900000004,116.39538964666366)
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n","(Musei Vaticani, Stradone dei Giardini, Città ...","(41.90496095, 12.454661671968115)",(41.90496095,12.454661671968115)
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n","(The Metropolitan Museum of Art, 5th Avenue, M...","(40.779443650000005, -73.96336411385192)",(40.779443650000005,-73.96336411385192)
4,British Museum,London,"6,239,983 (2019)\n","(British Museum, Great Russell Street, Holborn...","(51.51929365, -0.12801772178494725)",(51.51929365,-0.12801772178494725)
5,Tate Modern,London,"6,098,340 (2019)\n","(Tate Modern, Bankside, Borough, London Boroug...","(51.507456649999995, -0.09934365539515741)",(51.507456649999995,-0.09934365539515741)
6,National Gallery,London,"6,011,007 (2019)\n","(National Gallery, St Martin's Street, St. Jam...","(51.5088392, -0.12844740211268446)",(51.5088392,-0.12844740211268446)
7,State Hermitage Museum,Saint Petersburg,"4,956,529 (2019)\n","(Государственный Эрмитаж, набережная Зимней ка...","(59.9412076, 30.315486517765795)",(59.9412076,30.315486517765795)
8,Museo Reina Sofía,Madrid,"4,425,699 (2019)\n","(Museo Nacional Centro de Arte Reina Sofía, Ca...","(40.4081606, -3.6934515355572533)",(40.4081606,-3.6934515355572533)
9,National Gallery of Art,"Washington, D.C.","4,074,403 (2019)\n","(National Gallery of Art, Madison Drive Northw...","(38.89129405, -77.01988353942042)",(38.89129405,-77.01988353942042)


**We then remove the parentheses from the latitude and longitude values**

In [11]:
#remove the substring '()' from the latitude and longitude columns
museums_df['lat'] = museums_df['lat'].str.replace(r"(","").str.strip()
museums_df['long'] = museums_df['long'].str.replace(r")","").str.strip()

museums_df.head(10)

Unnamed: 0,name,city,visitors,address,latlong,lat,long
0,Musée du Louvre,Paris,"9,600,000 (2019)\n","(Musée du Louvre, Rue Saint-Honoré, Quartier d...","(48.8611473, 2.33802768704666)",48.8611473,2.33802768704666
1,National Museum of China,Beijing,"7,390,000 (2019)\n","(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3...","(39.903745900000004, 116.39538964666366)",39.9037459,116.39538964666366
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n","(Musei Vaticani, Stradone dei Giardini, Città ...","(41.90496095, 12.454661671968115)",41.90496095,12.454661671968116
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n","(The Metropolitan Museum of Art, 5th Avenue, M...","(40.779443650000005, -73.96336411385192)",40.779443650000005,-73.96336411385192
4,British Museum,London,"6,239,983 (2019)\n","(British Museum, Great Russell Street, Holborn...","(51.51929365, -0.12801772178494725)",51.51929365,-0.1280177217849472
5,Tate Modern,London,"6,098,340 (2019)\n","(Tate Modern, Bankside, Borough, London Boroug...","(51.507456649999995, -0.09934365539515741)",51.50745665,-0.0993436553951574
6,National Gallery,London,"6,011,007 (2019)\n","(National Gallery, St Martin's Street, St. Jam...","(51.5088392, -0.12844740211268446)",51.5088392,-0.1284474021126844
7,State Hermitage Museum,Saint Petersburg,"4,956,529 (2019)\n","(Государственный Эрмитаж, набережная Зимней ка...","(59.9412076, 30.315486517765795)",59.9412076,30.315486517765795
8,Museo Reina Sofía,Madrid,"4,425,699 (2019)\n","(Museo Nacional Centro de Arte Reina Sofía, Ca...","(40.4081606, -3.6934515355572533)",40.4081606,-3.693451535557253
9,National Gallery of Art,"Washington, D.C.","4,074,403 (2019)\n","(National Gallery of Art, Madison Drive Northw...","(38.89129405, -77.01988353942042)",38.89129405,-77.01988353942042


**Now we drop the latlong column as we don't need it anymore**

In [12]:
#drop the latlong column using the drop function
museums_df = museums_df.drop(['latlong'],axis=1)
museums_df.head(10)

Unnamed: 0,name,city,visitors,address,lat,long
0,Musée du Louvre,Paris,"9,600,000 (2019)\n","(Musée du Louvre, Rue Saint-Honoré, Quartier d...",48.8611473,2.33802768704666
1,National Museum of China,Beijing,"7,390,000 (2019)\n","(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3...",39.9037459,116.39538964666366
2,Vatican Museums,Vatican City (Rome),"6,882,931 (2019)\n","(Musei Vaticani, Stradone dei Giardini, Città ...",41.90496095,12.454661671968116
3,Metropolitan Museum of Art,New York City,"6,479,548 (2019)\n","(The Metropolitan Museum of Art, 5th Avenue, M...",40.779443650000005,-73.96336411385192
4,British Museum,London,"6,239,983 (2019)\n","(British Museum, Great Russell Street, Holborn...",51.51929365,-0.1280177217849472
5,Tate Modern,London,"6,098,340 (2019)\n","(Tate Modern, Bankside, Borough, London Boroug...",51.50745665,-0.0993436553951574
6,National Gallery,London,"6,011,007 (2019)\n","(National Gallery, St Martin's Street, St. Jam...",51.5088392,-0.1284474021126844
7,State Hermitage Museum,Saint Petersburg,"4,956,529 (2019)\n","(Государственный Эрмитаж, набережная Зимней ка...",59.9412076,30.315486517765795
8,Museo Reina Sofía,Madrid,"4,425,699 (2019)\n","(Museo Nacional Centro de Arte Reina Sofía, Ca...",40.4081606,-3.693451535557253
9,National Gallery of Art,"Washington, D.C.","4,074,403 (2019)\n","(National Gallery of Art, Madison Drive Northw...",38.89129405,-77.01988353942042


**For better understanding of the data, we remove any extra substrings in the visitors column**

In [13]:
#drop any extra substrings from the 'visitors' column
museums_df['visitors'] = museums_df['visitors'].str.replace(r"\s*\([^()]*\)","").str.strip()
museums_df.head(10)

Unnamed: 0,name,city,visitors,address,lat,long
0,Musée du Louvre,Paris,9600000,"(Musée du Louvre, Rue Saint-Honoré, Quartier d...",48.8611473,2.33802768704666
1,National Museum of China,Beijing,7390000,"(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3...",39.9037459,116.39538964666366
2,Vatican Museums,Vatican City (Rome),6882931,"(Musei Vaticani, Stradone dei Giardini, Città ...",41.90496095,12.454661671968116
3,Metropolitan Museum of Art,New York City,6479548,"(The Metropolitan Museum of Art, 5th Avenue, M...",40.779443650000005,-73.96336411385192
4,British Museum,London,6239983,"(British Museum, Great Russell Street, Holborn...",51.51929365,-0.1280177217849472
5,Tate Modern,London,6098340,"(Tate Modern, Bankside, Borough, London Boroug...",51.50745665,-0.0993436553951574
6,National Gallery,London,6011007,"(National Gallery, St Martin's Street, St. Jam...",51.5088392,-0.1284474021126844
7,State Hermitage Museum,Saint Petersburg,4956529,"(Государственный Эрмитаж, набережная Зимней ка...",59.9412076,30.315486517765795
8,Museo Reina Sofía,Madrid,4425699,"(Museo Nacional Centro de Arte Reina Sofía, Ca...",40.4081606,-3.693451535557253
9,National Gallery of Art,"Washington, D.C.",4074403,"(National Gallery of Art, Madison Drive Northw...",38.89129405,-77.01988353942042


**Now we need to transform the latitude and longitude to the type float so we can display the locations on the map**

In [14]:
#chane the type of the latitude and longitude to type float
museums_df[['lat','long']] = museums_df[['lat','long']].astype(float)

**Now we want to display our museums on the world map using folium**

In [15]:
#declare a map variable without specifying the location 
museums_map = folium.Map(zoom_start=2)
#iterate over the latitude and longitude values in the dataset
for lat, lng, label in zip(museums_df['lat'],museums_df['long'], museums_df['name']):
    #declare a label variable with popup fuction to see museum name when clicking on its marker
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='crimson',
        fill=True,
        fill_opacity=0.5,
        parse_html=False).add_to(museums_map)
museums_map

looks great! Now that we prepared our dataset and visualised the locations of the world's top museums on the map, it's time to dig in our exploratory analysis of the venues around these museums using the **Foursquare API** 

But first we have to assign the credintials to be able to communicate with the API.

In [21]:
CLIENT_ID = '1LW5SZUNUAKQKIPPCOWZYFMRXI2C0QBWBCTS2LENXMKHL04W' #Foursquare ID
CLIENT_SECRET = 'JDYR4BKZHOSTQTYC4SYG0MYB1JV3ZLDY0E13V4AJSNN1WAOE' #your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1LW5SZUNUAKQKIPPCOWZYFMRXI2C0QBWBCTS2LENXMKHL04W
CLIENT_SECRET:JDYR4BKZHOSTQTYC4SYG0MYB1JV3ZLDY0E13V4AJSNN1WAOE


**Now, Let's explore the venues that are currently existing around the GEM as a start**

In [22]:
museums_df.loc[70, 'name']

'Grand Egyptian Museum'

In [23]:
museum_lat = museums_df.loc[70, 'lat'] # museum latitude 
museum_lon = museums_df.loc[70, 'long'] # museum longitude

museum_name = museums_df.loc[70, 'name'] # museum name

print('Lat and long of {} are {}, {}.'.format(museum_name, 
                                                               museum_lat, 
                                                               museum_lon))

Lat and long of Grand Egyptian Museum are 29.9937269, 31.119626270969988.


**Now that we declared variables containing the location data of the GEM, let's explore the top 100 venues within 5km from the project**

In [24]:
radius = 5000
LIMIT = 100
url1 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, museum_lat, museum_lon, VERSION, radius, LIMIT)
url1

'https://api.foursquare.com/v2/venues/explore?client_id=1LW5SZUNUAKQKIPPCOWZYFMRXI2C0QBWBCTS2LENXMKHL04W&client_secret=JDYR4BKZHOSTQTYC4SYG0MYB1JV3ZLDY0E13V4AJSNN1WAOE&ll=29.9937269,31.119626270969988&v=20180605&radius=5000&limit=100'

**Let's write a function the gets the category of each venue from the json file we get**

In [25]:
results = requests.get(url1).json()

#function that extracts venue category
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

**Now let's take a look at the resulting dataframe of the venues around GEM**

In [26]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)


# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,id,name,categories,lat,lng
0,4fb0f892e4b085862f70f56c,Pyramid View,Scenic Lookout,29.98907,31.130397
1,4fb791b2e4b0cbce351e2eb2,Grand Egyptian Museum (المتحف المصري الكبير),History Museum,29.991405,31.118413
2,50d7b88ae4b0fb6f061011d4,Garden at Marriott Mena House,Garden,29.985365,31.133882
3,4e719435fa76b23d318333cc,On The Run,Convenience Store,29.986562,31.134501
4,56c1fa69cd100eb96d0b6e9d,Pyramid of Cheops (Khufu),Historic Site,29.97897,31.134245


**Now we creat a new column containing the name and category of each venue so we can use it as a label on our map visualization**

In [27]:
#creat a label column containing the name and category of each venue
nearby_venues['label'] = nearby_venues['name'] + ', ' + nearby_venues['categories']

In [28]:
#print how many venues were returned by Foursquare
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

91 venues were returned by Foursquare.


**Let's visualize the distribution of the venues around GEM**

In [29]:
#assign the coordinates of the city of Giza
giza_latlong = [30.0166666, 31.2166658]
#declare a map variable with coordinates of Giza
gem_map = folium.Map(location=giza_latlong , zoom_start=12)
#ititrate over the venues in the dataframe
for lat, lng, label in zip(nearby_venues['lat'],nearby_venues['lng'], nearby_venues['label']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='crimson',
        fill=True,
        fill_opacity=0.5,
        parse_html=False).add_to(gem_map)
#display map
gem_map

**Looks great! Now let's see how we can do this for all the other museums in our dataset**

# Methodology <a name="methodology"></a>

In this project we will try to understand the demographic of venues around the most visited museums around the world. but first we have to explore each museum and the top venues around it. Then, we can run some cluster analysis using K-means algorithm in order to find out the similarities between these museums. More importantly, to find out which cluster the GEM belongs to. Afterwards, we can start taking a closer look at the venues around the museums that belong at the same cluster as the GEM. Finally, we will compare between the categories of venues around the GEM and the categories of venues around these museums, so we can determine which venues seemed to be successful given the same circumstances and demographic.

**For now, let's write a function that gets the nearby venues for all our museums**

In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['museum', 
                  'museum Latitude', 
                  'museum Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
#run the function on our dataframe
museums_venues = getNearbyVenues(names=museums_df['name'],
                                   latitudes=museums_df['lat'],
                                   longitudes=museums_df['long']
                                  )

Musée du Louvre
National Museum of China
Vatican Museums
Metropolitan Museum of Art
British Museum
Tate Modern
National Gallery
State Hermitage Museum
Museo Reina Sofía
National Gallery of Art
Victoria and Albert Museum
National Palace Museum
Musée d'Orsay
Museo del Prado
National Museum of Korea
Moscow Kremlin Museums
National Gallery of Victoria
Tokyo Metropolitan Art Museum
Somerset House
Tretyakov Gallery
Rijksmuseum
Tokyo National Museum
Centro Cultural Banco do Brasil
State Russian Museum
Galleria degli Uffizi
National Folk Museum of Korea
National Museum of Scotland
Van Gogh Museum
Shanghai Museum
Museum of Modern Art
The National Art Center, Tokyo
Kunsthistorisches Museum
Kelvingrove Art Gallery and Museum
National Gallery Singapore
Tate Britain
Acropolis Museum
Belvedere
Galleria dell'Accademia
Art Institute of Chicago
National Portrait Gallery
National Museum of Western Art
Scottish National Gallery
Centro Cultural Banco do Brasil
Pushkin State Museum of Fine Arts
Getty Cente

In [32]:
print(museums_venues.shape)
museums_venues

(4132, 7)


Unnamed: 0,museum,museum Latitude,museum Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Musée du Louvre,48.861147,2.338028,Cour Carrée du Louvre,48.860360,2.338543,Pedestrian Plaza
1,Musée du Louvre,48.861147,2.338028,Musée du Louvre,48.860847,2.336440,Art Museum
2,Musée du Louvre,48.861147,2.338028,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
3,Musée du Louvre,48.861147,2.338028,Place du Palais Royal,48.862523,2.336688,Plaza
4,Musée du Louvre,48.861147,2.338028,Palais Royal,48.863236,2.337127,Historic Site
...,...,...,...,...,...,...,...
4127,Albertina,48.204636,16.368261,Flanagans Irish Pub,48.203174,16.373880,Irish Pub
4128,Albertina,48.204636,16.368261,Café de l'Europe,48.208225,16.371121,Café
4129,Albertina,48.204636,16.368261,The Third Man Sewer Tour (Der dritte Mann Kana...,48.200688,16.367314,Tour Provider
4130,Albertina,48.204636,16.368261,Secession,48.200504,16.365767,Art Museum


**Let's execlude the venues with irrelevant categories**

In [33]:
museums_venues = museums_venues[~museums_venues['Venue Category'].isin(['Museum','Art Museum','Historic Site'])]
museums_venues

Unnamed: 0,museum,museum Latitude,museum Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Musée du Louvre,48.861147,2.338028,Cour Carrée du Louvre,48.860360,2.338543,Pedestrian Plaza
2,Musée du Louvre,48.861147,2.338028,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
3,Musée du Louvre,48.861147,2.338028,Place du Palais Royal,48.862523,2.336688,Plaza
5,Musée du Louvre,48.861147,2.338028,Cour Napoléon,48.861172,2.335088,Plaza
6,Musée du Louvre,48.861147,2.338028,Comédie-Française,48.863088,2.336612,Theater
...,...,...,...,...,...,...,...
4126,Albertina,48.204636,16.368261,Peek & Cloppenburg,48.205615,16.371418,Clothing Store
4127,Albertina,48.204636,16.368261,Flanagans Irish Pub,48.203174,16.373880,Irish Pub
4128,Albertina,48.204636,16.368261,Café de l'Europe,48.208225,16.371121,Café
4129,Albertina,48.204636,16.368261,The Third Man Sewer Tour (Der dritte Mann Kana...,48.200688,16.367314,Tour Provider


In [34]:
#group the venues by museum to see how many venues returned for each museum
museums_venues.groupby('museum').count()

Unnamed: 0_level_0,museum Latitude,museum Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
museum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acropolis Museum,82,82,82,82,82,82
Albertina,43,43,43,43,43,43
Art Gallery of New South Wales,21,21,21,21,21,21
Art Institute of Chicago,48,48,48,48,48,48
Belvedere,32,32,32,32,32,32
...,...,...,...,...,...,...
Tretyakov Gallery,47,47,47,47,47,47
Van Gogh Museum,92,92,92,92,92,92
Vatican Museums,37,37,37,37,37,37
Victoria and Albert Museum,96,96,96,96,96,96


**Now let's check how many unique venue categories are there in our dataset**

In [35]:
print('There are {} uniques categories.'.format(len(museums_venues['Venue Category'].unique())))

There are 351 uniques categories.


**Now, we start the data preprocessing for cluster analysis**

In [36]:
#one hot encoding
museums_onehot = pd.get_dummies(museums_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
museums_onehot['museum'] = museums_venues['museum'] 

# move neighborhood column to the first column
fixed_columns = [museums_onehot.columns[-1]] + list(museums_onehot.columns[:-1])
museums_onehot = museums_onehot[fixed_columns]

museums_onehot.head()

Unnamed: 0,museum,ATM,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant,Yunnan Restaurant,Zoo,Zoo Exhibit
0,Musée du Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Musée du Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Musée du Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Musée du Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Musée du Louvre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [37]:
museums_grouped = museums_onehot.groupby('museum').mean().reset_index()
museums_grouped

Unnamed: 0,museum,ATM,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant,Yunnan Restaurant,Zoo,Zoo Exhibit
0,Acropolis Museum,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,...,0.012195,0.0,0.0,0.0,0.000000,0.012195,0.0,0.0,0.0,0.0
1,Albertina,0.0,0.0,0.0,0.000000,0.000000,0.023256,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
2,Art Gallery of New South Wales,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
3,Art Institute of Chicago,0.0,0.0,0.0,0.041667,0.020833,0.000000,0.0,0.0,0.020833,...,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
4,Belvedere,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tretyakov Gallery,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
66,Van Gogh Museum,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.000000,0.010870,0.0,0.0,0.0,0.0
67,Vatican Museums,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
68,Victoria and Albert Museum,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0


**Afterwards, we need to write a function that returns the most common venue category for each museum**

In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

**Now let's creat a new dataframe with the top 10 most common venues for each museum**

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['museum']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th most Common Venue'.format(ind+1))

# create a new dataframe
museums_venues_sorted = pd.DataFrame(columns=columns)
museums_venues_sorted['museum'] = museums_grouped['museum']

for ind in np.arange(museums_grouped.shape[0]):
    museums_venues_sorted.iloc[ind, 1:] = return_most_common_venues(museums_grouped.iloc[ind, :], num_top_venues)

museums_venues_sorted.head(50)

Unnamed: 0,museum,1st most Common Venue,2nd most Common Venue,3rd most Common Venue,4th most Common Venue,5th most Common Venue,6th most Common Venue,7th most Common Venue,8th most Common Venue,9th most Common Venue,10th most Common Venue
0,Acropolis Museum,Café,Hotel,Greek Restaurant,Coffee Shop,Ice Cream Shop,Vegetarian / Vegan Restaurant,Bistro,Grilled Meat Restaurant,Pedestrian Plaza,Restaurant
1,Albertina,Plaza,Café,Restaurant,Hotel,Cocktail Bar,Pedestrian Plaza,Supermarket,Palace,Brewery,Park
2,Art Gallery of New South Wales,Italian Restaurant,Sandwich Place,Café,Hotel,Pub,Australian Restaurant,Park,Tunnel,Pie Shop,Coffee Shop
3,Art Institute of Chicago,Park,Pub,Plaza,Theater,Middle Eastern Restaurant,Concert Hall,Hotel,Garden,American Restaurant,Music Venue
4,Belvedere,Hotel,Pizza Place,Café,Supermarket,Garden,Bakery,Palace,Exhibit,Train Station,Botanical Garden
5,British Museum,Coffee Shop,Café,Hotel,Pub,Bookstore,Italian Restaurant,Plaza,Bar,Cocktail Bar,Exhibit
6,Centro Cultural Banco do Brasil,Bookstore,Coffee Shop,Café,Brazilian Restaurant,Church,Bar,Tram Station,Salad Place,Snack Place,History Museum
7,De Young Museum,Garden,Park,Gift Shop,Science Museum,Exhibit,Lake,Outdoor Sculpture,Sculpture Garden,Botanical Garden,Bus Stop
8,Galleria degli Uffizi,Hotel,Italian Restaurant,Plaza,Ice Cream Shop,Café,Boutique,Art Gallery,Cocktail Bar,Sandwich Place,Bridge
9,Galleria dell'Accademia,Italian Restaurant,Café,Hotel,Ice Cream Shop,Pizza Place,Plaza,Sandwich Place,Bakery,Vegetarian / Vegan Restaurant,Scenic Lookout


**Great! Now let's start our cluster analysis**

In [40]:
# set number of clusters
kclusters = 5

museums_grouped_clustering = museums_grouped.drop('museum', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(museums_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 2, 0, 2, 3, 3, 0, 2, 2], dtype=int32)

In [41]:
# add clustering labels
# museums_venues_sorted.drop('Cluster Labels', axis=1, inplace=True)
museums_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

museums_merged = museums_df
museums_merged.rename({'name':'museum'}, axis=1, inplace=True)

museums_merged = museums_merged.join(museums_venues_sorted.set_index('museum'), on='museum')

museums_merged.dropna(subset=["Cluster Labels"], axis=0, inplace=True)

museums_merged['Cluster Labels'] = museums_merged['Cluster Labels'].astype(int)

museums_merged.head(20)

Unnamed: 0,museum,city,visitors,address,lat,long,Cluster Labels,1st most Common Venue,2nd most Common Venue,3rd most Common Venue,4th most Common Venue,5th most Common Venue,6th most Common Venue,7th most Common Venue,8th most Common Venue,9th most Common Venue,10th most Common Venue
0,Musée du Louvre,Paris,9600000,"(Musée du Louvre, Rue Saint-Honoré, Quartier d...",48.861147,2.338028,3,French Restaurant,Plaza,Hotel,Garden,Italian Restaurant,Cosmetics Shop,Boutique,Furniture / Home Store,Bakery,Perfume Shop
1,National Museum of China,Beijing,7390000,"(16, 东长安街, 北京市, 东城区, 北京市, 100006, China 中国, (3...",39.903746,116.39539,0,Chinese Restaurant,Yunnan Restaurant,Metro Station,French Restaurant,Castle,History Museum,Scenic Lookout,Park,Plaza,Monument / Landmark
2,Vatican Museums,Vatican City (Rome),6882931,"(Musei Vaticani, Stradone dei Giardini, Città ...",41.904961,12.454662,2,Sandwich Place,Art Gallery,Italian Restaurant,Ice Cream Shop,Church,Pizza Place,Hobby Shop,Pastry Shop,Monument / Landmark,Platform
3,Metropolitan Museum of Art,New York City,6479548,"(The Metropolitan Museum of Art, 5th Avenue, M...",40.779444,-73.963364,3,Exhibit,Park,Food Truck,Outdoor Sculpture,Music Venue,Monument / Landmark,Italian Restaurant,Chocolate Shop,Baseball Field,Coffee Shop
4,British Museum,London,6239983,"(British Museum, Great Russell Street, Holborn...",51.519294,-0.128018,3,Coffee Shop,Café,Hotel,Pub,Bookstore,Italian Restaurant,Plaza,Bar,Cocktail Bar,Exhibit
5,Tate Modern,London,6098340,"(Tate Modern, Bankside, Borough, London Boroug...",51.507457,-0.099344,3,Italian Restaurant,Hotel,Pub,Gym / Fitness Center,Coffee Shop,Music Venue,Performing Arts Venue,Wine Bar,Grocery Store,Tapas Restaurant
6,National Gallery,London,6011007,"(National Gallery, St Martin's Street, St. Jam...",51.508839,-0.128447,0,Plaza,Ice Cream Shop,Hotel,Pub,Theater,Burger Joint,Fountain,Steakhouse,Spanish Restaurant,Cocktail Bar
7,State Hermitage Museum,Saint Petersburg,4956529,"(Государственный Эрмитаж, набережная Зимней ка...",59.941208,30.315487,0,Furniture / Home Store,Garden,Exhibit,History Museum,Hotel,Outdoor Sculpture,Palace,Flower Shop,Canal,Theater
8,Museo Reina Sofía,Madrid,4425699,"(Museo Nacional Centro de Arte Reina Sofía, Ca...",40.408161,-3.693452,2,Spanish Restaurant,Hotel,Café,Garden,Restaurant,Sandwich Place,Plaza,Train Station,Bookstore,Electronics Store
9,National Gallery of Art,"Washington, D.C.",4074403,"(National Gallery of Art, Madison Drive Northw...",38.891294,-77.019884,3,Exhibit,Science Museum,History Museum,Deli / Bodega,Plaza,Indian Restaurant,Gift Shop,Theme Park Ride / Attraction,Theater,Food Truck


**Now let's visualize our clusters on the map**

In [42]:
map_clusters = folium.Map(zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(museums_merged['lat'], museums_merged['long'], museums_merged['museum'], museums_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=False)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Notice how the cluster that contains the GEM contains only one more museum, The National Museum in Gyeongju. Which suggests that the demographic around both museums is very similar.**

**Now let's look at the datasets of each cluster to get a better understaning of the similarities between clusters**

In [43]:
museums_merged.loc[museums_merged['Cluster Labels'] == 0, museums_merged.columns[[1] + list(range(5, museums_merged.shape[1]))]]

Unnamed: 0,city,long,Cluster Labels,1st most Common Venue,2nd most Common Venue,3rd most Common Venue,4th most Common Venue,5th most Common Venue,6th most Common Venue,7th most Common Venue,8th most Common Venue,9th most Common Venue,10th most Common Venue
1,Beijing,116.39539,0,Chinese Restaurant,Yunnan Restaurant,Metro Station,French Restaurant,Castle,History Museum,Scenic Lookout,Park,Plaza,Monument / Landmark
6,London,-0.128447,0,Plaza,Ice Cream Shop,Hotel,Pub,Theater,Burger Joint,Fountain,Steakhouse,Spanish Restaurant,Cocktail Bar
7,Saint Petersburg,30.315487,0,Furniture / Home Store,Garden,Exhibit,History Museum,Hotel,Outdoor Sculpture,Palace,Flower Shop,Canal,Theater
11,Taipei,121.548878,0,History Museum,Convenience Store,Hostel,Dim Sum Restaurant,Chinese Restaurant,Park,Bus Station,Taiwanese Restaurant,Theater,Gift Shop
15,Moscow,37.61573,0,Plaza,History Museum,Palace,Art Gallery,Park,Government Building,Event Space,Dumpling Restaurant,Ice Cream Shop,Garden
18,London,-0.118109,0,Theater,Pub,Park,Restaurant,Hotel,Event Space,Clothing Store,Shopping Plaza,Skate Park,Electronics Store
38,Chicago,-87.623083,0,Park,Pub,Plaza,Theater,Middle Eastern Restaurant,Concert Hall,Hotel,Garden,American Restaurant,Music Venue
39,London,-0.128228,0,Ice Cream Shop,Plaza,Hotel,Theater,Pub,Bakery,Steakhouse,Liquor Store,Burger Joint,Electronics Store
64,San Francisco,-122.469128,0,Garden,Park,Gift Shop,Science Museum,Exhibit,Lake,Outdoor Sculpture,Sculpture Garden,Botanical Garden,Bus Stop


In [44]:
museums_merged.loc[museums_merged['Cluster Labels'] == 1, museums_merged.columns[[1] + list(range(5, museums_merged.shape[1]))]]

Unnamed: 0,city,long,Cluster Labels,1st most Common Venue,2nd most Common Venue,3rd most Common Venue,4th most Common Venue,5th most Common Venue,6th most Common Venue,7th most Common Venue,8th most Common Venue,9th most Common Venue,10th most Common Venue
51,Gyeongju,129.227933,1,History Museum,Intersection,Zoo Exhibit,Flower Shop,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop
70,Cairo,31.119626,1,History Museum,Garden,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Flea Market


In [45]:
museums_merged.loc[museums_merged['Cluster Labels'] == 2, museums_merged.columns[[1] + list(range(5, museums_merged.shape[1]))]]

Unnamed: 0,city,long,Cluster Labels,1st most Common Venue,2nd most Common Venue,3rd most Common Venue,4th most Common Venue,5th most Common Venue,6th most Common Venue,7th most Common Venue,8th most Common Venue,9th most Common Venue,10th most Common Venue
2,Vatican City (Rome),12.454662,2,Sandwich Place,Art Gallery,Italian Restaurant,Ice Cream Shop,Church,Pizza Place,Hobby Shop,Pastry Shop,Monument / Landmark,Platform
8,Madrid,-3.693452,2,Spanish Restaurant,Hotel,Café,Garden,Restaurant,Sandwich Place,Plaza,Train Station,Bookstore,Electronics Store
10,London,-0.171572,2,Café,Exhibit,Science Museum,Garden,Italian Restaurant,Hotel,Coffee Shop,Ice Cream Shop,French Restaurant,Sandwich Place
13,Madrid,-3.692005,2,Spanish Restaurant,Hotel,Restaurant,Café,Plaza,Tapas Restaurant,Park,Garden,Bar,Italian Restaurant
14,Seoul,126.980209,2,Sandwich Place,Lake,Theater,Gym / Fitness Center,Bus Stop,Park,Gym,Café,Zoo Exhibit,Flower Shop
16,Melbourne,144.968909,2,Bar,Hotel,Italian Restaurant,Art Gallery,Theater,Performing Arts Venue,Café,Park,Grocery Store,Australian Restaurant
19,Moscow,37.62022,2,Café,Coffee Shop,Plaza,Fountain,Flower Shop,Park,Road,Restaurant,Dance Studio,Spa
24,Florence,11.255801,2,Hotel,Italian Restaurant,Plaza,Ice Cream Shop,Café,Boutique,Art Gallery,Cocktail Bar,Sandwich Place,Bridge
26,Edinburgh,-3.189336,2,Hotel,Restaurant,Café,Pub,Scottish Restaurant,Art Gallery,Gastropub,Scenic Lookout,Bar,Indian Restaurant
29,New York City,14.498766,2,Café,Eastern European Restaurant,Restaurant,Hotel,Bistro,Hostel,Park,Thai Restaurant,Bar,Art Gallery


In [46]:
museums_merged.loc[museums_merged['Cluster Labels'] == 3, museums_merged.columns[[1] + list(range(5, museums_merged.shape[1]))]]

Unnamed: 0,city,long,Cluster Labels,1st most Common Venue,2nd most Common Venue,3rd most Common Venue,4th most Common Venue,5th most Common Venue,6th most Common Venue,7th most Common Venue,8th most Common Venue,9th most Common Venue,10th most Common Venue
0,Paris,2.338028,3,French Restaurant,Plaza,Hotel,Garden,Italian Restaurant,Cosmetics Shop,Boutique,Furniture / Home Store,Bakery,Perfume Shop
3,New York City,-73.963364,3,Exhibit,Park,Food Truck,Outdoor Sculpture,Music Venue,Monument / Landmark,Italian Restaurant,Chocolate Shop,Baseball Field,Coffee Shop
4,London,-0.128018,3,Coffee Shop,Café,Hotel,Pub,Bookstore,Italian Restaurant,Plaza,Bar,Cocktail Bar,Exhibit
5,London,-0.099344,3,Italian Restaurant,Hotel,Pub,Gym / Fitness Center,Coffee Shop,Music Venue,Performing Arts Venue,Wine Bar,Grocery Store,Tapas Restaurant
9,"Washington, D.C.",-77.019884,3,Exhibit,Science Museum,History Museum,Deli / Bodega,Plaza,Indian Restaurant,Gift Shop,Theme Park Ride / Attraction,Theater,Food Truck
12,Paris,2.326583,3,French Restaurant,Garden,Food Truck,Boat or Ferry,Asian Restaurant,Fountain,Brasserie,Pedestrian Plaza,Bookstore,Bistro
17,Tokyo,139.77314,3,Zoo Exhibit,History Museum,Café,Plaza,Science Museum,Convenience Store,Coffee Shop,Concert Hall,Restaurant,Buddhist Temple
20,Amsterdam,4.885058,3,Hotel,Café,Men's Store,Restaurant,Coffee Shop,Jewelry Store,Smoke Shop,Bar,French Restaurant,Asian Restaurant
21,Tokyo,139.776019,3,History Museum,Zoo Exhibit,Sake Bar,Science Museum,Café,Coffee Shop,Concert Hall,Outdoor Sculpture,Art Gallery,BBQ Joint
22,Rio de Janeiro,-43.176319,3,Bookstore,Coffee Shop,Café,Brazilian Restaurant,Church,Bar,Tram Station,Salad Place,Snack Place,History Museum


In [47]:
museums_merged.loc[museums_merged['Cluster Labels'] == 4, museums_merged.columns[[1] + list(range(5, museums_merged.shape[1]))]]

Unnamed: 0,city,long,Cluster Labels,1st most Common Venue,2nd most Common Venue,3rd most Common Venue,4th most Common Venue,5th most Common Venue,6th most Common Venue,7th most Common Venue,8th most Common Venue,9th most Common Venue,10th most Common Venue
44,Los Angeles,121.104524,4,Park,Bubble Tea Shop,Convenience Store,Intersection,Beach,Food,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop


**Okay! Now let's take a closer look at the venues around Gyeongju National Museum**

In [48]:
museums_df.loc[51, 'museum']

'Gyeongju National Museum'

In [49]:
museum_lat = museums_df.loc[51, 'lat'] # museum latitude 
museum_lon = museums_df.loc[51, 'long'] # museum longitude

museum_name = museums_df.loc[51, 'museum'] # museum name

print('Lat and long of {} are {}, {}.'.format(museum_name, 
                                                               museum_lat, 
                                                               museum_lon))

Lat and long of Gyeongju National Museum are 35.82980445, 129.22793290584045.


In [50]:
radius = 5000
LIMIT = 100
url2 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, museum_lat, museum_lon, VERSION, radius, LIMIT)
url2

'https://api.foursquare.com/v2/venues/explore?client_id=1LW5SZUNUAKQKIPPCOWZYFMRXI2C0QBWBCTS2LENXMKHL04W&client_secret=JDYR4BKZHOSTQTYC4SYG0MYB1JV3ZLDY0E13V4AJSNN1WAOE&ll=35.82980445,129.22793290584045&v=20180605&radius=5000&limit=100'

In [51]:
results = requests.get(url2).json()

#function that extracts venue category
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [52]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)


# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,id,name,categories,lat,lng
0,4bd0261da8b3a5936155635f,Donggung Palace and Wolji Pond in Gyeongju (경주...,Historic Site,35.833413,129.226794
1,4e2facd4b0fbdc2b6503f146,국립경주박물관 미술관,Art Museum,35.829041,129.228176
2,4b9c7ccef964a520f96b36e3,Cheomseongdae (첨성대),Historic Site,35.834715,129.218987
3,4bbecc0fb083a5931c92a2e9,Gyeongju National Museum (국립경주박물관),History Museum,35.829856,129.228263
4,4cbf9165b6c4224b4d13f494,계림,Other Great Outdoors,35.832598,129.219219
...,...,...,...,...,...
69,4deaf7fb22713dd973a32982,"금오산(경주남산, 468m)",Mountain,35.788797,129.222806
70,4bc003bf461576b08caf7932,보문호,Lake,35.844667,129.277453
71,5711aa99498e83b7bf22b70a,남정부일기사식당,Korean Restaurant,35.790892,129.205672
72,4ff029b5e4b0730ed84cdcdd,보문수상공연장,Dance Studio,35.850547,129.274322


In [53]:
# show how many unique venue categories are there
print('There are {} uniques categories.'.format(len(nearby_venues['categories'].unique())))


There are 30 uniques categories.


In [54]:
gyeongju_venues = nearby_venues['categories'].unique()
gyeongju_venues

array(['Historic Site', 'Art Museum', 'History Museum',
       'Other Great Outdoors', 'Korean Restaurant', 'Bakery', 'Café',
       'Coffee Shop', 'Market', 'Pizza Place', 'Garden', 'Train Station',
       'Park', 'Bed & Breakfast', 'Botanical Garden', 'Bunsik Restaurant',
       'Hotel', 'Fast Food Restaurant', 'Rest Area', 'Ice Cream Shop',
       'Noodle House', 'Supermarket', 'Zoo', 'Museum', 'BBQ Joint',
       'Lake', 'Italian Restaurant', 'Steakhouse', 'Mountain',
       'Dance Studio'], dtype=object)

**Now let's take a closer look at the venues around the GEM**

In [56]:
museums_df.loc[70, 'museum']

'Grand Egyptian Museum'

In [57]:
museum_lat = museums_df.loc[70, 'lat'] # museum latitude 
museum_lon = museums_df.loc[70, 'long'] # museum longitude

museum_name = museums_df.loc[70, 'museum'] # museum name

print('Lat and long of {} are {}, {}.'.format(museum_name, 
                                                               museum_lat, 
                                                               museum_lon))

Lat and long of Grand Egyptian Museum are 29.9937269, 31.119626270969988.


In [58]:
radius = 5000
LIMIT = 100
url3 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, museum_lat, museum_lon, VERSION, radius, LIMIT)
url3

'https://api.foursquare.com/v2/venues/explore?client_id=1LW5SZUNUAKQKIPPCOWZYFMRXI2C0QBWBCTS2LENXMKHL04W&client_secret=JDYR4BKZHOSTQTYC4SYG0MYB1JV3ZLDY0E13V4AJSNN1WAOE&ll=29.9937269,31.119626270969988&v=20180605&radius=5000&limit=100'

In [59]:
results = requests.get(url3).json()

venues = results['response']['groups'][0]['items']
    
gem_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
gem_venues =gem_venues.loc[:, filtered_columns]

# filter the category for each row
gem_venues['venue.categories'] = gem_venues.apply(get_category_type, axis=1)


# clean columns
gem_venues.columns = [col.split(".")[-1] for col in gem_venues.columns]

gem_venues

  gem_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,id,name,categories,lat,lng
0,4fb0f892e4b085862f70f56c,Pyramid View,Scenic Lookout,29.989070,31.130397
1,4fb791b2e4b0cbce351e2eb2,Grand Egyptian Museum (المتحف المصري الكبير),History Museum,29.991405,31.118413
2,50d7b88ae4b0fb6f061011d4,Garden at Marriott Mena House,Garden,29.985365,31.133882
3,4e719435fa76b23d318333cc,On The Run,Convenience Store,29.986562,31.134501
4,56c1fa69cd100eb96d0b6e9d,Pyramid of Cheops (Khufu),Historic Site,29.978970,31.134245
...,...,...,...,...,...
86,594ec0762be42528bcafc3df,Sun City Cafe,Café,29.956413,31.103987
87,51c4e2e7498ed84020411a77,mal3b el sha3er,Soccer Field,29.962746,31.150207
88,4f217d6ce4b0b238b89c977f,Souq Kirdasah (سوق كرداسة),Flea Market,30.035706,31.118066
89,5d15c2dd6d54f80023b65063,الاهرامات,Memorial Site,29.959431,31.151353


In [60]:
print('There are {} uniques categories.'.format(len(gem_venues['categories'].unique())))

There are 38 uniques categories.


In [63]:
# show unique venue categories
GEM_venues = gem_venues['categories'].unique()
GEM_venues

array(['Scenic Lookout', 'History Museum', 'Garden', 'Convenience Store',
       'Historic Site', 'Resort', 'Pastry Shop', 'Gym / Fitness Center',
       'Restaurant', 'Souvenir Shop', 'Café', 'Kebab Restaurant',
       'Falafel Restaurant', 'Burger Joint', 'Coffee Shop', 'Lounge',
       'Seafood Restaurant', 'Amphitheater', 'Hotel', 'Film Studio',
       'Art Museum', 'Bed & Breakfast', 'Supermarket', 'Sports Club',
       'Mobile Phone Shop', 'Pizza Place', 'Pharmacy', 'Gift Shop',
       'Fried Chicken Joint', 'Arts & Crafts Store',
       'Fast Food Restaurant', 'Shopping Mall', 'Moving Target',
       'Stables', 'Bakery', 'Soccer Field', 'Flea Market',
       'Memorial Site'], dtype=object)

**Now let's look at what categories of venues exist around Gyeongju National Museum and don't exist around GEM**

In [66]:
suggested_venues = np.setdiff1d(gyeongju_venues, GEM_venues)
suggested_venues

array(['BBQ Joint', 'Botanical Garden', 'Bunsik Restaurant',
       'Dance Studio', 'Ice Cream Shop', 'Italian Restaurant',
       'Korean Restaurant', 'Lake', 'Market', 'Mountain', 'Museum',
       'Noodle House', 'Other Great Outdoors', 'Park', 'Rest Area',
       'Steakhouse', 'Train Station', 'Zoo'], dtype=object)

## Results and Discussion <a name="results"></a>

As our cluster analysis suggests, the area around the GEM has a very similar demographic to the Gyeongju National Museum's. This includes having a historic site, and a history museum nearby. Which is one of the factors shaping the demographic of the venues around these museums. By looking at the venues that exist around Gyeongju National Museum, we'll notice there are more outdoor activities for visitors to enjoy a well rounded cultural experience. This suggests that investing in an outdoor activity venue in the area around GEM is very promising. Especially considering Egypt's year round warm and sunny weather. Also, we can notice a more diverse food and beverage scene around Gyeongju National Museum. This suggests that the area around GEM could benefit from more diversity in terms of food and beverage.

## Conclusion <a name="conclusion"></a>

To conclude, our capstone project included the utilization of our data science toolkit in order to find out where should a potential investor invest their money around the Grand Egyptian Museum? Which is expected to open its doors for visitors in 2021. Our capstone project provided us with a valuable insight regarding the demographic of venues around GEM. According to our breif study of the area, in comparison to the area of the Gyeongju National Museum, the area around GEM could benefit from outdoor activity venues that would provide a more rounded cultural experience for visitors. In addition, the area could also benefit from a more diverse food and beverage scene, especially considering the diversity of the expected visitors to the museum.