## Business Problem

LoadsOfFun is a travel company aim to deliver unforgettable short excursions to their customers. Their customer are usually people travel for business, they are in town for a short time either working or attending a conference, who usually have a day or half a day of free time to explore the country they are visiting, they want to make the most out of their limited free time but their busy schedule do not allow them to do research or plan ahead.

These customers will not join a regular tour at some fixed schedule, they want to see the major sights but after taking a memorable picture, they do not want to spend a full day there.  They want to experience what locals enjoy and emerge in their culture briefly and take back with them a memorable day-in-life of being a local.

LoadsOfFun decided to fill the gap by offering short excursions that delivers local cultural experience.  The product development team lead, Helga, forms a small team to work on a pilot tour package for customer trial.  Her team are composed of young IT professionals equipped with data science skills who are enthusiastic travelers themselves.  Helga picked the location Iceland, it is her hometown and is a place filled with adventurous venues, she is eager to see what her team can come up with.



## How the Data is used to solve the problem

The development team members have not personally traveled to Iceland before, to get some understanding of the country and what is there to do, they decided to get the data from the internet. They teams’ plan is as follow. 

1.	Get the postal codes, city, and neighborhood names of Iceland from website.
2.	Get the latitudes and longitudes of the city and neighborhood names. 
4.	Get the top 10 popular venues of each neighborhood in a city from Foursquare using API.
5.	Get the category type of the venues.
6.	Perform Machine Learning on the neighborhood data using K-means. 
7.	Display clusters of data on map showing similarities.
8.	Analyze the data to create the excursion and recommandations.

The result will be reviewed by LoadsOfFun stakeholders who funded the project. The feedback is used to improve the pilot and develop the final product.

If the pilot is successful, LoadsOfFun plans to develop similar day-trip packages in other country, starting in Europe and US and may be Asia.

In [1]:
from bs4 import BeautifulSoup

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install geopy

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Analysis
#### Postal code tells information about the country such as cities and neighborhoods.  Using postal code allows us to drill down to the streets and what is on the street.  This information is what LoadsOfFun is focused on when creating excursion packages.
#### The website is https://en.wikipedia.org/wiki/List_of_postal_codes_in_Iceland it contains accurate data many organizations reference to.
### Action
#### Get postal code data from https://en.wikipedia.org/wiki/List_of_postal_codes_in_Iceland 
 

In [2]:
# This is to get the remote region postal code

#import pandas as pd
#table_MN = pd.read_html('https://en.m.wikipedia.org/wiki/List_of_postal_codes_in_Iceland')
#print(len(table_MN))
#df1=table_MN[0]
#df1.rename(columns={"Area served (district)":"Area served"},inplace=True)
#df2=pd.concat(table_MN[1:])
#df1.head

In [3]:
html_data=requests.get('https://en.m.wikipedia.org/wiki/List_of_postal_codes_in_Iceland#2xx:_Capital_Region_and_Southern_Peninsula').text

In [4]:
# read in data using beautiful soup

soup=BeautifulSoup(html_data, 'lxml')
soup

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes in Iceland - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"47a7c645-7b20-44cf-aef3-3fe3ca0bba96","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_in_Iceland","wgTitle":"List of postal codes in Iceland","wgCurRevisionId":1002032729,"wgRevisionId":1002032729,"wgArticleId":2196618,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgRelevantPageName":"List_of_postal_codes_in_Iceland","wgRelevantArt

### Analysis
#### After reviewing the data from wikipedia, we learn that Iceland postal code are divied based on regions, we are interested in the Capital Region and Southern Peninsula.
### Action
#### Create dataframe to store postal code, city, and neighborhood data

In [5]:
# create dataframe

column_names = ['Postal Code', 'City', 'PO address']
df = pd.DataFrame(columns=column_names)
df

Unnamed: 0,Postal Code,City,PO address


In [6]:
# populate the dataframe

table = soup.find('table')
row_num = 0
for row in table.find("tbody").find_all("tr"):
    
    #print(row)
    col = row.find_all("td")
    #print(col)
    if (col != []):
        postalcode = col[0].text
        city = col[1].text
        po_address = col[2].text
        df = df.append({'Postal Code': postalcode,
                'City':city,
                'PO address': po_address}, ignore_index=True)
        

In [7]:
df.head(30)

Unnamed: 0,Postal Code,City,PO address
0,101\n,Reykjavík (Miðborg)\n,Hagatorg 1\n
1,102\n,Reykjavík (Vesturbær)\n,Hagatorg 1\n
2,103\n,Reykjavík (Háaleiti og Bústaðir)\n,"Síðumúla 3-5, 108 Reykjavík\n"
3,104\n,Reykjavík (Laugardalur)\n,"Síðumúla 3-5, 108 Reykjavík\n"
4,105\n,Reykjavík (Hlíðar)\n,"Síðumúla 3-5, 108 Reykjavík\n"
5,107\n,Reykjavík (Vesturbær)\n,"Eiðistorgi 15, 170 Seltjarnarnes\n"
6,108\n,Reykjavík (Háaleiti og Bústaðir)\n,Síðumúla 3-5\n
7,109\n,Reykjavík (Breiðholt)\n,Þönglabakka 4\n
8,110\n,Reykjavík (Árbær)\n,Hraunbæ 119\n
9,111\n,Reykjavík (Breiðholt)\n,"Þönglabakka 4, 109 Reykjavík\n"


### Analysis
#### There are 27 postal codes in the Capital Region and Southern Peninsula region, however some postal code is used for post office boxes, owned by government, and by privite organizations.  These data are not  helpful for LoadsOfFun project so the team decided to omit it.  
#### The team learn that Reykjavik is a major city where business people will travel to.  
### Action
#### Select the city and neighborhood data for further analysis using slice

In [8]:
# slice a portion of data to work on from the dataframe

slice_df=df.iloc[0:12, 0:2].replace('\n',' ',regex=True) # remove unwanted \n breakline characters
slice_df

Unnamed: 0,Postal Code,City
0,101,Reykjavík (Miðborg)
1,102,Reykjavík (Vesturbær)
2,103,Reykjavík (Háaleiti og Bústaðir)
3,104,Reykjavík (Laugardalur)
4,105,Reykjavík (Hlíðar)
5,107,Reykjavík (Vesturbær)
6,108,Reykjavík (Háaleiti og Bústaðir)
7,109,Reykjavík (Breiðholt)
8,110,Reykjavík (Árbær)
9,111,Reykjavík (Breiðholt)


### Analysis
#### The data needs wrangling, a.k.a. clean the data, before they can work on it
### Action
1. Spliting the City information into two columns, city and neighborhoods.
2. Remove unwanted columns
3. Remove duplicated rows
4. Remove unneeded () characters

In [9]:
## Split the City column into City and Neighborhoods columns
ice_df=pd.DataFrame(slice_df.City.str.split(expand=True))
print('ice_df shape',ice_df.shape)

ice_df.head()

ice_df shape (12, 4)


Unnamed: 0,0,1,2,3
0,Reykjavík,(Miðborg),,
1,Reykjavík,(Vesturbær),,
2,Reykjavík,(Háaleiti,og,Bústaðir)
3,Reykjavík,(Laugardalur),,
4,Reykjavík,(Hlíðar),,


In [10]:
## Remove unwanted column
header=['City', 'Neighborhood', '2', '3']
ice_df.columns=header
ice_df=ice_df.drop(['2','3'], axis=1)

## Remove duplicates rows
ice_df=ice_df.drop_duplicates(subset=None, keep='first', inplace=False)

print('ice_df shape',ice_df.shape)
ice_df.head(10)

ice_df shape (9, 2)


Unnamed: 0,City,Neighborhood
0,Reykjavík,(Miðborg)
1,Reykjavík,(Vesturbær)
2,Reykjavík,(Háaleiti
3,Reykjavík,(Laugardalur)
4,Reykjavík,(Hlíðar)
7,Reykjavík,(Breiðholt)
8,Reykjavík,(Árbær)
10,Reykjavík,(Grafarvogur)
11,Reykjavík,(Grafarholt


In [11]:
## Remove unnecessary characters in Neighborhood text
ice_df['Neighborhood']=ice_df['Neighborhood'].str.replace('(', ' ')
ice_df['Neighborhood']=ice_df['Neighborhood'].str.replace(')', ' ')

print('ice_df shape',ice_df.shape)

ice_df shape (9, 2)


In [12]:
#ice_df=ice_df.drop('index', inplace=True)

ice_df=ice_df.reset_index()

In [13]:
ice_df=ice_df.drop('index', axis=1)
ice_df.head(10)

Unnamed: 0,City,Neighborhood
0,Reykjavík,Miðborg
1,Reykjavík,Vesturbær
2,Reykjavík,Háaleiti
3,Reykjavík,Laugardalur
4,Reykjavík,Hlíðar
5,Reykjavík,Breiðholt
6,Reykjavík,Árbær
7,Reykjavík,Grafarvogur
8,Reykjavík,Grafarholt


### Analysis
#### The team wants to know where Reykjavik is in Iceland, they will need a map to find out
### Action
#### Get the Latitude and Longitude of Reykjavik to plot a map

In [14]:
# plot a map of Reykjavik, Iceland

address = 'Reykjavik, Iceland'

geolocator = Nominatim(user_agent="Iceland_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Iceland are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Iceland are 64.145981, -21.9422367.


In [15]:
# create map of Reykjavik using latitude and longitude values
map_reykjavik = folium.Map(location=[latitude, longitude], zoom_start=11, tiles='Stamen Terrain')
map_reykjavik

### Analysis
#### The team wants to know where the neighborhoos are located within the City of Reykjavik, they will need to put a marker on the Reykajvik map.
#### The team learn that they need neighborhood latitude and longitude data for the marker
### Action
#### Get the Latitude and Longitude of Reykjavik neighborhood

In [16]:
#Get latitude and longitude

latitudes = []
longitudes = []

for i in ice_df['Neighborhood']:
    address = i
    geolocator = Nominatim(user_agent="Iceland_explorer")
    location = geolocator.geocode(address)
    
    latitudes.append(location.latitude)
    longitudes.append(location.longitude)

#print('Latitudes is ', latitudes)
#print('Longitudes is ', longitudes)    

ice1=ice_df.assign(Latitude=latitudes)
ice=ice1.assign(Longitude=longitudes)

#ice=ice.reset_index(drop=True)

ice.head(50)


Unnamed: 0,City,Neighborhood,Latitude,Longitude
0,Reykjavík,Miðborg,64.135984,-21.938189
1,Reykjavík,Vesturbær,64.145461,-21.958172
2,Reykjavík,Háaleiti,64.135067,-21.884649
3,Reykjavík,Laugardalur,64.142568,-21.867004
4,Reykjavík,Hlíðar,65.957517,-14.624533
5,Reykjavík,Breiðholt,64.10252,-21.832057
6,Reykjavík,Árbær,64.114373,-21.795208
7,Reykjavík,Grafarvogur,64.147253,-21.789674
8,Reykjavík,Grafarholt,64.126563,-21.755767


### Analysis
#### The team review the 9 neighborhoods latitude and longitude data and realized that Hlíðar is outside the city area.  They decided to remove it in their excursion planning and work with the remaining 8 cities. 

### Action
#### Remove index 4 and reset index

In [17]:
ice=ice.drop(labels=4, axis=0)
ice= ice.reset_index(drop=True)
ice.head(50)

Unnamed: 0,City,Neighborhood,Latitude,Longitude
0,Reykjavík,Miðborg,64.135984,-21.938189
1,Reykjavík,Vesturbær,64.145461,-21.958172
2,Reykjavík,Háaleiti,64.135067,-21.884649
3,Reykjavík,Laugardalur,64.142568,-21.867004
4,Reykjavík,Breiðholt,64.10252,-21.832057
5,Reykjavík,Árbær,64.114373,-21.795208
6,Reykjavík,Grafarvogur,64.147253,-21.789674
7,Reykjavík,Grafarholt,64.126563,-21.755767


### Analysis
#### The team finished cleaning the data and ready to put neighborhood marker on Reykjavik map. 

### Action
#### Mark the 8 neighborhoods in the city of Reykjavik map

In [18]:
# create map of Reykjavik using latitude and longitude values
map_reykjavik = folium.Map(location=[latitude, longitude], zoom_start=10, tiles='Stamen Terrain')

# add markers to map
for lat, lng, city, neighborhood in zip(ice['Latitude'], ice['Longitude'], 
                                           ice['City'], ice['Neighborhood']):
    label = '{}, {}'.format(neighborhood, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_reykjavik)  
    
map_reykjavik

#### lable the city

In [19]:
# create map of Reykjavik using latitude and longitude values
map_reykjavik = folium.Map(location=[latitude, longitude], zoom_start=10, tiles='Stamen Terrain')

# add markers to map
for lat, lng, city, neighborhood in zip(ice['Latitude'], ice['Longitude'], 
                                           ice['City'], ice['Neighborhood']):
    label = '{}, {}'.format(neighborhood, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_reykjavik)
    folium.Marker(
        [lat, lng],
        popup=neighborhood).add_to(map_reykjavik)
    
map_reykjavik

### Analysis
#### The excursion development team has a good idea of Reykjavik neighborhood's location.  Their next step is find out what is in each of the neighborhood, so they go to Foursquare to retrieve some data. 
#### Foursquare is the most trusted, independent location data platform for understanding how people move throught the neighborhood and the world.  They can get popular and recommanded places to go from locals and visitors.
### Action
#### Explore Reykjavik using Foursquare API.
#### The team uses Foursquare API, make the request, and receive a json file from Foursquare containing the data/information.

In [20]:
#connect to Foursquare as a developer

CLIENT_ID = 'IPFKGSTEYOILEGG2MJRFNKDX2YV5WEH44MR2KZDKZOWFBXWM' # your Foursquare ID
CLIENT_SECRET = '21MX2NLSSB4PWVFR3Y04NYJJOQCNDEWTM0TMVVLVKXK2B3CD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IPFKGSTEYOILEGG2MJRFNKDX2YV5WEH44MR2KZDKZOWFBXWM
CLIENT_SECRET:21MX2NLSSB4PWVFR3Y04NYJJOQCNDEWTM0TMVVLVKXK2B3CD


#### Get the first neighborhood's name, latitude, and longitude

In [21]:
neighborhood=ice.loc[0, 'Neighborhood'] # get the first row of column 'Neighborhood'
n_lat=ice.loc[0, 'Latitude'] # get the first row of column 'Latitude'
n_lng=ice.loc[0, 'Longitude'] # get the first row of column 'Longtitude'

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood, n_lat, n_lng)) 


Latitude and longitude values of  Miðborg  are 64.13598400000001, -21.938188660088876.


#### Get the top 100 venues in Midborg within 600 meters 

In [22]:

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 600 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    n_lat, 
    n_lng, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=IPFKGSTEYOILEGG2MJRFNKDX2YV5WEH44MR2KZDKZOWFBXWM&client_secret=21MX2NLSSB4PWVFR3Y04NYJJOQCNDEWTM0TMVVLVKXK2B3CD&v=20180605&ll=64.13598400000001,-21.938188660088876&radius=600&limit=100'

#### Send GET request to Foursquare and examine the result json file

In [23]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6095ce3cf9b8d65152eb2db8'},
 'response': {'headerLocation': 'Miõborg',
  'headerFullLocation': 'Miõborg, Reykjavik',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 9,
  'suggestedBounds': {'ne': {'lat': 64.14138400540001,
    'lng': -21.92583316037864},
   'sw': {'lat': 64.1305839946, 'lng': -21.95054415979911}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c606da3924b76b0b8e1f0b9',
       'name': 'Hljómskálagarðurinn',
       'location': {'address': 'Sóleyjargata 1',
        'lat': 64.14094531679677,
        'lng': -21.940229707492005,
        'labeledLatLngs': [{'label': 'display',
          'lat': 64.14094531679677,
          'lng': -21.940229707492005}],
        'distance': 561,
        'postalCode': '101',
 

### Analysis
#### The development team review the json file and understanding that they need to extract data that is useful to them
### Action
####  Clean the json file data
####  Create a fuction called **get_category_type** to extracts venue category data

In [24]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [25]:
# create dataframe to store nearby venues

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(50)

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Hljómskálagarðurinn,Park,64.140945,-21.94023
1,Norræna Húsið,Opera House,64.138162,-21.946746
2,AALTO Bistro,Scandinavian Restaurant,64.138363,-21.946862
3,Serrano (N1 Hringbraut),Burrito Place,64.138683,-21.937948
4,Mýrin Mathús,Diner,64.137481,-21.934801
5,Galtafell Guesthouse,Bed & Breakfast,64.141343,-21.936922
6,Matstofa FÍ,Restaurant,64.132115,-21.944246
7,Hljómskálagarður,Garden,64.140792,-21.941457
8,Flugumsjón,Airport Terminal,64.132024,-21.945856


In [26]:
print('In Modborg there are {} venues returned by Foursquare.'.format(nearby_venues.shape[0]))

In Modborg there are 9 venues returned by Foursquare.


### Analysis
#### Based on the venue categories, the team believe they can develop an itinerary for Modborg if the person does not want to travel outside of the neighborhood.  The continue to explore.

##### For example
##### Modborg Half Day Trip
1. Get a quick bite at AALTO Bistro in the morning
1. Take the bus to explore Modborg's surroundings
2. Have dinner at Myrin Mathus
3. Go to Opera House

##### Modborg Full Day Trip
1. Stay at Galtafell Guesthouse
2. Eat at Serrano
3. Visit Hljómskálagarðurinn
4. Dinner at Matstofa FÍ
5. Go to Opera House
6. Go to airport 

### Action
#### Explore other neighborhoods in Reykjavik and get the venues 
#### Create a function getNearbyVenues 

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
#Create Reykjavik venues dataframe using getNearbyVenues function

Reykjavik_venues = getNearbyVenues(names=ice['Neighborhood'],
                                   latitudes=ice['Latitude'],
                                   longitudes=ice['Longitude']
                                  )
print(Reykjavik_venues.shape)
Reykjavik_venues.head(70)

 Miðborg 
 Vesturbær 
 Háaleiti
 Laugardalur 
 Breiðholt 
 Árbær 
 Grafarvogur 
 Grafarholt
(65, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Miðborg,64.135984,-21.938189,Norræna Húsið,64.138162,-21.946746,Opera House
1,Miðborg,64.135984,-21.938189,AALTO Bistro,64.138363,-21.946862,Scandinavian Restaurant
2,Miðborg,64.135984,-21.938189,Serrano (N1 Hringbraut),64.138683,-21.937948,Burrito Place
3,Miðborg,64.135984,-21.938189,Mýrin Mathús,64.137481,-21.934801,Diner
4,Vesturbær,64.145461,-21.958172,Melabúðin,64.144684,-21.960164,Grocery Store
5,Vesturbær,64.145461,-21.958172,Kaffihús Vesturbæjar,64.144237,-21.961138,Café
6,Vesturbær,64.145461,-21.958172,Vesturbæjarlaug,64.144549,-21.962506,Pool
7,Vesturbær,64.145461,-21.958172,Ísbúð Vesturbæjar,64.145884,-21.962309,Ice Cream Shop
8,Vesturbær,64.145461,-21.958172,Vesturbær,64.145248,-21.95857,Neighborhood
9,Vesturbær,64.145461,-21.958172,Björnsbakarí,64.143986,-21.950875,Bakery


### Analysis
#### The development team was delighted to see so many venues offer by all the neighborhoods.  
### Action
#### Group the venue category by neighborhood so the data is more manageable 

In [29]:
#group by neighborhood

Reykjavik_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Breiðholt,4,4,4,4,4,4
Grafarholt,6,6,6,6,6,6
Grafarvogur,9,9,9,9,9,9
Háaleiti,13,13,13,13,13,13
Laugardalur,5,5,5,5,5,5
Miðborg,4,4,4,4,4,4
Vesturbær,16,16,16,16,16,16
Árbær,8,8,8,8,8,8


In [30]:
# find out unique venues 
print('There are {} uniques venue categories.'.format(len(Reykjavik_venues['Venue Category'].unique())))

There are 33 uniques venue categories.


### Observation/Discussion
#### After grouping the venue category by neighborhood, the team learn that Grafarvogur, Háaleiti, and Vesturbær has most selection of venues.  It is a good neighborhood to spend half a day or a full day in. 

#### In summary
* Grafarvogur has shopping mall and pharmacy
* Háaleiti has lots of hotels and its close to Laugardalur
* Vesturbær has asian restaurants and soccer stadiums

##### If someone is staying in Háaleit, visiting Laugardalur can be a fun excursion here's a sample itinerary.

##### Háaleiti and Laugardalur Half Day Trip
1. Get bakery at Mosfellsbakarí in the morning
2. Take a walk at the Laugardalurinn park
3. Lunch at Dirty Burger & Ribs
4. Have some icecream at Ísbúð Háaleitis


##### Háaleiti and Laugardalur Full Day Trip
1. Get bakery at Mosfellsbakarí in the morning
2. Take a walk at the Grasagarðurinn garden
3. Go to Café Flóran for lunch
4. Try out skating at Skautahöllin Laugardal
5. Have some icecream at Ísbúð Háaleitis
6. Dinner at Mulakaffi

### Action
#### The team continue to examine the data to analyze each neighborhood
#### Use one hot encoding to find the frequency of occurrence of each category in the neighborhood
One hot encoding allows the representation of categorical data to be more expressive. Many machine learning algorithms cannot work with categorical data directly. The categories must be converted into numbers. This is required for both input and output variables that are categorical.


In [31]:
# # one hot encoding
Reykjavik_venues= Reykjavik_venues[Reykjavik_venues['Venue Category']!='Neighborhood']
Reykjavik_onehot = pd.get_dummies(Reykjavik_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Reykjavik_onehot['Neighborhood'] = Reykjavik_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Reykjavik_onehot.columns[-1]] + list(Reykjavik_onehot.columns[:-1])
Reykjavik_onehot = Reykjavik_onehot[fixed_columns]

Reykjavik_onehot.head()

Unnamed: 0,Neighborhood,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Burrito Place,Café,Convenience Store,Diner,Electronics Store,Fast Food Restaurant,Garden,Grocery Store,Gym,Health & Beauty Service,Hotel,Ice Cream Shop,Indian Restaurant,Mobile Phone Shop,Opera House,Park,Pharmacy,Pizza Place,Playground,Pool,Restaurant,Sandwich Place,Scandinavian Restaurant,Skating Rink,Soccer Stadium,Supermarket,Thai Restaurant,Vacation Rental
0,Miðborg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Miðborg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
2,Miðborg,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Miðborg,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Vesturbær,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [32]:
Reykjavik_onehot.shape

(64, 33)

In [33]:
# Group venues for each neighborhood by taking the mean of the frequency of occurrence of each category

Reykjavik_grouped = Reykjavik_onehot.groupby('Neighborhood').mean().reset_index()
Reykjavik_grouped

Unnamed: 0,Neighborhood,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Burrito Place,Café,Convenience Store,Diner,Electronics Store,Fast Food Restaurant,Garden,Grocery Store,Gym,Health & Beauty Service,Hotel,Ice Cream Shop,Indian Restaurant,Mobile Phone Shop,Opera House,Park,Pharmacy,Pizza Place,Playground,Pool,Restaurant,Sandwich Place,Scandinavian Restaurant,Skating Rink,Soccer Stadium,Supermarket,Thai Restaurant,Vacation Rental
0,Breiðholt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Grafarholt,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0
2,Grafarvogur,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.111111,0.0
3,Háaleiti,0.0,0.076923,0.076923,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.384615,0.076923,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Laugardalur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0
5,Miðborg,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
6,Vesturbær,0.066667,0.0,0.2,0.066667,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.066667
7,Árbær,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0


In [34]:
Reykjavik_grouped.shape

(8, 33)

### Analysis
#### One hot encoding data tells the frequency of venue in the neighborhood, the team want to see the venue data
### Action
#### Print for each neighborhood, the top 10 most common venues and frequency of  venues

In [35]:
num_top_venues = 12

for hood in Reykjavik_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Reykjavik_grouped[Reykjavik_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Breiðholt ----
                      venue  freq
0             Grocery Store  0.50
1      Fast Food Restaurant  0.25
2               Pizza Place  0.25
3          Asian Restaurant  0.00
4                      Park  0.00
5                  Pharmacy  0.00
6                Playground  0.00
7                      Pool  0.00
8                Restaurant  0.00
9         Mobile Phone Shop  0.00
10           Sandwich Place  0.00
11  Scandinavian Restaurant  0.00


---- Grafarholt----
                      venue  freq
0                    Bakery  0.33
1               Supermarket  0.17
2   Health & Beauty Service  0.17
3                Playground  0.17
4             Grocery Store  0.17
5          Asian Restaurant  0.00
6                      Pool  0.00
7                  Pharmacy  0.00
8               Pizza Place  0.00
9            Sandwich Place  0.00
10               Restaurant  0.00
11              Opera House  0.00


---- Grafarvogur ----
                      venue  freq
0               

### Observation/Discussion 
#### This information is very useful for excursion development, it shows the popular venue in a neighborhood
### Action
#### Store the data in a dataframe and list the top 10 most common venues by neighborhood


In [36]:
#  function to sort the venues in desending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
# create dataframe to show top 10 common venues

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Reykjavik_grouped['Neighborhood']

for ind in np.arange(Reykjavik_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Reykjavik_grouped.iloc[ind, :], 
                                                                          num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Breiðholt,Grocery Store,Pizza Place,Fast Food Restaurant,Vacation Rental,Hotel,BBQ Joint,Bakery,Burger Joint,Burrito Place,Café
1,Grafarholt,Bakery,Health & Beauty Service,Supermarket,Grocery Store,Playground,Diner,Gym,Garden,Fast Food Restaurant,Electronics Store
2,Grafarvogur,Pharmacy,Grocery Store,Supermarket,Thai Restaurant,Indian Restaurant,Sandwich Place,Burrito Place,Gym,Garden,Fast Food Restaurant
3,Háaleiti,Hotel,Restaurant,Mobile Phone Shop,Electronics Store,Pizza Place,Ice Cream Shop,Burrito Place,Bakery,BBQ Joint,Café
4,Laugardalur,Park,Garden,Skating Rink,Scandinavian Restaurant,Vacation Rental,Diner,Grocery Store,Fast Food Restaurant,Electronics Store,Café
5,Miðborg,Opera House,Scandinavian Restaurant,Burrito Place,Diner,Vacation Rental,Electronics Store,Gym,Grocery Store,Garden,Fast Food Restaurant
6,Vesturbær,Bakery,Soccer Stadium,Grocery Store,Vacation Rental,Pizza Place,Burger Joint,Café,Convenience Store,Ice Cream Shop,Asian Restaurant
7,Árbær,Gym,Supermarket,Soccer Stadium,Grocery Store,Bakery,Burger Joint,Pool,Pizza Place,Diner,Garden


### Analysis
#### The team is satisfy with the venue data they have.  However, some developer want to see what machine learning can offer in their excursion development.  For example, determine similarities in the neighborhood and design excursions based on it.
### Action
#### Create cluster neighborhood using Machine Learning K-means algorithm

In [38]:
# set number of clusters
kclusters = 2

Reykjavik_grouped_clustering = Reykjavik_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Reykjavik_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 1, 1, 0, 0])

####  create a new dataframe to include cluster and the top 10 venues for each neighborhood

In [None]:
# only run this if Cluster Labels exists
neighborhoods_venues_sorted.drop(['Cluster Labels'], axis=1, inplace=True) 

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Reykjavik_merged = ice

# merge Reykjavik_grouped with ice to add latitude/longitude for each neighborhood
Reykjavik_merged = Reykjavik_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Reykjavik_merged.head(10) # check the last columns!

Unnamed: 0,City,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Reykjavík,Miðborg,64.135984,-21.938189,1,Opera House,Scandinavian Restaurant,Burrito Place,Diner,Vacation Rental,Electronics Store,Gym,Grocery Store,Garden,Fast Food Restaurant
1,Reykjavík,Vesturbær,64.145461,-21.958172,0,Bakery,Soccer Stadium,Grocery Store,Vacation Rental,Pizza Place,Burger Joint,Café,Convenience Store,Ice Cream Shop,Asian Restaurant
2,Reykjavík,Háaleiti,64.135067,-21.884649,0,Hotel,Restaurant,Mobile Phone Shop,Electronics Store,Pizza Place,Ice Cream Shop,Burrito Place,Bakery,BBQ Joint,Café
3,Reykjavík,Laugardalur,64.142568,-21.867004,1,Park,Garden,Skating Rink,Scandinavian Restaurant,Vacation Rental,Diner,Grocery Store,Fast Food Restaurant,Electronics Store,Café
4,Reykjavík,Breiðholt,64.10252,-21.832057,0,Grocery Store,Pizza Place,Fast Food Restaurant,Vacation Rental,Hotel,BBQ Joint,Bakery,Burger Joint,Burrito Place,Café
5,Reykjavík,Árbær,64.114373,-21.795208,0,Gym,Supermarket,Soccer Stadium,Grocery Store,Bakery,Burger Joint,Pool,Pizza Place,Diner,Garden
6,Reykjavík,Grafarvogur,64.147253,-21.789674,0,Pharmacy,Grocery Store,Supermarket,Thai Restaurant,Indian Restaurant,Sandwich Place,Burrito Place,Gym,Garden,Fast Food Restaurant
7,Reykjavík,Grafarholt,64.126563,-21.755767,0,Bakery,Health & Beauty Service,Supermarket,Grocery Store,Playground,Diner,Gym,Garden,Fast Food Restaurant,Electronics Store


### Analysis
#### The team played with several K-mean centriods and determine that the lower number is better
### Action
#### Create map to visualize the resulting clusters

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Reykjavik_merged['Latitude'], Reykjavik_merged['Longitude'], 
                                  Reykjavik_merged['Neighborhood'], Reykjavik_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    folium.Marker(
        [lat, lon],
        popup=poi).add_to(map_clusters)   
map_clusters

### Examine Clusters

#### Cluster 0

In [41]:
Reykjavik_merged.loc[Reykjavik_merged['Cluster Labels'] == 0, Reykjavik_merged.columns[[1] + list(range(5, Reykjavik_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Vesturbær,Bakery,Soccer Stadium,Grocery Store,Vacation Rental,Pizza Place,Burger Joint,Café,Convenience Store,Ice Cream Shop,Asian Restaurant
2,Háaleiti,Hotel,Restaurant,Mobile Phone Shop,Electronics Store,Pizza Place,Ice Cream Shop,Burrito Place,Bakery,BBQ Joint,Café
4,Breiðholt,Grocery Store,Pizza Place,Fast Food Restaurant,Vacation Rental,Hotel,BBQ Joint,Bakery,Burger Joint,Burrito Place,Café
5,Árbær,Gym,Supermarket,Soccer Stadium,Grocery Store,Bakery,Burger Joint,Pool,Pizza Place,Diner,Garden
6,Grafarvogur,Pharmacy,Grocery Store,Supermarket,Thai Restaurant,Indian Restaurant,Sandwich Place,Burrito Place,Gym,Garden,Fast Food Restaurant
7,Grafarholt,Bakery,Health & Beauty Service,Supermarket,Grocery Store,Playground,Diner,Gym,Garden,Fast Food Restaurant,Electronics Store


#### Cluster 1

In [42]:
Reykjavik_merged.loc[Reykjavik_merged['Cluster Labels'] == 1, Reykjavik_merged.columns[[1] + list(range(5, Reykjavik_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Miðborg,Opera House,Scandinavian Restaurant,Burrito Place,Diner,Vacation Rental,Electronics Store,Gym,Grocery Store,Garden,Fast Food Restaurant
3,Laugardalur,Park,Garden,Skating Rink,Scandinavian Restaurant,Vacation Rental,Diner,Grocery Store,Fast Food Restaurant,Electronics Store,Café


#### Cluster 2

In [43]:
Reykjavik_merged.loc[Reykjavik_merged['Cluster Labels'] == 2, Reykjavik_merged.columns[[1] + list(range(5, Reykjavik_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


#### Cluster 3

In [44]:
Reykjavik_merged.loc[Reykjavik_merged['Cluster Labels'] == 3, Reykjavik_merged.columns[[1] + list(range(5, Reykjavik_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


### Conclusion
 
#### Using data analysis we learn that json file brings back popular venue in each neighborhood.  Using Machine Learning K-means algorithm, we create a cluster map, it is a good way to find out the similaritise and difference of each neighborhoods.  
#### By changing the number of clusters in K-means, we learn the following: 
##### if K is set to 2, we learned that Miðborg and Laugardalur has similar venues.   Vesturbær, Háaleiti, Breiðholt, Árbær, Grafarvogur, and Grafarholt has similar venues.  
##### if K is set to 5, we learned that Vesturbær,  Háaleiti, Árbær, Grafarvogur, and Grafarholt has similar venues. Miðborg, Laugardalur, and Breiðholt each has some unique venues.  

#### The cluster data gave the team some insight in developing excursion.  Since Háaleiti is where most hotels are, the excursion will focus on venues from Miðborg, Laugardalur, and Breiðholt which is a short distance outside of Háaleiti.  
#### The team felt that visitors can tried out any popular venues in these neighborhoods to discovery and experience a true day-in-a-life in Iceland which is both unique and memorable to the person.  So they build some excursion pilot package for team review. It was approved by their team lead and ready for customer to try out.

### Result 

#### The excursion development team at LoadsOfFun come up with following trip packages inviting anyone traveling to Iceland to try out, Enjoy!
#### Excursions packages
##### Modborg Half Day Trip
1. Get a quick bite at AALTO Bistro in the morning
1. Take the bus to explore Modborg's surroundings
2. Have dinner at Myrin Mathus
3. Go to Opera House

##### Modborg Full Day Trip
1. Stay at Galtafell Guesthouse
2. Eat at Serrano
3. Visit Hljómskálagarðurinn
4. Dinner at Matstofa FÍ
5. Go to Opera House
6. Go to airport

##### Háaleiti and Laugardalur Half Day Trip
1. Get bakery at Mosfellsbakarí in the morning
2. Take a walk at the Laugardalurinn park
3. Lunch at Dirty Burger & Ribs
4. Have some icecream at Ísbúð Háaleitis


##### Háaleiti and Laugardalur Full Day Trip
1. Get bakery at Mosfellsbakarí in the morning
2. Take a walk at the Grasagarðurinn garden
3. Go to Café Flóran for lunch
4. Try out skating at Skautahöllin Laugardal
5. Have some icecream at Ísbúð Háaleitis
6. Dinner at Mulakaffi


##### Breiðholt half day Trip
1. Stay in vacation rental
2. Visit grocery store
3. Go to Indian Restaurant
