## Battle of Neighbourhoods - Week 1 Prt 2

### Data Exploration and Description

After a series of discussions and meetings, it was quickly decided that the best data containing top-rated attraction centers and museums in oslo will be fetched from PlanetWare Inc., which is based in Richmond Hill, Ontario, Canada. The summary of the first dataset is shown below. The second dataset will be a listed number of affordable hotel accommodation suitable for penultimate students in high schools.

_**First Dataset**_ contains [14 Top-Rated Tourist Attractions in Oslo](https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm). The project stipulates not more than 14 destination in the area of interest to make it easier for students to work with one another and complete thier projects within the shortest possible time. Oslo is one of the world's largest capitals in terms of area, but only 20 percent of this land mass has been developed - the remainder consists of parks, protected forests, hills, and hundreds of lakes. Parks and open spaces are an integral part of Oslo's cityscape, and are easily accessible from almost anywhere in the city. The center is a joy to explore on foot thanks to the numerous pathways and trails connecting its public spaces, as well as its many pedestrian-friendly areas, including the city's main street, Karl Johans gate. Stretching from Oslo Central Station near the waterfront all the way up to the Royal Palace, this wide avenue passes many of Oslo's tourist attractions, including the palace, the National Theatre, the old university buildings, and Oslo Cathedral. Regularly ranked as one of the best cities in the world in which to live, Oslo boasts a rich cultural scene and numerous things to do, and is famous for its theater, museums, and galleries. At the end of the trip, every student would have been equiped and more enlightened to go to the final year in school with clearer goals regarding their future History, Geography, Science & Engineering careers.

#### Get Required Libraries

In [17]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
#!conda install -c conda-forge geopy
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library
import json # library to handle JSON files
print('Folium installed')
print('Libraries imported.')


Collecting package metadata: done
Solving environment: \ 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/linux-64::anaconda==5.3.1=py37_0
  - defaults/linux-64::astropy==3.0.4=py37h14c3975_0
  - defaults/linux-64::bkcharts==0.2=py37_0
  - defaults/linux-64::blaze==0.11.3=py37_0
  - defaults/linux-64::bokeh==0.13.0=py37_0
  - defaults/linux-64::bottleneck==1.2.1=py37h035aef0_1
  - defaults/linux-64::dask==0.19.1=py37_0
  - defaults/linux-64::datashape==0.5.4=py37_1
  - defaults/linux-64::mkl-service==1.1.2=py37h90e4bf4_5
  - defaults/linux-64::numba==0.39.0=py37h04863e7_0
  - defaults/linux-64::numexpr==2.6.8=py37hd89afb7_0
  - defaults/linux-64::odo==0.5.1=py37_0
  - defaults/linux-64::pytables==3.4.4=py37ha205bf6_0
  - defaults/linux-64::pytest-arraydiff==0.2=py37h39e3cac_0
  - defaults/linux-64::pytest-astropy==0.4.0=py37_0
  - defaults/linux-64::pytest-doctestplus==0.1.3=py37_0
  - defaults

### 1. Download and Explore 1st Dataset

#### Import BeautifulSoup

In [48]:
from bs4 import BeautifulSoup as bs
print('bs4 imported.')

bs4 imported.


#### Load and explore the data from Planeware

In [50]:
url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')

In [51]:
print(soup.find('title').text)

14 Top-Rated Tourist Attractions in Oslo | PlanetWare


In [32]:
type(soup)

bs4.BeautifulSoup

In [89]:
attraction_place=soup.find_all('h2', class_="sitename")

for h2 in attraction_place: 
    attraction_place = print (h2.text);    

1  Vigeland Sculpture Park
2  Akershus Fortress
3  Viking Ship Museum
4  The National Museum
5  Munch Museum
6  Royal Palace
7  The Museum of Cultural History
8  Fram Museum
9  Holmenkollen Ski Jump and Museum
10  Oslo Cathedral
11  City Hall (Rådhuset)
12  Aker Brygge
13  Natural History Museum & Botanical Gardens
14  Oslo Opera House and Annual Music Festivals
 Where to Stay in Oslo for Sightseeing
 Tips and Tours: How to Make the Most of Your Visit to Oslo
 More Related Articles on PlanetWare.com


### Summary of Data to be further Explored and Used
We shall need just 3 main features from the website to get started: the list of attraction places, their addresses and their official websites and will be basically sufficient (in addition to recommended hotel accommodation - second dataset) for our exploration and anaylisis. With grounded facilities plan, this will ultimately convince the Irish Ministry of Education to approve our proposal from.

**Foursquare API** We will utilised to obtain the latitude, longitude lactions of the listed attraction places

#### List of attraction Centers or Museums

#### Addresses of Centers or Museums

### Note
Note that BeautifulSoup has picked 3 h2  tags after the 14th and last Museum. We have to clean this up before the final data exploration and analysis.

In [86]:
official_site=soup.find_all('div', class_="web")
print(official_site[0:2])

[<div class="web">
<span>Official site: </span><a href="http://www.vigeland.museum.no/en/vigeland-park" onclick="ga('send', 'event', 'externalsite', 'site', 'http://www.vigeland.museum.no/en/vigeland-park');" rel="nofollow" target="_blank">www.vigeland.museum.no/en/vigeland-park</a>
</div>, <div class="web">
<span>Official site: </span><a href="http://www.khm.uio.no/english/visit-us/viking-ship-museum/" onclick="ga('send', 'event', 'externalsite', 'site', 'http://www.khm.uio.no/english/visit-us/viking-ship-museum/');" rel="nofollow" target="_blank">www.khm.uio.no/english/visit-us/viking-ship-museum/</a>
</div>]


In [96]:
addressArray =soup.find_all('p', class_="nospc")
address = []
for add in addressArray:
    address.append(add.text.split("Address:")[1].strip())
print(address)
type(address)


['Nobels gate 32, N-0268 Oslo', 'Akershus Festning, 0015 Oslo', 'Frederiks gate 2, 0164 Oslo', 'Universitetsgata 13, Oslo', 'Tøyengata 53, 0578 Oslo', 'Bellevue, Oslo', 'Frederiks gate 2, 0164 Oslo', 'Bygdøynesveien 39, 0286 Oslo', 'Kongeveien 5, 0787 Oslo', 'Karl Johansgt. 11, 0154 Oslo', 'Rådhuset, 0037 Oslo', 'Bryggegata 9, 0120 Oslo', 'Sars gate 1, 0562 Oslo', 'Kirsten Flagstads Plass 1, 0150 Oslo']


list

#### Official websites of Centers or Museums

In [35]:
CLIENT_ID = 'YVPBNCUL4G2JQ15FSH1WO234FVZHAS31GSEWRRVBHZ4MMMRI' # your Foursquare ID
CLIENT_SECRET = 'LSEXPCJQZIXVEFYLSCAXDBVNVW0H10BX3L1U5IW2M23EU2EX' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails: MOJ')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails: MOJ
CLIENT_ID: YVPBNCUL4G2JQ15FSH1WO234FVZHAS31GSEWRRVBHZ4MMMRI
CLIENT_SECRET:LSEXPCJQZIXVEFYLSCAXDBVNVW0H10BX3L1U5IW2M23EU2EX


#### Using Foursquare API to obtain Museums Locations

In [34]:
address = 'Nobels gate 32, N-0268 Oslo'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

59.922816 10.700466


#### Let's take a look at the location (lat. and lng. of the first venue in this list), which is 
'Nobels gate 32, N-0268 Oslo', located in the heart of Oslo.
In the main data exploration we shall use a JSON file or use python 'for loop' to generate the location of all the 14 venues

In [37]:
# Download the museum list
!wget -O museum_list.json https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm/lists
print('Museum json file downloaded!')

--2019-06-05 09:41:34--  https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm/lists
Resolving www.planetware.com (www.planetware.com)... 52.22.101.10, 54.88.227.104, 107.23.224.123
Connecting to www.planetware.com (www.planetware.com)|52.22.101.10|:443... connected.
HTTP request sent, awaiting response... 301 Permanently Moved
Location: https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm [following]
--2019-06-05 09:41:34--  https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm
Reusing existing connection to www.planetware.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘museum_list.json’

museum_list.json        [ <=>                ] 135.31K  --.-KB/s    in 0.06s   

2019-06-05 09:41:35 (2.32 MB/s) - ‘museum_list.json’ saved [138554]

Museum json file downloaded!


In [97]:
df = pd.DataFrame(columns=['Attraction_Center', 'Address', 'Official_website', 'Longitude', 'Lattitude'])
df

Unnamed: 0,Attraction_Center,Address,Official_website,Longitude,Lattitude


### We proceed to Battle of Neigborhood Week2 with an instantiated DataFrame to work with