# Capitals from Europe

## 1. Collecting the Data
The first step is to colect the data that we are going to use to solve the problem. For that, we are going to use the BeautifulSoup library to scrape a wikipedia page to get all the countries of Europe with its respectives capitals. Then we will use Nominatim to get the latitude and longitude of all capitals. Finally, we will use the Foursquare API to search for venues in this locals. In the end, we will write the data in a file named **europe_venues_complete.csv**.  

 ## Importing the libraries

In [3]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
import folium
import json

### Making a request to the Wikipedia page

In [4]:
url = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Europe"
r = requests.get(url)
html = r.text

### Using Beautiful Soup to extract all the relevant information

In [5]:
soup = BeautifulSoup(html,'lxml')
table = soup.find_all('tr')##get all the items that are in the table
rows = table[2:51]
Countries = []
Capitals = []
for row in rows:
    columns = row.find_all('td')
    country = columns[2].contents[0].text
    try:
        capital = columns[4].contents[0].text
    except:
        capital = columns[4].contents[0]
    Countries.append(country)
    Capitals.append(capital)    

### Creating a Dataframe with all the data that we got

In [6]:
df_europe = pd.DataFrame({'Country':Countries,'Capital':Capitals})
df_europe.head(10)

Unnamed: 0,Country,Capital
0,Albania,Tirana
1,Andorra,Andorra la Vella
2,Armenia,Yerevan
3,Austria,Vienna
4,Azerbaijan,Baku
5,Belarus,Minsk
6,Belgium,Brussels
7,Bosnia and Herzegovina,Sarajevo
8,Bulgaria,Sofia
9,Croatia,Zagreb


### Getting Latitude and Longitude of the countries

In [8]:
Latitude = []
Longitude = []
for country, city in zip(df_europe['Country'],df_europe['Capital']):
    adress = "{}, {}".format(city,country)
    print(adress)
    geolocator = Nominatim(user_agent = "europe_explorer",timeout = 5)
    local = geolocator.geocode(adress)
    Latitude.append(local.latitude)
    Longitude.append(local.longitude)
df_europe.insert(len(df_europe.columns),'Latitude', Latitude)
df_europe.insert(len(df_europe.columns),'Longitude', Longitude)
df_europe.head()

Tirana, Albania
Andorra la Vella, Andorra
Yerevan, Armenia
Vienna, Austria
Baku, Azerbaijan
Minsk, Belarus
Brussels, Belgium
Sarajevo, Bosnia and Herzegovina
Sofia, Bulgaria
Zagreb, Croatia
Nicosia, Cyprus
Prague, Czech Republic
Copenhagen, Denmark
Tallinn, Estonia
Helsinki, Finland
Paris, France
Tbilisi, Georgia
Berlin, Germany
Athens, Greece
Budapest, Hungary
Reykjavík, Iceland
Dublin, Ireland
Rome, Italy
Astana, Kazakhstan
Riga, Latvia
Vaduz, Liechtenstein
Vilnius, Lithuania
Luxembourg, Luxembourg
Valletta, Malta
Chișinău, Moldova
Monaco, Monaco
Podgorica, Montenegro
Amsterdam, Netherlands
Skopje, North Macedonia
Oslo, Norway
Warsaw, Poland
Lisbon, Portugal
Bucharest, Romania
Moscow, Russia
San Marino, San Marino
Belgrade, Serbia
Bratislava, Slovakia
Ljubljana, Slovenia
Madrid, Spain
Stockholm, Sweden
Bern, Switzerland
Ankara, Turkey
Kiev, Ukraine
London, United Kingdom


Unnamed: 0,Country,Capital,Latitude,Longitude
0,Albania,Tirana,41.327946,19.818532
1,Andorra,Andorra la Vella,42.506939,1.521247
2,Armenia,Yerevan,40.177612,44.512585
3,Austria,Vienna,48.208354,16.372504
4,Azerbaijan,Baku,40.375443,49.832675


## 2. Creating a Map for visualization

In [9]:
adress = "Europe"
geolocator = Nominatim(user_agent = "europe_explorer",timeout=3)
local = geolocator.geocode(adress)
latitude = local.latitude
longitude = local.longitude
print(latitude,longitude)
europe = folium.Map(location=[latitude,longitude], zoom_start = 3)
for lat,long,country,city in zip(df_europe['Latitude'],df_europe['Longitude'],df_europe['Country'],df_europe['Capital']):
    label = "{}, {}".format(city,country)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,long],
        radius = 5,
        popup = label,
        color= 'red',
        fill = True,
        fill_opacity=0.7,
        parse_html= False).add_to(europe)
europe

51.0 10.0


## 3. Using the Fousquare API to colect the venues

### Setting the Fousquare parameters 
Note: The Foursquare_Developer.json is a file in my local machine that contain My client_Id and the my client_secret for the Foursquare API.

In [10]:
with open('Foursquare_Developer.json') as fs:
    credentials = json.load(fs)
CLIENT_ID = credentials["Client ID"] 
CLIENT_SECRET = credentials["Client SECRET"] 
VERSION = '20180605'
RADIUS = 20000
LIMIT = 200

The function below, makes requests to the API to get the nearby venues for each latitude and longitude given and return a dataframe with the data.

In [11]:
def getNearbyVenues(countries, cities, latitudes, longitudes, radius):
    
    venues_list=[]
    section = ['food','drinks','coffee','shops','arts','outdoors','sights'] 
    for country, city, lat, lng in zip(countries, cities, latitudes, longitudes):
        print(country)
        for s in section:
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                LIMIT)
            
            # make the GET request
            try:
                results = requests.get(url).json()["response"]['groups'][0]['items']
            except:
                print(requests.get(url).json())
            # return only relevant information for each nearby venue
            venues_list.append([(
                country,
                city, 
                lat, 
                lng,
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name'],
                v['venue']['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Country',
                             'Capital', 
                  'Capital Latitude', 
                  'Capital Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                 'Venue Id']
    
    return(nearby_venues)

Calling the function above for each neighborhood in our datframe to get the venues

In [None]:
europe_venues = getNearbyVenues(countries =df_europe['Country'],
                                cities = df_europe['Capital'], 
                               latitudes= df_europe['Latitude'],
                               longitudes=df_europe['Longitude'], radius = RADIUS)
europe_venues.shape

### Removing Possible Duplicates


In [43]:
print(europe_venues.shape)
europe_venues = europe_venues.drop_duplicates()
print(europe_venues.shape)

(31148, 8)
(25322, 8)


In [None]:
## Unmark the comments in case you want to explore the data.
# pd.set_option('display.max_rows', 600)
# europe_venues.loc[europe_venues['Capital']== 'Lisbon'].head(10)

### Writing our Data in a File

In [44]:
europe_venues.to_csv('europe_venues_complete.csv')
europe_venues.shape

(25322, 8)