# Get Coordinates - Using Nominatim

### Content:
1. Introduction
2. Get capital cities' coordinates
3. Data preparation for cluster analysis
4. Summary
5. References

### 1. Introduction:

### Library installation and import preparation:

In [1]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import folium # Import folium to visualize the data on a map
import time

from geopy.geocoders import Nominatim # Module to convert an address into latitude and longitude values

### 2. Get capital cities' coordinates:

In [2]:
# Load "Capital_Cities.csv" data:
capital_raw = pd.read_csv('Capital_Cities.csv')

capital_raw.head()

Unnamed: 0,Country,Capital City
0,Abkhazia,Sukhumi
1,Afghanistan,Kabul
2,Akrotiri and Dhekelia,Episkopi Cantonment
3,Albania,Tirana
4,Algeria,Algiers


#### 2.1. *get_loc_coords()* function:  
This function will make it easier to get coordinates for the cities in our dataframe.  
The return value of this function is a tuple with a single city's latitude and longitude values.

In [3]:
# This function will return the coordinates of the point of intrest
def get_loc_coords (city, country):
    
    '''
    Takes a city's name and its country and uses Nominatim to get its coordinates.
    '''
    
    # Set the search variable - "City" + "Country":
    address = city + ', ' + country
    
    # Get coordinates of "address":
    geolocator = Nominatim(user_agent = 'coordinate_finder')
    location = geolocator.geocode(address)
    
    # Assign variables for latitude and longitude:
    lat = location.latitude
    lon = location.longitude
    
    return (lat, lon)

#### 2.2. *add_city_coords()* function:  
This function leverages the *__get_loc_coords()__* to add coordinates to all location within a dataframe.  
The *__add_city_coords()__* function takes a dataframe, a country column name and a city column name, then returns that dataframe including coordinates for each city.

In [5]:
def add_city_coords (dataframe, country_colname, city_colname):
    
    """
    Takes a dataframe, country column name and a city column name,
    and adds each city's coordinates to the dataframe.
    """
    
    # A list to store coordinates:
    coordinates_list = []
    
    # Call the "get_loc_coords()" function for each city in the dataframe:
    for i in range (len(dataframe)):
        
        # Use "try" incase of error due to city/country name rejection:
        try:
            coordinates_list.append(get_loc_coords(dataframe[city_colname][i],dataframe[country_colname][i]))
            
        except:
            coordinates_list.append((np.NaN, np.NaN))
            
        time.sleep(1)

    
    # Add coordinates_list to the dataframe:
    coordinates = pd.DataFrame(coordinates_list)
    dataframe['City Latitude'] = coordinates[0]
    dataframe['City Longitude'] = coordinates[1]
    
    return dataframe

#### 2.3. Add city coordinates to the dataframe:

In [6]:
# Create new "Capital Coordinates" dataframe:
capital_coordinates = capital_raw.copy()

# Set column names variables for "add_city_coords" function:
country_colname = 'Country'
city_colname = 'Capital City'

We will now call *__add_city_coords()__* on our dataframe.

In [7]:
capital_coordinates = add_city_coords(capital_coordinates, country_colname, city_colname)

In [8]:
capital_coordinates.dropna(inplace = True)
capital_coordinates.head()

Unnamed: 0,Country,Capital City,City Latitude,City Longitude
0,Abkhazia,Sukhumi,43.003363,41.019274
1,Afghanistan,Kabul,34.526013,69.177648
2,Akrotiri and Dhekelia,Episkopi Cantonment,34.670434,32.901855
3,Albania,Tirana,41.315886,19.900912
4,Algeria,Algiers,36.775361,3.060188


Let's export the *__capital_coordinates__* dataframe as a csv file for future snalysis.

In [17]:
capital_coordinates.to_csv(r'Capital_Coordinates.csv', index = False)

### 3. Data preparation for cluster analysis:

#### 3.1. Setting up the dataframe:
In the cluster alalysis we will do in the future we will focus on three regions:  
* Europe 
* Latin America & Caribbean  
* North America

For this reason we will now create a dataframe of capital cities from these three regions and their coordinates.

First lets load our regions data.

In [10]:
# Load "Country_Region_DB.csv" data:
regions = pd.read_csv('Country_Region_DB.csv')

regions.head()

Unnamed: 0,Country,Region
0,Afghanistan,South Asia
1,Albania,Europe & Central Asia
2,Algeria,Middle East & North Africa
3,Angola,Sub-Saharan Africa
4,Antigua and Barbuda,Latin America & Caribbean


In [11]:
print(\
      'There are {} rows in "capital_coordinates" and {} rows in "region"'\
      .format(capital_coordinates.shape[0], regions.shape[0])\
     )

There are 238 rows in "capital_coordinates" and 191 rows in "region"


Now join between *__capital_coordinates__* and *__regions__*.

In [12]:
# Create new "Capital Region" dataframe:
capital_regions = capital_coordinates.copy()

# Add regions to "capital_coordinates" dataframe:
capital_regions = capital_coordinates.merge(regions, left_on = 'Country', right_on = 'Country')

# Create a list of regions to keep in dataframe:
regions_list = ['Europe & Central Asia', 'Latin America & Caribbean', 'North America']

# Keep only selected regions in dataframe:
capital_regions = capital_regions.loc[capital_regions['Region'].isin(regions_list)]


capital_regions.head()

Unnamed: 0,Country,Capital City,City Latitude,City Longitude,Region
1,Albania,Tirana,41.315886,19.900912,Europe & Central Asia
4,Antigua and Barbuda,St. John's,17.118457,-61.844851,Latin America & Caribbean
5,Argentina,Buenos Aires,-34.607568,-58.437089,Latin America & Caribbean
6,Armenia,Yerevan,40.177612,44.512585,Europe & Central Asia
8,Austria,Vienna,48.208354,16.372504,Europe & Central Asia


In [13]:
print(\
'{} are left after the merge between the two dataframes and the focus on the three regions.'\
      .format(capital_regions.shape[0])\
     )

77 are left after the merge between the two dataframes and the focus on the three regions.


#### 3.2. Remove Central Asia countries from the dataframe:  
* We have to do one more fix to our data.  
* The Europe region is combined with Central Asia, so we will remove the contries from Central Asia from our dataframe by setting a right border longitude value and a left border lobgitude value.  
* This will manually remove any countries that are not within the frame we want to cluster later on.

In [15]:
# Keep contries left from this longitude value:
capital_regions = capital_regions[capital_regions['City Longitude'] < 33.38]

# Keep contries right from this longitude value:
capital_regions = capital_regions[capital_regions['City Longitude'] > -99.2]

# Reset index:
capital_regions.reset_index(drop = True, inplace = True)

In [16]:
print(\
'Final dataframe has {} countries.'\
     .format(capital_regions.shape[0])\
     )

Final dataframe has 70 countries.


Let's export the *__capital_regions__* dataframe as a csv file for future snalysis.

In [18]:
capital_coordinates.to_csv(r'Capital_Regions.csv', index = False)

#### 3.3. Vusualizing the dataframe:

In [20]:
# create map of capital cities:
world_map = folium.Map(location=[0, 0], zoom_start = 2, tiles='stamentoner')

# Add markers to map
for lat, lng, city, country in zip(capital_regions['City Latitude'],
                                   capital_regions['City Longitude'],
                                   capital_regions['Capital City'],
                                   capital_regions['Country']):
    try:
        label = '{}, {}'.format(city, country)
        label = folium.Popup(label, parse_html = True)
        folium.CircleMarker(
            [lat, lng],
            radius = 2,
            popup = label,
            color = '#3765F4',
            fill = True,
            fill_color = '#3765F4',
            fill_opacity = 1,
            parse_html = False).add_to(world_map)  
    except:
        pass
    
world_map

__This image is provided so you can see the results of the last cell's code.__

<img src="Get_Coordinates.jpg">

### 4. Summary:

Overall, in this notebook we:
1. We have added coordinates values to the Capital Cities data.  
2. We created the dataframe we will use for the clustering analysis later on, focusing on regions:  
    * Europe  
    * Latin America & Caribbean  
    * North America 

__Thank you for taking your time to read through this!__

### 5. References

Wikipedia contributors. (2019, December). *List of national capitals*. Retrieved December 28, 2019, from Wikipedia: https://en.wikipedia.org/wiki/List_of_national_capitals

World Bank, Doing Business. (n.d.-a). *Doing Business Indicators [Data file]*. Retrieved from The World Bank: https://databank.worldbank.org/source/doing-business