<a href="https://colab.research.google.com/github/ArfaKhalid/Geospatial-Analysis/blob/main/Manipulating_Geospatial_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Manipulation In Geospatial Data Analysis

# Introduction
- Finding the location with just the name of a place
- Joining data based on spatial relationships

In [1]:
# Import libraries
import pandas as  pd
import geopandas as gpd
import numpy as np
import folium
from folium import Marker
import warnings
warnings.filterwarnings('ignore')

# Geocoding
- It is the process of converting the name of a place or an address to a location on the map.
- Landmark description on Google Maps, Bing Maps or Baidu Maps : uses a geocoder

In [2]:
from geopy.geocoders import Nominatim

# Nominatim
- Nominatim refers to the geocoding software that will be used to generate locations.
- Begin by instantiating the geocoder. Then, we need only apply the name or address as a Python string. (In this case, we supply "Pyramid of Khufu", also known as the Great Pyramid of Giza.)
- If the geocoding is successful, it returns a geopy.location.Location object with two important attributes:

1. The "point" attribute contains the (latitude, longitude) location, and
2. The "address" attribute contains the full address.

In [3]:
geolocator = Nominatim(user_agent='kaggle_learn')
location = geolocator.geocode("Pyramid of Khufu")
print(location.point)
print(location.address)

29 58m 44.976s N, 31 8m 3.17625s E
هرم خوفو, شارع ابو الهول السياحي, كوم الأخضر, الجيزة, 12125, مصر


# Attributes
- The value for the "point" attribute is a geopy.point.Point object.
- We can get the latitude and longitude from the latitude and longitude attributes.

In [4]:
point = location.point
print ("Latitude:", point.latitude)
print ("Longitude:", point.longitude)

Latitude: 29.97916
Longitude: 31.134215625236113


In [6]:
# It's often the case that we'll need to geocode many different addresses.
universities = pd.read_csv("top_universities.csv")
universities.head()

Unnamed: 0,Name
0,University of Oxford
1,University of Cambridge
2,Imperial College London
3,ETH Zurich
4,UCL


# Geocoder :
- Then we can use a lambda function to apply the geocoder to every row in the DataFrame. (We use a try/except statement to account for the case that the geocoding is unsuccessful.)

In [7]:
def my_geocoder (row):
    try :
        point = geolocator.geocode(row).point
        return pd.Series ({'Latitude' : point.latitude, 'Longitude ' : point.longitude})
    except:
        return None

universities[['Latitude', 'Longitude']] = universities.apply (lambda x: my_geocoder(x['Name']), axis = 1)

print ("{}% of addresses were geocoded!".format(
    (1 -sum(np.isnan(universities["Latitude"])) / len(universities)) * 100 ))

# Drop universities that were not sucessfully geocoded
universities = universities.loc[~np.isnan(universities["Latitude"])]
universities = gpd.GeoDataFrame(
    universities, geometry=gpd.points_from_xy(universities.Longitude, universities.Latitude))
universities.crs = {'init' : 'epsg:4326'}
universities.head()




95.0% of addresses were geocoded!


Unnamed: 0,Name,Latitude,Longitude,geometry
0,University of Oxford,33.65029,-117.828179,POINT (-117.82818 33.65029)
1,University of Cambridge,52.210946,0.092005,POINT (0.09200 52.21095)
2,Imperial College London,51.498959,-0.175641,POINT (-0.17564 51.49896)
3,ETH Zurich,47.413218,8.537491,POINT (8.53749 47.41322)
4,UCL,51.521785,-0.135151,POINT (-0.13515 51.52179)


# Visualization of all the locations that were returned by the geocoder.
- Notice that a few of the locations are certainly inaccurate as they are not in Europe.

In [8]:
# Create a map
m = folium.Map(location=[54,15], tiles='openstreetmap', zoom_start=2)

# Add points to the map
for idx, row in universities.iterrows():
    Marker ([row['Latitude'], row['Longitude']], popup=row['Name']).add_to(m)

# Display the map
m

# Table joins
How to combine data from different sources.
 ## Attribute join
 - pd.DataFrame.join() is used to combine information from multiple DataFRAMES WITH A SHARED index.
 - This way of joining data by simplying matching the values in index is called attribute join.
 - To perform attribute join with GeoDataFrame it's best to used gpd.GeoDataFrame.merge().  

In [9]:
# get the europe_bound(aries for every country in the Europe
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
europe = world.loc[world.continent =='Europe'].reset_index(drop=True)
europe_stats = europe[["name", "pop_est", "gdp_md_est"]]
europe_boundaries = europe[["name","geometry"]]

In [10]:
europe_boundaries.head()

Unnamed: 0,name,geometry
0,Russia,"MULTIPOLYGON (((180.00000 71.51571, 180.00000 ..."
1,Norway,"MULTIPOLYGON (((15.14282 79.67431, 15.52255 80..."
2,France,"MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3..."
3,Sweden,"POLYGON ((11.02737 58.85615, 11.46827 59.43239..."
4,Belarus,"POLYGON ((28.17671 56.16913, 29.22951 55.91834..."


In [11]:
# We'll join it with a DataFrame europe_stats containing the estimated population and gross domestic product (GDP) for each country.
europe_stats.head()

Unnamed: 0,name,pop_est,gdp_md_est
0,Russia,144373535.0,1699876
1,Norway,5347896.0,403336
2,France,67059887.0,2715518
3,Sweden,10285453.0,530883
4,Belarus,9466856.0,63080


# We do the attribute join:  The on argument is set to the column name that is used to match rows in europe_boundaries to rows in europe_stats.

In [12]:
# Use an attribute join to merge data about countries in Europe
europe = europe_boundaries.merge(europe_stats, on="name")
europe.head()

Unnamed: 0,name,geometry,pop_est,gdp_md_est
0,Russia,"MULTIPOLYGON (((180.00000 71.51571, 180.00000 ...",144373535.0,1699876
1,Norway,"MULTIPOLYGON (((15.14282 79.67431, 15.52255 80...",5347896.0,403336
2,France,"MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3...",67059887.0,2715518
3,Sweden,"POLYGON ((11.02737 58.85615, 11.46827 59.43239...",10285453.0,530883
4,Belarus,"POLYGON ((28.17671 56.16913, 29.22951 55.91834...",9466856.0,63080


# Spatial join
- Within spatial join we combine GeoDataFrames based on the spatial relationship between the objects in the "geometry" columns.
- gpd.sjoin()


In [14]:
# For instance, we already have a GeoDataFrame universities containing geocoded addresses of European universities.
# Then we can use a spatial join to match each university to its corresponding country. We do this with gpd.sjoin().
# Use spatial join to match universities to countries in Europe
european_universities = gpd.sjoin(universities, europe)
# Investigate the result
print("We located {} universities." .format(len(universities)))
print("Only {} of the universities were located in Europe (in {} different countries).".format(len(european_universities), len(european_universities.name.unique())))

european_universities.head()

We located 95 universities.
Only 89 of the universities were located in Europe (in 15 different countries).


Unnamed: 0,Name,Latitude,Longitude,geometry,index_right,name,pop_est,gdp_md_est
1,University of Cambridge,52.210946,0.092005,POINT (0.09200 52.21095),28,United Kingdom,66834405.0,2829108
2,Imperial College London,51.498959,-0.175641,POINT (-0.17564 51.49896),28,United Kingdom,66834405.0,2829108
4,UCL,51.521785,-0.135151,POINT (-0.13515 51.52179),28,United Kingdom,66834405.0,2829108
5,London School of Economics and Political Science,51.514261,-0.116734,POINT (-0.11673 51.51426),28,United Kingdom,66834405.0,2829108
6,University of Edinburgh,55.944076,-3.188374,POINT (-3.18837 55.94408),28,United Kingdom,66834405.0,2829108


# Interpretation:
- The spatial join looks at the "geometry" columns in both GeoDataFrames.
- If a Point object from the universities GeoDataFrame intersects a Polygon object from the europe DataFrame, the corresponding rows are combined and added as a single row of the european_universities DataFrame.
- Otherwise, countries without a matching university (and universities without a matching country) are omitted from the results.
- The gpd.sjoin() method is customizable for different types of joins, through the how and op arguments. For instance, you can do the equivalent of a SQL left (or right) join by setting how='left' (or how='right').