# Activity 5.01: Plotting Geospatial Data on a Map

In this activity, we will take our previously learned skills of plotting data with geoplotlib and apply them to our new world_cities_pop.csv dataset. We will find the dense areas of cities in Europe that have a population of more than 100,000 people:

In [48]:
import numpy as np
import pandas as pd
import geoplotlib
from geoplotlib.utils import read_csv, DataAccessObject, BoundingBox

In [2]:
df_raw = pd.read_csv('../../Datasets/world_cities_pop.csv', dtype={'Region': np.str})
df_raw

Unnamed: 0,Country,City,AccentCity,Region,Population,Latitude,Longitude
0,ad,aixas,Aixàs,06,,42.483333,1.466667
1,ad,aixirivali,Aixirivali,06,,42.466667,1.500000
2,ad,aixirivall,Aixirivall,06,,42.466667,1.500000
3,ad,aixirvall,Aixirvall,06,,42.466667,1.500000
4,ad,aixovall,Aixovall,06,,42.466667,1.483333
...,...,...,...,...,...,...,...
3173953,zw,zimre park,Zimre Park,04,,-17.866111,31.213611
3173954,zw,ziyakamanas,Ziyakamanas,00,,-18.216667,27.950000
3173955,zw,zizalisari,Zizalisari,04,,-17.758889,31.010556
3173956,zw,zuzumba,Zuzumba,06,,-20.033333,27.933333


In [3]:
df_raw.dtypes

Country        object
City           object
AccentCity     object
Region         object
Population    float64
Latitude      float64
Longitude     float64
dtype: object

In [4]:
df = pd.DataFrame(df_raw)
df.rename(columns={'Latitude': 'lat', 'Longitude': 'lon'}, inplace=True)
df

Unnamed: 0,Country,City,AccentCity,Region,Population,lat,lon
0,ad,aixas,Aixàs,06,,42.483333,1.466667
1,ad,aixirivali,Aixirivali,06,,42.466667,1.500000
2,ad,aixirivall,Aixirivall,06,,42.466667,1.500000
3,ad,aixirvall,Aixirvall,06,,42.466667,1.500000
4,ad,aixovall,Aixovall,06,,42.466667,1.483333
...,...,...,...,...,...,...,...
3173953,zw,zimre park,Zimre Park,04,,-17.866111,31.213611
3173954,zw,ziyakamanas,Ziyakamanas,00,,-18.216667,27.950000
3173955,zw,zizalisari,Zizalisari,04,,-17.758889,31.010556
3173956,zw,zuzumba,Zuzumba,06,,-20.033333,27.933333


In [5]:
dataset = DataAccessObject(df)
dataset

DataAccessObject(['Country', 'City', 'AccentCity', 'Region', 'Population', 'lat', 'lon'] x 3173958)

In [6]:
# plot the data points on a dot density plot
geoplotlib.dot(
    dataset
)
geoplotlib.show()

In [7]:
# get the number of cities per country
df.groupby(['Country']).agg({'City': 'count'})

Unnamed: 0_level_0,City
Country,Unnamed: 1_level_1
ad,92
ae,446
af,88749
ag,183
ai,42
...,...
yt,122
za,12693
zm,13027
zr,23012


In [8]:
df.groupby(['Country']).size()

Country
ad       92
ae      446
af    88749
ag      183
ai       42
      ...  
yt      122
za    12693
zm    13027
zr    23012
zw     1341
Length: 234, dtype: int64

In [9]:
# get the average number of cities per country
df.groupby(['Country']).size().agg('mean')

13563.923076923076

In [20]:
# extract the countries that have a population of greater than zero
countries_mask = (df.groupby(['Country']).agg({'Population': 'sum'}) > 0)
countries_to_include = list(countries_mask[countries_mask['Population'] == True].index)

In [21]:
df_filtered=df[df.Country.isin(countries_to_include)]
df_filtered.dropna(subset=['Population'], inplace=True)
df_filtered

Unnamed: 0,Country,City,AccentCity,Region,Population,lat,lon
6,ad,andorra la vella,Andorra la Vella,07,20430.0,42.500000,1.516667
20,ad,canillo,Canillo,02,3292.0,42.566667,1.600000
32,ad,encamp,Encamp,03,11224.0,42.533333,1.583333
49,ad,la massana,La Massana,04,7211.0,42.550000,1.516667
53,ad,les escaldes,Les Escaldes,08,15854.0,42.500000,1.533333
...,...,...,...,...,...,...,...
3173646,zw,redcliffe,Redcliffe,06,38231.0,-19.033333,29.783333
3173676,zw,rusape,Rusape,04,23761.0,-18.533333,32.116667
3173737,zw,shurugwi,Shurugwi,07,17107.0,-19.666667,30.000000
3173892,zw,victoria falls,Victoria Falls,00,36702.0,-17.933333,25.833333


In [18]:
# plot the filtered data points on a dot density plot
geoplotlib.dot(
    df_filtered
)
geoplotlib.show()

In [47]:
# extract the countries that have a population of greater than 100,000
cities_mask = (df.groupby(['City']).agg({'Population': 'sum'}) >= 100000)
cities_to_include = list(cities_mask[cities_mask['Population'] == True].index)
df_filtered2=df[df.City.isin(cities_to_include)]
df_filtered2.dropna(subset=['Population'], inplace=True)
df_filtered2

Unnamed: 0,Country,City,AccentCity,Region,Population,lat,lon
93,ae,abu dhabi,Abu Dhabi,01,603687.0,24.466667,54.366667
242,ae,dubai,Dubai,03,1137376.0,25.258172,55.304717
490,ae,sharjah,Sharjah,06,543942.0,25.357310,55.403304
6644,af,baglan,Baglan,03,108481.0,36.130684,68.708286
24457,af,gardez,Gardez,36,103732.0,33.597439,69.225922
...,...,...,...,...,...,...,...
3173009,zw,gweru,Gweru,06,201879.0,-19.450000,29.816667
3173019,zw,harare,Harare,04,2213701.0,-17.817778,31.044722
3173109,zw,kadoma,Kadoma,04,100276.0,-18.350000,29.916667
3173161,zw,kwekwe,Kwekwe,04,116332.0,-18.916667,29.816667


In [23]:
# plot the filtered data points on a dot density plot
geoplotlib.dot(
    df_filtered2
)
geoplotlib.show()

To get a better understanding of the density of our data points on the map, use a Voronoi tessellation layer.

In [49]:
# plot a Voronoi tessellation
geoplotlib.dot(
    df_filtered2,
    color='b',
    point_size=1
)
geoplotlib.voronoi(
    df_filtered2, 
    cmap='hot_r',    
    max_area=1e5, 
    alpha=200           
)
geoplotlib.set_smoothing(True)
geoplotlib.set_bbox(BoundingBox.WORLD)
geoplotlib.show()

Filter down the data even further to only cities in countries such as Germany and Great Britain.

In [45]:

df_filtered3=df_filtered2[df_filtered2.Country.isin(['de', 'gb'])]
df_filtered3

Unnamed: 0,Country,City,AccentCity,Region,Population,lat,lon
722416,de,aachen,Aachen,07,251104.0,50.770833,6.105278
725123,de,augsburg,Augsburg,02,261842.0,48.366667,10.883333
726965,de,bergen,Bergen,06,13586.0,52.816667,9.966667
726970,de,bergen,Bergen,12,14621.0,54.416667,13.433333
727047,de,bergisch gladbach,Bergisch Gladbach,07,106611.0,50.983333,7.133333
...,...,...,...,...,...,...,...
1003432,gb,weymouth,Weymouth,D6,50253.0,50.600000,-2.450000
1003718,gb,winchester,Winchester,F2,44094.0,51.016667,-1.316667
1003869,gb,wolverhampton,Wolverhampton,Q3,252792.0,52.583333,-2.133333
1003982,gb,worcester,Worcester,Q4,100023.0,52.166667,-2.166667


Finally, use a Delaunay triangulation layer to find the most densely populated areas.

In [59]:
# plot a Delaunay triangulation
geoplotlib.delaunay(
    df_filtered3, 
    cmap='hot_r'
)
geoplotlib.set_smoothing(True)
geoplotlib.set_bbox(BoundingBox.from_nominatim('EUROPE'))
geoplotlib.show()

('bbox from Nominatim:', 26.0, 76.0, -15.0, 35.0)
