# Where is the population center of the world? 
*Where we find a circle on the globe where 50 percent of people live inside the circle*

Much of the power in the world is concentrated in western countries like the US and Europe, making those places very prominent on the world stage. But where is the center of the world if we go by population numbers? Inspired by a [Real Life Lore youtube video](https://www.youtube.com/watch?v=mcqq8eAufXk) I went looking for the smallest circle one can draw on the Earth where 50 percent of people live inside the circle, and 50 percent outside. The center of that circle could be dubbed the population center of the world. 

In this article we will delve into a lot of geography related topics, on old passion of mine dating back to my days as a PhD. You will learn how to:

- Read grid data into Python
- Intersect a grid dataset with a polygon
- Transform data between geographic projections
- Draw circles across a round Earth on a flat map, also know as Great Cirle Distances

In the next few sections I will slowly towards our solution, starting with the population data underlying all of our analyses. 

# The population source data
At the core of our analysis is the population data. I chose to use a 1km grid population dataset that was [published on WorldPop](https://www.worldpop.org/doi/10.5258/SOTON/WP00647). This dataset was collated by a number of universities under a Bill and Melinda Gates Foundation grant. It covers the entire globe, and provides the amount of people living in a particluar gridcell: 

![](world_pop.png)

I chose to use [the most recent dataset they provide](https://data.worldpop.org/GIS/Population/Global_2000_2020/2020/0_Mosaicked/ppp_2020_1km_Aggregated.tif), which is of 2020:

In [2]:
import rasterio
import numpy as np

population_raster = rasterio.open('data/ppp_2020_1km_Aggregated.tif')
population_data = population_raster.read(1)
population_data

array([[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
        -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
       [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
        -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
       [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
        -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
       ...,
       [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
        -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
       [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
        -3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
       [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
        -3.4028235e+38, -3.4028235e+38, -3.4028235e+38]], dtype=float32)

The `read` command produces a numpy array with all the population data. A striking feature are all the `-3.4028235e+38` values, which the metadata of the file reveals to be NA values. This mostly covers the waterbodies, where nobody lives. Next we convert these values to proper NA values and sum up all the non-NA values:

In [5]:
population_data[population_data == -3.4028234663852886e+38] = np.nan
np.sum(population_data, where=~np.isnan(population_data)) / 1e9

7.966754816

So according to the dataset, 7.9 billion people lived on the Earth in 2020. This is nicely in line with what I expected. For later use we repackage this code into a function that determines the total population size of a `.tif` grid file:

In [6]:
def get_population(tif_file, na_value = -3.4028234663852886e+38):
    population_raster = rasterio.open(tif_file)
    population_data = population_raster.read(1)
    population_data[population_data == na_value] = np.nan
    return np.sum(population_data, where=~np.isnan(population_data))
total_population = get_population('data/ppp_2020_1km_Aggregated.tif')
total_population

7966755000.0

# Boxing in the population data
In our first step we added up all the population gridcells on the map. Obviously, in our quest to find the circle containing half the population we do not want to add up the entire map. As a first step we are going to determine the total population inside a small bounding box. To do this we are going to determine which of our gridcells is inside the box, and which is not. In geography jargon this is called intersecting the bounding box with the gridcells. First we construct the bounding box we are interested in:

In [30]:
# Helper function to draw folium maps
import folium

def plot_folium(shape, center, zoom):
    f = folium.Figure(width=800, height=800)
    m = folium.Map(center, zoom_start=zoom, tiles='cartodbpositron').add_to(f)
    folium.GeoJson(shape).add_to(m)
    folium.LatLngPopup().add_to(m)
    return m

In [32]:
# https://gis.stackexchange.com/questions/294206/create-a-polygon-from-coordinates-in-geopandas-with-python
# Bbox from https://gist.github.com/graydon/11198540: (3.31497114423, 50.803721015, 7.09205325687, 53.5104033474)
# The points are from top left, clockwise

import geopandas as gpd
from shapely.geometry import Polygon

lat_point_list = [53.5104033474, 53.5104033474, 50.803721015, 50.803721015]
lon_point_list = [3.31497114423, 7.09205325687, 7.09205325687, 3.31497114423]

polygon_geom = Polygon(zip(lon_point_list, lat_point_list))
crs = {'init': 'epsg:4326'}
bbox_nl = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom])  

plot_folium(bbox_nl, [51.854457, 4.377184], 7)

  in_crs_string = _prepare_from_proj_string(in_crs_string)


which nicely shows that we are using the bounding box of the Netherlands as our first target. 

The first solution [I explored](https://gis.stackexchange.com/questions/388047/get-coordinates-of-all-pixels-in-a-raster-with-rasterio) involved converting each of the gridcells to a geographic point, and performing the intersection between the box and the points. The issue with this solution is that it takes an enormous amount of RAM, and my Python session would simply crash each time I tried to generate the points. To remedy the situation, I chose to outsource the work to GDAL. GDAL is a geospatial processing library with bindings for Python. 

In GDAL I can use the warp tool to perform the required analysis. This however expects all the input data to be files on disk. So  first we have to save the bounding box to disk, and then we can call GDAL:

In [12]:
from osgeo import gdal

bbox_nl.to_file(filename='tmp/bbox_nl.shp', driver="ESRI Shapefile")
ds = gdal.Warp('tmp/bbox_nl.tif', 'data/ppp_2020_1km_Aggregated.tif', cutlineDSName='tmp/bbox_nl.shp', cropToCutline=True)
del ds

Here we use the cutline ability of Warp to cut down our grid to only the gridcells inside the bounding box, and dumping that result as a new tif file. After that, we can simply use our `get_population` function to calculate the total number of people living inside the box:

In [13]:
get_population('tmp/bbox_nl.tif') / 1e6

31.186402

which shows that 31 million people live inside the box. This nicely aligns with the 17 million people living in the Netherlands, combined with the population centers in Belgium and Germany that also fall into the box. For convinience we wrap these steps into a new function for later use:

In [29]:
def get_population_in_shape(shape_obj):
    shape_obj.to_file(filename='tmp/tmp.shp', driver="ESRI Shapefile")
    ds = gdal.Warp('tmp/tmp.tif', 'data/ppp_2020_1km_Aggregated.tif', cutlineDSName='tmp/tmp.shp', cropToCutline=True)
    del ds
    return get_population('tmp/tmp.tif')
get_population_in_shape(bbox_nl) / 1e6

31.186402

# Upgrading the box to a circle
But our challenge was to perform this operation with a circle, not a box. Luckily, the entire workflow given above is not limited to a box shape, but works for any polygon type shape. So, our next challenge is to construct a circle around a specific center point with a particular radius. 

A problem here is that the population grid we use is defined in latitude-longitude coordinates. This projection of a round Earth on a flat plane prevents us from simply drawing circles on the map. The [solution I found](https://gis.stackexchange.com/questions/121256/creating-a-circle-with-radius-in-metres) takes the lat-lon center point we provide and reprojects it to a projection that works with meters distance. Then we construct a circle in that projection, and project the circle back to latitude-longitude (WGS84):

In [34]:
from functools import partial

import pyproj
from shapely import geometry
from shapely.geometry import Point
from shapely.ops import transform

def get_circle_with_radius(lon, lat, radius, resolution=16):
    # https://gis.stackexchange.com/questions/121256/creating-a-circle-with-radius-in-metres
    local_azimuthal_projection = f"+proj=aeqd +R=6371000 +units=m +lat_0={lat} +lon_0={lon}".format(
        lat, lon
    )
    wgs84_to_aeqd = partial(
        pyproj.transform,
        pyproj.Proj("+proj=longlat +datum=WGS84 +no_defs"),
        pyproj.Proj(local_azimuthal_projection),
    )
    aeqd_to_wgs84 = partial(
        pyproj.transform,
        pyproj.Proj(local_azimuthal_projection),
        pyproj.Proj("+proj=longlat +datum=WGS84 +no_defs"),
    )

    center = Point(float(lon), float(lat))
    point_transformed = transform(wgs84_to_aeqd, center)
    buffer = point_transformed.buffer(radius, resolution)
    # Get the polygon with lat lon coordinates
    circle_poly = transform(aeqd_to_wgs84, buffer)

    return gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[circle_poly])

greenland_circle = get_circle_with_radius(-45, 63, 2200e3, resolution=64)
plot_folium(greenland_circle, [63, -45], 2)

Which nicely shows that our perfect circle of 2200 kilometers translates to an oval in the latitude-longitude projection. This explains why [Greenland is shown so much bigger on maps then Africa](https://www.visualcapitalist.com/map-true-size-of-africa/). With our Greenland circle ready for use, we can call the `get_population_in_shape` function:

In [22]:
get_population_in_shape(greenland_circle) / 1e6

1.81848775

This confirms that not a lot of people live in or around Greenland, 1.8 Million to be precise. 

# Finding the center of population
In the Youtube movie, the presenter names a 3300km circle around the town of Mong Khet to present the smallest circle where 50 percent of the worlds population lives inside the circle. Using our tools and data we can check this:

In [36]:
mong_khet_circle = get_circle_with_radius(99.38, 21.72, 3300e3, resolution=64)
mk_circle_population = get_population_in_shape(mong_khet_circle) 
print(mk_circle_population / 1e6)
print(mk_circle_population / total_population)
plot_folium(mong_khet_circle, [21.72, 99.38], 3)

3885.511936
0.48771578


which shows that according to our data we are close, but only 48 percent of the world lives inside the circle. A bit of tinkering around with the radius shows that we can get a precise 50 percent if we expand the circle a bit:

In [27]:
mong_khet_circle = get_circle_with_radius(99.38, 21.72, 3412e3, resolution=64)
mk_circle_population = get_population_in_shape(mong_khet_circle) 
print(mk_circle_population / 1e6)
print(mk_circle_population / total_population)

3983.822592
0.5000559


That my circle needs to be a bit bigger could be related to the particular dataset I used. 

Based on these tools, a good next step could be to use an optimisation algorithm to try and determine the exact location and radius where we can find the smallest circle that covers 50 percent of the Earth's population. 

# Population data attribution
WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00647