# Download population for every country in the world

This is a tutorial and explanation on how to download and manage the population data used in the TropiDash dashboard in Section 3 - Impacts variables.

The data is downloaded from: https://www.worldpop.org/datacatalog/

## Global dataset

In order to download the global dataset utilized in the dashboard, the below code is needed. It will download the Unconstrained global mosaics 2000-2020 from https://hub.worldpop.org/geodata/summary?id=24777. The file contains the estimated total number of people per grid-cell. The dataset is available to download in Geotiff format at a resolution of 30 arc (approximately 1km at the equator). The projection is Geographic Coordinate System, WGS84. The units are number of people per pixel. The mapping approach is Random Forest-based dasymetric redistribution.

In [3]:
import pandas as pd
import requests

tif = requests.get("https://data.worldpop.org/GIS/Population/Global_2000_2020/2020/0_Mosaicked/ppp_2020_1km_Aggregated.tif")
with open(f"globalmosaicspop.tif", "wb") as tiffile:
        tiffile.write(tif.content)
        print("Download complete")

Then, the dataset was resampled to a raster file with cells of 10 km x 10 km. The operation was performed in QGIS. In `TropiDash\utils_impacts.py` a function to do this operation (`resample_raster()`) was created but didn't work properly, so we resorted on using QGIS. The value assigned to nodata was 999999999.0.

## Fix nodata issue

The outcome presented issues in the nodata definition: nodata values were read by rasterio as nan and couldn't be interpreted as nodata in the plot function. To solve this, the code below was run.

In [1]:
import rasterio
import numpy as np

#fix of nodata values
def fix_no_data_value(input_file, output_file, no_data_value=0):
    with rasterio.open(input_file, "r+") as src:
        src.nodata = no_data_value
        with rasterio.open(output_file, 'w',  **src.profile) as dst:
            for i in range(1, src.count + 1):
                band = src.read(i)
                band = np.where(band==no_data_value,no_data_value,band)
                band[np.isnan(band)] = no_data_value
                dst.write(band,i)

input_file = "data/impacts/ppp_2020_1km_Aggregated_resampled_10km_sum_clipped_3402na.tif"
output_file = "data/impacts/ppp_2020_1km_Aggregated_resampled_10km_sum_clipped_3402na_fix.tif"
no_data_value=999999999.0
fix_no_data_value(input_file, output_file,no_data_value)

## Single countries data

The cell below will perform the download of all the countries' population counts from WorldPop dataframe. If you want to download a subset of the list, find the countries iso code and perform one of the two cells below the next. The whole dataset will weight approx. 75 GB.

**Attention**: to run the code below you will need to install wpgpDownload package, which is not listed in the project requirements

In [None]:
from wpgpDownload.utils.wpcsv import ISO_LIST
from wpgpDownload.utils.wpcsv import Product
from wpgpDownload.utils.convenience_functions import download_country_covariates as dl
from wpgpDownload.utils.convenience_functions import refresh_csv
import time
refresh_csv()

In [1]:
#download population raster grids for each country in the list
for iso in ISO_LIST:
    products = Product(iso) #get a list of all available products for that country
    for x in list(products.products.items()):
        if x[1].dataset_name == "ppp_2020":
            print("Downloading ", iso, " ppp_2020 ...")
            start_time = time.time()
            dl(ISO = iso, out_folder = "C:/Users/user/Desktop/countries", prod_name = "ppp_2020")
            print(iso, " complete - ", round(time.time()-start_time, 2), "s")

In [2]:
#download population raster grids for each country in the list

isolist = pd.Series(ISO_LIST)
isolist = isolist[~isolist.isin(["ITA"])] # this will download every country except Italy
isolist.reset_index(drop = True, inplace = True)

for iso in isolist:
    products = Product(iso)
    for x in list(products.products.items()):
        if x[1].dataset_name == "ppp_2020":
            print("Downloading ", iso, " ppp_2020 ...")
            start_time = time.time()
            dl(ISO = iso, out_folder = "C:/Users/user/Desktop/countries", prod_name = "ppp_2020")
            print(iso, " complete - ", round(time.time()-start_time, 2), "s")

In [3]:
for iso in ["AUS"]: #this will only download Australia
    products = Product(iso)
    for x in list(products.products.items()):
        if x[1].dataset_name == "ppp_2020":
            print("Downloading ", iso, " ppp_2020 ...")
            start_time = time.time()
            dl(ISO = iso, out_folder = "C:/Users/user/Desktop/countries", prod_name = "ppp_2020")
            print(iso, " complete - ", round(time.time()-start_time, 2), "s")

Downloading  AUS  ppp_2020 ...
AUS  complete -  1055.8 s
