# Plotting WOfS data across Africa with country-boundary shapefile

This plot compares surface water detected by Water Observations from Space between two chosen years. 

**Caveats**

* This uses the WOfS annual summaries, it does not specify differences caused by longer-term trend conditions (such as drought), which should be investigated at a regional scale
* Using native resolution (30m for WOfS) seems to have a fairly large impact on the results - resampling causes the loss of the smaller or more ephemeral waterbodies. This resolution and the resampling technique can be changed or specified in the `dc.load` statement. It may require a larger sandbox instance to run successfully

## Load packages, connect to datacube

In [1]:
%matplotlib inline

import datacube
import gc
import matplotlib.pyplot as plt
import geopandas as gpd
from datacube.utils import geometry
import pandas as pd
import matplotlib as mpl
import xarray as xr
from odc.algo import xr_reproject
from datacube.utils.geometry import assign_crs

from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.dask import create_local_dask_cluster

dc = datacube.Datacube(app="WOfS-figure")

import warnings
warnings.filterwarnings("ignore", message="Iteration over multi-part geometries is deprecated and will be removed in ")

In [2]:
create_local_dask_cluster()

0,1
Client  Scheduler: tcp://127.0.0.1:40001  Dashboard: /user/chad/proxy/8787/status,Cluster  Workers: 1  Cores: 15  Memory: 104.37 GB


### Load shapefile

In [3]:
vector_file = "data/africa_basins_100000km2.shp"
output_suffix = 'basins'
gdf = gpd.read_file(vector_file)
gdf.head()

Unnamed: 0,BASIN_ID,AREA_SQKM,geometry
0,9871,2916242.0,"MULTIPOLYGON (((26.45833 6.08333, 26.45833 6.0..."
1,12034,305367.9,"MULTIPOLYGON (((18.29167 20.64167, 18.28333 20..."
2,12306,158529.3,"POLYGON ((23.87500 20.60000, 23.87500 20.64167..."
3,19050,103734.4,"POLYGON ((26.11667 20.62500, 26.10833 20.62500..."
4,22891,211663.9,"POLYGON ((11.26667 18.47500, 11.26667 18.52500..."


### Analysis parameters

In [4]:
# Set the year you want to compare
year = '2020'

In [5]:
attribute_col = 'BASIN_ID'
measurements = ['frequency']
resolution = (-30,30)
output_crs = 'EPSG:6933'

In [6]:
query = {'measurements': measurements,
         'resolution': resolution,
         'output_crs': output_crs,
         'dask_chunks':dict(x=1000,y=1000)
         }

### Loop through polygons and process WOfS data



In [7]:
#store results in dict
year_area = {}
area_mean = {}
area_std = {}
area_min = {}
area_idxmin = {}
area_max = {}
area_idxmax = {}
diff = {}
change_perc = {} 
anomaly = {}

In [8]:
g0, g1, g2 = gc.get_threshold()
gc.set_threshold(g0*2, g1*2, g2*2)

In [9]:
# A progress indicator
i = 0

# Loop through polygons in geodataframe and extract satellite data
for index, row in gdf.iterrows():

    print(" Feature {:02}/{:02}\r".format(i + 1, len(gdf)),
                  end='')

    # Get the geometry
    geom = geometry.Geometry(row.geometry.__geo_interface__,
                             geometry.CRS(f'EPSG:{gdf.crs.to_epsg()}'))

    # Update dc query with geometry      
    query.update({'geopolygon': geom}) 

    # Load wofs
    ds = dc.load(product='wofs_ls_summary_annual',
                 time=('2000', '2020'),
                 **query)
    
    # Generate a polygon mask to keep only data within the polygon
    mask = xr_rasterize(gdf.iloc[[index]], ds, verbose=False)
    ds = ds.where(mask)
    
    #threshold
    ds = ds.frequency > 0.01
    
    #calculate area of pixels
    area_per_pixel = query["resolution"][1]**2 / 1000**2

    ds_area = (ds.sum(dim=['x', 'y']) * area_per_pixel).compute()
    
    #summary stats for area of water from 1984-2020
    ds_area_mean = ds_area.mean().values.item()
    ds_area_std = ds_area.std().values.item()
    ds_area_max = ds_area.max().values.item()
    ds_area_idxmax = int(ds_area.idxmax().dt.year.values)
    ds_area_min = ds_area.min().values.item()
    ds_area_idxmin = int(ds_area.idxmin().dt.year.values)
    
    #area in year of interest
    ds_area_year = ds_area.sel(time=year).values.item()
    
    #compare area
    diff_area = ds_area_year - ds_area_mean
    diff_percent = diff_area / ds_area_mean * 100
    anomaly_year = (ds_area_year - ds_area_mean) / ds_area_std
    
    #output results in dict
    country=str(row[attribute_col])
    
    year_area.update({country: ds_area_year})
    area_mean.update({country: ds_area_mean})
    area_std.update({country: ds_area_std})
    area_max.update({country: ds_area_max})
    area_min.update({country: ds_area_min})
    area_idxmin.update({country: ds_area_idxmin})
    area_idxmax.update({country: ds_area_idxmax}) 
    diff.update({country: diff_area})
    change_perc.update({country: diff_percent})
    anomaly.update({country: anomaly_year})
    i += 1

 Feature 27/27

In [10]:
df = pd.DataFrame.from_dict(
    [
        year_area,
        area_mean,
        area_std,
        area_max,
        area_min,
        area_idxmax,
        area_idxmin,
        diff,
        change_perc,
        anomaly,
    ]
).T.rename(
    {
        0: "Water Extent " + year + " (km2)",
        1: "Mean Water Extent 2000-2020 (km2)",
        2: "Std. Dev. Water Extent 2000-2020 (km2)",
        3: "Max Water Extent 2000-2020 (km2)",
        4: "Min Water Extent 2000-2020 (km2)",
        5: "Year of Max Water Extent 2000-2020",
        6: "Year of Min Water Extent 2000-2020",
        7: "Difference in Water Extent: " + year + " - Mean (km2)",
        8: "Percentage Difference in Water Extent: " + year + " - Mean (km2)",
        9: "Standardised Anomaly in Water Extent: (" + year + " - Mean) / Std.dev.",
    },
    axis=1,
)

df.head()

Unnamed: 0,Water Extent 2020 (km2),Mean Water Extent 2000-2020 (km2),Std. Dev. Water Extent 2000-2020 (km2),Max Water Extent 2000-2020 (km2),Min Water Extent 2000-2020 (km2),Year of Max Water Extent 2000-2020,Year of Min Water Extent 2000-2020,Difference in Water Extent: 2020 - Mean (km2),Percentage Difference in Water Extent: 2020 - Mean (km2),Standardised Anomaly in Water Extent: (2020 - Mean) / Std.dev.
9871,142688.6577,115671.029271,13466.888789,142688.6577,99001.1259,2020.0,2004.0,27017.628429,23.357299,2.006226
12034,106.3746,217.111157,377.912377,1850.3964,28.6056,2002.0,2013.0,-110.736557,-51.004545,-0.293022
12306,15.6915,73.434429,293.222991,1384.479,0.6372,2002.0,2012.0,-57.742929,-78.631957,-0.196925
19050,0.7047,169.169271,548.720796,2607.6222,0.423,2002.0,2018.0,-168.464571,-99.583435,-0.307013
22891,28.4049,41.200329,110.427457,532.359,0.8649,2002.0,2012.0,-12.795429,-31.056618,-0.115872


## Join results onto geodataframe and export

In [None]:
gdf = gdf.join(df, attribute_col)
gdf.head()

In [83]:
gdf.to_file('results/geojsons/wofs_water_extent_africa_'+output_suffix+'_'+year+'.geojson')

## Explore results

In [85]:
gdf = gpd.read_file('results/geojsons/wofs_water_extent_africa_'+output_suffix+'_'+year+'.geojson')

In [86]:
#simplify so plotting is fast
gdf_simple = gdf.to_crs('epsg:6933')
gdf_simple['geometry'] = gdf_simple['geometry'].simplify(2500)


In [87]:
gdf_simple.explore(column='Standardised Anomaly in Water Extent: (2020 - Mean) / Std.dev.', cmap='RdBu', vmin=-2.5, vmax=2.5, style_kwds={'fillOpacity':1.0}, tiles='CartoDB positron')

# Rainfall


In [88]:
year='2020'
vector_file = 'results/geojsons/wofs_water_extent_africa_'+output_suffix+'_'+year+'.geojson'
gdf = gpd.read_file(vector_file)

In [93]:
attribute_col = 'BASIN_ID'

resolution = (-5000, 5000) # native CHIRPS resolution
output_crs = 'EPSG:6933'
measurements = ['rainfall']

query = {'measurements': measurements,
         'resolution': resolution,
         'output_crs': output_crs,
         'dask_chunks':dict(x=1000,y=1000)
         }

In [94]:
year_rain = {}
rain_mean = {}
rain_std = {}
rain_anomaly = {}
rain_min = {}
rain_idxmin = {}
rain_max = {}
rain_idxmax = {}

In [95]:
g0, g1, g2 = gc.get_threshold()
gc.set_threshold(g0*3, g1*3, g2*3)

In [96]:
i = 0

# Loop through polygons in geodataframe and extract satellite data
for index, row in gdf.iterrows():
    print(" Feature {:02}/{:02}\r".format(i + 1, len(gdf)),
                  end='')
    
    #skip come countries as no-data
    country=str(row[attribute_col])
    if (country == 'Cape Verde') | (country == 'Mauritius'):
        pass
    
    else:
        # Get the geometry
        geom = geometry.Geometry(row.geometry.__geo_interface__,
                                 geometry.CRS(f'EPSG:{gdf.crs.to_epsg()}'))

        # Update dc query with geometry      
        query.update({'geopolygon': geom}) 

        ds_all = dc.load(product=['rainfall_chirps_monthly'],
                         time=('1981', '2020'),
                         **query).rainfall

        #select out difeent components of calcs
        ds = ds_all.sel(time=year)
        ds_clim=ds_all.sel(time=slice('1981','2010'))
        ds_match_wofs=ds_all.sel(time=slice('2000','2020'))

        # set -9999 no-data values to NaN
        ds = ds.where(ds !=-9999.)
        ds_clim = ds_clim.where(ds_clim !=-9999.)
        ds_match_wofs = ds_match_wofs.where(ds_match_wofs !=-9999.)

        # Generate a polygon mask to keep only data within the polygon
        mask = xr_rasterize(gdf.iloc[[index]], ds)
        ds = ds.where(mask)
        ds_clim = ds_clim.where(mask)
        ds_match_wofs = ds_match_wofs.where(mask)

        # find sum over 12 months (ie total for that year)
        ds = ds.sum(dim='time').mean(dim=['x','y']).compute()

        # Climatologies 
        ds_clim_mean_year = ds_clim.groupby('time.year').sum(dim=['time']).mean(dim=['x','y']).compute()
        ds_clim_std = ds_clim_mean_year.std('year')
        ds_clim_mean = ds_clim_mean_year.mean('year')

        #extra summary stats to match wofs summary plots
        ds_match_wofs_mean_year = ds_match_wofs.groupby('time.year').sum(dim=['time']).mean(dim=['x','y']).compute()
        ds_max = ds_match_wofs_mean_year.max('year')
        ds_idxmax = int(ds_match_wofs_mean_year.idxmax('year').values)
        ds_min = ds_match_wofs_mean_year.min('year')
        ds_idxmin = int(ds_match_wofs_mean_year.idxmin('year').values)

        #anomalies
        anomalies = xr.apply_ufunc(
            lambda x, m, s: (x - m) / s,
            ds,
            ds_clim_mean,
            ds_clim_std,
            output_dtypes=[ds.dtype],
            dask="allowed"
        )

        #ad results to dict
        year_rain.update({country: ds.values.item()})
        rain_mean.update({country: ds_clim_mean.values.item()})
        rain_std.update({country: ds_clim_std.values.item()})
        rain_anomaly.update({country: anomalies.values.item()})
        rain_max.update({country: ds_max.item()})
        rain_min.update({country: ds_min.item()})
        rain_idxmin.update({country: ds_idxmin})
        rain_idxmax.update({country: ds_idxmax}) 

    # Update counter
    i += 1

 Feature 27/27

In [97]:
df_rain = pd.DataFrame.from_dict(
    [
        year_rain,
        rain_mean,
        rain_std,
        rain_anomaly,
        rain_max,
        rain_min,
        rain_idxmin,
        rain_idxmax
    ]
).T.rename(
    {
        0: "Total Rainfall " + year + " (mm)",
        1: "Mean Yearly Rainfall 1981-2010 (mm)",
        2: "Std. Dev. Yearly Rainfall 1981-2010 (mm)",
        3: "Standardised Yearly Rainfall Anomaly "+year,
        4: "Max Rainfall 2000-2020 (mm)",
        5: "Min Rainfall 2000-2020 (mm)",
        6: "Year of Min Rainfall 2000-2020",
        7: "Year of Max Rainfall 2000-2020",
    },
    axis=1,
)

df_rain.head()

Unnamed: 0,Total Rainfall 2020 (mm),Mean Yearly Rainfall 1981-2010 (mm),Std. Dev. Yearly Rainfall 1981-2010 (mm),Standardised Yearly Rainfall Anomaly 2020,Max Rainfall 2000-2020 (mm),Min Rainfall 2000-2020 (mm),Year of Min Rainfall 2000-2020,Year of Max Rainfall 2000-2020
9871,322.623169,285.228851,21.400887,1.747326,337.296783,247.652756,2009.0,2019.0
12034,4.626991,4.810228,0.370024,-0.495203,5.221259,4.171667,2010.0,2018.0
12306,1.333627,1.333342,0.094319,0.003016,1.515414,1.206486,2002.0,2018.0
19050,1.93643,1.779869,0.053662,2.917545,1.967283,1.698374,2012.0,2019.0
22891,9.237149,9.402452,0.631329,-0.261834,11.139598,8.396553,2001.0,2018.0


In [111]:
# df_rain.index = df_rain.index.astype(int)

In [112]:
gdf = gdf.set_index('BASIN_ID').join(df_rain, attribute_col)
gdf.head()

Unnamed: 0_level_0,Water Extent 2020 (km2),Mean Water Extent 2000-2020 (km2),Std. Dev. Water Extent 2000-2020 (km2),Max Water Extent 2000-2020 (km2),Min Water Extent 2000-2020 (km2),Year of Max Water Extent 2000-2020,Year of Min Water Extent 2000-2020,Difference in Water Extent: 2020 - Mean (km2),Percentage Difference in Water Extent: 2020 - Mean (km2),Standardised Anomaly in Water Extent: (2020 - Mean) / Std.dev.,geometry,Total Rainfall 2020 (mm),Mean Yearly Rainfall 1981-2010 (mm),Std. Dev. Yearly Rainfall 1981-2010 (mm),Standardised Yearly Rainfall Anomaly 2020,Max Rainfall 2000-2020 (mm),Min Rainfall 2000-2020 (mm),Year of Min Rainfall 2000-2020,Year of Max Rainfall 2000-2020
BASIN_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
9871,142688.6577,115671.029271,13466.888789,142688.6577,99001.1259,2020.0,2004.0,27017.628429,23.357299,2.006226,"MULTIPOLYGON (((26.45833 6.08333, 26.45833 6.0...",322.623169,285.228851,21.400887,1.747326,337.296783,247.652756,2009.0,2019.0
12034,106.3746,217.111157,377.912377,1850.3964,28.6056,2002.0,2013.0,-110.736557,-51.004545,-0.293022,"MULTIPOLYGON (((18.29167 20.64167, 18.28333 20...",4.626991,4.810228,0.370024,-0.495203,5.221259,4.171667,2010.0,2018.0
12306,15.6915,73.434429,293.222991,1384.479,0.6372,2002.0,2012.0,-57.742929,-78.631957,-0.196925,"POLYGON ((23.87500 20.60000, 23.87500 20.64167...",1.333627,1.333342,0.094319,0.003016,1.515414,1.206486,2002.0,2018.0
19050,0.7047,169.169271,548.720796,2607.6222,0.423,2002.0,2018.0,-168.464571,-99.583435,-0.307013,"POLYGON ((26.11667 20.62500, 26.10833 20.62500...",1.93643,1.779869,0.053662,2.917545,1.967283,1.698374,2012.0,2019.0
22891,28.4049,41.200329,110.427457,532.359,0.8649,2002.0,2012.0,-12.795429,-31.056618,-0.115872,"POLYGON ((11.26667 18.47500, 11.26667 18.52500...",9.237149,9.402452,0.631329,-0.261834,11.139598,8.396553,2001.0,2018.0


In [113]:
# Export results (overwritng WOfS file exported earlier with new rainfall data appended)
gdf.to_file('results/geojsons/wofs_water_extent_africa_'+output_suffix+'_'+year+'.geojson')

In [114]:
gdf_simple = gdf.to_crs('epsg:6933')
gdf_simple['geometry'] = gdf_simple['geometry'].simplify(2500)

In [116]:
gdf_simple.columns

Index(['Water Extent 2020 (km2)', 'Mean Water Extent 2000-2020 (km2)',
       'Std. Dev. Water Extent 2000-2020 (km2)',
       'Max Water Extent 2000-2020 (km2)', 'Min Water Extent 2000-2020 (km2)',
       'Year of Max Water Extent 2000-2020',
       'Year of Min Water Extent 2000-2020',
       'Difference in Water Extent: 2020 - Mean (km2)',
       'Percentage Difference in Water Extent: 2020 - Mean (km2)',
       'Standardised Anomaly in Water Extent: (2020 - Mean) / Std.dev.',
       'geometry', 'Total Rainfall 2020 (mm)',
       'Mean Yearly Rainfall 1981-2010 (mm)',
       'Std. Dev. Yearly Rainfall 1981-2010 (mm)',
       'Standardised Yearly Rainfall Anomaly 2020',
       'Max Rainfall 2000-2020 (mm)', 'Min Rainfall 2000-2020 (mm)',
       'Year of Min Rainfall 2000-2020', 'Year of Max Rainfall 2000-2020'],
      dtype='object')

In [121]:
gdf_simple.explore(column='Year of Max Rainfall 2000-2020', cmap='viridis', vmax=2020, style_kwds={'fillOpacity':1.0}, tiles='CartoDB positron')

# ------FIGURES CODE ----------------------------------------------

### Plot WOfS with country boundaries
**Use this section to edit existing plot title, bounds, etc.**

If you have successfully exported the previous shapefile but started a new instance, there is no need to re-process the data. It can be read in from the shapefile by uncommenting and running the code below. Be sure the vector file path is to the correct shapefile title.

In [None]:
# vector_file = "Plot_WOfS_by_country-data/wofs_pc_change_2020_vs_alltime_60-100.geojson"
# gdf = gpd.read_file(vector_file)

In [None]:
gdf.head()

### Customise the plot

In [None]:
# Define plot and colourbar axes
fig, ax = plt.subplots(1,1, figsize=(10,10))
fig.subplots_adjust(bottom=0.2)
cax = fig.add_axes([0.16, 0.15, 0.70, 0.03])

# Define colour map
cmap = mpl.cm.RdYlBu
bounds = list(range(-100, 101, 10))
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)#, extend='both')
cbar = mpl.colorbar.ColorbarBase(cax, cmap=cmap,
                                norm=norm,
                                orientation='horizontal')

# Define colourbar labelling
cbar.set_ticks([]) #list(range(-100, 101, 25)))
# cbar.set_ticklabels(list('{:2}%'.format(i) for i in (list(range(-100, 101, 25)))))
# cbar.set_label("% change", fontsize='14')

# Turn off lon-lat ticks and labels
ax.set_yticklabels([])
ax.set_xticklabels([])
ax.set_xticks([])
ax.set_yticks([])

# Remove frame
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# plot 'pc_change' and 'geometry' boundary lines
gdf.plot('pc_change', ax=ax, cmap=cmap, vmin=-100, vmax=100, norm=norm)
gdf.geometry.plot(ax=ax, linewidth=0.8, edgecolor='black', facecolor="none")

# Additional text boxes
ax.text(-25, -52, s="Less water than usual", color='black', fontsize='14')
ax.text(34, -52, s="More water than usual", color='black', fontsize='14')

# Plot title - include frequency range
ax.set_title("Surface water area "+str(int(lower*100))+"-"+str(int(upper*100))+"% "+year+"\nDifference from all-time summary", fontsize='18')
#Plot title - concise title
# ax.set_title("Surface water area - "+year+"\nDifference from all-time summary", fontsize='18')


# Export figure
# fig.savefig('wofs_pc_change_'+year+'_vs_alltime_'+str(int(lower*100))+'-'+str(int(upper*100))+"_"+str(resolution[1])+'m-res.png', 
#             bbox_inches='tight',
#             dpi=200, 
#             facecolor="white")

## - Plot rainfall anomalies, to compare to WOfS

If the data for that year has previously been generated, uncomment and import the data using the file read code below.

In [None]:
# vector_file = "results/geojsons/chirps_per_country_"+year+"_5km.geojson"
# gdf = gpd.read_file(vector_file)

In [None]:
# Define plot and colourbar axes
fig, ax = plt.subplots(1,1, figsize=(10,10))
fig.subplots_adjust(bottom=0.2)
cax = fig.add_axes([0.16, 0.15, 0.70, 0.03])

# Define colour map
cmap = mpl.cm.RdYlBu
bounds = list(range(-100, 101, 10))
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)#, extend='both')
cbar = mpl.colorbar.ColorbarBase(cax, cmap=cmap,
                                norm=norm,
                                orientation='horizontal')

# Define colourbar labelling
cbar.set_ticks([])
# cbar.set_ticklabels(list('{:2}%'.format(i) for i in (list(range(-100, 101, 25)))))
# cbar.set_label("Rainfall (mm)", fontsize='14')

# Turn off lon-lat ticks and labels
ax.set_yticklabels([])
ax.set_xticklabels([])
ax.set_xticks([])
ax.set_yticks([])

# Remove frame
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# plot 'pc_change' and 'geometry' boundary lines
gdf.plot('pc_rain', ax=ax, cmap=cmap, norm=norm)
gdf.geometry.plot(ax=ax, linewidth=0.8, edgecolor='black', facecolor="none")

# Additional text boxes
ax.text(-25, -52, s="Less rain than usual", color='black', fontsize='14')
ax.text(34, -52, s="More rain than usual", color='black', fontsize='14')

# Plot title
ax.set_title("Rainfall anomaly "+year+" - CHIRPS", fontsize='18')

# Export figure
fig.savefig('rainfall_anomaly_per_country_'+year+'_5km.png', 
            bbox_inches='tight',
            dpi=200, 
            facecolor="white")

---

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).

**Compatible datacube version:**

In [None]:
print(datacube.__version__)

**Last Tested:**

In [None]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')