# Plotting WOfS data across Africa with country-boundary shapefile

This plot compares surface water detected by Water Observations from Space between two chosen years. 

**Caveats**

* This uses the WOfS annual summaries, it does not specify differences caused by longer-term trend conditions (such as drought), which should be investigated at a regional scale
* Using native resolution (30m for WOfS) seems to have a fairly large impact on the results - resampling causes the loss of the smaller or more ephemeral waterbodies. This resolution and the resampling technique can be changed or specified in the `dc.load` statement. It may require a larger sandbox instance to run successfully

## Load packages, connect to datacube

In [1]:
%matplotlib inline

import datacube
import gc
import matplotlib.pyplot as plt
import geopandas as gpd
from datacube.utils import geometry
import pandas as pd
import matplotlib as mpl
import xarray as xr
from odc.algo import xr_reproject
from datacube.utils.geometry import assign_crs

from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.dask import create_local_dask_cluster

dc = datacube.Datacube(app="WOfS-figure")

import warnings
warnings.filterwarnings("ignore", message="Iteration over multi-part geometries is deprecated and will be removed in ")

In [2]:
create_local_dask_cluster()

0,1
Client  Scheduler: tcp://127.0.0.1:46509  Dashboard: /user/chad/proxy/8787/status,Cluster  Workers: 1  Cores: 31  Memory: 254.70 GB


## 1 - Processing WOfS per country
The final step of this section outputs a new shapefile specific to the data parameters and time periods identified. Once you have generated the shapefile, you can run **2 - Plot WOfS with country boundaries** without repeating this section.

Country boundary shapefile sourced from [openAFRICA](https://open.africa/dataset/africa-shapefiles), with the file last updated on April 21, 2020.

### Load shapefile

In [3]:
vector_file = "data/african_countries.shp"
gdf = gpd.read_file(vector_file)
gdf.head()

Unnamed: 0,name,geometry
0,Sudan,"MULTIPOLYGON (((38.58148 18.02542, 38.58203 18..."
1,Angola,"MULTIPOLYGON (((11.79481 -16.81925, 11.79375 -..."
2,Benin,"MULTIPOLYGON (((1.86343 6.28872, 1.86292 6.288..."
3,Botswana,"POLYGON ((25.17447 -17.77881, 25.18476 -17.783..."
4,Burkina Faso,"POLYGON ((-0.45567 15.08082, -0.45411 15.07937..."


### Analysis parameters

In [4]:
# Set the year you want to compare
year = '2020'

In [5]:
attribute_col = 'name'
measurements = ['frequency']
resolution = (-30,30)
output_crs = 'EPSG:6933'

In [6]:
query = {'measurements': measurements,
         'resolution': resolution,
         'output_crs': output_crs,
         'dask_chunks':dict(x=1000,y=1000)
         }

### Define function to calculate surface water per polygon
`gdf.iloc[0:5].iterrows():` can be used to test a smaller number of polygons first.

Arguments:
* `gdf` the geodataframe of the country boundary shapefile
* `year` the designated year to investigate
* `lower` frequency (0-1) lower bound for water (anything above this is included)
* `upper` frequency upper bound (anything above this is excluded) - max value of `1.0`

In [7]:
#store results in dict
year_area = {}
area_mean = {}
area_std = {}
area_min = {}
area_idxmin = {}
area_max = {}
area_idxmax = {}
diff = {}
change_perc = {} 
anomaly = {}

In [None]:
# A progress indicator
i = 0

# Loop through polygons in geodataframe and extract satellite data
for index, row in gdf.iterrows():

    print(" Feature {:02}/{:02}\r".format(i + 1, len(gdf)),
                  end='')

    # Get the geometry
    geom = geometry.Geometry(row.geometry.__geo_interface__,
                             geometry.CRS(f'EPSG:{gdf.crs.to_epsg()}'))

    # Update dc query with geometry      
    query.update({'geopolygon': geom}) 

    # Load wofs
    ds = dc.load(product='wofs_ls_summary_annual',
                 time=('2000', '2020'),
                 **query)
    
    # Generate a polygon mask to keep only data within the polygon
    mask = xr_rasterize(gdf.iloc[[index]], ds, verbose=False)
    ds = ds.where(mask)
    
    #threshold
    ds = ds.frequency > 0.01
    
    #calculate area of pixels
    area_per_pixel = query["resolution"][1]**2 / 1000**2
    
    #area of water for each year
    g0, g1, g2 = gc.get_threshold()
    gc.set_threshold(g0*2, g1*2, g2*2)

    ds_area = (ds.sum(dim=['x', 'y']) * area_per_pixel).compute()
    
    #summary stats for area of water from 1984-2020
    ds_area_mean = ds_area.mean().values.item()
    ds_area_std = ds_area.std().values.item()
    ds_area_max = ds_area.max().values.item()
    ds_area_idxmax = int(ds_area.idxmax().dt.year.values)
    ds_area_min = ds_area.min().values.item()
    ds_area_idxmin = int(ds_area.idxmin().dt.year.values)
    
    #area in year of interest
    ds_area_year = ds_area.sel(time=year).values.item()
    
    #compare area
    diff_area = ds_area_year - ds_area_mean
    diff_percent = diff_area / ds_area_mean * 100
    anomaly_year = (ds_area_year - ds_area_mean) / ds_area_std
    
    #output results in dict
    country=str(row[attribute_col])
    
    year_area.update({country: ds_area_year})
    area_mean.update({country: ds_area_mean})
    area_std.update({country: ds_area_std})
    area_max.update({country: ds_area_max})
    area_min.update({country: ds_area_min})
    area_idxmin.update({country: ds_area_idxmin})
    area_idxmax.update({country: ds_area_idxmax}) 
    diff.update({country: diff_area})
    change_perc.update({country: diff_percent})
    anomaly.update({country: anomaly_year})
    i += 1

 Feature 01/55

In [None]:
df = pd.DataFrame.from_dict(
    [
        year_area,
        area_mean,
        area_std,
        area_max,
        area_min,
        area_idxmax,
        area_idxmin,
        diff,
        change_perc,
        anomaly,
    ]
).T.rename(
    {
        0: "Water Extent " + year + " (km2)",
        1: "Mean Water Extent 2000-2020 (km2)",
        2: "Std. Dev. Water Extent 2000-2020 (km2)",
        3: "Max Water Extent 2000-2020 (km2)",
        4: "Min Water Extent 2000-2020 (km2)",
        5: "Year of Max Water Extent 2000-2020 (km2)",
        6: "Year of Min Water Extent 2000-2020 (km2)",
        7: "Difference in Water Extent: " + year + " - Mean (km2)",
        8: "Percentage Difference in Water Extent: " + year + " - Mean (km2)",
        9: "Standardised Anomaly in Water Extent: (" + year + " - Mean) / Std.dev.",
    },
    axis=1,
)

df.head()

## Join results onto geodataframe and export

In [None]:
gdf = gdf.join(df, 'name')

In [None]:
gdf.to_file('results/geojsons/wofs_water_extent_africa_'+year+'.geojson')

## Explore results

In [None]:
#simplify so plotting is fast
gdf = gdf.to_crs('epsg:6933')
gdf['geometry'] = gdf['geometry'].simplify(2500)


In [None]:
gdf.explore(column=)

## 2 - Plot WOfS with country boundaries
**Use this section to edit existing plot title, bounds, etc.**

If you have successfully exported the previous shapefile but started a new instance, there is no need to re-process the data. It can be read in from the shapefile by uncommenting and running the code below. Be sure the vector file path is to the correct shapefile title.

In [None]:
# vector_file = "Plot_WOfS_by_country-data/wofs_pc_change_2020_vs_alltime_60-100.geojson"
# gdf = gpd.read_file(vector_file)

In [None]:
gdf.head()

### Customise the plot

In [None]:
# Define plot and colourbar axes
fig, ax = plt.subplots(1,1, figsize=(10,10))
fig.subplots_adjust(bottom=0.2)
cax = fig.add_axes([0.16, 0.15, 0.70, 0.03])

# Define colour map
cmap = mpl.cm.RdYlBu
bounds = list(range(-100, 101, 10))
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)#, extend='both')
cbar = mpl.colorbar.ColorbarBase(cax, cmap=cmap,
                                norm=norm,
                                orientation='horizontal')

# Define colourbar labelling
cbar.set_ticks([]) #list(range(-100, 101, 25)))
# cbar.set_ticklabels(list('{:2}%'.format(i) for i in (list(range(-100, 101, 25)))))
# cbar.set_label("% change", fontsize='14')

# Turn off lon-lat ticks and labels
ax.set_yticklabels([])
ax.set_xticklabels([])
ax.set_xticks([])
ax.set_yticks([])

# Remove frame
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# plot 'pc_change' and 'geometry' boundary lines
gdf.plot('pc_change', ax=ax, cmap=cmap, vmin=-100, vmax=100, norm=norm)
gdf.geometry.plot(ax=ax, linewidth=0.8, edgecolor='black', facecolor="none")

# Additional text boxes
ax.text(-25, -52, s="Less water than usual", color='black', fontsize='14')
ax.text(34, -52, s="More water than usual", color='black', fontsize='14')

# Plot title - include frequency range
ax.set_title("Surface water area "+str(int(lower*100))+"-"+str(int(upper*100))+"% "+year+"\nDifference from all-time summary", fontsize='18')
#Plot title - concise title
# ax.set_title("Surface water area - "+year+"\nDifference from all-time summary", fontsize='18')


# Export figure
fig.savefig('wofs_pc_change_'+year+'_vs_alltime_'+str(int(lower*100))+'-'+str(int(upper*100))+"_"+str(resolution[1])+'m-res.png', 
            bbox_inches='tight',
            dpi=200, 
            facecolor="white")

## 3 - Average rainfall and anomalies for the given year
This can be used to sense-check a country that has a year which deviates a lot from the all-time statistics. Check the `Plot_WOfS_by_country-data` folder to see if CHIRPS for that year has already been pre-calculated. If so, skip down to the plotting section.

In [None]:
vector_file = "Plot_WOfS_by_country-data/african_countries.shp"
gdf = gpd.read_file(vector_file)

In [None]:
attribute_col = 'name'
year = '2015'

resolution = (-5000, 5000) # native CHIRPS resolution
output_crs = 'EPSG:6933'
measurements = ['rainfall']

query = {'measurements': measurements,
         'resolution': resolution,
         'output_crs': output_crs,
         }

### Long-term average (climatology) per country using CHPclim
This gives a suitable baseline to compare the yearly CHIRPS data to. The CHPclim netcdf summary file is generated in the `digitalearthafrica/thematic-layers` repository and has been copied here to preserve filepaths.

In [None]:
def rainfall_per_polygon(gdf, year):
    num_pixels_year = {}
    num_pixels_alltime = {}

    # A progress indicator
    i = 0

    # Loop through polygons in geodataframe and extract satellite data
    for index, row in gdf.iterrows():

        print(" Feature {:02}/{:02}\r".format(i + 1, len(gdf)),
                      end='')

        # Get the geometry
        geom = geometry.Geometry(row.geometry.__geo_interface__,
                                 geometry.CRS(f'EPSG:{gdf.crs.to_epsg()}'))

        # Update dc query with geometry      
        query.update({'geopolygon': geom}) 

        # Load CHIRPS monthly (hide print statements)
        with HiddenPrints():
            ds = dc.load(product=['rainfall_chirps_monthly'],
                             time= year,
                             group_by='solar_day',
                             measurements = ['rainfall'],
                             resolution = (-5000, 5000), # native CHIRPS resolution
                             output_crs = 'EPSG:6933')

        # set -9999 no-data values to NaN
        ds = ds.where(ds !=-9999.)      

        # find sum over 12 months (ie total for that year)
        ds = ds.sum(dim='time')

        # Load CHPclim climatology and set to same projection as monthly
        ds_chpclim = assign_crs(xr.open_dataarray('Plot_WOfS_by_country-data/chpclim_africa_12month_sum.nc'),  
                                    crs='epsg:4326')
        ds_chpclim = xr_reproject(ds_chpclim, ds.geobox)
        ds_chpclim = assign_crs(ds_chpclim, crs=ds.geobox.crs)

        # Generate a polygon mask to keep only data within the polygon
        with HiddenPrints():
            mask = xr_rasterize(gdf.iloc[[index]], ds)
            mask_chpclim = xr_rasterize(gdf.iloc[[index]], ds_chpclim)

        # Mask dataset to set pixels outside the polygon to `NaN`
        ds = ds.where(mask)
        ds_chpclim = ds_chpclim.where(mask_chpclim)

        # Find mean rainfall over country area 
        ds = ds.mean(dim=['x','y'])
        ds_chpclim = ds_chpclim.mean(dim=['x','y'])

        # Append results to dictionary
        num_pixels_year.update({str(row[attribute_col]) : ds.rainfall.values.flatten()[0]})
        num_pixels_alltime.update({str(row[attribute_col]) : ds_chpclim.values.flatten()[0]})

        # Update counter
        i += 1
        
    return num_pixels_year, num_pixels_alltime

In [None]:
chirps_year, chpclim_year = rainfall_per_polygon(gdf, year)

In [None]:
# Assign to geodataframe
pd_chirps = pd.Series(chirps_year)
pd_chpclim = pd.Series(chpclim_year)
pdrain_diff = pd_chirps - pd_chpclim

gdf = gdf.assign(avg_chirps=pd_chirps.values)
gdf = gdf.assign(avg_chpclim=pd_chpclim.values)
gdf = gdf.assign(rain_diff=pdrain_diff.values)

# Calculate percentage difference
rain_pc_change = (gdf.rain_diff/gdf.avg_chpclim*100).fillna(0)
gdf = gdf.assign(pc_rain=rain_pc_change.values)

In [None]:
gdf.head()

In [None]:
# Export average CHIRPS for that year, and CHPclim, per country as geojson
gdf.to_file("Plot_WOfS_by_country-data/chirps_per_country_"+year+"_5km.geojson", driver='GeoJSON')

## 4 - Plot rainfall anomalies, to compare to WOfS

If the data for that year has previously been generated, uncomment and import the data using the file read code below.

In [None]:
# vector_file = "Plot_WOfS_by_country-data/chirps_per_country_"+year+"_5km.geojson"
# gdf = gpd.read_file(vector_file)

In [None]:
# Define plot and colourbar axes
fig, ax = plt.subplots(1,1, figsize=(10,10))
fig.subplots_adjust(bottom=0.2)
cax = fig.add_axes([0.16, 0.15, 0.70, 0.03])

# Define colour map
cmap = mpl.cm.RdYlBu
bounds = list(range(-100, 101, 10))
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)#, extend='both')
cbar = mpl.colorbar.ColorbarBase(cax, cmap=cmap,
                                norm=norm,
                                orientation='horizontal')

# Define colourbar labelling
cbar.set_ticks([])
# cbar.set_ticklabels(list('{:2}%'.format(i) for i in (list(range(-100, 101, 25)))))
# cbar.set_label("Rainfall (mm)", fontsize='14')

# Turn off lon-lat ticks and labels
ax.set_yticklabels([])
ax.set_xticklabels([])
ax.set_xticks([])
ax.set_yticks([])

# Remove frame
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# plot 'pc_change' and 'geometry' boundary lines
gdf.plot('pc_rain', ax=ax, cmap=cmap, norm=norm)
gdf.geometry.plot(ax=ax, linewidth=0.8, edgecolor='black', facecolor="none")

# Additional text boxes
ax.text(-25, -52, s="Less rain than usual", color='black', fontsize='14')
ax.text(34, -52, s="More rain than usual", color='black', fontsize='14')

# Plot title
ax.set_title("Rainfall anomaly "+year+" - CHIRPS", fontsize='18')

# Export figure
fig.savefig('rainfall_anomaly_per_country_'+year+'_5km.png', 
            bbox_inches='tight',
            dpi=200, 
            facecolor="white")

## 4 - Investigate a single country

Use this section to do any of the following
* plot a single country of data
* verify the function defined in Part (1) 
* check the effects of a boundary buffer on the results 

In [None]:
# import the original country shapefile
vector_file = "african_countries.shp"
gdf = gpd.read_file(vector_file)

In [None]:
gdf.head()

In [None]:
gdf_somalia = gdf.loc[gdf['name'] == 'Somalia'] 

In [None]:
gdf_somalia

In [None]:
# If you want to check the results of a buffer
# buffer value is defined in the same units as the CRS

# gdf_somalia['geometry'] = gdf_somalia.geometry.buffer(-0.005)
# gdf_somalia

In [None]:
geom = geometry.Geometry(gdf_somalia.unary_union, gdf_somalia.crs)
geom

query = {'measurements': measurements,
         'resolution': resolution,
         'output_crs': output_crs,
         }

In [None]:
#load wofs alltime
ds_alltime = dc.load(product='wofs_ls_summary_alltime',
                     geopolygon = geom,
                     **query)

In [None]:
ds_alltime.frequency.plot()

In [None]:
ds = dc.load(product="wofs_ls_summary_annual",
                  time=('2020'),
                  resampling='nearest',
                  like=ds_alltime.geobox
                  ).frequency

In [None]:
ds.plot()

### Mask NaNs

In [None]:
mask = xr_rasterize(gdf_somalia, ds)

In [None]:
ds = ds.where(mask)

In [None]:
ds.plot()

In [None]:
ds.count()

In [None]:
mask_alltime = xr_rasterize(gdf_somalia, ds_alltime)

In [None]:
ds_alltime = ds_alltime.where(mask)

In [None]:
ds_alltime.frequency.plot()

In [None]:
ds_alltime.frequency.count()

### Check number of thresholded values and compare to the original (unbuffered) function

In [None]:
alltime_water = ds_alltime.frequency.where(ds_alltime.frequency > 0.6)
alltime_water.count()

In [None]:
ds_water = ds.where(ds > 0.6)
ds_water.count()

In [None]:
ds_water.plot()

In [None]:
alltime_water.plot()

---

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).

**Compatible datacube version:**

In [None]:
print(datacube.__version__)

**Last Tested:**

In [None]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')