# Find lots of dams using WOFLs in a loop

**What does this notebook do?** This notebook uses output `NetCDF` files of WOFS summaries from `datacube-stats` to generate polygons of water bodies in the landscape. This code follows the following workflow:
* Generate a list of netCDF files within a specified folder location
* Opens each netCDF file and:
    * Keep only pixels identified as wet at least 10% of the time
    * Convert the raster data into polygons
    * Filter the polygons based on size and proximity to identified major rivers
    * Remove any named water bodies - e.g. lakes
* Append the final polygon set to a shapefile

**Required inputs:** 
This code requires that you have run `datacube-stats` to produce WOFS summaries as netCDF files. This code takes in the folder location to generate a list of netCDF files to process.

**Date:** September 2018

**Author:** Claire Krause

In [1]:
%pylab notebook

import rasterio.features
from shapely.geometry import Polygon, shape, mapping
from shapely.ops import unary_union
import geopandas as gp
import fiona
from fiona.crs import from_epsg
import xarray as xr
import pandas as pd
import glob
import os.path

Populating the interactive namespace from numpy and matplotlib


## Set up all the parameters for the script

### How wet does a pixel need to be to be included?
The value set here will be the minimum amount of time (as a decimal between 0 and 1) that you want water to be detected before it is included in the analysis. 

E.g. If this was set to 0.25, any pixels that are wet *at least* 25% of the time will be included. If you are looking for persistant water bodies, you will want to set this threshold higher. If you don't want to use this filter, set this value to 0.

In [2]:
AtLeastThisWet = 0.10

### How big/small should the polygons be?
This filtering step can remove very small and/or very large polygons. The size listed here is in m2. A single pixel in Landsat data is 25 m X 25 m = 625 m2. 

**MinSize**

E.g. A minimum size of 6250 means that polygons need to be at least 10 pixels to be included. If you don't want to use this filter, set this value to 0.

**MaxSize**

E.g. A maximum size of 1 000 000 means that you only want to consider polygons less than 1 km2. If you don't want to use this filter, set this number to something stupidly large.

In [3]:
MinSize = 3120
MaxSize = 10000000

### Do you want to filter out polygons that intersect with major rivers?

We use the [surface hydrology lines dataset](http://pid.geoscience.gov.au/dataset/ga/83107) to filter out polygons that intersect with major rivers. 
Note that we have filtered this dataset to only keep rivers tagged as `major`, and it is this filtered dataset that we use here.

If you don't want to filter out polygons that intersect with rivers, set this parameter to `False`.

In [4]:
FilterOutRivers = True

In [5]:
# Read in the major rivers dataset (if you are using it)
if FilterOutRivers == True:
    FilteredMajorRivers = '/g/data/r78/cek156/ShapeFiles/SurfaceHydrologyLinesRegionalFilteredMAJOR.shp'
    MajorRivers = gp.GeoDataFrame.from_file(FilteredMajorRivers) 
    MajorRivers = MajorRivers.to_crs({'init':'epsg:3577'})

## Is this analysis for all of Australia, or just a subset?

In [6]:
SubRegion = True

If you would like to analyse only some part of Australia, you will need to provide a `list` of Albers tiles that cover that region. 

If you would like to automatically generate a list of tiles using the outputs of an analysis (e.g. we have previously run a custom `datacube-stats` analysis using this region, and so we can generate a list of tiles that we know covers this area using the outputs of this analysis), set `SubRegionAuto = True` and update the location of the output file directory.

If you would like to manually feed in a list of albers tiles, set `SubRegionAuto = False`, and feed in a list of tiles in the format:

```
SubRegionAlbersTiles = ['8_-32', '9_-32', '10_-32', '8_-33', '9_-33']
                        ```

In [7]:
SubRegionAuto = True

In [8]:
if SubRegionAuto == True:
    # Where are the datacube-stats netCDF files located?
    WOFSMDBFolder = '/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/'

    MDBtiles = glob.glob(f'{WOFSMDBFolder}*.nc')

    SubRegionAlbersTiles = set()
    for filePath in MDBtiles:
        AlbersTiles = filePath.split('_')
        ThisTile = f'{AlbersTiles[3]}_{AlbersTiles[4]}'
        SubRegionAlbersTiles.add(ThisTile)
    SubRegionAlbersTiles = list(SubRegionAlbersTiles)

In [9]:
if SubRegionAuto == False:
    SubRegionAlbersTiles = ['8_-32', '9_-32', '10_-32', '8_-33', '9_-33','10_-33','11_-33',
                            '12_-33','13_-33','14_-33', '15_-33','16_-33','17_-33','18_-33',
                            '19_-33','20_-33','8_-34','9_-34','10_-34','11_-34','12_-34',
                            '13_-34','14_-34','15_-34','16_-34','17_-34','18_-34','19_-34',
                            '20_-34','8_-35','9_-35','10_-35','11_-35','12_-35','13_-35',
                            '14_-35','15_-35','16_-35','17_-35','18_-35','19_-35','20_-35',
                            '8_-36','9_-36','10_-36','11_-36','12_-36','13_-36','14_-36',
                            '15_-36','16_-36','17_-36','18_-36','19_-36','8_-37','9_-37',
                            '10_-37','11_-37','12_-37','13_-37','14_-37','15_-37','16_-37',
                            '17_-37','18_-37','19_-37','8_-38','9_-38','10_-38','11_-38',
                            '12_-38','13_-38','14_-38','15_-38','16_-38','17_-38','18_-38',
                            '9_-39','10_-39','11_-39','12_-39','13_-39','14_-39','15_-39',
                            '16_-39','17_-39','10_-40','11_-40','12_-40','13_-40','14_-40',
                            '15_-40','16_-40','17_-40','11_-41','12_-41','13_-41','14_-41',
                            '15_-41','16_-41','14_-42','15_-42','16_-42','15_-43']

## Loop through each tile and polygonise the annual WOfS data

Within this cell, you need to set up:
- years to analyse: `for year in range(2016,2018)` - note that the last year is NOT included in the analysis
- WOFSInputFolder: Where are the datacube-stats netCDF files located?
- WOFSshp: The name and filepath of the intermediate output polygon set
- WOFSshpMerged: The name and filepath of the final corrected output polygon set
- AlbersBuffer: The file location of a shapefile that is a 1 pixel buffer around the Albers tile boundaries

In [10]:
for year in range(2000,2001):

    ### Set up some file names for the inputs and outputs

    # Where are the datacube-stats netCDF files located?
    WOFSInputFolder = '/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/'

    # The name and filepath of the intermediate output polygon set
    WOFSshp = f'/g/data/r78/cek156/dea-notebooks/Dams/Dams2000to2018/AllNSW2000to201810pcMinMaxRiverTemp.shp'

    # The name and filepath of the final corrected output polygon set
    WOFSshpMerged = f'/g/data/r78/cek156/dea-notebooks/Dams/Dams2000to2018/AllNSW2000to201810pcMinMaxRiverCleaned.shp'

    ### Get the list of netcdf file names to loop through

    if SubRegion == True:
        Alltiles = []
        for tile in SubRegionAlbersTiles:
            Tiles = glob.glob(f'{WOFSInputFolder}*_{tile}_{year}0101.nc')
            Alltiles.append(Tiles[0]) # Assumes only one file will be returned
    else:
        Alltiles = glob.glob(f'{WOFSInputFolder}*_{year}_summary.nc')

    ## Read in and process the data

    for WOFSfile in Alltiles: 
        try:
            # Read in the data
            WOFSnetCDFData = xr.open_rasterio(f'NETCDF:{WOFSfile}:frequency')

            # Filter our classified data layer to remove noise
            # Remove any pixels that are wet < AtLeastThisWet% of the time
            WOFSfiltered = WOFSnetCDFData > AtLeastThisWet
            # Remove the superfluous time dimension
            WOFSfiltered = WOFSfiltered.squeeze()
            # Change all zeros to NaN
            WOFSfiltered = WOFSfiltered.where(WOFSfiltered !=0)

            # Convert the raster to polygons
            WOFSpolygons = rasterio.features.shapes(WOFSfiltered.data.astype('float32'), 
                                                    transform = WOFSnetCDFData.transform[:-3])
            WOFSlist = list(WOFSpolygons)

            # Remove the nan polygons
            WOFSlist = [x for x in WOFSlist if x[1] == 1]

            # The rasterio.features.shapes returns a tuple. We only want to keep the geometry portion
            WOFSOFSbreaktuple = [a for a, b in WOFSlist]
            # Grab the geometries and convert into a shapely geometry
            for poly in WOFSOFSbreaktuple:
                poly['geometry'] = shape(poly)

            # Calculate the area of each polygon
            WOFLshapes = []
            for i, WOFLshape in enumerate(WOFSOFSbreaktuple):
                polyArea = Polygon(WOFLshape['coordinates'][0]).area
                WOFLshape['properties'] = {'area': polyArea}
                WOFLshapes.append(WOFLshape)

            # Filter out any polygons smaller than MinSize, and greater than MaxSize
            AreasIndex = [i for i, x in enumerate(WOFLshapes) if 
                          (x['properties']['area'] > MinSize) & 
                          (x['properties']['area'] <= MaxSize)]
            WOFSOFSbig = [WOFLshapes[x] for x in AreasIndex]

            if FilterOutRivers:
                # Remove any polygons that intersect with a major river
                # We are only interested in farm dams, so do not need the WOFS polygons for the rivers. 
                WOFSOFSfiltered2p0 = gp.GeoDataFrame(WOFSOFSbig).set_geometry('geometry')
                Intersections= gp.sjoin(MajorRivers, WOFSOFSfiltered2p0, how="inner", op='intersects')
                IntersectIndex = sorted(list(set(Intersections['index_right'])))

                WOFSOFS = [WOFSOFSbig[x] for x in range(len(WOFSOFSbig)) 
                                                  if x not in IntersectIndex]
            else:
                WOFSOFS = WOFSOFSbig

            # Merge any overlapping polygons 
            MergedPolygonsGeoms = unary_union([feature['geometry'] for feature in WOFSOFS])
            MergedPolygonsGeomsList = list(MergedPolygonsGeoms)

            # Re-calculate the area of each polygon and add it as an attribute to our new merged polygons
            #( For some reason, doing this merge prior to filtering doesn't work properly, hence calculating
            # area twice...)
            MergedPolygons = []
            for i, WOFLshape in enumerate(MergedPolygonsGeomsList):
                PolygonBits = mapping(WOFLshape)
                polyArea = Polygon(WOFLshape).area
                PolygonBits['properties'] = {'area': polyArea}
                MergedPolygons.append(PolygonBits)

            # Save the polygons to a shapefile
            schema = {'geometry': 'Polygon','properties': {'area': 'str'}}

            if os.path.isfile(WOFSshp):
                with fiona.open(WOFSshp, "a", crs = from_epsg(3577), driver = 'ESRI Shapefile', schema = schema) as output:
                    for i in range(len(MergedPolygons)):
                        output.write({'properties': {'area': MergedPolygons[i]['properties']['area']},
                                    'geometry': {'type': MergedPolygons[i]['type'], 'coordinates': MergedPolygons[i]['coordinates']}})
            else:
                with fiona.open(WOFSshp, "w", crs = from_epsg(3577), driver = 'ESRI Shapefile', schema = schema) as output:
                    for i in range(len(MergedPolygons)):
                        output.write({'properties': {'area': MergedPolygons[i]['properties']['area']},
                                    'geometry': {'type': MergedPolygons[i]['type'], 'coordinates': MergedPolygons[i]['coordinates']}})
        except:
            print(WOFSfile)

    ## Merge polygons that have an edge at a tile boundary

    #Now that we have all of the polygons across our whole region of interest, we need to check 
    #for artifacts in the data caused by tile boundaries. 

    # Load in a shapefile of the albers tile boundaries
    # We have created a shapefile that consists of the albers tile boundaries,
    # plus a 1 pixel (25 m) buffer. This shapefile will help us to find any polygons that have a
    # boundary at the edge of an albers tile. We can then find where polygons touch across this
    # boundary, and join them up.

    AlbersBuffer = gp.read_file(
        '/g/data/r78/cek156/ShapeFiles/AlbersBuffer25m.shp')

    DamPolygons = gp.read_file(WOFSshp)

    # Find where the albers polygon overlaps with our dam polygons
    # Perform a spatial join for polygons that intersect
    Intersects = gp.sjoin(AlbersBuffer, DamPolygons, how='inner', op='intersects')
    # Get the index of the dam polygons that intersect with the Albers buffer
    IntersectIndex = sorted(Intersects['index_right'])

    # Get just the dams that intersect with the tile boundaries
    BoundaryDams = DamPolygons.iloc[IntersectIndex]
    NotBoundaryDams = DamPolygons.loc[~DamPolygons.index.isin(IntersectIndex)]

    # Now combine overlapping polygons in `BoundaryDams`
    UnionBoundaryDams = BoundaryDams.unary_union

    # `Explode` the multipolygon back out into individual polygons
    UnionGDF = gp.GeoDataFrame(crs=DamPolygons.crs, geometry=[UnionBoundaryDams])
    MergedDams = UnionGDF.explode()

    # Then combine our new merged polygons with the `NotBoundaryDams`
    # Combine New merged polygons with the remaining polygons that are not near the tile boundary
    AllTogether = gp.GeoDataFrame(pd.concat([NotBoundaryDams, MergedDams],
                                            ignore_index=True, sort=True)).set_geometry('geometry')

    # Calculate the area of each polygon
    AllTogether['area'] = AllTogether.area

    # Then write the lot out to shapefile
    AllTogether.crs = {'init': 'epsg:3577'}
    AllTogether.to_file(WOFSshpMerged, driver='ESRI Shapefile')

/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_20_-40_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not ma

/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_20_-38_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not ma

/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_20_-37_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not ma

/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_19_-38_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')


/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_21_-34_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')


/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_20_-36_20000101.nc


  warn('CRS of frames being joined does not match!')


/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_21_-33_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')


/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_19_-39_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')


/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_19_-40_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')


/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_16_-43_20000101.nc
/g/data/r78/cek156/datacube_stats/WOFSDams2000to2018/wofs_summary_20_-39_20000101.nc


  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')
  warn('CRS of frames being joined does not match!')


## Apply Land/Sea mask

To do.