# Rasterizing shapefiles and vectorizing rasters

**Notebook currently compatible with both the `NCI` and `DEA Sandbox` environments**

### Background

Sometimes we have a raster and we want to turn it into a vector (shapefile).

Sometimes we have a shapefile and we'd like to turn it into a raster (e.g. for use in masking)

### Description
1. Using [Rasterio](https://rasterio.readthedocs.io/en/stable/index.html):
    - Rasterizing vector
    - Vectorizing raster
    
    
2. Using [gdal](https://gdal.org/):
    - Rasterizing vector
    - Vectorizing raster


### Technical details
* **Products used:** `Sentinel 2`
* **Analyses used:** `rasterio.features.rasterize`, `rasterio.features.shapes`, `gdal.RasterizeLayer`, `gdal.polygonize`
* **Special requirements:** 
    `!pip install --user descartes`

## Getting started
To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

### Load packages


In [3]:
%matplotlib inline

import datacube
import rasterio.features
import shapely
import geopandas as gpd
import numpy as np
import pprint

import sys
sys.path.append("../Scripts")
# from dea_datahandling import load_ard

### Connect to the datacube
Give your datacube app a unique name that is consistent with the purpose of the notebook.

In [4]:
dc = datacube.Datacube(app="Rasterize and Vectorize")

## Load WoFS data from the datacube

We will load in an annual summary from the [Water Observations from Space (WOfS)](https://www.ga.gov.au/scientific-topics/community-safety/flood/wofs/about-wofs) product to provide us with some data to work with.  The query below will load the 2016 annual summary of WOfS for the region around the Menindee Lakes.

In [None]:
# Create a query object
query = {
    'x': (142.1, 142.80),
    'y': (-32.1, -32.6),
    'time': ('2016')
}

ds = dc.load(product='wofs_annual_summary', **query)

print(ds)


### Plot the WOfS summary

Let's plot the WOfS data to get an idea of the objects we will be vectorizing.  In the code below, we first select the pixels where the satellites have observed water at least 25 % of the year, this is so we can isolate the more persistent water bodies and reduce some of the noise before we vectorize the raster.

In [None]:
#select pixels that are classified as water > 25 % of the time
water_bodies = ds.frequency > 0.25

#plot
water_bodies.plot(size=5)

## Vectorizing rasters


Using `rasterio.features.shape`, we will turn our raster dataset of the persistent water bodies into a shapefile.
- The `source` parameter requires a simple array, with the datatype explicitly set. 
- We can use the `mask` parameter to set which values in the array we want to convert to vectors, in this instance, we want to convert only the values of 1, which correspond to the persistent water bodies in the dataArray. 
- The `transform` parameter provides the geospatial coordinates to transform the array into a geo-located shapefile.  We can get this from our original xarray dataset

In [None]:
# First grab the spatial information from our datacube xarray object
transform = ds.geobox.transform

#run the function
vectors = rasterio.features.shapes(source=water_bodies.data.astype('int16'),
                                   mask=(water_bodies.data == 1),
                                   transform=transform)


In [5]:
dc.list_products()

Unnamed: 0_level_0,name,description,gqa_mean_x,gqa_iterative_stddev_xy,lon,gqa_iterative_mean_y,instrument,eo_gsd,gqa_stddev_x,fmask_snow,...,gqa_iterative_stddev_x,gqa_iterative_stddev_y,gqa_abs_iterative_mean_x,label,gqa_final_gcp_count,gqa_iterative_mean_x,crs,resolution,tile_size,spatial_dimensions
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10,fc_percentile_albers_annual,"Landsat Fractional Cover percentile 25 metre, ...",,,,,"TM,ETM+,OLI",,,,...,,,,,,,EPSG:3577,"(-25, 25)","(100000.0, 100000.0)","(y, x)"
11,fc_percentile_albers_seasonal,"Landsat Fractional Cover percentile 25 metre, ...",,,,,"TM,ETM+,OLI",,,,...,,,,,,,EPSG:3577,"(-25, 25)","(100000.0, 100000.0)","(y, x)"
33,ga_ls5t_ard_3,Landsat 5 ARD,,,,,,,,,...,,,,,,,,,,
34,ga_ls7e_ard_3,Landsat 7 ARD,,,,,,,,,...,,,,,,,,,,
35,ga_ls8c_ard_3,Landsat 8 ARD,,,,,,,,,...,,,,,,,,,,
6,high_tide_comp_20p,High tide 20 percentage composites 25m v. 2.0.0,,,,,,,,,...,,,,,,,EPSG:3577,"(-25, 25)",,"(y, x)"
27,item_v2,Relative Extents Model,,,,,,,,,...,,,,,,,EPSG:3577,"(-25, 25)",,"(y, x)"
28,item_v2_conf,"Average ndwi Standard Deviation, the Confidenc...",,,,,,,,,...,,,,,,,EPSG:3577,"(-25, 25)",,"(y, x)"
9,landsat_barest_earth,Landsat-5/Landsat-7/Landsat-8 combined Barest ...,,,,,"TM,ETM+,OLI",,,,...,,,,,,,EPSG:3577,"(-25, 25)","(100000.0, 100000.0)","(y, x)"
7,low_tide_comp_20p,Low tide 20 percentage composites 25m v. 2.0.0,,,,,,,,,...,,,,,,,EPSG:3577,"(-25, 25)",,"(y, x)"


The returned object is a `generator` that contains the `(polygon, value)` pair for each unique feature in the array.  Before we create a shapefile, we first need to unpack the generator objects inot lists which are easier to manipulate.

The last part of the code will print the first (polygon, value) pair to show what these objects look like.

In [None]:
#first convert the genertor into a list, as a generator can only be consumed once
vectors=list(vectors)

#extract the polygons and values from the list using list comprehension
polygons = [shape for shape, value in vectors]
values = [value for shape, value in vectors]    

print("Image value: " + str(values[0]))
print("Geometry:")
pprint.pprint(polygons[0])

Now that we have a list of geometries and values, we can add these to a [geopandas.GeoDataFrame](http://geopandas.org/). Geopandas is a very useful library for handling vector data as it extends the datatypes used by pandas to allow spatial operations.

In [None]:
#create a geopandas dataframe populated with the polygons
gdf = gpd.GeoDataFrame(polygons)

#loop through the rows and convert each geometry into Shapely polygon
for ix, poly in gdf.iterrows():
    poly['coordinates'] = shapely.geometry.shape(poly)

# rename the coordinates column to 'geometry', this is required to
# allow geopandas to plot th polygons
gdf = gdf.rename(columns = {'coordinates':'geometry'})
    
#add our values to an attribute column
gdf['attrs'] = values

### Plot our vectorised raster 

In [None]:
gdf.plot(figsize=(6,6))

### Export as shapefile

Geopandas allows us to very easily export the geodataframe as a shapefile for use in other applications

In [None]:
gdf.to_file('water_bodies.shp')

## Rasterizing a shapefile

Using `rasterio` to turn a shapefile into a shapefile

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/GeoscienceAustralia/dea-notebooks).

**Last modified:** September 2019

**Compatible `datacube` version:** 

In [None]:
print(datacube.__version__)

## Tags
Browse all available tags on the DEA User Guide's [Tags Index](https://docs.dea.ga.gov.au/genindex.html)