This script takes a list of input shapefiles corresponding to a land cover classification and 
outputs a .TIFF training dataset for the accompanying Landsat image. 

In order to create these training shapefiles, you must first load your Landsat image into a GIS program, such as ArcGIS or QGIS. For these purposes, I have stayed open source and am using QGIS. While I do not currently have step-by-step instructions, ultimately you just need to create separate shapefiles for each individual land class you want to classify your entire image off of. In my case here, I am using three simple classes--water, impervious, and vegetation. Creating polygon shapefiles is quite simple, and all you need to do is make sure whatever polygons you draw contain pixels in the origin Landsat image _entirely of that specific class_ and not of mixed pixel classification values. 

In [1]:
import geopandas as gpd
import gdal, ogr, osr
import numpy as np
import rasterio
from rasterio.mask import mask
import matplotlib.pyplot as plt
from shapely.geometry import MultiPolygon, Point, Polygon
%matplotlib inline

Input each shapefile path and add the corresponding geodataframes to a list

In [2]:
in_water = r"../../data/shapefiles/Class Shapefile Attempts/water.shp"
in_impervious = r"../../data/shapefiles/Class Shapefile Attempts/impervious.shp"
in_veg = r"../../data/shapefiles/Class Shapefile Attempts/dense_veg.shp"

water_gdf = gpd.read_file(in_water)
imp_gdf = gpd.read_file(in_impervious)
dv_gdf = gpd.read_file(in_veg)

gdf_list = [water_gdf, imp_gdf, dv_gdf]

in_tif = r"../../data/Austin_Landsat_Clip.tif"

This loops through each of the shapefiles and extracts the corresponding Landsat 
pixel values to a same-dimensional dataset

In [3]:
out_image_list = []
with rasterio.open(in_tif) as src:
    prj = src.crs.to_wkt()
    for i in range(len(gdf_list)):
        gdf = gdf_list[i]
        gdf_reproj = gdf.to_crs(prj)
        x1, y1, x2, y2 = gdf_reproj.total_bounds
        p1 = Point(x1, y1)
        p2 = Point(x2, y1)
        p3 = (x2, y2)
        p4 = (x1, y2)
        poly = Polygon([p1, p2, p3, p4])
        shapes = [MultiPolygon(gdf_reproj['geometry'].values)]
        out_image, out_transform = rasterio.mask.mask(src, shapes, crop=False)
        out_image[out_image != 0] = i+1
        out_image_list.append(out_image[1,:,:])
    out_meta = src.meta
    
arr = np.zeros((out_image_list[0].shape[0], out_image_list[1].shape[1]))
arr.shape
for out in out_image_list:
    id_int = int(out.max())
    arr[out == out.max()] = id_int

Changes the shape of the output arrays for Geotiff creation via Rasterio

In [4]:
out_write_img = arr.reshape(1, arr.shape[0], arr.shape[1])
out_write_img
out_meta.update({'count':1})


out_meta.update({"driver":"GTiff",
                "height": out_image.shape[1],
                 "width":out_image.shape[2],
                "transform": out_transform})
with rasterio.open(r"../../data/Austin_Training_Classes.tif", "w", **out_meta) as dest:
    dest.write(out_write_img)