## Creating study area array
The purpose of this notebook is to take in geotiff's of surface water data and collate them together into one large array that the analysis can be run on.

To do this you need to understand the EarthEngine GeoTiff naming convention. When you exporting the surface water dataset to GeoTIFF(s) from earth engine, the image is split into tiles. The filename of each tile will be in the form baseFilename-yMin-xMin where xMin and yMin are the coordinates of each tile within the overall bounding box of the exported image. These are the image names given in the images of the folder "EarthEngineAltiplano" of the zenodo dataset.

Once you understand the naming convention of the tiles you can create an order that allows you to merge them together in a way that is geographically contiguous. One such ordering was created for the current collating script and the images were renamed by this order in the folder "AltiPlanoProcessOrder"

In [1]:
from dask import array
import rasterio
import numpy as np
import h5py
#need to import the warp module seperately
import rasterio.warp
import rasterio.merge
import matplotlib.pyplot as plt
import avul

### importing images in AltiPlanoProcessOrder folder 
Cell below takes in the geotiffs in the AltiPlanoProcessOrder folder using the ordering implied by their names loads them into memory and then appends them onto a list of datasets.

In [37]:
#import the surface water dataset
datasets = []
left = 0
bottom = 0
right = -100
top = -100
for x in range(1,13):
    
    dataset = rasterio.open('../Data/AltiPlano/AltiPlanoProccessOrder/AltiPlanoYearly'+str(x)+'.tif')
    print(dataset.bounds)
    left1 = dataset.bounds[0]
    bottom1 = dataset.bounds[1]
    right1 = dataset.bounds[2]
    top1 = dataset.bounds[3]
    #Get the bounds of the whole area being collated
    if left1 < left:
        left = left1
    if bottom1 < bottom:
        bottom = bottom1
    if right1 > right:
        right = right1
    if top1 > top:
        top = top1
        
    #append to dataset list
    datasets.append(dataset)

BoundingBox(left=-71.61791653184363, bottom=-16.196085583504487, right=-68.58232952374695, top=-13.1604985754078)
BoundingBox(left=-68.58232952374695, bottom=-16.196085583504487, right=-65.54674251565027, top=-13.1604985754078)
BoundingBox(left=-65.54674251565027, bottom=-16.196085583504487, right=-64.45475045627458, top=-13.1604985754078)
BoundingBox(left=-71.61791653184363, bottom=-19.231672591601175, right=-68.58232952374695, top=-16.196085583504487)
BoundingBox(left=-68.58232952374695, bottom=-19.231672591601175, right=-65.54674251565027, top=-16.196085583504487)
BoundingBox(left=-65.54674251565027, bottom=-19.231672591601175, right=-64.45475045627458, top=-16.196085583504487)
BoundingBox(left=-71.61791653184363, bottom=-22.267259599697862, right=-68.58232952374695, top=-19.231672591601175)
BoundingBox(left=-68.58232952374695, bottom=-22.267259599697862, right=-65.54674251565027, top=-19.231672591601175)
BoundingBox(left=-65.54674251565027, bottom=-22.267259599697862, right=-64.454

In [35]:
#double check that the bounds are the bounds of the area you exported from Earth Engine. For the current
#study area these are approx: -71.62, -23.23, -64.45,-13.16 
print(left,bottom,right,top)

-71.61791653184363 -23.22719931230798 -64.45475045627458 -13.1604985754078


In [None]:
#use the bounds obtained during collating the data to create a transform that maps pixels to coordinates
Tran = rasterio.transform.from_bounds(left, bottom, right, top, 26580, 37354)
rasterio.transform.xy(Tran, 1, 1, offset='center')

### merge datasets together and write them to disk
Now that we have a list of geotiff datasets to read and have them listed in the right order we can read them into memory one at a time and then write them to disk. This is what the cell bellow does.

In [None]:
for x in range(1,36):
    for y in range(len(datasets)):
        print(y)
        dataset = datasets[y]
        arr = dataset.read(x)
        if y < 3:
            if y == 0:
                fRow = arr
            else:
                fRow = np.concatenate([fRow,arr],1)
        if y >= 3 and y < 6:
            if y == 3:
                sRow = arr
            else:
                sRow = np.concatenate([sRow,arr],1)
        if y >=6 and y < 9: 
            if y == 6:
                tRow = arr
            else:
                tRow = np.concatenate([tRow,arr],1)
        if y >= 9:
            if y == 9:
                foRow = arr
            else:
                foRow = np.concatenate([foRow,arr],1)
                
    tallFRow = fRow.reshape((1,)+fRow.shape)
    tallSRow = sRow.reshape((1,)+sRow.shape)
    tallTRow = tRow.reshape((1,)+tRow.shape)
    tallFoRow = foRow.reshape((1,)+foRow.shape)

    OneYrOcc = np.hstack([tallFRow,tallSRow,tallTRow,tallFoRow])
        
    if x == 1:
        hf = h5py.File('Altiplano.h5', 'w')
        hf.create_dataset('Altiplano',data=OneYrOcc, dtype='u2', compression="gzip", chunks=True, maxshape=(None,None,None))
    if x > 1:
        hf["Altiplano"].resize((hf["Altiplano"].shape[0] + 1), axis = 0)
        hf["Altiplano"][-1:] = OneYrOcc
#call this to write to disk
hf.close()

### Selecting subset to be analyzed
The cells below take a subset of the larger area created in the cells above. This subset is what will be analyzed in the notebooks "RunTiles.ipynb" and "VizResults.ipynb"

In [2]:
#use line below once to reopen the dataset after writing
#need to use the 'r' option instead of 'a' for dask distributed to work
hf = h5py.File('../Data/Altiplano.h5','r')
d = hf['./Altiplano']          # Pointer on on-disk array

#convert the on-disk array to a dask array
Im = array.from_array(d,chunks='auto')

In [3]:
Im = Im.astype('uint8') #convert to unsigned int 8 to save space

#trim and rechunk matrix

#Study area 
#Im = Im[0:35,0:16384,10016:26400]

#small test area for computers with less memory than 32gb
Im = Im[0:35,0:5464,10016:15480]

#load trimmed study region into memory
Im = Im.compute()

In [4]:
#study area image shape should be (35,16384,16384)
#small test area shape should be (35,5464,5464)
Im.shape

(35, 5464, 5464)

In [5]:
#now import the mask dataset. Only doing this here so it can be 
#cropped if necessary for testing on a lower memory machine. 
mask = plt.imread('../Data/WholeTestAreaTempCenterlinesThreshold1.png')
mask[mask == 255] = 1
#trim mask if running small test window
cutmask = mask[0:5464,0:5464]

In [6]:
#get lat-long locations of corners of cropped image of the final test region. This is helpful
#in a later part of the analysis when you need to get another transform
#to determine the geographical center of active region. See "VizResults.ipynb"
left = -71.61791653184363 
bottom = -23.22719931230798 
right = -64.45475045627458 
top = -13.1604985754078
Tran = rasterio.transform.from_bounds(left, bottom, right, top, 26580, 37354)
#get new corners for full study area
#rasterio.transform.xy(Tran, [0,0,16384,16384], [10016,26400,10016,26400], offset='center')
#get new corners for small test area
rasterio.transform.xy(Tran, [0,0,5464,5464], [10016,15480,10016,15480], offset='center')

([-68.91852401882868,
  -67.44600560509997,
  -68.91852401882868,
  -67.44600560509997],
 [-13.160633322700418,
  -13.160633322700418,
  -14.633151736429138,
  -14.633151736429138])

In [12]:
#save image and mask that will be analyzed in the "RunTiles" notebook
#if TestRegion.h5 already exists you will get an error message when you try to run the line below. To run delete
#old TestRegion.h5 or rename
shf = h5py.File('../Data/TestRegion.h5','w')
shf.create_dataset('Altiplano',data=Im, dtype='u2', compression="gzip", chunks=True, maxshape=(None,None,None))
shf.close()
plt.imsave('../Data/TestRegionMask.png',cutmask,cmap='gray')