## Make mask files from geojson building masks from SN2 datasets -- Sept 14, 2021

This notebook:
- installs conda
- installs geospatial python libraries
- attaches to my google drive, specifically the SN2-Khartoum directory containing 500 sample chips and building masks
- creates byte building masks from the geojson files corresponding to each input image.
- writes them to a subdirectory of SN2-Khartoum.


#### Miniconda installation.

This setup process follows instructions given in this very good and clear article: [Conda + Google Colab](https://towardsdatascience.com/conda-google-colab-75f7c867a522).

In [59]:
%%bash

MINICONDA_INSTALLER_SCRIPT=Miniconda3-py37_4.10.3-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX

PREFIX=/usr/local
Unpacking payload ...
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - _openmp_mutex==4.5=1_gnu
    - brotlipy==0.7.0=py37h27cfd23_1003
    - ca-certificates==2021.7.5=h06a4308_1
    - certifi==2021.5.30=py37h06a4308_0
    - cffi==1.14.6=py37h400218f_0
    - chardet==4.0.0=py37h06a4308_1003
    - conda-package-handling==1.7.3=py37h27cfd23_1
    - conda==4.10.3=py37h06a4308_0
    - cryptography==3.4.7=py37hd23ed53_0
    - idna==2.10=pyhd3eb1b0_0
    - ld_impl_linux-64==2.35.1=h7274673_9
    - libffi==3.3=he6710b0_2
    - libgcc-ng==9.3.0=h5101ec6_17
    - libgomp==9.3.0=h5101ec6_17
    - libstdcxx-ng==9.3.0=hd4cf53a_17
    - ncurses==6.2=he6710b0_1
    - openssl==1.1.1k=h27cfd23_0
    - pip==21.1.3=py37h06a4308_0
    - pycosat==0.6.3=py37h27cfd23_0
    - pycparser==2.20=py_2
    - pyopenssl=

--2021-09-14 20:04:16--  https://repo.continuum.io/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c84f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh [following]
--2021-09-14 20:04:16--  https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8303, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89026327 (85M) [application/x-sh]
Saving to: ‘Miniconda3-py37_4.10.3-Linux-x86_64.sh.2’

     0K .......... .......... .......... .......... ..........  0% 46.1M 2s
    50K .......... .......... ..........

In [60]:
!which conda # should return /usr/local/bin/conda

/usr/local/bin/conda


In [61]:
!conda --version #should return 4.10.3

conda 4.10.3


In [62]:
!which python # still returns /usr/local/bin/python

/usr/local/bin/python


In [63]:
!python --version

Python 3.7.10


In [64]:
# Now that you have installed Conda you need to update Conda and all its dependencies to their most recent versions without updating Python to 3.8+.
# This code updates everything while holding python constant at 3.7.
%%bash

conda install --channel defaults conda python=3.7 --yes
conda update --channel defaults --all --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - conda
    - python=3.7


The following packages will be UPDATED:

  openssl                                 1.1.1k-h27cfd23_0 --> 1.1.1l-h7f8727e_0
  python                                  3.7.10-h12debd9_4 --> 3.7.11-h12debd9_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /usr/local


The following packages will be REMOVED:

  attrs-21.2.0-pyhd8ed1ab_0
  boost-cpp-1.70.0-ha2d47e9_1
  bzip2-1.0.8-h7f98852_4
  c-ares-1.17.1-h27cfd23_0
  cairo-1.16.0-hf32fb01_1
  cfitsio-3.470-hb418390_7
  chardet-4.0.0-py37h06a4308_1003
  click-7.1.2-pyh9f0ad1d_0
  click

You've updated conda. In theory. Actually, the version number didn't change. But the version number of the python installation has changed.


In [65]:
!conda --version # now returns 4.10.3

conda 4.10.3


In [66]:
!python --version

Python 3.7.11


Now you need to modify your path settings so things get installed properly. The initial sys.path looks like the one given in the writeup I'm following.

Note that the preinstalled packages included with Google Colab are installed into the /usr/local/lib/python3.6/dist-packages directory. You can get an idea of what packages are available by simply listing the contents of this directory.

(The ls returns gobs of stuff, so I've commented it out).

Any package that you install with Conda will be installed into the directory /usr/local/lib/python3.7/site-packages so you will need to add this directory to sys.path in order for these packages to be available for import.

Note that because the /usr/local/lib/python3.6/dist-packages directory containing the pre-installed Google Colab packages appears ahead of the /usr/local/lib/python3.6/site-packages directory where Conda installs packages, the version of a package available via Google Colab will take precedence over any version of the same package installed via Conda.


In [67]:
import sys
sys.path

['',
 '/content',
 '/env/python',
 '/usr/lib/python37.zip',
 '/usr/lib/python3.7',
 '/usr/lib/python3.7/lib-dynload',
 '/usr/local/lib/python3.7/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/local/lib/python3.7/dist-packages/IPython/extensions',
 '/root/.ipython',
 '/usr/local/lib/python3.7/site-packages',
 '/usr/local/lib/python3.7/site-packages']

In [68]:
# !ls /usr/local/lib/python3.7/dist-packages

In [69]:
import sys
_ = (sys.path
        .append("/usr/local/lib/python3.7/site-packages"))

#### Installing the python geospatial libraries you'll need.

First, mount your google drive to colab. Remember the authorization step you need to respond to, every time you run this code.


In [70]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [71]:
!ls /content/drive/MyDrive/SN2-Khartoum

bldgs-geojson  bldgs-mask  pansharp


Install a minimal set of geospatial libraries into the base environment. geopandas contains osgeo, which in turn contains gdal and ogr, the libraries you need for burning raster masks. 

In [72]:
!conda install --channel conda-forge geopandas geojson --yes

Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done
Solving environment: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | /

In [73]:
from osgeo import ogr, gdal
import geojson

In [74]:
# Support function for making a blank mask given a raster to match (ds) and a path to the mask file. 
def make_blank_mask_from_img(ds, mask_path):
    '''
    ds: gdal raster dataset (we'll match its georeferencing and size in the byte mask)
    mask_path: where to write the mask file.
    '''
    dr = ds.GetDriver()
    ds_new = dr.Create(mask_path,ds.RasterXSize,ds.RasterYSize,ds.RasterCount,gdal.GDT_Byte)
    ds_new.SetGeoTransform(ds.GetGeoTransform())
    ds_new.SetProjection(ds.GetProjection())
    return ds_new 

This is a function to burn a mask into a byte file. 

In [75]:
# WHAT a hassle it was to get this to work. But this did finally work.
def burn_bldgs_to_mask(src_raster_path, bldg_gjson_path, mask_path):
  

  # get vector layer
  lyr = None
  with open(bldg_gjson_path) as f:
    the_json = f.read() 
    ds = ogr.Open(the_json)
    lyr = ds.GetLayer()
  assert lyr is not None

  #todo: log number of bldgs in the layer
  # len(lyr)

  # Open raster source file, make a mask with equal pixel spacing and georeferencing
  # burn the vector layer into the mask. 
  # Nullify the mask_ds variable at the end, to flush the image to disk.
  ras_ds = gdal.Open(src_raster_path)
  mask_ds = make_blank_mask_from_img(ras_ds, mask_path)
  gdal.RasterizeLayer(mask_ds, [1], lyr, burn_values=[1] ) 
  # mask_ds.GetRasterBand(1).SetNoDataValue(0.0) 
  mask_ds = None
  return

Now for processing:
- get a list of source raster files 
- loop over all source files to create mask chips. 
  - check for matching building files.
    - if there isn't one, should I assume there are no buildings in that chip? No -- all images have associated geojson files, even if they are empty.
  - create matching mask and burn buildings to it


In [76]:
# define paths
chip_base = r'/content/drive/MyDrive/SN2-Khartoum/pansharp' #example: RGB-PanSharpen_AOI_5_Khartoum_img1.tif
mask_base = r'/content/drive/MyDrive/SN2-Khartoum/bldgs-mask' #example: RGB-PanSharpen_AOI_5_Khartoum_mask1.tif
json_base = r'/content/drive/MyDrive/SN2-Khartoum/bldgs-geojson' #example: buildings_AOI_5_Khartoum_img1.geojson


# get source raster files
import os, sys
ps_files = os.listdir(chip_base)

# extract image numbers and construct the paths of the other files
import re
ps_pattern = re.compile(r"img(?P<numbers>[0-9]+)\.tif$")
matches = [ps_pattern.search(filename) for filename in ps_files]
numbers_s = [match.group("numbers") for match in matches]

mask_files = [f"RGB-PanSharpen_AOI_5_Khartoum_mask{num}.tif" for num in numbers_s]
json_files = [f"buildings_AOI_5_Khartoum_img{num}.geojson" for num in numbers_s]

all_files_index = zip([int(num) for num in numbers_s], 
                      [os.path.join(chip_base, ps_file) for ps_file in ps_files],
                      [os.path.join(mask_base, mask_file) for mask_file in mask_files],
                      [os.path.join(json_base, json_file) for json_file in json_files])

# list 10 entries for a check
# for ii, afi in enumerate(all_files_index):
#   print (f"{afi[0]}\n {afi[1]}\n {afi[2]}\n {afi[3]}\n\n")
#   if ii == 10: break

AttributeError: ignored

In [None]:
for fileset in all_files_index:
  (src_raster_path, bldg_gjson_path, mask_path) = (fileset[1], fileset[3], fileset[2])
  burn_bldgs_to_mask(src_raster_path, bldg_gjson_path, mask_path)