# Preprocessing land-use/land-cover (LULC) data to enrich and refine them by vector data

## Environment and dependencies

This preprocessing workflow requires to install specific packages to run most of processing commands. Anaconda environment has been used to ensure the consistency and seamless installation of libraries. Geopandas and pandas are recommended to be installed in this way (to provide compatible versions) through Anaconda Prompt: 
conda install -c conda-forge geopandas pandas

Other libraries may be installed through simple commands in your Anaconda Prompt:

$ conda install fiona
$ conda install gdal

This package is currently not included into the preprocessing workflow, but might be useful in future:

$ conda install qgis --channel conda-forge

We would like to install all dependencies required (not required at this stage):

In [None]:
import sys

import os
os.environ['USE_PATH_FOR_GDAL_PYTHON'] = 'YES' #to import gdal

import numpy as np
import numpy.ma as ma
import warnings
import fiona
import geopandas as gpd

# import processing if needed
# from qgis.core import QgsVectorLayer
# from qgis.core import QgsProject
# from qgis.core import QgsProcessingUtils
# from qgis.core import QgsGeometryChecker

As GDAL installation might face issues it is important to include a separate troubleshooting statement for its installation:

In [None]:
#INSTALLING GDAL
try:
    from osgeo import ogr, osr, gdal
except ImportError:
    import sys
    sys.exit('ERROR: cannot find GDAL/OGR modules')

It is recommended to use GDAL error handler function and exception module:

In [None]:
# specify GDAL error handler function
def gdal_error_handler(err_class, err_num, err_msg):
    errtype = {
        gdal.CE_None: 'None',
        gdal.CE_Debug: 'Debug',
        gdal.CE_Warning: 'Warning',
        gdal.CE_Failure: 'Failure',
        gdal.CE_Fatal: 'Fatal'
    }
    err_msg = err_msg.replace('\n', ' ')
    err_class = errtype.get(err_class, 'None')
    print('Error Number: %s' % (err_num))
    print('Error Type: %s' % (err_class))
    print('Error Message: %s' % (err_msg))

# enable GDAL/OGR exceptions
gdal.UseExceptions()

## Input data and paths

Firstly, it is vital to define names of input data and paths to them. Currently, the automatical extraction of current folder works (os.getcwd) to avoid hard-coded path.


In [None]:
# specify parent and child directory of code/data
parent_dir = os.getcwd()
print (parent_dir)
child_dir = 'data/input'

# SPECIFY INPUT RASTER AND VECTOR DATA
# specifying the file names
lulc = 'lulc_2022.gtif'
vector_refine = 'vector_refine.gpkg'
# specifying the path to these files through the path variables
lulc = os.path.join(parent_dir,child_dir,lulc)
vector_refine = os.path.join(parent_dir,child_dir,vector_refine)

## Validity check

It is required to check the validity of vector geometry used to refine raster LULC data

In [None]:
# open geopackage file
data_source = ogr.Open(vector_refine)

# get the number of layers in geopackage
num_layers = data_source.GetLayerCount()

# iterate through each layer
for i in range(num_layers):
    layer = data_source.GetLayerByIndex(i)

    # iterate through each feature in the layer
    feature = layer.GetNextFeature()
    while feature:
        geometry = feature.GetGeometryRef()

        # check the validity of each geometry
        if not geometry.IsValid():
            print(f"Invalid Geometry in feature {feature.GetFID()}: {geometry.ExportToWkt()}. Further computations may be affected by geometry invalidity")

        feature = layer.GetNextFeature()

# close the geopackage file
data_source = None