# Exploratory Data Analysis - Ethiopia LCLUC

Initial imagery analysis.

In [32]:
import os
import glob
import numpy as np
import xarray as xr
import rasterio as rio

In [33]:
data_path = '/Users/jacaraba/Desktop/development/ilab/ethiopia-lcluc/adapt-data'
!ls /Users/jacaraba/Desktop/development/ilab/ethiopia-lcluc/adapt-data

Gonji_Kolela_All_2_SymDiff_SortLat_01_prj_selectS_ras.tif
Gonji_Kolela_All_2_SymDiff_SortLat_01_prj_selectS_ras.tif.aux.xml
Gonji_Kolela_All_2_SymDiff_SortLat_01_prj_v3r1m.tif
Gonji_Kolela_All_2_SymDiff_SortLat_01_prj_v3r1m.tif.aux.xml
Gonji_Kolela_All_2_SymDiff_SortLat_01_prj_v3r20cm.tif
WV01_20090215_P1BS_1020010006856900-toa-clipped.tif
WV01_20090215_P1BS_1020010006856900-toa.tif
WV01_20100104_P1BS_102001000BE8BB00-toa-clipped.tif
WV01_20100104_P1BS_102001000BE8BB00-toa.tif
WV01_20141220_P1BS_1020010037B2E200-toa-clipped.tif
WV01_20141220_P1BS_1020010037B2E200-toa.tif
WV02_20100215_M1BS_10300100043A6100-toa-clipped.tif
WV02_20100215_M1BS_10300100043A6100-toa.tif
WV02_20100215_P1BS_10300100043A6100-toa-clipped.tif
WV02_20100215_P1BS_10300100043A6100-toa.tif
WV02_20101206_M1BS_10300100081AB200-toa-clipped.tif
WV02_20101206_M1BS_10300100081AB200-toa.tif
WV02_20101217_M1BS_1030010008D79900-toa-clipped.tif
WV02_20101217_M1BS_1030010008D79900-toa.tif
WV02_20101217_P1BS_

## Labels

Looking at labels dimensions and unique values.

In [34]:
label_filename = os.path.join(data_path, 'Gonji_Kolela_All_2_SymDiff_SortLat_01_prj_v3r1m.tif')
label = xr.open_rasterio(label_filename).values
print(f'Label shape: {label.shape}, Unique Values: {np.unique(label)}')

Label shape: (1, 6564, 10817), Unique Values: [ 1  2  3  4  5  6 15]


Based on this, class: 15 will be excluded at the time of generating the appropiate training dataset.

## Imagery

We need to find the corresponding imagery to match the labels. From Woubet:

The land uses/cover are for the 2016 year. Generally, there is no much change from year to year for few years.
But with longer time gap from 2016 (say 2020), croplands may encroach to shrublands, barelands, and forestlands.
In this regard, we may encounter an issue in the image classification, since our input WorldView images are mainly in 2010. Not sure, if we should also consider other similar images.

The project people exclude road and river networks from the people’s landholdings data, when they digitize the aerial photos. So, these networks are assigned with noData in the raster. I hope that will not create an issue in the classification.

- Define exact dates we can use for "similar area labels" - number of years
- Define class names 1-6
- Define P1BS use, maybe with indices if we had matching bands

Clip the imagery to the labels extent:

```bash
rio clip raster.tif output-raster.tif --like raster_label.tif
```

In [35]:
raster_files = glob.glob(os.path.join(data_path, '*toa.tif'))

In [36]:
for f in raster_files:
    if not os.path.isfile(f'{f[:-4]}-clipped.tif'):
        cmd = f'rio clip {f} {f[:-4]}-clipped.tif --like {label_filename}'
        os.system(cmd)

## Generate Configuration File

Generate configuration file for Random Forest and CNN training.

In [41]:
train_files = [
    'WV02_20101206_M1BS_10300100081AB200-toa-clipped.tif',
    'WV02_20110202_M1BS_1030010009B10200-toa-clipped.tif'
]

In [43]:
for f2 in train_files:
    x = xr.open_rasterio(os.path.join(data_path, f2))
    print(os.path.join(data_path, f2), x.shape)

/Users/jacaraba/Desktop/development/ilab/ethiopia-lcluc/adapt-data/WV02_20101206_M1BS_10300100081AB200-toa-clipped.tif (8, 3282, 2757)
/Users/jacaraba/Desktop/development/ilab/ethiopia-lcluc/adapt-data/WV02_20110202_M1BS_1030010009B10200-toa-clipped.tif (8, 1684, 5061)
