# Processing images into chips

Planet imagery was originally processed into larger tiles of 2368 x 2358 pixels at a resolution of 0.000025$^\circ$. Labelling was undertaken on only a subset of each tile, corresponding to a 0.005$^\circ$ target (~550 m). For release, the imagery was cropped to the target box and resampled to make chips of 224x224 pixels, with labels rasterized to the same dimensions. Functions provided by the built in `MakeLabels` class are used for the task. 

In [1]:
import os
import sys
from pathlib import Path
import pandas as pd
import rioxarray as rxr
from datetime import datetime as dt
from makelabels import MakeLabels

## Setup

### Paths

In [4]:
root_dir = os.environ["HOME"]
proj_dir = Path(root_dir) / "projects/lacunalabels"
data_dir = Path(root_dir) / "data"
image_dir = Path(os.path.dirname(root_dir)) / \
    "data/imagery/planet/tiles" # input directory
chip_dir = Path(data_dir) / "labels/lacunalabels/images"  # output
log_file = str(Path(root_dir) / "logs/image-chipping")

if not os.path.isdir(chip_dir):
    os.makedirs(chip_dir)

### Catalogs

Read in assignment catalog containing the image tile names, drop duplicated site names (there are more than one assignment mapped for many sites)

In [5]:
catalog = pd.read_csv(Path(proj_dir) /\
                      "data/interim/assignments_full_wtiles.csv")
chip_catalog = (
    catalog[["name", "image_date", "x", "y", "destfile"]]
    .drop_duplicates()
    .reset_index(drop=True)
)

## Run chipping

Using the `image_chipper` provided by `MakeLabels` run in parallel mode. 

### Initialize the class

Which also initiates a logger

In [6]:
mkl = MakeLabels(logfile=log_file)

Started dataset creation


### Define arguments for chipping function

Using a dictionary of keyword arguments, to enable parallelized implementation of the `image_chipper` function. Arguments including the half-width of the chipping target, in decimal degrees, the desired output dimensions (224x224), the input and output directories, etc. 

See `help(mkl.image_chipper)` for more details on arguments.`

In [7]:
kwargs = {
    "src_dir": image_dir, 
    "dst_dir": chip_dir, 
    "src_col": "destfile",
    "date_col": "image_date",
    "w": 0.0025, 
    "rows": 224,
    "cols": 224, 
    "crs": "epsg:4326",
    "verbose": False,
    "overwrite": False
}

### Run chipping in parallel


In [8]:
%%time
catalogf = mkl.run_parallel_threads(
    chip_catalog, mkl.image_chipper, kwargs, 4
)
catalogf = pd.DataFrame(catalogf)

Completed run
CPU times: user 4h 37min 49s, sys: 19min 52s, total: 4h 57min 41s
Wall time: 4h 31min 58s


In [9]:
catalogf.reset_index(drop=True, inplace=True)
catalogf.drop(columns=["destfile", "x", "y"], inplace=True)

In [10]:
catalogf.to_csv(
    Path(proj_dir) / "data/interim/image_chip_catalog.csv", index=False
)
catalogf.head()

Unnamed: 0,name,image_date,image
0,ET0007182,2017-08-15,ET0007182_2017-08.tif
1,NE3372442,2021-08-15,NE3372442_2021-08.tif
2,SN0105655,2020-02-15,SN0105655_2020-02.tif
3,SD4068077,2022-03-15,SD4068077_2022-03.tif
4,ML2303293,2021-04-15,ML2303293_2021-04.tif


In [18]:
cnames = os.listdir(chip_dir)
# catalogf.query("image in @cnames")

# [c for c in cnames if c not in catalogf.image.to_list()]

[]

In [38]:
!ls /home/airg/lestes/data/labels/lacunalabels/labels/ | wc -l

33746
