# Extracting The Tree Images 
In this notebook we will extract the tree images from the raster files and store them together with the correct label in a `.npz` file.

**Note:** For demonstration purposes only on ortho image will be processed. In the `scripts` folder is a file containing all the functions to use the process on all available data.

### Importing needed libraries & packages

In [1]:
import pandas as pd
import numpy as np
import rasterio as rio

### Loading the tree data
We will now load the data with the tree coordinates and labels we need to extract the individual tree images.

In [2]:
path = "./data/trees.csv" # file containing the needed data
# Import data that contains the labeled but uncorrected gps tree
# coordinates
tc_df = pd.read_csv(path)

# Extrac those variables that will be of importance, this might vary depending on how it is stored
tc_df = tc_df[['X', 'Y', 'desc']]

# Rename the columns
tc_df.columns = ['x_geo', 'y_geo', 'label']
coordinates = tuple(tc_df.itertuples(index=False,name=None)) # converting the dataframe into a tuple of tuples to make sure everything stays in place.

### Extracting the trees
Now we will load the raster image (i.e. ortho photos, vegetation height raster, etc.) and extract all trees that are in the file we just loaded.

In [3]:
filepath = "./data/TDOP/TDOP_2022.tif"

In [4]:
save_list = [] # creating lists to store the extracted images in 
labels_list = [] # same for the labels

with rio.open(filepath) as data:
    for (lon, lat, label) in coordinates:

        # Get pixel coordinates from map coordinates
        py, px = data.index(lon, lat)
        window = rio.windows.Window(px - 14, py - 14, 35, 35)

        # Read the data in the window
        clip = data.read(window=window)
        clip_T = np.transpose(clip,(1,2,0)) # transpose the data to match required shape, rasterio images are in the shape: channels,width,height
                                            # but we need width,height,channels

        if clip_T.shape != (35,35,4): # checking if the clipped image was fully inside the picture by looking at the clips size
            continue
        else:
            save_list.append(clip_T)
            labels_list.append(label)

# turn the list into an array for saving
save_array = np.array(save_list)    
labels_array = np.array(labels_list)

Let's look at what we just created:

In [5]:
save_array.shape

(1705, 35, 35, 4)

And what does a image look like?

In [6]:
save_array[0]

array([[[14111, 16306, 14080, 38100],
        [14471, 16715, 14440, 38238],
        [12788, 14853, 12668, 36180],
        ...,
        [10343, 11835, 11380, 35915],
        [11202, 12808, 12325, 37338],
        [11795, 13630, 12441, 38603]],

       [[17477, 20131, 17227, 42603],
        [16365, 18885, 15951, 40501],
        [14120, 16338, 13783, 37606],
        ...,
        [ 9978, 11459, 10787, 35332],
        [ 9088, 10444,  9866, 33907],
        [11883, 13742, 12552, 39153]],

       [[19693, 22673, 19336, 45706],
        [20033, 23080, 19113, 45341],
        [16394, 18940, 15745, 40370],
        ...,
        [10623, 12269, 11170, 36407],
        [10013, 11544, 10552, 35342],
        [13599, 15809, 13856, 41598]],

       ...,

       [[23609, 25672, 17864, 53189],
        [19985, 21806, 15440, 47413],
        [20612, 22533, 16244, 48534],
        ...,
        [13061, 13866, 12254, 36810],
        [14496, 15492, 13386, 39290],
        [15908, 17103, 14445, 41686]],

       [[23559,

It's an array filled with 16bit color values for the 4 channels in our image.

### Saving the data
We now save the image and label arrays to a `.npz` file which houses all relevant arrays in one convenient file.

In [7]:
np.savez("./data/TDOP.npz",img=save_array,labels=labels_array)