# Data augmentation

### This notebook is dedicated to do data augmentation. In essence, we generate new data from what we have to create new training entries.

### The approach we took was to rotate each image by an angle of 90 degrees 4 times, which should seem as new images from our model and would therefore not overfit on this new data.


> We first do the basic imports 

In [8]:
import numpy as np
import pandas as pd
import rasterio
from rasterio.transform import from_origin
import warnings
warnings.filterwarnings("ignore", category=rasterio.errors.NotGeoreferencedWarning)

In [9]:
metadata = pd.read_csv("data/metadata.csv") # load the data

> We here define the rotation matrix to rotate the image 90 degrees clockwise every pixel.

In [10]:
def rotate_matrix_90(mat:np.array)-> np.array:
    """
    Returns the input matrix but rotated 90 degrees clockwise

    intput: np.array mat
    returns: np.array
    """
    return mat[::-1].T

> We create a new metadata file so that we can load the images from that file.
>
> We have to keep in mind that we also have to change the x and y coordinates of the plume on every image so it matches the new rotation.
>
> We then also save the new images in the directory.

In [14]:
rows = {'date': [],
        'id_coord': [],
        'plume': [],
        'set': [],
        'lat': [],
        'lon': [],
        'coord_x': [],
        'coord_y': [],
        'path': []
} # Prepare new data frame as dict to append one at a time

for _ , row in metadata.iterrows():
    img_path = "data/"+row["path"]+".tif"
    with rasterio.open(img_path) as src:
        data = src.read()
        x, y = row["coord_x"], row["coord_y"]
        profile = src.profile

        img = data[0]

        for i in range(4):
            # Write the rotated image
            rotated_path = f"{row['path']}_rot{i}"

            width, height = 64,64
            transform = from_origin(0, height, 1, 1)

            with rasterio.open(rotated_path+".tif", 'w', driver='GTiff', height=height, width=width, count=1, dtype=str(img.dtype), crs='EPSG:4326', transform=transform) as dst:
                dst.write(img,1)

            rows["date"].append(row["date"])
            rows["id_coord"].append(f"{row['id_coord']}_rot{i}")
            rows["plume"].append(row["plume"])
            rows["set"].append(row["set"])
            rows["lat"].append(row["lat"])
            rows["lon"].append(row["lon"])
            rows["coord_x"].append(x)
            rows["coord_y"].append(y)
            rows["path"].append(rotated_path)

            img = rotate_matrix_90(img)
            x,y = y, 64-x # rotation matrix around the point 32,32

new_metadata = pd.DataFrame(rows) # Make it a data frame to then save it

> Save the new metadata to a csv file

In [15]:
new_metadata.to_csv("new_metadata.csv")

Unnamed: 0,date,id_coord,plume,set,lat,lon,coord_x,coord_y,path
0,20230223,id_6675_rot0,yes,train,31.528750,74.330625,24,47,images/plume/20230223_methane_mixing_ratio_id_...
1,20230223,id_6675_rot1,yes,train,31.528750,74.330625,47,40,images/plume/20230223_methane_mixing_ratio_id_...
2,20230223,id_6675_rot2,yes,train,31.528750,74.330625,40,17,images/plume/20230223_methane_mixing_ratio_id_...
3,20230223,id_6675_rot3,yes,train,31.528750,74.330625,17,24,images/plume/20230223_methane_mixing_ratio_id_...
4,20230103,id_2542_rot0,yes,train,35.538000,112.524000,42,37,images/plume/20230103_methane_mixing_ratio_id_...
...,...,...,...,...,...,...,...,...,...
1715,20230213,id_5510_rot3,no,train,32.713854,44.609398,10,55,images/no_plume/20230213_methane_mixing_ratio_...
1716,20230330,id_6609_rot0,no,train,47.758979,27.801630,21,15,images/no_plume/20230330_methane_mixing_ratio_...
1717,20230330,id_6609_rot1,no,train,47.758979,27.801630,15,43,images/no_plume/20230330_methane_mixing_ratio_...
1718,20230330,id_6609_rot2,no,train,47.758979,27.801630,43,49,images/no_plume/20230330_methane_mixing_ratio_...
