## 1. Downloading the Dataset
The dataset can be downloaded from the following Kaggle link: [Deforestation in Ukraine](https://www.kaggle.com/datasets/isaienkov/deforestation-in-ukraine/data).

Follow these steps to download and place the dataset:
1. Download the dataset as a `.zip` file from Kaggle.
2. Extract the contents of the zip file.
3. Place the extracted folder into the `data/images` directory of this project.


## 2. Organizing the Dataset

Once the dataset is downloaded, it contains satellite images in `.jp2` format with filenames that encode information about location and acquisition time. 

We will:
1. Use the `prepare_data.py` script to sort the images by location (tile).
2. Filter the dataset to include only True Color Images (TCI).

The `prepare_data.py` script:
- Reads all images in the `data/images` directory.
- Filters images to include only those with `_TCI` in their filenames.
- Sorts images into subfolders by tile name.


## 3. Running the Data Preparation Script

The `prepare_data.py` script is located in the `src` directory. It organizes the dataset by:
- Filtering for True Color Images (TCI).
- Sorting images into folders by location (tile).

### Steps:
1. Ensure the dataset is in the `data/images` folder.
2. Run the following code to preprocess the data.


In [None]:
!python ../src/prepare_data.py

## 4. Verifying the Processed Dataset

The processed dataset should now be available in the `data/sorted_by_tile` folder. Each subfolder corresponds to a tile (location) and contains the True Color Images (TCI) for that tile.


## Moving Data for Testing

After sorting the images into folders based on their tile identifiers, the next step is to prepare a set of images specifically for testing the model. To do this:

1. Navigate to the folder where the sorted images are located (`data/sorted_by_tile`).
2. Manually select a few tile folders (e.g., `T36UYA`, `T36UXA`) and copy them to the `data/test_images` directory.
3. The `test_images` directory will serve as the dataset for testing your feature matching and inference scripts.

This ensures that the test set contains images from the same tiles (locations), allowing the model to evaluate feature matching across different seasonal variations of the same region.

### Important Note:
- Ensure that the `test_images` directory contains at least two images from each tile folder to allow proper testing.
- You can keep the original sorted images in `data/sorted_by_tile` intact for training or further dataset preparation.