# Dataset Preparation

In this notebook we import dataset 'Deforestation in Ukraine from Sentinel2 data' from Kaggle. Then, we copy all "..._TCI.jp2" files to the dataset folder in our project.

### Importing libraries

In [5]:
import os
import kagglehub
import shutil
from pathlib import Path

  from .autonotebook import tqdm as notebook_tqdm


### Downloading dataset from Kaggle

In [6]:
kaggle_dataset_path = kagglehub.dataset_download("isaienkov/deforestation-in-ukraine")
kaggle_dataset_path = Path(kaggle_dataset_path)
print("Path to dataset files:", kaggle_dataset_path)

Path to dataset files: C:\Users\Alex\.cache\kagglehub\datasets\isaienkov\deforestation-in-ukraine\versions\1


### Creating dataset folder if it does not exist

In [7]:
dataset_dir = Path("./dataset")

In [8]:
dataset_dir.mkdir(exist_ok=True)

In [9]:
search_pattern = "**/*_TCI.jp2" # ** in all subfolders

In [10]:
image_paths_list = list(kaggle_dataset_path.rglob(search_pattern))

In [11]:
len(image_paths_list) # We have 50 images

50

### Copying all files to the dataset folder in our project

In [63]:
for src_path in image_paths_list:
    dst_path = dataset_dir / src_path.name
    
    if not dst_path.exists():
        print(f"Copying {src_path.name}...")
        shutil.copy(src_path, dst_path)
    else:
        print(f"Skipping {src_path.name}, already exists.")

print("Copying complete.")

Copying T36UYA_20160212T084052_TCI.jp2...
Copying T36UYA_20160330T082542_TCI.jp2...
Copying T36UYA_20160405T085012_TCI.jp2...
Copying T36UYA_20160502T083602_TCI.jp2...
Copying T36UYA_20160509T082612_TCI.jp2...
Copying T36UYA_20160618T082602_TCI.jp2...
Copying T36UYA_20160621T084012_TCI.jp2...
Copying T36UYA_20160830T083602_TCI.jp2...
Copying T36UYA_20161026T083032_TCI.jp2...
Copying T36UYA_20161121T085252_TCI.jp2...
Copying T36UYA_20161205T083332_TCI.jp2...
Copying T36UXA_20180731T083601_TCI.jp2...
Copying T36UXA_20180810T083601_TCI.jp2...
Copying T36UXA_20180820T083601_TCI.jp2...
Copying T36UXA_20180830T083601_TCI.jp2...
Copying T36UXA_20180919T083621_TCI.jp2...
Copying T36UYA_20190318T083701_TCI.jp2...
Copying T36UYA_20190328T084011_TCI.jp2...
Copying T36UYA_20190407T083601_TCI.jp2...
Copying T36UYA_20190417T083601_TCI.jp2...
Copying T36UXA_20190427T083601_TCI.jp2...
Copying T36UYA_20190427T083601_TCI.jp2...
Copying T36UYA_20190517T083601_TCI.jp2...
Copying T36UXA_20190606T083601_TCI