# Prepare Datasets and Create the Project Structure

Notebook to preprocess the dataset and create a proper folder structure to store the data. This notebook must be run before running the `Data_Augmentation_Using_Generative_Adversarial_Networks.ipynb` notebook that should be available in the same folder.

### Prerequisites

Esnure that the Cityscapes dataset is downloaded and placed in the current working directory. The dataset can be downloaded from [here](https://www.cityscapes-dataset.com/). You need to register to access the dataset. The following datasets are required for data augmentation:
1. [gtFine_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=1) (241MB): Fine annotations for training and validation datasets (3475 annotated images) and dummy annotations (ignore regions) for the test set (1525 images).
2. [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB): Left 8-bit images - training, validation, and test datasets (5000 images).

### Check Python version

In [1]:
import platform
assert (platform.python_version_tuple()[:2] >= ('3','7')), "The notebooks are tested on Python 3.7 and higher. Please updated your Python to evaluate the code"

### Check Notebook server has access to all required resources

In [2]:
from pathlib import Path

dataset_folder = Path("dataset")
if not dataset_folder.exists():
    raise FileNotFoundError("Download and place `{}` in the current directory (`{}`)".format(dataset_folder.name, Path.cwd()))

In [3]:
expected_zipped_datasets = ["gtFine_trainvaltest.zip", "leftImg8bit_trainvaltest.zip"]
expected_zipped_datasets_path = list()

for zipped_dataset in expected_zipped_datasets:
    zipped_dataset = Path.joinpath(Path.cwd(), "dataset", zipped_dataset)
    expected_zipped_datasets_path.append(zipped_dataset)
    if not zipped_dataset.exists():
        raise FileNotFoundError("Download and place `{}` in the current directory (`{}`)".format(zipped_dataset.name, Path.cwd()))

### Unzip Datasets

In [4]:
import zipfile as zf

unzipped_datasets_name = [str(dataset_path).replace(".zip", "") for dataset_path in expected_zipped_datasets]
unzipped_datasets_path = [Path.joinpath(Path.cwd(), "dataset", dataset_name) for dataset_name in unzipped_datasets_name]

for dataset_input_path, dataset_output_path in zip(expected_zipped_datasets_path, unzipped_datasets_path):
    with zf.ZipFile(dataset_input_path, 'r') as zip_ref:
        zip_ref.extractall(dataset_output_path)

### Prepare Project Structure