# Manage Directories

This notebook will manage the test and train directories. It will create two new directories, "cancer" and "no_cancer", and copy the images from the original directories to the new directories.

The original directories are:

```cmd
.
├── test
│   ├── images
│   └── labels
├── train
│   ├── images
│   └── labels
└── valid
    ├── images
    └── labels
```

The new directories will be:

```cmd
.
├── test
│   ├── classification
│   │   ├── cancer
│   │   ├── no_cancer
│   ├── images
│   └── labels
├── train
│   ├── classification
│   │   ├── cancer
│   │   ├── no_cancer
│   ├── images
│   └── labels
├── valid
│    ├── classification
│    │   ├── cancer   
│    │   ├── no_cancer
│    ├── images
│    └── labels
```

## Create the new directories


In [1]:
from os import mkdir


def create_dir(path: str, directory: str) -> None:
  new_path: str = path + directory

  try:
    mkdir(new_path)
  except FileExistsError:
    print(f"Directory '{new_path}' already exists.")
  except PermissionError:
    print(f"Permission denied: Unable to create '{new_path}'.")
  except Exception as e:
    print(f"An error occurred: {e}")

### Create classification directories


In [None]:
TEST = 0
TRAIN = 1
VALID = 2

root_paths: list[str] = ["./dataset/test/", "./dataset/train/", "./dataset/valid/"]

# Create classification directories
directory = "classification"

for origin in root_paths:
  create_dir(origin, directory)

### Create cancer and no_cancer directories


In [None]:
classification_paths: list[str] = ["./dataset/test/classification/", "./dataset/train/classification/", "./dataset/valid/classification/"]

for origin in classification_paths:
  create_dir(origin, "cancer")
  create_dir(origin, "no_cancer")

## Get files


Images and labels are named the same, except for the extension. The labels include if the image has cancer or not, and the bounding box. So we will use the labels to know if the image has cancer or not, and copy the images to the correct directory.  
When getting the files, we will not get the extension, so we can use the same name for the image and label.


In [4]:
from os import walk


def get_files(path: str):
  files: list[str] = []

  for (dirpath, dirnames, filenames) in walk(path):
    files.extend(filenames)
    break

  return files


def remove_file_extension(file: str) -> str:
  return file.rsplit(".", 1)[0]


def get_files_names(path: str) -> list[str]:
  return [remove_file_extension(file) for file in get_files(path)]

In [5]:
test_files: list[str] = get_files_names("./dataset/test/images/")
train_files: list[str] = get_files_names("./dataset/train/images/")
valid_files: list[str] = get_files_names("./dataset/valid/images/")

files = [test_files, train_files, valid_files]

## Process files

The first char in the labels is either a 0 or a 1. If it is a 1, the image has cancer, if it is a 0, the image does not have cancer. We will use this information to copy the images to the correct directory.


In [6]:
def cancer(filepath: str) -> bool:
  with open(filepath, "r") as file:
    return file.read(1) == "1"

In [None]:
cancer_bools = []

for i in range(3):
  cancer_bools.append([cancer(root_paths[i] + "labels/" + file + ".txt") for file in files[i]])

## Copy files


In [9]:
from shutil import copy2


def copy_files(files: list[str], origin: str, destination: str, cancer_bools: list[bool]) -> None:
  for i in range(len(files)):
    if cancer_bools[i]:
      copy2(f"{origin}{files[i]}.jpg", f"{destination}cancer/")
    else:
      copy2(f"{origin}{files[i]}.jpg", f"{destination}no_cancer/")


for i in range(3):
  copy_files(
    files[i], 
    root_paths[i] + "images/", 
    classification_paths[i], 
    cancer_bools[i]
  )