<a href="https://colab.research.google.com/github/adspacheco/classificacao-fraturas/blob/main/pre_data_augmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!wget https://github.com/adspacheco/classificacao-fraturas/raw/main/dataset/dataset.zip
!unzip /content/dataset.zip

In [2]:
%%capture
!wget https://raw.githubusercontent.com/adspacheco/classificacao-fraturas/main/utils.py

In [3]:
!pip install keras-visualizer # só por causa do utils :)

Collecting keras-visualizer
  Downloading keras_visualizer-3.2.0-py3-none-any.whl (7.1 kB)
Installing collected packages: keras-visualizer
Successfully installed keras-visualizer-3.2.0


In [4]:
import utils
import glob
import pandas as pd
import os
import random
import shutil

In [5]:
BASE_PATH = '/content/classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x'

In [6]:
initial_counts = utils.count_files_and_calculate_percentages(BASE_PATH)

Diretório: Hairline Fracture - Total: 111
  Train files: 101 (90.99%)
  Test files: 10 (9.01%)
Diretório: Avulsion fracture - Total: 123
  Train files: 109 (88.62%)
  Test files: 14 (11.38%)
Diretório: Pathological fracture - Total: 134
  Train files: 116 (86.57%)
  Test files: 18 (13.43%)
Diretório: Impacted fracture - Total: 84
  Train files: 75 (89.29%)
  Test files: 9 (10.71%)
Diretório: Fracture Dislocation - Total: 156
  Train files: 137 (87.82%)
  Test files: 19 (12.18%)
Diretório: Greenstick fracture - Total: 122
  Train files: 106 (86.89%)
  Test files: 16 (13.11%)
Diretório: Longitudinal fracture - Total: 80
  Train files: 68 (85.00%)
  Test files: 12 (15.00%)
Diretório: Oblique fracture - Total: 85
  Train files: 69 (81.18%)
  Test files: 16 (18.82%)
Diretório: Spiral Fracture - Total: 86
  Train files: 74 (86.05%)
  Test files: 12 (13.95%)
Diretório: Comminuted fracture - Total: 148
  Train files: 134 (90.54%)
  Test files: 14 (9.46%)

Resumo:
  Total Train: 989 (87.60%)
  

In [7]:
old_to_new_dir_names = utils.rename_directories_and_files(BASE_PATH)

print("\nContagem de arquivos depois da padronização:")

new_counts = utils.count_files_and_calculate_percentages(BASE_PATH, train_name='train', test_name='test')

utils.validate_counts(initial_counts, new_counts, old_to_new_dir_names)


Contagem de arquivos depois da padronização:
Diretório: longitudinal - Total: 80
  Train files: 68 (85.00%)
  Test files: 12 (15.00%)
Diretório: fracture_dislocation - Total: 156
  Train files: 137 (87.82%)
  Test files: 19 (12.18%)
Diretório: comminuted - Total: 148
  Train files: 134 (90.54%)
  Test files: 14 (9.46%)
Diretório: avulsion - Total: 123
  Train files: 109 (88.62%)
  Test files: 14 (11.38%)
Diretório: oblique - Total: 85
  Train files: 69 (81.18%)
  Test files: 16 (18.82%)
Diretório: hairline - Total: 111
  Train files: 101 (90.99%)
  Test files: 10 (9.01%)
Diretório: pathological - Total: 134
  Train files: 116 (86.57%)
  Test files: 18 (13.43%)
Diretório: impacted - Total: 84
  Train files: 75 (89.29%)
  Test files: 9 (10.71%)
Diretório: greenstick - Total: 122
  Train files: 106 (86.89%)
  Test files: 16 (13.11%)
Diretório: spiral - Total: 86
  Train files: 74 (86.05%)
  Test files: 12 (13.95%)

Resumo:
  Total Train: 989 (87.60%)
  Total Test: 140 (12.40%)

Padroniza

In [8]:
!ls classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x

avulsion    fracture_dislocation  hairline  longitudinal  pathological
comminuted  greenstick		  impacted  oblique	  spiral


In [9]:
!ls "/content/classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x/avulsion/test"

avulsion_110.jpg  avulsion_113.jpg  avulsion_116.jpg  avulsion_119.jpg	avulsion_122.jpg
avulsion_111.jpg  avulsion_114.jpg  avulsion_117.jpg  avulsion_120.jpg	avulsion_123.jpg
avulsion_112.jpg  avulsion_115.jpg  avulsion_118.jpg  avulsion_121.jpg


# Tratamento da Base

Para garantir um conjunto de dados balanceado, selecionamos 34 imagens para o conjunto de teste de cada classe, resultando em um número igual de exemplos para avaliação.

A menor classe possuía 84 imagens, deixando 50 imagens disponíveis para augmentation. Aplicamos a função de augmentation com três tipos de transformações aleatórias (rotação, espelhamento horizontal e vertical) a essas 50 imagens originais.

Cada imagem sofreu até 3 transformações diferentes, criando 4 variações no total (a original mais 3 transformadas) por imagem, até atingir o total desejado de 170 imagens de treino por classe.

Dessa forma, temos 34 imagens de teste por classe sem augmentation e um conjunto de treino balanceado com 170 imagens por classe.

In [10]:
AUG_DIR_PATH = '/content/augmentation'
AUG_BASE_PATH = '/content/classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x-aug'
NEW_BASE_PATH = '/content/classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x-balanced'
FINAL_BASE_PATH = '/content/classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x-final'

target_count = 170
classes = ['avulsion', 'comminuted', 'fracture_dislocation', 'greenstick', 'hairline', 'impacted', 'longitudinal', 'oblique', 'pathological', 'spiral']

In [11]:
os.makedirs(NEW_BASE_PATH, exist_ok=True)

for cls in classes:
    os.makedirs(os.path.join(NEW_BASE_PATH, 'train', cls), exist_ok=True)
    os.makedirs(os.path.join(NEW_BASE_PATH, 'test', cls), exist_ok=True)

In [12]:
for cls in classes:
    train_path = os.path.join(BASE_PATH, cls, 'train')
    test_path = os.path.join(BASE_PATH, cls, 'test')

    all_images = os.listdir(train_path) + os.listdir(test_path)

    random.shuffle(all_images)

    test_images = all_images[:34]

    train_images = all_images[34:]

    for img in test_images:
        src = os.path.join(train_path, img) if img in os.listdir(train_path) else os.path.join(test_path, img)
        dest = os.path.join(NEW_BASE_PATH, 'test', cls, img)
        shutil.copyfile(src, dest)

    for img in train_images:
        src = os.path.join(train_path, img) if img in os.listdir(train_path) else os.path.join(test_path, img)
        dest = os.path.join(NEW_BASE_PATH, 'train', cls, img)
        shutil.copyfile(src, dest)

In [13]:
file_counts = utils.count_files_in_directory(NEW_BASE_PATH, classes)

for split in ['train', 'test']:
    print(f"\n{split.upper()}:")
    for cls in classes:
        print(f"{cls}: {file_counts[split][cls]} imagens")


TRAIN:
avulsion: 89 imagens
comminuted: 114 imagens
fracture_dislocation: 122 imagens
greenstick: 88 imagens
hairline: 77 imagens
impacted: 50 imagens
longitudinal: 46 imagens
oblique: 51 imagens
pathological: 100 imagens
spiral: 52 imagens

TEST:
avulsion: 34 imagens
comminuted: 34 imagens
fracture_dislocation: 34 imagens
greenstick: 34 imagens
hairline: 34 imagens
impacted: 34 imagens
longitudinal: 34 imagens
oblique: 34 imagens
pathological: 34 imagens
spiral: 34 imagens


In [14]:
file_counts = utils.count_files_in_directory(NEW_BASE_PATH, classes)

In [15]:
for class_label in classes:
    train_path = os.path.join(NEW_BASE_PATH, 'train', class_label)
    aug_train_path = os.path.join(AUG_BASE_PATH, 'train', class_label)
    os.makedirs(aug_train_path, exist_ok=True)

    class_images = os.listdir(train_path)
    existing_images = len(class_images)

    for img_name in class_images:
        src = os.path.join(train_path, img_name)
        dest = os.path.join(aug_train_path, img_name)
        shutil.copy(src, dest)

    if existing_images < target_count:
        augment_count = target_count - existing_images
        counter = existing_images

        for img_name in class_images:
            img_path = os.path.join(train_path, img_name)
            for _ in range(augment_count // existing_images + (1 if counter < target_count else 0)):
                if counter < target_count:
                    utils.save_augmented_images(img_path, class_label, 1, counter, AUG_BASE_PATH)
                    counter += 1

print("Image augmentation completed!")

Image augmentation completed!


In [16]:
aug_file_counts = utils.count_files_in_directory(AUG_BASE_PATH, classes)

print("\nTRAIN (Augmented):")
for cls in classes:
    print(f"{cls}: {aug_file_counts['train'][cls]} imagens")


TRAIN (Augmented):
avulsion: 170 imagens
comminuted: 170 imagens
fracture_dislocation: 170 imagens
greenstick: 170 imagens
hairline: 170 imagens
impacted: 170 imagens
longitudinal: 170 imagens
oblique: 170 imagens
pathological: 170 imagens
spiral: 170 imagens


In [17]:
for cls in classes:
    os.makedirs(os.path.join(FINAL_BASE_PATH, cls, 'train'), exist_ok=True)
    os.makedirs(os.path.join(FINAL_BASE_PATH, cls, 'test'), exist_ok=True)


for cls in classes:
    aug_train_path = os.path.join(AUG_BASE_PATH, 'train', cls)
    orig_train_path = os.path.join('/content/classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x-balanced', 'train', cls)
    orig_test_path = os.path.join('/content/classificacao-imagem-tipos-fraturas-ossos-imagens-raio-x-balanced', 'test', cls)

    final_train_path = os.path.join(FINAL_BASE_PATH, cls, 'train')
    final_test_path = os.path.join(FINAL_BASE_PATH, cls, 'test')

    utils.copy_files(aug_train_path, final_train_path)

    utils.copy_files(orig_test_path, final_test_path)

print("Arquivos copiados para a nova estrutura de diretórios.")


Arquivos copiados para a nova estrutura de diretórios.


In [18]:
final_file_counts = utils.count_files_in_final_directory(FINAL_BASE_PATH, classes)

In [19]:
for split in ['train', 'test']:
    print(f"\n{split.upper()}:")
    for cls in classes:
        print(f"{cls}: {final_file_counts[split][cls]} imagens")


TRAIN:
avulsion: 170 imagens
comminuted: 170 imagens
fracture_dislocation: 170 imagens
greenstick: 170 imagens
hairline: 170 imagens
impacted: 170 imagens
longitudinal: 170 imagens
oblique: 170 imagens
pathological: 170 imagens
spiral: 170 imagens

TEST:
avulsion: 34 imagens
comminuted: 34 imagens
fracture_dislocation: 34 imagens
greenstick: 34 imagens
hairline: 34 imagens
impacted: 34 imagens
longitudinal: 34 imagens
oblique: 34 imagens
pathological: 34 imagens
spiral: 34 imagens


In [20]:
zip_path = '/content/classificacao-imagem-tipos-fraturas-ossos-imagens.zip'

shutil.make_archive(zip_path.replace('.zip', ''), 'zip', FINAL_BASE_PATH)

'/content/classificacao-imagem-tipos-fraturas-ossos-imagens.zip'