# Klasifikacija prirodnih nepogode
___
<br>

Za realizaciju ovoga projekta, korišten je skup podataka - [disaster dataset](https://www.kaggle.com/datasets/sarthaktandulje/disaster-damage-5class). Ovaj skup podataka sadrži slike prirodnih nepogoda:
- požar
- dim(mogući požar)
- poplava
- klizište

Također, skup sadrži i slike na kojima nisu prikazane navedene nepogode.

### 1. Uvoz skupa podataka

In [1]:
from glob import glob

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### 1.1 Struktura skupa podataka

In [None]:
from tabulate import tabulate

directories = glob('/content/drive/MyDrive/disaster_dataset/*')

dataset_structure = [[x, len(glob(x + '/*'))] for x in directories]

print(tabulate([[x, len(glob(x + '/*'))] for x in directories], headers=['Direktorij', 'Broj slika'], tablefmt="grid"))

print(f'\nUkupan broj slika: {sum([len(glob(x + "/*")) for x in directories])}')

+---------------------------------------------------+--------------+
| Direktorij                                        |   Broj slika |
| /content/drive/MyDrive/disaster_dataset/smoke     |          302 |
+---------------------------------------------------+--------------+
| /content/drive/MyDrive/disaster_dataset/normal    |         2226 |
+---------------------------------------------------+--------------+
| /content/drive/MyDrive/disaster_dataset/fire      |         2537 |
+---------------------------------------------------+--------------+
| /content/drive/MyDrive/disaster_dataset/flood     |         2706 |
+---------------------------------------------------+--------------+
| /content/drive/MyDrive/disaster_dataset/landslide |          310 |
+---------------------------------------------------+--------------+

Ukupan broj slika: 8081


### 1.2 Odabir uzorka

Zbog velike neuravnoteženosti skupa podataka, moglo bi doći do pristranosti modela klasama `normal`, `fire` i `flood`. Primjerice, skup slika klase `flood` je oko 9 puta veći od skupova klasa `smoke` i `landslide`.

Metode kojima se ovakav tip problema redovito rješava su:

<br>

| Metoda | Opis |
| :--- | :--- |
| **Ponderiranje klasa (Class Weighting)** | Dodjeljuju se veće težine (weights) manjinskim klasama <br><br> (Dim: 302, Klizište: 310) u funkciji gubitka (loss function). <br><br> To prisiljava model da pridaje veću važnost **ispravnom** klasificiranju tih rjeđih primjera. <br><br> Ovo je često najjednostavniji i najučinkovitiji pristup. |
| **Oversampling** | **Povećavanje** broja primjera u manjinskim klasama <br><br>(npr. tehnikama poput **SMOTE** ili jednostavno ponavljanjem postojećih slika) <br><br>kako bi se izjednačio broj s dominantnim klasama. |
| **Undersampling** | **Smanjivanje** broja primjera u dominantnim klasama <br><br>(Požar: 2537, Poplava: 2706) nasumičnim uklanjanjem slika. <br><br>**Oprez:** Gubitak potencijalno važnih informacija je rizik. |
| **Augmentacija podataka (Data Augmentation)** | Stvaranje novih primjera iz postojećih manjinskih slika primjenom transformacija <br><br>(npr. rotacija, zrcaljenje, izrezivanje, promjena svjetline) <br><br>kako bi se umjetno povećao njihov broj i raznolikost. |

<br>

Za potrebe ovoga projekta, iskoristit će se metoda undersamplinga kako bi se smanjila iskorištenost računalnih resursa i skratilo vrijeme učenja modela.
To znači da će skupovi `smoke` i `landslide` ostati gotovo isti, a iz klasa `fire`, `flood` i `normal` će se uzeti nasumično odabrani uzorak koji će biti blizu broja podataka u najmanjem skupu.

<br>

Kako je klasa `smoke`, klasa koja ima najmanje podataka (302), zaokružit ćemo taj broj, te uzeti slučajni uzorak od 300 slika iz svake klase.

In [None]:
import numpy as np

In [None]:
np.random.seed(9)

fire_sample = np.random.choice(glob(directories[0] + '/*.jpg'), 300, replace=False)
flood_sample = np.random.choice(glob(directories[1] + '/*.jpg'), 300, replace=False)
normal_sample = np.random.choice(glob(directories[2] + '/*.jpg'), 300, replace=False)
smoke_sample = np.random.choice(glob(directories[3] + '/*.jpg'), 300, replace=False)
landslide_sample = np.random.choice(glob(directories[4] + '/*.jpg'), 300, replace=False)

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split

all_paths = np.concatenate([
    fire_sample,
    flood_sample,
    normal_sample,
    smoke_sample,
    landslide_sample
])
# 0=fire, 1=flood, 2=normal, 3=smoke, 4=landslide (primjer)
labels = np.array(
    [0] * 300 +
    [1] * 300 +
    [2] * 300 +
    [3] * 300 +
    [4] * 300
)


In [None]:
X_train_paths, X_temp_paths, y_train, y_temp = train_test_split(
    all_paths, labels, test_size=0.3, random_state=9, stratify=labels
)

X_val_paths, X_test_paths, y_val, y_test = train_test_split(
    X_temp_paths, y_temp, test_size=0.5, random_state=9, stratify=y_temp
)

print(f"Trening skup (70%): {len(X_train_paths)} slika") # 1050
print(f"Validacija skup (15%): {len(X_val_paths)} slika") # 225
print(f"Test skup (15%): {len(X_test_paths)} slika") # 225

Trening skup (70%): 1050 slika
Validacija skup (15%): 225 slika
Test skup (15%): 225 slika


In [None]:
!pip install opencv-python
import os
import shutil
import cv2
from tensorflow.keras.utils import Sequence
import tensorflow as tf

LOCAL_BASE_DIR = '/tmp/disaster_data_local/'
os.makedirs(LOCAL_BASE_DIR, exist_ok=True)
LOCAL_TRAIN_DIR = os.path.join(LOCAL_BASE_DIR, 'train')
LOCAL_VAL_DIR = os.path.join(LOCAL_BASE_DIR, 'val')
LOCAL_TEST_DIR = os.path.join(LOCAL_BASE_DIR, 'test')

class_names = ['fire', 'flood', 'normal', 'smoke', 'landslide']
for name in class_names:
    os.makedirs(os.path.join(LOCAL_TRAIN_DIR, name), exist_ok=True)
    os.makedirs(os.path.join(LOCAL_VAL_DIR, name), exist_ok=True)
    os.makedirs(os.path.join(LOCAL_TEST_DIR, name), exist_ok=True)

def copy_and_update_paths(old_paths, labels, local_base_dir):
    new_paths = []
    for old_path, label in zip(old_paths, labels):
        class_name = class_names[label]
        local_class_dir = os.path.join(local_base_dir, class_name)
        file_name = os.path.basename(old_path)
        new_path = os.path.join(local_class_dir, file_name)
        shutil.copy2(old_path, new_path)
        new_paths.append(new_path)
    return np.array(new_paths)

print("Kopiranje fajlova s Drive-a na lokalni disk Colaba...")

X_train_local = copy_and_update_paths(X_train_paths, y_train, LOCAL_TRAIN_DIR)
X_val_local = copy_and_update_paths(X_val_paths, y_val, LOCAL_VAL_DIR)
X_test_local = copy_and_update_paths(X_test_paths, y_test, LOCAL_TEST_DIR)

print("\nKopiranje uspješno završeno. Inicijalizacija generatora...")


class CustomDataGenerator(Sequence):
    def __init__(self, image_paths, labels, batch_size, target_size=(224, 224), num_classes=5, shuffle=True):
        self.image_paths = image_paths
        self.labels = labels
        self.batch_size = batch_size
        self.target_size = target_size
        self.num_classes = num_classes
        self.shuffle = shuffle
        self.on_epoch_end()

    def __len__(self):
        return int(np.floor(len(self.image_paths) / self.batch_size))

    def __getitem__(self, index):
        indices = self.indices[index*self.batch_size:(index+1)*self.batch_size]
        batch_paths = [self.image_paths[k] for k in indices]
        batch_labels = [self.labels[k] for k in indices]

        X = np.empty((self.batch_size, *self.target_size, 3), dtype=np.float32)

        for i, path in enumerate(batch_paths):

            img_cv2 = cv2.imread(path)

            if img_cv2 is None:
                print(f"Upozorenje: OpenCV nije uspio učitati sliku na lokalnom disku: {path}")
                continue

            # Konverzija iz BGR (OpenCV standard) u RGB (Keras/TensorFlow standard)
            img_rgb = cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB)

            img_resized = cv2.resize(img_rgb, self.target_size)

            img_array = img_resized.astype('float32') / 255.0

            X[i,] = img_array

        # One-hot encoding
        y = tf.keras.utils.to_categorical(batch_labels, num_classes=self.num_classes)

        return X, y

    def on_epoch_end(self):
        self.indices = np.arange(len(self.image_paths))
        if self.shuffle:
            np.random.shuffle(self.indices)


BATCH_SIZE = 32
TARGET_SIZE = (224, 224)

train_generator = CustomDataGenerator(X_train_local, y_train, BATCH_SIZE, TARGET_SIZE, shuffle=True)
validation_generator = CustomDataGenerator(X_val_local, y_val, BATCH_SIZE, TARGET_SIZE, shuffle=False)
test_generator = CustomDataGenerator(X_test_local, y_test, BATCH_SIZE, TARGET_SIZE, shuffle=False)

Kopiranje fajlova s Drive-a na lokalni disk Colaba...

Kopiranje uspješno završeno. Inicijalizacija generatora...


In [None]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

NUM_CLASSES = 5

base_model = VGG16(
    weights='imagenet',
    include_top=False,
    input_shape=(TARGET_SIZE[0], TARGET_SIZE[1], 3)
)

for layer in base_model.layers:
    layer.trainable = False

x = base_model.output
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(NUM_CLASSES, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=predictions)

model.compile(
    optimizer=Adam(learning_rate=0.0001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()


  self._warn_if_super_not_called()


Epoch 1/20
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m842s[0m 26s/step - accuracy: 0.3462 - loss: 1.6765 - val_accuracy: 0.7232 - val_loss: 0.8117
Epoch 2/20
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m800s[0m 25s/step - accuracy: 0.7650 - loss: 0.7289 - val_accuracy: 0.7812 - val_loss: 0.6175
Epoch 3/20
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m792s[0m 25s/step - accuracy: 0.8318 - loss: 0.4909 - val_accuracy: 0.8438 - val_loss: 0.4957
Epoch 4/20
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m797s[0m 25s/step - accuracy: 0.8710 - loss: 0.3793 - val_accuracy: 0.8170 - val_loss: 0.4705
Epoch 5/20
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m859s[0m 27s/step - accuracy: 0.9430 - loss: 0.2547 - val_accuracy: 0.8482 - val_loss: 0.4406
Epoch 6/20
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m817s[0m 26s/step - accuracy: 0.9457 - loss: 0.2106 - val_accuracy: 0.8482 - val_loss: 0.4175
Epoch 7/20
[1m32/32[0m [3

In [None]:

NUM_EPOCHS = 20

history = model.fit(
    train_generator,
    epochs=NUM_EPOCHS,
    validation_data=validation_generator
)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

loss, accuracy = model.evaluate(test_generator)
print(f"Testna točnost: {accuracy*100:.2f}%")

y_pred_probs = model.predict(test_generator)
y_pred_classes = np.argmax(y_pred_probs, axis=1)

y_true_one_hot = []
for i in range(len(test_generator)):
    _, batch_y = test_generator[i]
    y_true_one_hot.extend(batch_y)
y_true = np.argmax(y_true_one_hot, axis=1)


target_names = ['fire', 'flood', 'normal', 'smoke', 'landslide']
print("\n--- Classification Report ---")
print(classification_report(y_true, y_pred_classes, target_names=target_names))

cm = confusion_matrix(y_true, y_pred_classes)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=target_names, yticklabels=target_names)
plt.title('Matrica konfuzije')
plt.ylabel('Stvarna klasa')
plt.xlabel('Predviđena klasa')
plt.show()