<a href="https://colab.research.google.com/github/ANadalCardenas/attention/blob/main/wildfire_model_version_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **First version model**

I prepared a very basic proposal. I assumed that we would use the Sentinel-2 training dataset, which includes all seven bands, but this is just an assumption.

# Imports and environment setup:

In [2]:
# Numerical operations and array handling
import numpy as np

# TensorFlow and Keras for building and training the CNN
import tensorflow as tf
from tensorflow.keras import layers, models

# Plotting training curves
import matplotlib.pyplot as plt

# Monitoring
from sklearn.metrics import precision_score, recall_score, f1_score

# Load the dataset:

In the real project:
 - Loads Sentinel-2 image patches and labels

 - Each sample is a 64*64 patch with 7 spectral bands

 - Labels are binary:

   - 0 → no wildfire

   - 1 → wildfire

Now, we simulate data, in the real project, this will come from Copernicus Data Space (or whatever we decide)

In [3]:
# Number of samples (for example purposes only)
N = 1000

# X contains image patches:
# shape = (number_of_samples, height, width, bands)
# Now, we will atke into account all of 7 bands, but also could be less than 7
X = np.random.rand(N, 64, 64, 7).astype(np.float32)

# y contains labels:
# 0 = no fire, 1 = fire
y = np.random.randint(0, 2, size=(N,))

# Train / validation split

Splits the dataset into:

80% training (*ask to Amanda whats the best split*)

20% validation (*the same*)

Validation data is never seen during training.

In [4]:
# Index where we split the dataset
split_index = int(0.8 * N)

# Training data
X_train = X[:split_index]
y_train = y[:split_index]

# Validation data
X_val = X[split_index:]
y_val = y[split_index:]

# Band-wise normalization
We must do that because Sentinel-2 bands have very different value ranges. Each band measures different physical phenomena, so the numerical values are not comparable.If one band has much larger values, it dominates the sum. So, we must do the normalization because:

 - Improves convergence

 - Makes training stable

 - Is essential for CNNs

 - Normalizes each band independently

 - Applies:
\[
x_{\text{norm}} = \frac{x - \mu}{\sigma}
\]

In [5]:
def normalize_per_band(X):
    """
    Normalize each spectral band independently.

    X shape: (N = 1000, H = 64, W = 64, C = 7)
    """
    # Create a copy to avoid modifying the original data
    X_normalized = X.copy()

    # Loop over each spectral band
    for band in range(X.shape[-1]):
        # Compute mean of this band. With this annotation X[..., band].shape = (N, H, W)
        mean = X[..., band].mean()

        # Compute standard deviation (small value added to avoid division by zero)
        std = X[..., band].std() + 1e-6

        # Normalize the band
        X_normalized[..., band] = (X[..., band] - mean) / std

    return X_normalized


# CNN + MaxPooling model (from scratch)
This CNN takes an input patch (64, 64, 7) and outputs a probability between 0 and 1:

Close to 1 → likely fire

Close to 0 → likely no fire

*Maybe, we can improve this part using logids. We can ask to Amanda as well as a recomendation of the number of layers*

In [6]:
# Create a CNN model as a stack of layers
model = models.Sequential([

    # Input shape: 64x64 patch with 7 spectral bands
    layers.Input(shape=(64, 64, 7)),

    # ---- Block 1 ----
    # Conv2D learns local patterns; 16 filters means 16 pattern detectors.
    # padding = "same". It means that the output feature map has the same height and width as the input.
    layers.Conv2D(16, 3, padding='same', activation='relu'),

    # MaxPooling reduces resolution and computation (64x64 -> 32x32)
    layers.MaxPooling2D(2),

    # ---- Block 2 ----
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(2),  # 32x32 -> 16x16

    # ---- Block 3 ----
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(2),  # 16x16 -> 8x8

    # GlobalAveragePooling compresses (8, 8, 64) into (64,)
    # This is fast and avoids many parameters
    layers.GlobalAveragePooling2D(),

    # Final classifier:
    # Sigmoid converts the output into a probability in [0, 1]
    layers.Dense(1, activation='sigmoid')
])


# Compile the model

 - Adam optimizer updates the CNN weights

 - Binary cross-entropy is used for fire/no-fire classification

 - Accuracy is monitored


In [7]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),  # Learning rate controls step size
    loss='binary_crossentropy',                              # Standard loss for sigmoid binary classifier
    metrics=['accuracy']                                     # Monitor accuracy during training.
                                                             # We can add more metrics to monitoring
)

model.summary()

# Train the model

Fits the CNN to the training data and checks progress on validation data after each epoch.

In [None]:
history = model.fit(
    X_train, y_train,                    # Training patches and labels
    validation_data=(X_val, y_val),      # Validation patches and labels
    epochs=50,                           # Number of training epochs
    batch_size=32                        # Samples per batch
)


Epoch 1/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 173ms/step - accuracy: 0.5027 - loss: 0.6942 - val_accuracy: 0.4350 - val_loss: 0.6981
Epoch 2/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 125ms/step - accuracy: 0.5261 - loss: 0.6916 - val_accuracy: 0.4350 - val_loss: 0.6996
Epoch 3/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 123ms/step - accuracy: 0.5037 - loss: 0.6933 - val_accuracy: 0.4350 - val_loss: 0.7012
Epoch 4/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 169ms/step - accuracy: 0.5388 - loss: 0.6897 - val_accuracy: 0.4350 - val_loss: 0.7001
Epoch 5/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 128ms/step - accuracy: 0.4807 - loss: 0.6944 - val_accuracy: 0.4350 - val_loss: 0.7061
Epoch 6/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 122ms/step - accuracy: 0.5358 - loss: 0.6911 - val_accuracy: 0.4350 - val_loss: 0.7009
Epoch 7/50
[1m25/25[0m [3

# Training curves (optional but useful)

Plots training and validation accuracy to detect overfitting.

In [None]:


plt.plot(history.history['accuracy'], label='Training accuracy')
plt.plot(history.history['val_accuracy'], label='Validation accuracy')
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.show()


# Inference (predictions)

In [None]:
# Select a single patch (batch size 1)
patch = X_val[0:1]  # Shape: (1, 64, 64, 7)

# Predict returns probability directly because of sigmoid
prob_fire = model.predict(patch)[0, 0]

print("Fire probability:", prob_fire)

# Convert probability into a final decision using a threshold (Maybe we must try different tresholds)
if prob_fire > 0.5:
    print("Wildfire detected")
else:
    print("No wildfire detected")


Batch prediction on many patches:

In [None]:
# Predict probabilities for all validation patches
probs = model.predict(X_val).reshape(-1)     # Shape: (N,)

# Convert probabilities to hard class predictions (0/1)
preds = (probs > 0.5).astype(int)

# Compute accuracy manually
acc = (preds == y_val).mean()

print("Manual validation accuracy:", acc)


# Evaluation
I haven't collected any test data yet, but once we have it, we can test the model as follows:

In [None]:
test_loss, test_acc = model.evaluate(x_test, y_test) # Not created yet
print("Test loss:", test_loss)
print("Test accuracy:", test_acc)


# More monitoring metrics

In [None]:
# Predict probabilities on test set
test_probs = model.predict(X_test).reshape(-1)

# Convert probabilities to predictions
test_pred = (test_probs > 0.5).astype(int)

print("Precision:", precision_score(y_test, test_pred))
print("Recall:", recall_score(y_test, test_pred))
print("F1-score:", f1_score(y_test, test_pred))