# Model Training Setup for Magnetotelluric Inversion

This script sets up a **deep learning model** for **MT inversion** using TensorFlow/Keras and Scikit-learn for preprocessing.

## **Libraries Used**:
- **Numerical and Scientific**: `numpy`, `scipy` for data manipulation and smoothing.
- **Preprocessing**: `MinMaxScaler` for normalizing data, `train_test_split` for data splitting.
- **Modeling**: Keras layers like `Dense`, `Dropout`, `MultiHeadAttention` for building the neural network.
- **Utilities**: `joblib` for saving scalers, `os` for file handling.



In [None]:
import numpy as np
from scipy.interpolate import make_interp_spline
from scipy.ndimage import gaussian_filter1d
from sklearn.preprocessing import MinMaxScaler
import os
import joblib
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, LayerNormalization, MultiHeadAttention, Dropout, Layer, Flatten, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split



# Data Loading for Model Training

This script loads pre-saved data for model training.

## **Data Files Loaded**:
- `X_combined2.npy`: Combined apparent resistivity and phase responses (Shape: `(100000, 90, 2)`).
- `X_model_scaled2.npy`: Normalized resistivity profiles (Shape: `(100000, 100)`).

The data is loaded from the specified `load_path` and printed as a confirmation message.


In [None]:

load_path = "" #Path to where the data is saved
X_combined = np.load(os.path.join(load_path, 'X_combined2.npy'))  # Shape: (100000, 90, 2)
X_model_scaled = np.load(os.path.join(load_path, 'X_model_scaled2.npy'))  # Shape: (100000, 100)
print("Data loaded successfully.")

# Transformer Attention Block

This class defines a **Transformer-based attention block** for deep learning models, implementing self-attention and feedforward layers.

## **Components**:
- **Self-attention**: Uses `MultiHeadAttention` to capture dependencies in input data.
- **Feedforward Network (FFN)**: A two-layer fully connected network with ReLU activation.
- **Layer Normalization**: Normalizes outputs to stabilize training.
- **Dropout**: Applied after attention and FFN to prevent overfitting.

## **Forward Pass**:
1. Computes self-attention.
2. Applies layer normalization.
3. Passes through a feedforward network.
4. Applies dropout and final normalization.

In [None]:
class TransformerAttentionBlock(Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, dropout_rate=0.01):
        super(TransformerAttentionBlock, self).__init__()
        self.attention = MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = tf.keras.Sequential([
            Dense(ff_dim, activation="relu"),  # Feedforward network
            Dense(embed_dim)
        ])
        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)
        self.dropout1 = Dropout(dropout_rate)
        self.dropout2 = Dropout(dropout_rate)

    def call(self, inputs, training=False):
        # Self-attention
        attn_output = self.attention(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)

        # Feedforward network
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)




# Graph Attention Layer

This class implements a **Graph Attention Layer (GAT)**, which applies attention mechanisms to graph-structured data.

## **Components**:
- **Weights**: Learnable weight matrices for input features and attention scores.
- **Attention Mechanism**: Computes attention scores for each node pair in the graph using **LeakyReLU** activation.
- **Masking**: Masks out non-existent edges using the adjacency matrix.
- **Softmax**: Normalizes attention scores across neighboring nodes.
- **Dropout**: Applied to attention scores during training to prevent overfitting.

## **Forward Pass**:
1. Computes node feature transformation (`XW`).
2. Calculates attention scores (`e_src` and `e_dst`).
3. Applies softmax and dropout.
4. Aggregates the output either by concatenation or averaging.

This layer is used to enhance the learning of node relationships in graph neural networks.


In [None]:
# Graph Attention Layer
class GraphAttentionLayer(Layer):
    def __init__(self, output_dim, num_heads=4, concat=False, dropout_rate=0.01, activation=None, **kwargs):
        super(GraphAttentionLayer, self).__init__(**kwargs)
        self.output_dim = output_dim
        self.num_heads = num_heads
        self.concat = concat
        self.dropout_rate = dropout_rate
        self.activation = activation

    def build(self, input_shape):
        input_dim = input_shape[0][-1]
        self.W = self.add_weight(shape=(input_dim, self.num_heads * self.output_dim),
                                 initializer='glorot_uniform', name='W')
        self.a_src = self.add_weight(shape=(self.num_heads, self.output_dim, 1),
                                     initializer='glorot_uniform', name='a_src')
        self.a_dst = self.add_weight(shape=(self.num_heads, self.output_dim, 1),
                                     initializer='glorot_uniform', name='a_dst')
        super(GraphAttentionLayer, self).build(input_shape)

    def call(self, inputs, training=False):
        X, adjacency = inputs
        batch_size = tf.shape(X)[0]
        num_nodes = tf.shape(X)[1]

        XW = tf.matmul(X, self.W)
        XW = tf.reshape(XW, (batch_size, num_nodes, self.num_heads, self.output_dim))
        XW = tf.transpose(XW, perm=[0, 2, 1, 3])

        e_src = tf.einsum('hdf,bhnd->bhn', self.a_src, XW)
        e_dst = tf.einsum('hdf,bhnd->bhn', self.a_dst, XW)

        e_src_expanded = tf.expand_dims(e_src, axis=-1)
        e_dst_expanded = tf.expand_dims(e_dst, axis=2)
        e = e_src_expanded + e_dst_expanded

        e = tf.nn.leaky_relu(e, alpha=0.2)
        mask = adjacency > 0
        mask = tf.expand_dims(mask, axis=1)
        e = tf.where(mask, e, tf.fill(tf.shape(e), -1e9))

        alpha = tf.nn.softmax(e, axis=-1)
        alpha = tf.nn.dropout(alpha, rate=self.dropout_rate) if training else alpha
        out = tf.matmul(alpha, XW)

        if self.concat:
            out = tf.transpose(out, perm=[0, 2, 1, 3])
            out = tf.reshape(out, (batch_size, num_nodes, self.num_heads * self.output_dim))
        else:
            out = tf.reduce_mean(out, axis=1)

        if self.activation:
            out = self.activation(out)
        return out




# GAT Model with Attention Blocks and Dense Layers

This class defines a **Graph Attention Network (GAT)** with two attention blocks followed by dense layers for prediction.

## **Components**:
1. **Graph Attention Layers**:
   - First GAT layer (`gat1`) with multi-head attention and concatenation.
   - Second GAT layer (`gat2`) with single-head attention and averaging.
2. **Transformer Attention Blocks**:
   - Two **TransformerAttentionBlock** layers for capturing sequential dependencies.
3. **Dense Layers**:
   - Two dense layers with 1024 units for final feature transformation.
   - Final dense layer outputs the predictions with the specified `output_dim`.
4. **Global Pooling**:
   - `GlobalAveragePooling1D` used to aggregate graph-level features.

## **Forward Pass**:
1. Passes input through GAT layers with dropout.
2. Applies transformer attention blocks.
3. Reduces the attention output and applies dense layers.
4. Returns final prediction after dense transformations.

## **Adjacency Matrix**:
- Generates a **local adjacency matrix** based on node connectivity.


In [None]:

class GAT(Model):
    def __init__(self, num_heads=8, hidden_dim=64, output_dim=105, dropout_rate=0.1, **kwargs):
        super(GAT, self).__init__(**kwargs)
        self.gat_output_dim = output_dim
        self.dropout_rate = dropout_rate
        self.gat1 = GraphAttentionLayer(output_dim=hidden_dim, num_heads=num_heads, concat=True,
                                        dropout_rate=dropout_rate, activation=tf.nn.elu)
        self.gat2 = GraphAttentionLayer(output_dim=hidden_dim, num_heads=1, concat=False,
                                        dropout_rate=dropout_rate, activation=tf.nn.elu)
        self.global_pool = GlobalAveragePooling1D()

        self.transformer_attention1 = TransformerAttentionBlock(embed_dim=16, num_heads=4, ff_dim=64, dropout_rate=0.01)
        self.transformer_attention2 = TransformerAttentionBlock(embed_dim=16, num_heads=4, ff_dim=64, dropout_rate=0.01)

        self.dense1 = Dense(1024)
        self.dense2 = Dense(1024)
        self.final_dense = Dense(output_dim)

    def call(self, inputs, training=False):
        X, adjacency = inputs
        X = tf.nn.dropout(X, rate=self.dropout_rate) if training else X
        X = self.gat1((X, adjacency), training=training)
        X = tf.nn.dropout(X, rate=self.dropout_rate) if training else X
        X = self.gat2((X, adjacency), training=training)
        X = self.global_pool(X)

        # Reshape for attention blocks
        reshaped_output = tf.reshape(X, (-1, 4, 16))

        # first attention block
        attention_output = self.transformer_attention1(reshaped_output, training=training)

        # second attention block
        attention_output = self.transformer_attention2(attention_output, training=training)

        # Aggregate attention outputs
        aggregated_output = tf.reduce_mean(attention_output, axis=1)  # Shape: (batch_size, embed_dim)

        # dense layers
        X = self.dense1(aggregated_output)
        X = self.dense2(X)
        # Final output
        X = self.final_dense(X)
        return X

# Adjacency Matrix
def create_adjacency_matrix(num_nodes=90, k=5):
    adj = np.zeros((num_nodes, num_nodes), dtype=np.float32)
    for i in range(num_nodes):
        for j in range(max(0, i - k), min(num_nodes, i + k + 1)):
            if i != j:
                adj[i, j] = 1.0
    np.fill_diagonal(adj, 1.0)
    return adj



# Preparing and Training GAT Model

## **Steps**:

### 1. **Adjacency Matrix**:
   - Created using the function `create_adjacency_matrix(num_nodes=90, k=5)` for node connectivity.

### 2. **Data Preparation**:
   - `X_combined` as node features and `X_model_scaled` as targets.
   - Replicated adjacency matrix for all samples.
   - Split dataset into training and validation sets (90%-10%).

### 3. **TensorFlow Tensors**:
   - Converted data to **TensorFlow tensors** for model compatibility.

### 4. **TensorFlow Datasets**:
   - Created TensorFlow datasets for training and validation with shuffling and batching.

### 5. **GAT Model Setup**:
   - Defined GAT model with **Graph Attention Layers**, **Transformer Attention Blocks**, and **Dense Layers**.

### 6. **Compilation & Training**:
   - Compiled model with **Adam optimizer**, **mean squared error loss**, and **MAE metric**.
   - Trained the model on the dataset for 100 epochs.

In [None]:
adjacency_matrix = create_adjacency_matrix(num_nodes=90, k=5)

X = X_combined
y = X_model_scaled

# Replicate adjacency matrix for all samples
num_samples = X.shape[0]
A = np.tile(adjacency_matrix, (num_samples, 1, 1))

# Split Dataset into Training and Validation Sets
X_train, X_val, A_train, A_val, y_train, y_val = train_test_split(
    X, A, y, test_size=0.2, random_state=42
)

# Convert to TensorFlow Tensors
X_train = tf.convert_to_tensor(X_train, dtype=tf.float32)
X_val = tf.convert_to_tensor(X_val, dtype=tf.float32)
A_train = tf.convert_to_tensor(A_train, dtype=tf.float32)
A_val = tf.convert_to_tensor(A_val, dtype=tf.float32)
y_train = tf.convert_to_tensor(y_train, dtype=tf.float32)
y_val = tf.convert_to_tensor(y_val, dtype=tf.float32)

buffer_size = X_train.shape[0]

# Create TensorFlow datasets
train_dataset = tf.data.Dataset.from_tensor_slices(((X_train, A_train), y_train)).shuffle(buffer_size).batch(64).prefetch(tf.data.AUTOTUNE)
val_dataset = tf.data.Dataset.from_tensor_slices(((X_val, A_val), y_val)).batch(64).prefetch(tf.data.AUTOTUNE)

print("Training and validation datasets prepared.")

# Define and Train the GAT Model
gat_model = GAT(num_heads=8, hidden_dim=64, output_dim=100, dropout_rate=0.01)

# Compile the Model
gat_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss='mean_squared_error',
    metrics=['mae']
)

# Build the Model
node_features_input = Input(shape=(90, 2), dtype=tf.float32)
adjacency_input = Input(shape=(90, 90), dtype=tf.float32)
gat_output = gat_model((node_features_input, adjacency_input))

# Train the Model
history = gat_model.fit(train_dataset, validation_data=val_dataset, epochs=100)
