## Data Generation

**Import the necessary Libraries**

In [None]:
import numpy as np
from scipy.interpolate import make_interp_spline
from scipy.ndimage import gaussian_filter1d
from sklearn.preprocessing import MinMaxScaler
import joblib
import tensorflow as tf

# Magnetotelluric Model Parameters

This section initializes fundamental parameters required for simulating magnetotelluric (MT) responses.

## Constants
- **Permeability of Free Space (`μ₀`)**:
  Defined as \(4\pi \times 10^{-7}\) H/m, which is a fundamental physical constant used in electromagnetic field calculations.

## Frequency and Period Arrays
- **Frequencies (`frequencies`)**:
  A predefined array of frequency values (in Hz) spanning a wide range, used for analyzing the subsurface response at different depths.
- **Periods (`periods`)**:
  Computed as the inverse of frequencies (\(T = \frac{1}{f}\)), representing the corresponding time periods in seconds.

## Resistivity Parameters
- **`resistivity_range`**: Defines the minimum and maximum resistivities in Ohm·m for the subsurface layers.
- **`rho_ref`**: A reference resistivity value (50 Ohm·m), typically used for normalization or as a baseline model.

## Layered Earth Model
- **Number of Layers (`num_layers`)**:
  The model consists of **50 layers**, each with a specified resistivity and thickness.
- **Depth Calculation (`depths`)**:
  - Starts with an initial depth of **8 meters**.
  - Subsequent depths are computed by multiplying the last depth by a factor of **1.2078** to ensure a gradual increase, reaching approximately **100,000 meters**.
- **Layer Thickness (`layer_thicknesses`)**:
  - Computed as the difference between consecutive depths.
  - The first depth value is inserted as the initial layer thickness.



In [None]:

mu_0 = 4 * np.pi * 1e-7  # Permeability of free space (H/m)
frequencies = np.array([1.000000e+04, 8.799998e+03, 7.200000e+03, 6.000000e+03, 5.200001e+03,  4.400000e+03, 3.600000e+03, 3.000001e+03, 2.600000e+03, 2.200000e+03,  1.800000e+03, 1.500000e+03, 1.300000e+03, 1.058824e+03, 9.176473e+02,  7.764706e+02, 6.352943e+02, 5.294117e+02, 4.588235e+02, 3.882354e+02,  3.176470e+02, 2.647059e+02, 2.294118e+02, 1.941176e+02, 1.588235e+02,  1.323529e+02, 1.147059e+02, 9.705883e+01, 7.941176e+01, 6.499999e+01,  5.499999e+01, 3.750000e+01, 3.250000e+01, 2.750000e+01, 2.250000e+01,  1.875000e+01, 1.625000e+01, 1.375000e+01, 1.125000e+01, 9.375000e+00,  8.125000e+00, 6.875000e+00, 5.625000e+00, 4.687500e+00, 4.062500e+00,  3.437500e+00, 2.812500e+00, 2.343750e+00, 1.718750e+00, 1.406250e+00,  1.171875e+00, 1.015625e+00, 8.593750e-01, 7.031250e-01, 5.859375e-01,  5.078125e-01, 4.296875e-01, 3.515625e-01, 2.929688e-01, 2.539063e-01,  1.757813e-01, 8.789063e-02, 5.371094e-02, 4.394531e-02, 3.662109e-02,  3.173828e-02, 2.685547e-02, 2.197266e-02, 1.831055e-02, 1.586914e-02,  1.342773e-02, 1.098633e-02, 9.155274e-03, 7.934572e-03, 6.713867e-03,  5.493165e-03, 4.577638e-03, 3.967284e-03, 3.356934e-03, 2.746581e-03,  2.288818e-03, 1.983643e-03, 1.678467e-03, 1.373291e-03, 1.144409e-03,  9.918214e-04, 8.392333e-04, 6.866456e-04, 5.722047e-04, 3.433228e-04])
periods = 1 / frequencies  # Periods in seconds
resistivity_range = [1, 1000]  # Min and max resistivities in Ohm.m
rho_ref = 50  # Reference resistivity in Ohm.m
num_layers = 50

depths = [8]  # Starting depth
for i in range(num_layers - 1):
    depth = depths[-1] * 1.2078  #We want to get 50 depth points ending at about 100,000
    depths.append(depth)


# Calculate layer thicknesses from depths
layer_thicknesses = np.diff(depths)  # Difference between consecutive depths
layer_thicknesses = np.insert(layer_thicknesses, 0, depths[0])  # Insert first depth as the initial thickness




# Resistivity Profile Generation

This function generates a **smooth resistivity profile** corresponding to specified depth points. The resistivity values are interpolated using a cubic spline and optionally smoothed with a Gaussian filter.

## Function: `generate_smooth_resistivity_profile`

### **Purpose**
- Creates a continuous resistivity profile that varies smoothly with depth.
- Uses cubic spline interpolation to ensure a realistic transition between layers.
- Optionally applies **Gaussian smoothing** to remove sharp resistivity jumps.

### **Parameters**
- **`depths`** *(array)*:
  A list or array containing depth values for each layer.
- **`resistivity_range`** *(list of two values)*:
  Specifies the minimum and maximum resistivity values (in Ohm·m) to randomly generate control points.
- **`num_spline_points`** *(int)*:
  Number of control points used for generating the cubic spline.
- **`smoothing`** *(bool, default=True)*:
  If `True`, applies **Gaussian smoothing** to reduce abrupt variations in resistivity.

### **Process**
1. **Random Resistivity Points**:
   - Generates random resistivity values within the specified range at selected control depths.
2. **Cubic Spline Interpolation**:
   - Uses **cubic splines** to interpolate resistivity values at the exact depths of the model.
3. **Optional Smoothing**:
   - If enabled, applies a **Gaussian filter** to smooth the interpolated resistivity values.

### **Returns**
- **`resistivities`** *(array)*:
  A smooth resistivity profile that aligns with the depth points, useful for modeling subsurface electrical properties.



In [None]:

def generate_smooth_resistivity_profile(depths, resistivity_range, num_spline_points, smoothing=True):
    """
    Generates a smooth resistivity profile where resistivities align with depth points.

    Parameters:
    - depths: Array of depth points for the layers
    - resistivity_range: List with min and max resistivity values
    - num_spline_points: Number of points to generate spline (int)
    - smoothing: Whether to apply Gaussian smoothing (bool)

    Returns:
    - resistivities: Array of resistivity values aligned with each layer's depth
    """
    # Generate random spline points between the minimum and maximum depths
    spline_points_depth = np.linspace(depths[0], depths[-1], num=num_spline_points)
    resistivity_points = np.random.uniform(resistivity_range[0], resistivity_range[1], num_spline_points)

    # Create cubic spline for interpolation
    spline = make_interp_spline(spline_points_depth, resistivity_points, k=3)

    # Interpolate resistivities for the given depths
    resistivities = spline(depths)

    if smoothing:
        # Apply Gaussian smoothing to resistivities to reduce sharp transitions
        resistivities = gaussian_filter1d(resistivities, sigma=1)

    return resistivities




# Apparent Resistivity and Phase Computation

This function computes the **apparent resistivity** and **phase** from a layered Earth model using magnetotelluric (MT) theory. It employs recursive impedance calculations to determine the electromagnetic response for a given set of layer thicknesses, conductivities, and signal periods.

## Function: `compute_apparent_resistivity_and_phase`

### **Purpose**
- Computes the **apparent resistivity** and **phase** of electromagnetic waves as they propagate through subsurface layers.
- Uses a recursive approach to determine the impedance of each layer.
- Calculates the response for multiple periods (frequencies), essential for **1D magnetotelluric inversion**.

### **Parameters**
- **`thicknesses`** *(array)*:
  An array representing the thickness of each subsurface layer. The last layer is assumed to be infinite.
- **`conductivities`** *(array)*:
  An array containing the electrical conductivity (S/m) of each layer.
- **`periods`** *(array)*:
  Time periods (s) corresponding to the different frequencies used in the analysis.

### **Computational Process**
1. **Initialize Impedance for the Deepest Layer**:
   - The bottommost layer (assumed infinite) has an impedance given by:
     \[
     Z_n = \frac{1}{\sqrt{\mu_0 \omega \sigma_n i}}
     \]
   where:
   - \( \mu_0 \) is the permeability of free space.
   - \( \omega \) is the angular frequency (\(2\pi/T\)).
   - \( \sigma_n \) is the conductivity of the last layer.

2. **Recursive Computation of Impedance for Each Layer**:
   - The characteristic impedance of a layer is computed as:
     \[
     K = \sqrt{\sigma \mu_0 \omega i}
     \]
   - The impedance is updated using:
     \[
     Z_j = \frac{1}{K} \times \frac{K Z_{j+1} + \tanh(K d_j)}{1 + K Z_{j+1} \tanh(K d_j)}
     \]
   where \( d_j \) is the thickness of layer \( j \).

3. **Compute Apparent Resistivity**:
   - The **apparent resistivity** is derived from the surface impedance \( Z \):
     \[
     \rho_a = \frac{|Z|^2}{\mu_0 \omega}
     \]

4. **Compute Phase**:
   - The **phase** of the impedance is calculated as:
     \[
     \phi = \angle Z + 90^\circ
     \]

### **Returns**
- **`apparent_resistivity`** *(array)*:
  An array of computed apparent resistivity values corresponding to each period.
- **`phase`** *(array)*:
  An array of phase values (in degrees) associated with each period.



In [None]:
# Compute apparent resistivity and phase
def compute_apparent_resistivity_and_phase(thicknesses, conductivities, periods):
    apparent_resistivity = []
    phase = []

    for T in periods:
        omega = 2 * np.pi / T  # Angular frequency (rad/s)
        cns = np.zeros(len(conductivities), dtype=complex)
        cns[-1] = 1 / np.sqrt(mu_0 * omega * conductivities[-1] * 1j)

        for j in reversed(range(len(thicknesses))):
            K = np.sqrt(conductivities[j] * mu_0 * omega * 1j)
            layer_thickness = thicknesses[j] if j < len(thicknesses) - 1 else np.inf
            if j + 1 < len(cns):
                cns[j] = (1 / K) * ((K * cns[j + 1] + np.tanh(K * layer_thickness)) /
                                    (1 + K * cns[j + 1] * np.tanh(K * layer_thickness)))

        Z = cns[0]
        rho_apparent = np.abs(Z) ** 2 / (mu_0 * omega)
        phi = np.angle(Z, deg=True) + 90

        apparent_resistivity.append(rho_apparent)
        phase.append(phi)

    return np.array(apparent_resistivity), np.array(phase)





# Data Generation and Preprocessing for Magnetotelluric Modeling

This script generates synthetic training data for **machine learning-based magnetotelluric (MT) inversion** by simulating smooth resistivity profiles and computing their corresponding **apparent resistivity** and **phase responses**. The generated data is then **normalized** and saved for future model training.

## **Step 1: Define Data Parameters**
- **`num_spline_points_list`**: Specifies different numbers of spline control points used to generate resistivity profiles.
- **`examples_per_spline`**: The number of synthetic resistivity profiles generated per spline point count.
- **`total_examples`**: The total number of training examples, computed as:
  \[
  \text{total_examples} = \text{examples_per_spline} \times \text{len(num_spline_points_list)}
  \]

## **Step 2: Initialize Data Storage**
- **`X_model`**: Stores the generated resistivity profiles (input features).
- **`y_rho`**: Stores the computed **apparent resistivity** values.
- **`y_phi`**: Stores the computed **phase response** values.

## **Step 3: Generate Synthetic Resistivity Data**
For each number of spline points in `num_spline_points_list`:
1. **Generate a smooth resistivity profile** using cubic spline interpolation.
2. **Convert resistivity to conductivity** (\(\sigma = 1/\rho\)).
3. **Compute the magnetotelluric response**:
   - Apparent resistivity (\(\rho_a\))
   - Phase (\(\phi\))
4. **Store the computed values** in `X_model`, `y_rho`, and `y_phi`.

Each iteration prints progress updates to track the number of generated examples.

## **Step 4: Normalize Data**
To ensure efficient neural network training, all datasets are normalized using **Min-Max Scaling**:
- **`scaler_X_model`**: Normalizes resistivity profiles.
- **`scaler_y_rho`**: Normalizes apparent resistivity.
- **`scaler_y_phi`**: Normalizes phase responses.
- The transformed datasets (`X_model_scaled`, `y_rho_scaled`, `y_phi_scaled`) are then created.

## **Step 5: Prepare Data for Model Training**
The normalized outputs (`y_rho_scaled`, `y_phi_scaled`) are stacked together along the last axis to form `X_combined`, where:
\[
X_{\text{combined}} \in \mathbb{R}^{(\text{num_examples}, \text{num_periods}, 2)}
\]
Each sample contains **apparent resistivity and phase** as separate components.

## **Step 6: Save Processed Data**
The following files are saved:
- **Numpy arrays** (`.npy` format) for efficient loading:
  - `X_combined_u.npy`: Combined resistivity and phase responses.
  - `X_model_scaled_u.npy`: Normalized resistivity profiles.
- **MinMaxScaler objects** (`.pkl` format) for future use:
  - `scaler_X_model_u.pkl`
  - `scaler_y_rho_u.pkl`
  - `scaler_y_phi_u.pkl`

These files can be loaded later to **retrain models** without recalculating the data.


In [None]:
num_spline_points_list = [6, 7, 8, 9, 10]  # Number of spline points
examples_per_spline = 20000  # Number of examples per spline point count
total_examples = examples_per_spline * len(num_spline_points_list)  # Total examples: 100,000

X_model = np.empty((total_examples, num_layers))
y_rho = np.empty((total_examples, len(periods)))
y_phi = np.empty((total_examples, len(periods)))

example_idx = 0  # Index to keep track of the current example

#Generate data
for num_spline_points in num_spline_points_list:
    print(f"Generating {examples_per_spline} examples with {num_spline_points} spline points...")
    for _ in range(examples_per_spline):
        # Generate smooth resistivity profile
        resistivities = generate_smooth_resistivity_profile(
            depths=depths,
            resistivity_range=resistivity_range,
            num_spline_points=num_spline_points,
            smoothing=True
        )
        conductivities = 1 / resistivities
        apparent_resistivity, phase = compute_apparent_resistivity_and_phase(
            layer_thicknesses,
            conductivities,
            periods
        )

        # Store the results
        X_model[example_idx] = resistivities
        y_rho[example_idx] = apparent_resistivity
        y_phi[example_idx] = phase
        example_idx += 1

    print(f"Completed {examples_per_spline} examples with {num_spline_points} spline points.")

print("Data generation completed.")

X_model = np.array(X_model)
y_rho = np.array(y_rho)
y_phi = np.array(y_phi)

# Normalize data
scaler_X_model = MinMaxScaler()
scaler_y_rho = MinMaxScaler()
scaler_y_phi = MinMaxScaler()

scaler_X_model.fit(X_model)
scaler_y_rho.fit(y_rho)
scaler_y_phi.fit(y_phi)
X_model_scaled = scaler_X_model.transform(X_model)
y_rho_scaled = scaler_y_rho.transform(y_rho)
y_phi_scaled = scaler_y_phi.transform(y_phi)

# Combine y_rho and y_phi for the input data (each frequency has rho and phi as components)
X_combined = np.stack((y_rho_scaled, y_phi_scaled), axis=-1)  # Shape: (num_stations, num_periods, 2)

save_path = ""#Tha path to save the data

# Save the generated data to Google Drive
np.save(save_path + 'X_combined_u.npy', X_combined)
np.save(save_path + 'X_model_scaled_u.npy', X_model_scaled)

# Save the scalers
joblib.dump(scaler_X_model, save_path + 'scaler_X_model_u.pkl')
joblib.dump(scaler_y_rho, save_path + 'scaler_y_rho_u.pkl')
joblib.dump(scaler_y_phi, save_path + 'scaler_y_phi_u.pkl')

