# Envelope Memory Usage with a Two Fixed Dimensions

This experiment aims to evaluate the memory usage of the envelope seismic attribute operator with two fixed dimensions.
On this notebook you will find:
- The problem statement
- The data collection for the experiment
- The evaluation of the experiment results.

## Problem Statement

The envelope seismic attribute operator is a crucial process in seismic data analysis.
It enhances seismic data by extracting the envelope of the seismic signal, which is calculated by taking the absolute value of the Hilbert transform of the seismic data.
This operator is widely used in seismic interpretation, inversion, and attribute analysis.

In this experiment, we aim to evaluate the memory usage of the envelope seismic attribute operator when applied to synthetic seismic data. To achieve this, we will:

1. Generate synthetic seismic data.
2. Apply the envelope operator to the synthetic data using the [DASF](https://github.com/discovery-unicamp/dasf-core) framework.
3. Assess the memory usage during this process using [TraceQ](https://github.com/discovery-unicamp/traceq).

Our evaluation will focus on two fixed dimensions (such as inline, crossline, or time).
This means that we will group the results by two of these dimensions and measure the memory usage for each group, while the third dimension will vary.

## Data Collection

In this section, we will outline the steps needed to collect the necessary data for our experiment. The process is organized into the following steps:

1. **Setup Environment:**
  - Set up the environment with proper env variables and global constants to use during the experiment.

2. **Setup Dependencies:**
  - Set up the virtual environment running this notebook with the required dependencies.

3. **Setup the output directory:**
  - On this step we will setup the output directory in which we will save the experiment results.

4. **Generate Synthetic Seismic Data:**
  - Generate synthetic seismic data within a specified range of dimensions.

5. **Execute the Envelope Operator:**
  - Apply the envelope operator to the synthetic data using the prepared environment and tools.

After completing these steps, we will have the data generated by TraceQ to evaluate the memory usage of the envelope operator.

### Setup Environment

During the environment setup, we need to:
- Proper configure `PYTHONPATH`
- Setup dependencies

Below, we're configuring the `PYTHONPATH` to allow using the tools we've coded for the experiments

In [16]:
import sys
import os

helpers_path = os.path.abspath('../libs/helpers')
traceq_path = os.path.abspath('../libs/traceq')

helpers_path not in sys.path and sys.path.append(helpers_path)
traceq_path not in sys.path and sys.path.append(traceq_path)

print(sys.path)

['/home/delucca/.pyenv/versions/3.10.14/lib/python310.zip', '/home/delucca/.pyenv/versions/3.10.14/lib/python3.10', '/home/delucca/.pyenv/versions/3.10.14/lib/python3.10/lib-dynload', '', '/home/delucca/.pyenv/versions/3.10.14/envs/dask-auto-chunking/lib/python3.10/site-packages', '/home/delucca/src/unicamp/msc/dask-auto-chunking/libs/helpers', '/home/delucca/src/unicamp/msc/dask-auto-chunking/libs/traceq']


Now, lets setup some relevant global variables

In [17]:
from pprint import pprint

LOG_TRANSPORTS = ['CONSOLE','FILE']
LOG_LEVEL = 'DEBUG'

NUM_XLINES = 200
NUM_SAMPLES = 200
STEP_SIZE = 100
RANGE_SIZE = 15

print('Experiment config:')
pprint({
    'LOG_TRANSPORTS': LOG_TRANSPORTS,
    'LOG_LEVEL': LOG_LEVEL,
    'NUM_XLINES': NUM_XLINES,
    'NUM_SAMPLES': NUM_SAMPLES,
    'STEP_SIZE': STEP_SIZE,
    'RANGE_SIZE': RANGE_SIZE
}, indent=2, sort_dicts=True)

Experiment config:
{ 'LOG_LEVEL': 'DEBUG',
  'LOG_TRANSPORTS': ['CONSOLE', 'FILE'],
  'NUM_SAMPLES': 200,
  'NUM_XLINES': 200,
  'RANGE_SIZE': 15,
  'STEP_SIZE': 100}


### Setup Dependencies

Before running this step, make sure you are running this notebook in the environment defined by the `.python-version` file.

In [3]:
%pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


Now, we need to install the dependencies for the tools we use during the experiment.

In [4]:
%pip install -r ../libs/helpers/requirements.txt
%pip install -r ../libs/traceq/requirements.txt

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Setup output directory

In [18]:
import uuid
import os

from datetime import datetime

EXPERIMENT_ID = f'002-{datetime.now().strftime("%Y%m%d%H%M%S")}-{uuid.uuid4().hex[:6]}'
OUTPUT_DIR = f'./output/{EXPERIMENT_ID}'

os.makedirs(OUTPUT_DIR)

OUTPUT_DIR

'./output/002-20241002003025-37edc4'

### Generate synthetic data

In [19]:
from helpers.datasets import generate_seismic_data

DATA_OUTPUT_DIR = f'{OUTPUT_DIR}/data'

synthetic_data_paths = [
    generate_seismic_data(
        inlines=STEP_SIZE * step_num,
        xlines=NUM_XLINES,
        samples=NUM_SAMPLES,
        output_dir=DATA_OUTPUT_DIR,
    ) for step_num in range(1, RANGE_SIZE + 1)
] 

print(synthetic_data_paths)

2024-10-02 00:30:27 - generate-seismic-data - INFO - Generating synthetic data for shape (100, 200, 200)
2024-10-02 00:30:29 - generate-seismic-data - INFO - Generating synthetic data for shape (200, 200, 200)
2024-10-02 00:30:32 - generate-seismic-data - INFO - Generating synthetic data for shape (300, 200, 200)
2024-10-02 00:30:38 - generate-seismic-data - INFO - Generating synthetic data for shape (400, 200, 200)
2024-10-02 00:30:45 - generate-seismic-data - INFO - Generating synthetic data for shape (500, 200, 200)
2024-10-02 00:30:54 - generate-seismic-data - INFO - Generating synthetic data for shape (600, 200, 200)
2024-10-02 00:31:05 - generate-seismic-data - INFO - Generating synthetic data for shape (700, 200, 200)
2024-10-02 00:31:18 - generate-seismic-data - INFO - Generating synthetic data for shape (800, 200, 200)
2024-10-02 00:31:33 - generate-seismic-data - INFO - Generating synthetic data for shape (900, 200, 200)
2024-10-02 00:31:50 - generate-seismic-data - INFO - Ge

['./output/002-20241002003025-37edc4/data/100-200-200.segy', './output/002-20241002003025-37edc4/data/200-200-200.segy', './output/002-20241002003025-37edc4/data/300-200-200.segy', './output/002-20241002003025-37edc4/data/400-200-200.segy', './output/002-20241002003025-37edc4/data/500-200-200.segy', './output/002-20241002003025-37edc4/data/600-200-200.segy', './output/002-20241002003025-37edc4/data/700-200-200.segy', './output/002-20241002003025-37edc4/data/800-200-200.segy', './output/002-20241002003025-37edc4/data/900-200-200.segy', './output/002-20241002003025-37edc4/data/1000-200-200.segy', './output/002-20241002003025-37edc4/data/1100-200-200.segy', './output/002-20241002003025-37edc4/data/1200-200-200.segy', './output/002-20241002003025-37edc4/data/1300-200-200.segy', './output/002-20241002003025-37edc4/data/1400-200-200.segy', './output/002-20241002003025-37edc4/data/1500-200-200.segy']


### Execute the envelope attribute

On this step, we will execute the attribute for each generated synthetic data

In [28]:
import gc

for synthetic_data_path in synthetic_data_paths:
    # Cleanup
    for var_name in ['traceq', 'envelope_from_segy', 'dask', 'client', 'Client']:
        if var_name in locals():
            del locals()[var_name]

    gc.collect()
    
    # Restart variables
    import dask
    from dask.distributed import Client
    import traceq
    from helpers.dask_operators import envelope_from_segy

    dask.config.set({"distributed.diagnostics.nvml": False})
    client = Client(n_workers=1, threads_per_worker=1)
    
    # Execute
    shape = synthetic_data_path.split('/')[-1].split('.')[0]
    traceq.load_config(
        {
            "output_dir": f'{OUTPUT_DIR}/profile-{shape}',
            "logger": {
                "enabled_transports": LOG_TRANSPORTS,
                "level": LOG_LEVEL,
            },
            "profiler": {
                "session_id": shape,
                "memory_usage": {
                    "enabled_backends": ['kernel'],
                },
            },
        }
    )
    
    r = traceq.profile(envelope_from_segy, synthetic_data_path)
    print(r)
    client.close()
    ddd

NameError: name 'synthetic_data_paths' is not defined

## Evaluating Experiment Results

In [21]:
from helpers.result_handlers import load_profile_results
zipped_sessions = list(load_profile_results(OUTPUT_DIR))

print(zipped_sessions)

[('100-200-200', './output/002-20241002003025-37edc4/profile-100-200-200/100-200-200.prof'), ('1400-200-200', './output/002-20241002003025-37edc4/profile-1400-200-200/1400-200-200.prof'), ('1000-200-200', './output/002-20241002003025-37edc4/profile-1000-200-200/1000-200-200.prof'), ('1500-200-200', './output/002-20241002003025-37edc4/profile-1500-200-200/1500-200-200.prof'), ('800-200-200', './output/002-20241002003025-37edc4/profile-800-200-200/800-200-200.prof'), ('400-200-200', './output/002-20241002003025-37edc4/profile-400-200-200/400-200-200.prof'), ('1100-200-200', './output/002-20241002003025-37edc4/profile-1100-200-200/1100-200-200.prof'), ('1200-200-200', './output/002-20241002003025-37edc4/profile-1200-200-200/1200-200-200.prof'), ('300-200-200', './output/002-20241002003025-37edc4/profile-300-200-200/300-200-200.prof'), ('1300-200-200', './output/002-20241002003025-37edc4/profile-1300-200-200/1300-200-200.prof'), ('700-200-200', './output/002-20241002003025-37edc4/profile-7

With the metadata normalized, and the organized data, we need now to get the peaks for each profile

In [22]:
from helpers.result_handlers import get_peak, get_unit

peaks = [(shape, get_peak(profile_path), get_unit(profile_path)) for shape, profile_path in zipped_sessions]
print(peaks)

KeyError: 'kernel_memory_usage'

Now, we can create the graph

In [None]:
import plotly.graph_objects as go

data_sorted = sorted(peaks, key=lambda x: int(x[0]))
x_values = [int(item[0]) for item in data_sorted]
y_values_gb = [item[1] / 1048576 for item in data_sorted]  # Memory usage in GB

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_values, y=y_values_gb, mode='lines+markers', name='Memory Usage'))

fig.update_layout(
    xaxis_title="Inlines",
    yaxis_title="Memory Usage (GB)",
    xaxis=dict(showgrid=True, zeroline=True),
    yaxis=dict(showgrid=True, zeroline=True, exponentformat="E"),
    font=dict(family="Courier New, monospace", size=18, color="Black"),
    template="plotly_white"
)

fig.write_image(f'{OUTPUT_DIR}/memory-usage-over-inlines.pdf', format="pdf", engine="kaleido")
fig.show()