# Envelope Memory Usage with a One Fixed Dimension

This experiment aims to evaluate the memory usage of the envelope seismic attribute operator with one fixed dimension.
On this notebook you will find:
- The problem statement
- The data collection for the experiment
- The evaluation of the experiment results.

## Problem Statement

Following [experiment 002](./002-envelope-memory-usage-with-two-fixed-dimensions.ipynb), the goal for this experiment is to evaluate how the Envelope attribute behaves when keeping only a single dimension fixed.
During the experiment, we plan to:

1. Generate synthetic seismic data.
2. Apply the envelope operator to the synthetic data using the [DASF](https://github.com/discovery-unicamp/dasf-core) framework.
3. Assess the memory usage during this process using [TraceQ](https://github.com/discovery-unicamp/traceq).

## Data Collection

In this section, we will outline the steps needed to collect the necessary data for our experiment. The process is organized into the following steps:

1. **Setup Environment:**
  - Set up the environment with proper env variables and global constants to use during the experiment.

2. **Setup Dependencies:**
  - Set up the virtual environment running this notebook with the required dependencies.

3. **Setup the output directory:**
  - On this step we will setup the output directory in which we will save the experiment results.

4. **Generate Synthetic Seismic Data:**
  - Generate synthetic seismic data within a specified range of dimensions.

5. **Execute the Envelope Operator:**
  - Apply the envelope operator to the synthetic data using the prepared environment and tools.

After completing these steps, we will have the data generated by TraceQ to evaluate the memory usage of the envelope operator.

### Setup Environment

During the environment setup, we need to:
- Proper configure `PYTHONPATH`
- Setup dependencies

Below, we're configuring the `PYTHONPATH` to allow using the tools we've coded for the experiments

In [1]:
import os
import sys

seismic_path = os.path.abspath('../tools/seismic')
traceq_path = os.path.abspath('../tools/traceq')

if seismic_path not in sys.path:
    sys.path.append(seismic_path)

if traceq_path not in sys.path:
    sys.path.append(traceq_path)

print(sys.path)

['/home/delucca/.pyenv/versions/3.10.14/lib/python310.zip', '/home/delucca/.pyenv/versions/3.10.14/lib/python3.10', '/home/delucca/.pyenv/versions/3.10.14/lib/python3.10/lib-dynload', '', '/home/delucca/.pyenv/versions/3.10.14/envs/seismic-attributes-memory-profile/lib/python3.10/site-packages', '/home/delucca/src/unicamp/msc/seismic-attributes-memory-profile/tools/seismic', '/home/delucca/src/unicamp/msc/seismic-attributes-memory-profile/tools/traceq']


Now, lets setup some relevant global variables

In [2]:
from pprint import pprint

LOG_TRANSPORTS = ['CONSOLE', 'FILE']
LOG_LEVEL = 'DEBUG'

NUM_INLINES = 200
NUM_XLINES = 200
NUM_SAMPLES = 200
STEP_SIZE = 100
RANGE_SIZE = 15

print('Experiment config:')
pprint({
    'LOG_TRANSPORTS': LOG_TRANSPORTS,
    'LOG_LEVEL': LOG_LEVEL,
    'NUM_INLINES': NUM_INLINES,
    'NUM_XLINES': NUM_XLINES,
    'NUM_SAMPLES': NUM_SAMPLES,
    'STEP_SIZE': STEP_SIZE,
    'RANGE_SIZE': RANGE_SIZE
}, indent=2, sort_dicts=True)

Experiment config:
{ 'LOG_LEVEL': 'DEBUG',
  'LOG_TRANSPORTS': ['CONSOLE', 'FILE'],
  'NUM_INLINES': 200,
  'NUM_SAMPLES': 200,
  'NUM_XLINES': 200,
  'RANGE_SIZE': 15,
  'STEP_SIZE': 100}


### Setup Dependencies

Before running this step, make sure you are running this notebook in the environment defined by the `.python-version` file.

In [3]:
%pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


We need to install DASF tools to execute the experiment

In [4]:
%pip install git+https://github.com/discovery-unicamp/dasf-core.git

Collecting git+https://github.com/discovery-unicamp/dasf-core.git
  Cloning https://github.com/discovery-unicamp/dasf-core.git to /tmp/pip-req-build-ilcdjksp
  Running command git clone --filter=blob:none --quiet https://github.com/discovery-unicamp/dasf-core.git /tmp/pip-req-build-ilcdjksp
  Resolved https://github.com/discovery-unicamp/dasf-core.git to commit c4841cdeb92f596b0c011fe0480897731556c2dd
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting xpysom-dask@ git+https://github.com/jcfaracco/xpysom-dask#egg=master (from dasf==1.0b6)
  Cloning https://github.com/jcfaracco/xpysom-dask to /tmp/pip-install-wmcgplnu/xpysom-dask_06244f90655d4ddf8f00eec7e5a36c75
  Running command git clone --filter=blob:none --quiet https://github.com/jcfaracco/xpysom-dask /tmp/pip-install-wmcgplnu/xpysom-dask_06244f90655d4ddf8f00eec7e5a36c75
  Resolved https://github.com/jcf

Now, we need to install a package that is private.
In order to do this, you can either use the HTTPS or SSH protocol.

In the next cell you can install with the protocol you prefer, just uncomment the one you want to use.

In [5]:
%pip install git+ssh://git@github.com/discovery-unicamp/dasf-seismic.git ### Using SSH
# %pip install git+https://<token>@github.com/discovery-unicamp/dasf-seismic.git ### Using HTTPS -> Don't forget to replate <token> with your own PAT

%pip install torch

Collecting git+ssh://****@github.com/discovery-unicamp/dasf-seismic.git
  Cloning ssh://****@github.com/discovery-unicamp/dasf-seismic.git to /tmp/pip-req-build-t20e2hoj
  Running command git clone --filter=blob:none --quiet 'ssh://****@github.com/discovery-unicamp/dasf-seismic.git' /tmp/pip-req-build-t20e2hoj
  Resolved ssh://****@github.com/discovery-unicamp/dasf-seismic.git to commit 8037f403f79dd2a628a6d0d14543ead670eb01d4
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting dasf@ git+https://github.com/discovery-unicamp/dasf-core.git@main (from dasf-seismic==1.0b5)
  Cloning https://github.com/discovery-unicamp/dasf-core.git (to revision main) to /tmp/pip-install-j93_2dp5/dasf_c98e6bbd88714f92aa10bb113e5fff1a
  Running command git clone --filter=blob:none --quiet https://github.com/discovery-unicamp/dasf-core.git /tmp/pip-install-j93_2dp5/dasf_c98e6bbd88

Now, we need to install the dependencies for the tools we use during the experiment.

In [6]:
%pip install -r ../tools/seismic/requirements.txt
%pip install -r ../tools/traceq/requirements.txt

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Setup output directory

In [7]:
import uuid
import os

from datetime import datetime

EXPERIMENT_ID = f'003-{datetime.now().strftime("%Y%m%d%H%M%S")}-{uuid.uuid4().hex[:6]}'
OUTPUT_DIR = f'../output/{EXPERIMENT_ID}'

os.makedirs(OUTPUT_DIR)

OUTPUT_DIR

'../output/003-20240910224723-191b1c'

### Generate synthetic data

In [8]:
from seismic.data.synthetic import generate_and_save_for_range

DATA_OUTPUT_DIR = f'{OUTPUT_DIR}/experiment'

synthetic_data_paths = generate_and_save_for_range(
    NUM_INLINES,
    NUM_XLINES,
    NUM_SAMPLES,
    STEP_SIZE,
    RANGE_SIZE,
    output_dir=DATA_OUTPUT_DIR,
)

print(synthetic_data_paths)

[32m2024-09-10 22:47:23.985[0m | [1mINFO    [0m | [36mseismic.data.synthetic[0m:[36mgenerate_and_save_for_range[0m:[36m60[0m - [1mGenerating synthetic data with the following parameters:[0m
[32m2024-09-10 22:47:23.986[0m | [1mINFO    [0m | [36mseismic.data.synthetic[0m:[36mgenerate_and_save_for_range[0m:[36m61[0m - [1mNumber of inlines: 200[0m
[32m2024-09-10 22:47:23.987[0m | [1mINFO    [0m | [36mseismic.data.synthetic[0m:[36mgenerate_and_save_for_range[0m:[36m62[0m - [1mNumber of crosslines: 200[0m
[32m2024-09-10 22:47:23.987[0m | [1mINFO    [0m | [36mseismic.data.synthetic[0m:[36mgenerate_and_save_for_range[0m:[36m63[0m - [1mNumber of samples: 200[0m
[32m2024-09-10 22:47:23.988[0m | [1mINFO    [0m | [36mseismic.data.synthetic[0m:[36mgenerate_and_save_for_range[0m:[36m64[0m - [1mStep size: 100[0m
[32m2024-09-10 22:47:23.988[0m | [1mINFO    [0m | [36mseismic.data.synthetic[0m:[36mgenerate_and_save_for_range[0m:[36m65

### Execute the envelope attribute

On this step, we will execute the attribute for each generated synthetic data

In [9]:
import traceq

from seismic.attributes import envelope

for synthetic_data_path in synthetic_data_paths:
    shape = synthetic_data_path.split('/')[-1].split('.')[0]
    traceq.load_config(
        {
            "output_dir": f'{OUTPUT_DIR}/profile-{shape}',
            "logger": {
                "enabled_transports": LOG_TRANSPORTS,
                "level": LOG_LEVEL,
            },
            "profiler": {
                "session_id": shape,
                "memory_usage": {
                    "enabled_backends": ['kernel'],
                },
            },
        }
    )

    traceq.profile(envelope.run, synthetic_data_path)

[32m2024-09-10 22:54:07.681[0m | [1mINFO    [0m | [36mtraceq.profiler.main[0m:[36mrun_profiler[0m:[36m15[0m - [1mStarting profiler[0m
[32m2024-09-10 22:54:07.682[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.builders[0m:[36mbuild_trace_hooks[0m:[36m20[0m - [34m[1mBuilding trace hooks for enabled metrics: [<Metric.MEMORY_USAGE: 'MEMORY_USAGE'>, <Metric.TIME: 'TIME'>][0m
[32m2024-09-10 22:54:07.683[0m | [1mINFO    [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m12[0m - [1mEnabled memory usage backends: "[<MemoryUsageBackend.KERNEL: 'KERNEL'>]"[0m
[32m2024-09-10 22:54:07.684[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m23[0m - [34m[1mLoading backend: "kernel"[0m
[32m2024-09-10 22:54:07.686[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m31[0m - [34m[1mLoaded backen

## Evaluating Experiment Results

In [10]:
import os
import gzip
import msgpack


def list_directories(path):
    entries = os.listdir(path)

    directories = [
        os.path.join(path, entry)
        for entry in entries
        if os.path.isdir(os.path.join(path, entry))
    ]
    return directories


def find_profiles(directory):
    parquet_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith(".prof"):
                full_path = os.path.join(root, file)
                parquet_files.append(full_path)
    return parquet_files


def get_session_paths(directory_path):
    dirs = list_directories(directory_path)
    shapes = [shape for shape in dirs if 'experiment' not in shape and 'checkpoints' not in shape]

    return [find_profiles(shape)[0] for shape in shapes]


def get_session_names(sessions):
    return [
        f"{'-'.join(os.path.basename(session).split('/')[-1].split('.')[0].split('-')[:2])}" for session in sessions
    ]


def normalize_metadata(sessions):
    for session in sessions:
        profile = load_profile(session)

        metadata = profile["metadata"]
        metadata_dict = {k: v for k, v in metadata.items()}

        entrypoint_segy_filepath = metadata_dict.pop("entrypoint_segy_filepath", None)
        if entrypoint_segy_filepath:
            entrypoint_shape = os.path.basename(entrypoint_segy_filepath).split(".")[0]
            entrypoint_shape = f"({entrypoint_shape.replace('-', ',')})"
            metadata_dict["entrypoint_shape"] = entrypoint_shape

        new_metadata = {k: v for k, v in metadata_dict.items()}
        profile["metadata"] = new_metadata

        with gzip.open(session, "wb") as f:
            packed = msgpack.packb(profile)
            f.write(packed)


session_paths = get_session_paths(OUTPUT_DIR)
normalize_metadata(session_paths)

session_names = get_session_names(session_paths)
zipped_sessions = list(zip(session_names, session_paths))

print(zipped_sessions)

[('200-1100', '../output/003-20240910224723-191b1c/profile-200-1100-200/200-1100-200.prof'), ('200-900', '../output/003-20240910224723-191b1c/profile-200-900-200/200-900-200.prof'), ('100-200', '../output/003-20240910224723-191b1c/profile-100-200-200/100-200-200.prof'), ('1400-200', '../output/003-20240910224723-191b1c/profile-1400-200-200/1400-200-200.prof'), ('200-100', '../output/003-20240910224723-191b1c/profile-200-100-200/200-100-200.prof'), ('200-500', '../output/003-20240910224723-191b1c/profile-200-500-200/200-500-200.prof'), ('200-800', '../output/003-20240910224723-191b1c/profile-200-800-200/200-800-200.prof'), ('1000-200', '../output/003-20240910224723-191b1c/profile-1000-200-200/1000-200-200.prof'), ('200-1000', '../output/003-20240910224723-191b1c/profile-200-1000-200/200-1000-200.prof'), ('1500-200', '../output/003-20240910224723-191b1c/profile-1500-200-200/1500-200-200.prof'), ('800-200', '../output/003-20240910224723-191b1c/profile-800-200-200/800-200-200.prof'), ('400

With the metadata normalized, and the organized data, we need now to get the peaks for each profile

In [11]:
from traceq.profiler.loaders import load_profile


def get_peak(profile_path):
    profile = load_profile(profile_path)
    data = profile['experiment']

    return max(item['kernel_memory_usage'] for item in data)


def get_unit(profile_path):
    profile = load_profile(profile_path)
    return profile['metadata']['kernel_memory_usage_unit']


peaks = [(shape, get_peak(profile_path), get_unit(profile_path)) for shape, profile_path in zipped_sessions]
print(peaks)

[('200-1100', 1872856.0, 'kb'), ('200-900', 1643200.0, 'kb'), ('100-200', 780704.0, 'kb'), ('1400-200', 2193256.0, 'kb'), ('200-100', 776788.0, 'kb'), ('200-500', 1218564.0, 'kb'), ('200-800', 1541740.0, 'kb'), ('1000-200', 1758756.0, 'kb'), ('200-1000', 1763140.0, 'kb'), ('1500-200', 2311224.0, 'kb'), ('800-200', 1549876.0, 'kb'), ('400-200', 1102592.0, 'kb'), ('1100-200', 1872508.0, 'kb'), ('200-600', 1322116.0, 'kb'), ('200-1500', 2311252.0, 'kb'), ('1200-200', 1986288.0, 'kb'), ('200-1200', 1988528.0, 'kb'), ('300-200', 990908.0, 'kb'), ('1300-200', 2095864.0, 'kb'), ('700-200', 1440016.0, 'kb'), ('500-200', 1222540.0, 'kb'), ('200-200', 882772.0, 'kb'), ('200-1400', 2191332.0, 'kb'), ('200-300', 990976.0, 'kb'), ('200-700', 1425888.0, 'kb'), ('200-1300', 2098040.0, 'kb'), ('600-200', 1328236.0, 'kb'), ('200-400', 1098636.0, 'kb'), ('900-200', 1653184.0, 'kb')]


Now, we can create the graph

In [29]:
import plotly.graph_objects as go

# Prepare experiment for 3D plot
inlines = [int(item[0].split('-')[0]) for item in peaks]
xlines = [int(item[0].split('-')[1]) for item in peaks]
memory_gb = [item[1] / 1048576 for item in peaks]  # Convert memory usage to GB

fig = go.Figure(data=[go.Scatter3d(
    x=inlines,
    y=xlines,
    z=memory_gb,
    mode='markers',
    marker=dict(
        size=5,
        color=memory_gb,
        colorscale='Viridis',
    ),
    line=dict(color='blue'),
)])

fig.update_layout(
    scene=dict(
        xaxis_title='Inline',
        yaxis_title='Xline',
        zaxis_title='Memory Usage (GB)',
        aspectmode='cube',
        camera=dict(
            eye=dict(x=-2.5, y=-1.5, z=2),
            center=dict(x=0, y=0, z=-1)
        ),
    ),
    font=dict(family="Courier New, monospace", size=12, color="Black"),
    template="plotly_white"
)

fig.write_image(f'{OUTPUT_DIR}/memory-usage-3d.pdf', format="pdf", engine="kaleido", width=900, height=700)
fig.show()