# Memory Usage Backend Comparisson

This experiment aims to evaluate the different measurements for the memory usage of the envelope seismic attribute operator.
On this notebook you will find:
- The problem statement
- The data collection for the experiment
- The evaluation of the experiment results.

## Problem statement

In order to evaluate the memory usage of the envelope seismic attribute operator, we need to compare the memory usage of the different backends available.
The backends available are the following:
- [resource](https://docs.python.org/3/library/resource.html)
- [psutil](https://psutil.readthedocs.io/en/latest/)
- [tracemalloc](https://docs.python.org/3/library/tracemalloc.html)
- [Direct kernel /proc file system](https://man7.org/linux/man-pages/man5/proc.5.html)

After executing the experiment, we expect the measurements to be consistent and the memory usage to be similar across the different backends.

## Data Collection

The first step for data collection is generating the synthetic data.
For testing purposes we will use the same data for all the backends.

### Setup Environment

In [1]:
import sys
import os

seismic_path = os.path.abspath('../tools/seismic')
traceq_path = os.path.abspath('../tools/traceq')

if seismic_path not in sys.path:
    sys.path.append(seismic_path)

if traceq_path not in sys.path:
    sys.path.append(traceq_path)

print(sys.path)

['/home/delucca/.pyenv/versions/3.10.14/lib/python310.zip', '/home/delucca/.pyenv/versions/3.10.14/lib/python3.10', '/home/delucca/.pyenv/versions/3.10.14/lib/python3.10/lib-dynload', '', '/home/delucca/.pyenv/versions/3.10.14/envs/seismic-attributes-memory-profile/lib/python3.10/site-packages', '/home/delucca/src/unicamp/msc/seismic-attributes-memory-profile/tools/seismic', '/home/delucca/src/unicamp/msc/seismic-attributes-memory-profile/tools/traceq']


Now, lets setup some relevant global variables

In [49]:
from pprint import pprint

NUM_INLINES = 600
NUM_XLINES = 600
NUM_SAMPLES = 600

LOG_TRANSPORTS = ['CONSOLE','FILE']
LOG_LEVEL = 'DEBUG'

UNIT = 'gb'

print('Experiment config:')
pprint({
    'NUM_INLINES': NUM_INLINES,
    'NUM_XLINES': NUM_XLINES,
    'NUM_SAMPLES': NUM_SAMPLES,
    'LOG_TRANSPORTS': LOG_TRANSPORTS,
    'LOG_LEVEL': LOG_LEVEL,
    'UNIT': UNIT,
}, indent=2, sort_dicts=True)

Experiment config:
{ 'LOG_LEVEL': 'DEBUG',
  'LOG_TRANSPORTS': ['CONSOLE', 'FILE'],
  'NUM_INLINES': 600,
  'NUM_SAMPLES': 600,
  'NUM_XLINES': 600,
  'UNIT': 'gb'}


### Setup Dependencies


In [3]:
%pip install --upgrade pip

Collecting pip
  Using cached pip-24.2-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-24.2
Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install git+https://github.com/discovery-unicamp/dasf-core.git

Collecting git+https://github.com/discovery-unicamp/dasf-core.git
  Cloning https://github.com/discovery-unicamp/dasf-core.git to /tmp/pip-req-build-a_991aaz
  Running command git clone --filter=blob:none --quiet https://github.com/discovery-unicamp/dasf-core.git /tmp/pip-req-build-a_991aaz
  Resolved https://github.com/discovery-unicamp/dasf-core.git to commit d18f00e8ab682d601ab076c2fbabaaca1338036e
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting xpysom-dask@ git+https://github.com/jcfaracco/xpysom-dask#egg=master (from dasf==1.0b6)
  Cloning https://github.com/jcfaracco/xpysom-dask to /tmp/pip-install-n192ahto/xpysom-dask_7025aa985e414c2c8fbb2052f6fb8bf1
  Running command git clone --filter=blob:none --quiet https://github.com/jcfaracco/xpysom-dask /tmp/pip-install-n192ahto/xpysom-dask_7025aa985e414c2c8fbb2052f6fb8bf1
  Resolved https://github.com/jcf

In [12]:
%pip install git+ssh://git@github.com/discovery-unicamp/dasf-seismic.git ### Using SSH
# %pip install git+https://<token>@github.com/discovery-unicamp/dasf-seismic.git ### Using HTTPS -> Don't forget to replate <token> with your own PAT

%pip install torch

Collecting git+ssh://****@github.com/discovery-unicamp/dasf-seismic.git
  Cloning ssh://****@github.com/discovery-unicamp/dasf-seismic.git to /tmp/pip-req-build-d61s9dn9
  Running command git clone --filter=blob:none --quiet 'ssh://****@github.com/discovery-unicamp/dasf-seismic.git' /tmp/pip-req-build-d61s9dn9
  Resolved ssh://****@github.com/discovery-unicamp/dasf-seismic.git to commit 8037f403f79dd2a628a6d0d14543ead670eb01d4
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting dasf@ git+https://github.com/discovery-unicamp/dasf-core.git@main (from dasf-seismic==1.0b5)
  Cloning https://github.com/discovery-unicamp/dasf-core.git (to revision main) to /tmp/pip-install-kwj34pgn/dasf_8c272bd307444f32b911a4cbcc7b5df1
  Running command git clone --filter=blob:none --quiet https://github.com/discovery-unicamp/dasf-core.git /tmp/pip-install-kwj34pgn/dasf_8c272bd307

In [8]:
%pip install -r ../tools/seismic/requirements.txt
%pip install -r ../tools/traceq/requirements.txt

Collecting scipy==1.10.1 (from -r ../tools/seismic/requirements.txt (line 2))
  Downloading scipy-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
Downloading scipy-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.4/34.4 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.14.1
    Uninstalling scipy-1.14.1:
      Successfully uninstalled scipy-1.14.1
Successfully installed scipy-1.10.1
Note: you may need to restart the kernel to use updated packages.
Collecting toml==0.10.2 (from -r ../tools/traceq/requirements.txt (line 1))
  Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting loguru==0.7.2 (from -r ../tools/traceq/requirements.txt (line 3))
  Downloading loguru-0.7.2-py3-none-any.whl.metadata (23 kB)
Collecting py

### Setup output directory

In [9]:
import uuid
import os

from datetime import datetime

EXPERIMENT_ID = f'001-{datetime.now().strftime("%Y%m%d%H%M%S")}-{uuid.uuid4().hex[:6]}'
OUTPUT_DIR = f'../output/{EXPERIMENT_ID}'

os.makedirs(OUTPUT_DIR)

OUTPUT_DIR

'../output/001-20240910000024-3747ca'

### Generate Synthetic Data

In [10]:
from seismic.data.synthetic import generate_and_save_synthetic_data

DATA_OUTPUT_DIR = f'{OUTPUT_DIR}/data'

synthetic_data_path = generate_and_save_synthetic_data(
    NUM_INLINES,
    NUM_XLINES,
    NUM_SAMPLES,
    output_dir=DATA_OUTPUT_DIR,
)
print(synthetic_data_path)

[32m2024-09-10 00:00:25.147[0m | [1mINFO    [0m | [36mseismic.data.synthetic[0m:[36mgenerate_and_save_synthetic_data[0m:[36m130[0m - [1mGenerating synthetic data for shape (600, 600, 600)[0m
../output/001-20240910000024-3747ca/data/600-600-600.segy


### Collecting Data for Resource Backend

In [41]:
import traceq

from seismic.attributes import envelope

traceq.load_config(
    {
        "output_dir": f'{OUTPUT_DIR}/resource',
        "logger": {
            "enabled_transports": LOG_TRANSPORTS,
            "level": LOG_LEVEL,
        },
        "profiler": {
            "session_id": 'resource',
            "memory_usage": {
                "enabled_backends": ['resource'],
            },
        },
    }
)

traceq.profile(envelope.run, synthetic_data_path)

[32m2024-09-10 00:17:17.571[0m | [1mINFO    [0m | [36mtraceq.profiler.main[0m:[36mrun_profiler[0m:[36m15[0m - [1mStarting profiler[0m
[32m2024-09-10 00:17:17.571[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.builders[0m:[36mbuild_trace_hooks[0m:[36m20[0m - [34m[1mBuilding trace hooks for enabled metrics: [<Metric.MEMORY_USAGE: 'MEMORY_USAGE'>, <Metric.TIME: 'TIME'>][0m
[32m2024-09-10 00:17:17.573[0m | [1mINFO    [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m12[0m - [1mEnabled memory usage backends: "[<MemoryUsageBackend.RESOURCE: 'RESOURCE'>]"[0m
[32m2024-09-10 00:17:17.574[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m23[0m - [34m[1mLoading backend: "resource"[0m
[32m2024-09-10 00:17:17.575[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m31[0m - [34m[1mLoaded 

### Collecting Data for Psutil Backend

In [42]:
import traceq

from seismic.attributes import envelope

traceq.load_config(
    {
        "output_dir": f'{OUTPUT_DIR}/psutil',
        "logger": {
            "enabled_transports": LOG_TRANSPORTS,
            "level": LOG_LEVEL,
        },
        "profiler": {
            "session_id": 'psutil',
            "memory_usage": {
                "enabled_backends": ['psutil'],
            },
        },
    }
)

traceq.profile(envelope.run, synthetic_data_path)

[32m2024-09-10 00:17:25.119[0m | [1mINFO    [0m | [36mtraceq.profiler.main[0m:[36mrun_profiler[0m:[36m15[0m - [1mStarting profiler[0m
[32m2024-09-10 00:17:25.120[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.builders[0m:[36mbuild_trace_hooks[0m:[36m20[0m - [34m[1mBuilding trace hooks for enabled metrics: [<Metric.MEMORY_USAGE: 'MEMORY_USAGE'>, <Metric.TIME: 'TIME'>][0m
[32m2024-09-10 00:17:25.121[0m | [1mINFO    [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m12[0m - [1mEnabled memory usage backends: "[<MemoryUsageBackend.PSUTIL: 'PSUTIL'>]"[0m
[32m2024-09-10 00:17:25.122[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m23[0m - [34m[1mLoading backend: "psutil"[0m
[32m2024-09-10 00:17:25.123[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m31[0m - [34m[1mLoaded backen

### Collecting Data for Tracemalloc Backend

In [43]:
import traceq

from seismic.attributes import envelope

traceq.load_config(
    {
        "output_dir": f'{OUTPUT_DIR}/tracemalloc',
        "logger": {
            "enabled_transports": LOG_TRANSPORTS,
            "level": LOG_LEVEL,
        },
        "profiler": {
            "session_id": 'tracemalloc',
            "memory_usage": {
                "enabled_backends": ['tracemalloc'],
            },
        },
    }
)

traceq.profile(envelope.run, synthetic_data_path)

[32m2024-09-10 00:17:30.539[0m | [1mINFO    [0m | [36mtraceq.profiler.main[0m:[36mrun_profiler[0m:[36m15[0m - [1mStarting profiler[0m
[32m2024-09-10 00:17:30.540[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.builders[0m:[36mbuild_trace_hooks[0m:[36m20[0m - [34m[1mBuilding trace hooks for enabled metrics: [<Metric.MEMORY_USAGE: 'MEMORY_USAGE'>, <Metric.TIME: 'TIME'>][0m
[32m2024-09-10 00:17:30.541[0m | [1mINFO    [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m12[0m - [1mEnabled memory usage backends: "[<MemoryUsageBackend.TRACEMALLOC: 'TRACEMALLOC'>]"[0m
[32m2024-09-10 00:17:30.542[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m23[0m - [34m[1mLoading backend: "tracemalloc"[0m
[32m2024-09-10 00:17:30.542[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m31[0m - [34m[

### Collect Data for Kernel Backend

In [44]:
import traceq

from seismic.attributes import envelope

traceq.load_config(
    {
        "output_dir": f'{OUTPUT_DIR}/kernel',
        "logger": {
            "enabled_transports": LOG_TRANSPORTS,
            "level": LOG_LEVEL,
        },
        "profiler": {
            "session_id": 'kernel',
            "memory_usage": {
                "enabled_backends": ['kernel'],
            },
        },
    }
)

traceq.profile(envelope.run, synthetic_data_path)

[32m2024-09-10 00:17:45.601[0m | [1mINFO    [0m | [36mtraceq.profiler.main[0m:[36mrun_profiler[0m:[36m15[0m - [1mStarting profiler[0m
[32m2024-09-10 00:17:45.602[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.builders[0m:[36mbuild_trace_hooks[0m:[36m20[0m - [34m[1mBuilding trace hooks for enabled metrics: [<Metric.MEMORY_USAGE: 'MEMORY_USAGE'>, <Metric.TIME: 'TIME'>][0m
[32m2024-09-10 00:17:45.603[0m | [1mINFO    [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m12[0m - [1mEnabled memory usage backends: "[<MemoryUsageBackend.KERNEL: 'KERNEL'>]"[0m
[32m2024-09-10 00:17:45.603[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m23[0m - [34m[1mLoading backend: "kernel"[0m
[32m2024-09-10 00:17:45.604[0m | [34m[1mDEBUG   [0m | [36mtraceq.profiler.metrics.memory_usage.builders[0m:[36mbuild_trace_hooks[0m:[36m31[0m - [34m[1mLoaded backen

## Evaluating Experiment Results

First, we need to load all profiles into variables.

In [48]:
import os
import gzip
import msgpack

from traceq.profiler.loaders import load_profile

def list_directories(path):
    entries = os.listdir(path)

    directories = [
        os.path.join(path, entry)
        for entry in entries
        if os.path.isdir(os.path.join(path, entry))
    ]
    return directories

def find_profiles(directory):
    parquet_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith(".prof"):
                full_path = os.path.join(root, file)
                parquet_files.append(full_path)
    return parquet_files

def get_session_paths(directory_path):
    dirs = list_directories(directory_path)
    backends = [backend for backend in dirs if 'data' not in backend and 'checkpoints' not in backend]

    return [find_profiles(backend)[0] for backend in backends]

def get_session_names(sessions):
    return [
        os.path.basename(session).split("-")[-1].split(".")[0] for session in sessions
    ]

def normalize_metadata(sessions):
    for session in sessions:
        profile = load_profile(session)

        metadata = profile["metadata"]
        metadata_dict = {k: v for k, v in metadata.items()}

        entrypoint_segy_filepath = metadata_dict.pop("entrypoint_segy_filepath", None)
        if entrypoint_segy_filepath:
            entrypoint_shape = os.path.basename(entrypoint_segy_filepath).split(".")[0]
            entrypoint_shape = f"({entrypoint_shape.replace('-',',')})"
            metadata_dict["entrypoint_shape"] = entrypoint_shape

        new_metadata = {k: v for k, v in metadata_dict.items()}
        profile["metadata"] = new_metadata

        with gzip.open(session, "wb") as f:
            packed = msgpack.packb(profile)
            f.write(packed)

session_paths = get_session_paths(OUTPUT_DIR)
normalize_metadata(session_paths)

session_names = get_session_names(session_paths)
zipped_sessions = zip(session_names, session_paths)

traceq.compare_profiles(
    {session_name: session for session_name, session in zipped_sessions},
    UNIT,
    output_dir=OUTPUT_DIR,
)

[32m2024-09-10 00:19:08.771[0m | [1mINFO    [0m | [36mtraceq.analyzer.main[0m:[36mcompare_profiles[0m:[36m16[0m - [1mStarting profiles comparison[0m
[32m2024-09-10 00:19:08.772[0m | [34m[1mDEBUG   [0m | [36mtraceq.analyzer.main[0m:[36mcompare_profiles[0m:[36m17[0m - [34m[1mUsing config: output_dir=PosixPath('../output/001-20240910000024-3747ca') logger=LoggerConfig(enabled_transports=[<Transport.CONSOLE: 'CONSOLE'>, <Transport.FILE: 'FILE'>], level=<Level.DEBUG: 'DEBUG'>) profiler=ProfilerConfig(session_id='kernel', enabled_metrics=[<Metric.MEMORY_USAGE: 'MEMORY_USAGE'>, <Metric.TIME: 'TIME'>], memory_usage=MemoryUsageConfig(enabled_backends=[<MemoryUsageBackend.KERNEL: 'KERNEL'>]), filepath=PosixPath('/home/delucca/src/unicamp/msc/seismic-attributes-memory-profile/tools/seismic/seismic/attributes/envelope.py'), entrypoint='run', signature=[FunctionParameter(name='segy_filepath', type='str', position=0, default='None'), FunctionParameter(name='n_workers', type=