# Experiment Notebook

## Introduction

This notebook serves as a showcase of the different units involved in the Memory Consumption Measurement Experiment, which is the first experiment in the Memory-Aware Chunking thesis.

The goal of this experiment is to analyze how Python programs consume memory under different conditions and configurations.
Rather than running the full experiment inside the notebook, this document is designed to explain and validate the individual components used in the experiment, helping to ensure correctness before execution.

### Notebook Structure

The notebook is divided into multiple sections, each focusing on a specific aspect of the experiment:

- **Background & Motivation:** A brief explanation of why measuring memory consumption is important and how it fits into the broader context of memory-aware chunking.
- **Methodology:** A description of how memory is measured, what tools are used, and what metrics are collected.
- **Evaluation of Experiment Components:** Individual Jupyter Notebook cells will showcase and validate key components of the experiment.

The actual experiment is executed outside of this notebook, using a shell script that automates the memory measurements for different test cases.
**This notebook does not run the experiment itself,** but ensures that all the pieces function correctly.

### How to Use this Notebook

- Run the individual cells to inspect and verify the behavior of specific parts of the experiment.
- Review the expected memory usage patterns before running the full experiment via the shell script.
- Use this notebook as a debugging and documentation tool to support future iterations of the experiment.

By structuring the experiment this way, we ensure a clear separation between explanation, validation, and execution, making it easier to reason about the results while keeping the experiment reproducible and well-documented.

## Background and Motivation

Measuring the memory consumption of Python programs is a fundamental aspect of performance analysis, particularly in computationally intensive workflows such as scientific computing and data analysis.
Many scientific applications involve high-dimensional datasets, large-scale computations, and complex algorithms that require execution in high-performance computing (HPC) environments such as supercomputers and distributed systems.
In these settings, resource allocation is a critical factor—inefficient memory management can lead to unnecessary computational costs, reduced system throughput, and even job failures due to memory exhaustion.

Beyond resource management, precise memory measurement plays a crucial role in algorithm evaluation and reproducibility.
Without accurate memory profiling, researchers and engineers risk drawing misleading conclusions about an algorithm’s efficiency and scalability.
This is particularly important in the context of memory-aware chunking, where the goal is to develop methods that dynamically adjust computational workloads based on available memory.
Understanding memory consumption patterns is essential for designing smarter memory-aware algorithms that can optimize resource usage without compromising performance.

However, accurately measuring memory usage in Python is not straightforward.
Several factors affect how memory consumption is reported, including:

- Python’s memory model (dynamic memory allocation, garbage collection)
- Memory fragmentation caused by Python’s internal allocator and C-based libraries
- Operating system optimizations (virtual memory, copy-on-write, memory compression)
- Caching mechanisms that persist across executions
- Third-party libraries (e.g., NumPy, TensorFlow) that handle memory allocation independently of Python’s memory management system.

These factors introduce variability in memory measurements, making it challenging to isolate the true memory footprint of a given computation.
This experiment aims to address these challenges by systematically measuring Python’s memory consumption under controlled conditions, applying different measurement techniques, and identifying the most reliable approaches for scientific workloads.

## Methodology

The experiment is designed to systematically measure the memory consumption of Python programs using a controlled and reproducible approach.
Since Python’s memory management is influenced by multiple factors such as dynamic allocation, garbage collection, and OS-level optimizations, this experiment employs a combination of **internal and external memory measurement techniques** to obtain a comprehensive view of memory usage.

Instead of running the experiment inside the Jupyter Notebook, the actual execution will be performed via a **shell script**.
The notebook serves as a structured guide, validating individual components and explaining the methodology used to ensure correctness before execution.

### 1. Experiment Execution Workflow

![](../../thesis/assets/images/04-experiment-flowchart.png)

Following the flowchart above, he experiment follows a structured workflow to ensure consistency across multiple runs and minimize external interferences:

#### **Step 1: Data Generation**
- The experiment begins by generating **synthetic datasets** designed to simulate real-world computational workloads.
- These datasets vary in **size and structure** to test memory consumption across different scenarios.
- The datasets are **saved to disk** to ensure consistency between runs.

#### **Step 2: Isolated Execution Environment**
- Each execution is performed in a **separate process** to prevent memory contamination from previous runs.
- For full-scale tests, the experiment runs inside a **dedicated shell script** that ensures **a clean execution environment**.
- The script starts the experiment, executes the Python program, and logs memory usage data.

#### **Step 3: Memory Profiling & Data Collection**
- Memory consumption is measured using multiple techniques, capturing **peak memory usage** and **memory usage over time**.
- Both **internal and external profiling tools** are used to compare results:
  - **Internal Tools:** `tracemalloc`
  - **External Tools:** `psutil`, `resource`, `/proc` filesystem
- The experiment logs key memory statistics for later analysis.

#### **Step 4: Cleanup & Reproducibility**
- After execution, **all allocated memory is released**, and any temporary files are cleaned up.
- The experiment is repeated **under identical conditions** to verify result consistency.

### 2. Measurement Techniques
Since different tools provide different perspectives on memory consumption, the experiment leverages a mix of measurement approaches:

#### **External Measurement Techniques (System-Level)**
- **`psutil` Library:** Tracks process-level memory usage (Resident Set Size, Virtual Memory Size).
- **`/proc` Filesystem:** Extracts memory data directly from Linux kernel statistics.
- **`resource` Module:** Reports peak memory consumption during execution.
- **Docker API (if used):** Monitors memory consumption of containerized executions.

#### **Internal Measurement Techniques (Python-Level)**
- **`tracemalloc` Module:** Captures fine-grained memory allocation details, tracking memory usage down to individual objects.

Using multiple techniques ensures that **both high-level and granular memory usage details** are captured, allowing for cross-validation between different tools.

### 3. Key Metrics Collected
The experiment records the following key memory usage metrics:

| Metric | Symbol | Description |
|--------|--------|-------------|
| **Execution Time** | `T` | Total runtime of the program, measured in seconds. |
| **Peak Memory Usage** | `M_peak` | The highest memory usage observed during execution. |
| **Memory Usage Over Time** | `M_t` | Tracks memory consumption at different points during execution. |

These metrics provide insights into **memory efficiency, peak consumption points, and performance trends**.

### 4. Experiment Automation & Logging
- The experiment runs **automatically** via a shell script that:
  - Initializes the execution environment.
  - Runs the Python program with logging enabled.
  - Collects memory statistics and execution logs.
- Output data is stored in structured logs for **post-experiment analysis**.

### 5. Validation & Reproducibility
To ensure accurate results:
- The experiment is **repeated multiple times** under the same conditions.
- Memory measurements from different tools are **compared** to detect inconsistencies.
- If applicable, the system is **restarted between runs** to remove lingering memory allocations.

## Evaluation of Experiment Components

### 1. Validation of Data Generation
The goals of this section are:

- Ensure the synthetic datasets are correctly created and saved.
- Verify the consistency of dataset sizes and structures across runs.

### 2. Validation of Execution Environment
The goals of this section are:

- Check if the experiment runs in an isolated process or container.
- Confirm that no memory contamination occurs between runs.

### 3. Verification of Memory Profiling Techniques
The goals of this section are:

- Test and compare memory measurement tools (`psutil`, `tracemalloc`, `/proc` filesystem).
- Validate consistency between internal and external profiling data.

### 4. Logging and Data Collection Evaluation
The goals of this section are:

- Ensure memory statistics and execution logs are properly recorded.
- Verify the correctness and completeness of logged data.

### 5. Reproducibility Tests
The goals of this section are:

- Run the same experiment multiple times to detect inconsistencies.
- Evaluate the impact of environment resets (e.g., restarting Python/kernel).

### 6. Peak Memory Usage Analysis
The goals of this section are:

- Validate that peak memory usage is correctly captured.
- Compare reported peak values across different measurement techniques.

### 7. Time-Series Memory Usage Evaluation
The goals of this section are:

- Examine memory usage trends over time (`M_t`).
- Identify memory spikes, leaks, and unexpected variations.

### 8. External Monitoring Cross-Validation
The goals of this section are:

- Compare internal memory profiling results with Docker API or `/proc` measurements.
- Detect discrepancies between system-level and Python-level memory reports.

### 9. Controlled Memory Constraint Tests
The goals of this section are:

- Run experiments with memory constraints to validate `M_peak` accuracy.
- Ensure that measured peak memory usage reflects the actual minimum required memory.

## Summary of Findings

TODO