Skip to content

DataFrame memory management utility, del_inactive_dataframes#115

Merged
lshpaner merged 4 commits into
mainfrom
del_inactive_dataframes
Dec 20, 2025
Merged

DataFrame memory management utility, del_inactive_dataframes#115
lshpaner merged 4 commits into
mainfrom
del_inactive_dataframes

Conversation

@Oscar-Gil-Data
Copy link
Copy Markdown
Collaborator

DataFrame memory management

The del_inactive_dataframes function is a utility for managing pandas DataFrames in
interactive, cloud, and local Python environments. It helps reduce memory pressure by
identifying inactive DataFrames and optionally deleting them from a given namespace,
while preserving a specified set of active DataFrames.

This function was designed for exploratory and long-running workflows, such as Jupyter
or cloud notebook sessions, where intermediate DataFrames and hidden references
(for example, IPython output cache variables like _14, _15) can accumulate and
contribute to session instability or crashes.

At a high level, the function can:

  • List active pandas DataFrames in a namespace
  • Optionally delete inactive DataFrames
  • Optionally include IPython output-cache variables during cleanup
  • Optionally report memory usage before and after deletion
  • Provide both human-readable output and a structured return value for programmatic use

Requirements

  • Core functionality supports Python 3.7+
  • Core dependency: pandas
  • Optional pretty console output uses Rich, which requires Python 3.8+
  • Optional to view memory before/after, install psutil

Optional dependencies

This functionality is designed to run with minimal dependencies. Optional packages
enable enhanced output and diagnostics but are not required.


Pretty console output (optional)

  • Pretty tables use rich (requires Python 3.8+)
  • If rich is not installed, or if Python < 3.8 is used, output automatically falls back to plain text
  • No core functionality is lost when Rich is unavailable

Process memory reporting (optional)

  • Process-level memory reporting uses psutil if installed
  • If psutil is not installed, the function still runs normally
  • DataFrame-level memory reporting remains available via pandas

Memory reporting

The function can optionally report memory usage before and after DataFrame deletion.
This is especially useful in long-running or cloud notebook sessions where memory
pressure can build up over time.

Default behavior

  • Default (memory_mode="dataframes"): reports total pandas DataFrame memory using DataFrame.memory_usage(deep=True)
  • This reflects the memory footprint of DataFrames currently held in the namespace
  • This is the primary and most reliable indicator that DataFrames were successfully removed

Optional process-level reporting

  • Optional (memory_mode="all"): also reports process RSS if psutil is installed

What is process RSS?

Process RSS (Resident Set Size) is the amount of physical memory currently allocated to
the running Python process by the operating system. It includes:

  • memory used by pandas and NumPy arrays
  • Python interpreter overhead
  • loaded modules and libraries
  • temporary buffers created during execution, such as console output

RSS represents total process memory, not just DataFrame memory.

How to interpret process RSS

Process RSS is provided as an advisory metric and should be interpreted with care:

  • RSS may not decrease immediately after deleting DataFrames
  • Python often returns freed memory to its internal allocator rather than back to the OS
  • RSS can temporarily increase due to printing, garbage collection, or module loading
  • A decrease in DataFrame memory combined with stable or rising RSS is normal and does not indicate a failure to free DataFrames

For most users, DataFrame memory totals are the primary signal that cleanup succeeded.
Process RSS is best used to observe long-term memory trends rather than immediate
before-and-after changes.


@lshpaner lshpaner merged commit 9b2f137 into main Dec 20, 2025
@lshpaner
Copy link
Copy Markdown
Collaborator

Great job; did some minor cleanup and added unittests; looks great!

@lshpaner lshpaner deleted the del_inactive_dataframes branch December 20, 2025 02:03
lshpaner added a commit that referenced this pull request Dec 20, 2025
DataFrame memory management utility, del_inactive_dataframes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants