Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WISH: Rprofmem() overhaul #25

Open
HenrikBengtsson opened this issue Jun 4, 2016 · 0 comments
Open

WISH: Rprofmem() overhaul #25

HenrikBengtsson opened this issue Jun 4, 2016 · 0 comments

Comments

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Jun 4, 2016

Background

The Rprofmem() of the utils package is built into the very core of R (*) and can be used to log every memory allocation requested in (include by R itself, any R packages and directly by the user). It provides information on the number of bytes allocated and the (reverse) call stack trace that led up to the request. For example:

> Rprofmem()
> x <- integer(1000)
> y <- double(1000)
> z <- complex(1000)
> a <- foo("integer", 1000)
> b <- foo("double", 1000)
> c <- foo("complex", 1000)
> Rprofmem(NULL)
> cat(readLines("Rprofmem.out", warn=FALSE), sep="\n")
4040 :"integer"
8040 :"double"
16040 :"complex"
4040 :"vector" "foo"
8040 :"vector" "foo"
16040 :"vector" "foo"

Note that Rprof(..., interval=0.01, memory.profiling=TRUE) also profiles memory usage, but that is based on a sampling method (every interval seconds), which means it will only report on the overall memory usage (at those snapshots) but not on the invidual memory allocation requests. Because of this, Rprof() will not provide any information on what caused the memory footprint to increase.

(*) In order for Rprofmem() (and tracemem()) to work at all, the R executable must have been built with memory profiling enabled (see Section 'Wishes' below).

Details

The Rprofmem() is built into the very core of R, more precisely it logs every memory allocation done in the allocVector3() function (part of the R API);

#ifdef R_MEMORY_PROFILING
        R_ReportAllocation(hdrsize + size * sizeof(VECREC));
#endif

The more commonly used allocVector() function is just an inline function calling allocVector3(). Moreover, Rprofmem() memory profiling also logs every "newpage" memory allocation done by internal GetNewPage().

Note that Rprofmem() does not log low-level memory allocation done by Calloc() / Free().

Wishes

1. [FIXED] Fix the bug causing allocations without a call stack to clutter up output

When R does memory allocations internally, these are also logged. These entries have an empty call stack. Due to a bug in the code, the log of such entries lack newlines causing several entries to appear on the same line in the log file. For example,

> Rprofmem()
> x <- integer(1000)
> y <- double(1000)
> Rprofmem(NULL)
> cat(readLines("Rprofmem.out", warn=FALSE), sep="\n")
4040 :"integer"
200 :360 :360 :1064 :8040 :"double"

This makes it unnecessarily hard/tricky to parse the Rprofmem log file.

Moved to Issue #42 (solved).

2. Enable memory profiling by default

In order for Rprofmem() (and tracemem()) to work at all, the R executable must have been built with memory profiling enabled, i.e. ./configure --enable-memory-profiling, cf. the 'R Installation and Administration' manual. However, note that the Windows binaries provided via CRAN do indeed have this enabled by default.

However, as Radford Neal suggests (below R-devel thread), "the overhead of having [Rprofmem()] enabled is negligible when profiling is not actually being done". Internally, the logging is done after testing if (R_IsMemReporting) { ... }, which is a very cheap logical test and could therefore could be part of the default build (and not conditional on #ifdef R_MEMORY_PROFILING).

For more details on Raford Neal's suggestions and improvements, see:

Implementing this is very simple, e.g. Radford's patch.

3. Log more information

In the current implementation, which is more or less from 2006, the R memory profiling collects and report on:

  1. Number of bytes allocated (requested)
  2. The call stack trace as the name of the functions called

However, it should be possible to gather more information that this, e.g.

  1. Timestamp
  2. Number of bytes allocated
  3. Data type allocated
  4. The call stack:
    • name and namespace/environment of each function, e.g. base::vector()
    • the source code line (if available)
    • frame identifiers

Similar improvements have already been proposed by others:

4. Different output formats

One could imagine that Rprofmem() supports different flavors of logging. For backward compatibility, one could have a "legacy" mode. One could have options to control exactly what to log, and possibly also options for output format, e.g. tab- or comma-separated value files.

5. Log deallocations

AFAIU, all deallocations of memory allocated by allocVector3() are done by the R garbage collector. It would be useful if these memory deallocations would be recorded in the Rprofmem output too. An immediate benefit would be that one could use the cumulative sum of logged allocations and deallocations to infer the amount of memory currently allocated by R.

Updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant