Skip to content

QUESTION about memory management #1709

@vakopian

Description

@vakopian

The following loop should be able to succeed on my nvidia GPU with 4GB of memory, because each step in the loop should only use less than 800MB. However this loop aborts after a few iterations with CUDA Error (2): out of memory:

    array res = constant(0, 10000, 20000, f32);
    for (int i = 0; i < 20; ++i) {
        array R = randu(10000, 20000, f32);
        res += R;
    }

I added printMemInfo() in each step and this is what it shows:

starting step 0

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
---------------------------------------------------------
starting step 1

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 2

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 3

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1362540000  |   762.9 MB |       Yes |        No |
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 4

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1392040000  |   762.9 MB |       Yes |        No |
|    0x1362540000  |   762.9 MB |       Yes |        No |
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 5

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x13c1b40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1362540000  |   762.9 MB |       Yes |        No |
|    0x1392040000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Device out of memory:101):
In function virtual void* cuda::MemoryManager::nativeAlloc(size_t)
In file src/backend/cuda/memory.cpp:97
CUDA Error (2): out of memory


In function af::array af::randu(const af::dim4&, af::dtype)
In file src/api/cpp/random.cpp:96

I have tried inserting deviceGC in each step in the loop, but the result is the same. I've also tried to sync() (and plus deviceGC) after each step, still the same result.

If I understand this correctly, even though the R array goes out of scope, for some reason AF still keeps the lock on it and does not release the memory back to the pool. It keeps accumulating these until it runs out of device memory.

Is this a bug or expected behavior? If it's expected, what can I do to make such a loop work?

This is of course an artificially constructed example to demonstrate the problem. In my real code, I have a function that uses a lot of GPU memory. I call this function many times, and the first few times it's able to finish properly, but eventually it results in the same out of memory error as above. And printMemInfo() shows the same pattern - temporary arrays that are allocated and used within the function show up as locked memory even after the function is returned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions