QUESTION about memory management

The following loop should be able to succeed on my nvidia GPU with 4GB of memory, because each step in the loop should only use less than 800MB. However this loop aborts after a few iterations with `CUDA Error (2): out of memory`:
```
    array res = constant(0, 10000, 20000, f32);
    for (int i = 0; i < 20; ++i) {
        array R = randu(10000, 20000, f32);
        res += R;
    }
```
I added `printMemInfo()` in each step and this is what it shows:
```
starting step 0

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
---------------------------------------------------------
starting step 1

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 2

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 3

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1362540000  |   762.9 MB |       Yes |        No |
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 4

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x1392040000  |   762.9 MB |       Yes |        No |
|    0x1362540000  |   762.9 MB |       Yes |        No |
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
starting step 5

---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|    0x13c1b40000  |   762.9 MB |       Yes |        No |
|    0x1302f40000  |   762.9 MB |       Yes |        No |
|    0x1332a40000  |   762.9 MB |       Yes |        No |
|    0x1362540000  |   762.9 MB |       Yes |        No |
|    0x1392040000  |   762.9 MB |       Yes |        No |
---------------------------------------------------------
terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Device out of memory:101):
In function virtual void* cuda::MemoryManager::nativeAlloc(size_t)
In file src/backend/cuda/memory.cpp:97
CUDA Error (2): out of memory


In function af::array af::randu(const af::dim4&, af::dtype)
In file src/api/cpp/random.cpp:96
```

I have tried inserting `deviceGC` in each step in the loop, but the result is the same. I've also tried to `sync()` (and plus `deviceGC`) after each step, still the same result.

If I understand this correctly, even though the `R` array goes out of scope, for some reason AF still keeps the lock on it and does not release the memory back to the pool. It keeps accumulating these until it runs out of device memory.

Is this a bug or expected behavior? If it's expected, what can I do to make such a loop work?

This is of course an artificially constructed example to demonstrate the problem. In my real code, I have a function that uses a lot of GPU memory. I call this function many times, and the first few times it's able to finish properly, but eventually it results in the same out of memory error as above. And `printMemInfo()` shows the same pattern - temporary arrays that are allocated and used within the function show up as locked memory even after the function is returned.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUESTION about memory management #1709

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QUESTION about memory management #1709

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions