Weird behavior upon increasing input variable. (Illegal memory access) #1953

Ritobusk · 2023-05-28T09:24:53Z

Hello.

(I am assuming the people fixing issues have access to the futharkhpa03fl server since the program uses a lot of memory!) I get the following error:

./knn-driver: knn-driver.c:10583: CUDA call
  cuLaunchKernel(ctx->program->gpu_map_transpose_f32, grid[0], grid[1], grid[2], 32, 8, 1, 0 + (shared_sizze_66634 + (8 - shared_sizze_66634 % 8) % 8), ((void *)0), kernel_args_66631, ((void *)0))
failed with error code 700 (an illegal memory access was encountered)

This happens when I increase either an input variable k or d. E.g. when k = 60 it works as intended. But increasing it to 120 creates the error.

The machine I used was futharkhpa03fl with futhark version 0.22.4 (the default on the machine) and 0.25 with following git ID: git: 761c9de (Wed May 24 21:20:42 2023 +0200).
Both produced the same error.
My program uses a lot of memory so futharkhpa03fl or a machine with a gpu with a lot of memory should probably be used to run it!
I attached the code. There are 2 versions since unflatten was changed in how it is called in the futhark 0.25 version I used. Otherwise the 2 zip files are identical.
It can be run without issue with the following commands:

unzip IssueCodeV0.22.4.zip

cd IssueCode/

futhark cuda knn-driver.fut

futhark dataset -b -g 1i32 -g 60i64 -g [500000][50]f32 -g [500000][50]f32 > tmpdata

./knn-driver -t /dev/stdout -n < tmpdata >> res.txt

This should work (and take a little over 10 seconds). To reproduce the bug: Change the dataset to:
futhark dataset -b -g 1i32 -g 120i64 -g [500000][50]f32 -g [500000][50]f32 > tmpdata
And try to run the program as before with: ./knn-driver -t /dev/stdout -n < tmpdata >> res.txt
Here k (2nd value of the dataset) was changed. If you change d (dimensionality of 3rd and 4th values of dataset) but keep k at 60 you should get a similar error. E.g. with this as the dataset:

futhark dataset -b -g 1i32 -g 60i64 -g [500000][150]f32 -g [500000][150]f32 > tmpdata

The error occurs in this part of the code (knn-driver.fut line 68-82):

  let supercharging = 
    ----let ksqr = k**2
    let (super_knn_inds_seq) =
      loop (curr_knns_q) = (new_knns_q) for i < k do
        let k_inds = map (\knn_ind_q -> map (\q -> knn_inds_t[knn_ind_q[i],q]) (iota k)) knn_inds_q
        let k_points = map (\inds -> map (\ind -> 
                                        let test_point = map (\k_ind -> test_set[ind, k_ind]) (iota d)
                                        in (ind, test_point)
                                        ) inds) k_inds
                       --map (\inds -> map (\ind -> (ind, test_set[ind])) inds) k_inds
        in  map2 (\refs ind -> bruteForce queries[ind] curr_knns_q[ind] refs ) k_points (iota n)
    in super_knn_inds_seq

  let (super_knn_inds, _) =  unzip <| map (\i_knn -> unzip i_knn) supercharging
  in (super_knn_inds)

I tried running the program without this and it ran without errors.
Increasing either d or k will increase the amount of memory needed. However the program does not fail because of allocating too much memory with the datasets above. This can however happen if another proccess is run on futharkhpa03fl while trying to run the program. Or if k, d and the amount of points in the 3rd and 4th dataset values are sufficiently large enough.

IssueCodeV0.25.zip
IssueCodeV0.22.4.zip

The text was updated successfully, but these errors were encountered:

athas · 2023-05-28T13:38:16Z

Does it work with the non-GPU backends?

Ritobusk · 2023-05-28T14:15:57Z

I have not tried with the datasets above. Sorry. I thought it would take too long. I have tried with smaller datasets with the c backend and those work as intended.

The dataset:
futhark dataset -b -g 1i32 -g 120i64 -g [100000][50]f32 -g [10000][50]f32 > tmpdata
Takes around 142 seconds with the c backend. The program scales with the gpu backend in a way that this would take around 10 hours with the dataset above. (With this one: futhark dataset -b -g 1i32 -g 120i64 -g [500000][50]f32 -g [500000][50]f32 > tmpdata)

I'll try to see if it scales in a similar fashion with the c backend!

To be more clear: The code works as intended up to a certain point with the cuda backend. Then when the inputs increase in size what I would expect to happen is that it has to use too much memory. But before that happens the 'weird' bug occurs.

Edit: it scales better! I think it will be finished within 1 hour.

Ritobusk · 2023-05-28T15:39:13Z

It runs without error with the c backend on the dataset
futhark dataset -b -g 1i32 -g 120i64 -g [500000][50]f32 -g [500000][50]f32 > tmpdata
This dataset produces the error with the cuda backend as mentioned above!

athas · 2023-05-29T17:37:49Z

The issue arises during transposition of an array of shape [50][60000000]. Note that the total number of elements does not fit in a single 32-bit integer. (See @coancea, it's beginning to happen in real code!)

I have a pretty good guess at what is going wrong now. I just hope fixing it won't cause performance regressions for transposing smaller arrays. I'd really prefer not to have multi-versioned transpose kernels. (Or rather, even more versions, as we already have three for every primitive type.)

Unfortunately, using 64 bit index arithmetic in all cases results in unacceptable performance for small arrays, so we need this bit of multi-versioning. Fortunately the transpose kernels are not large. Fixes #1953.

Ritobusk · 2023-05-30T15:07:02Z

Thank you for solving the issue! It now works on my end.

Unfortunately, using 64 bit index arithmetic in all cases results in unacceptable performance for small arrays, so we need this bit of multi-versioning. Fortunately the transpose kernels are not large. Fixes diku-dk#1953.

athas added bug compiler labels May 29, 2023

athas mentioned this issue May 29, 2023

Use 64-bit transpose kernels for large arrays. #1954

Merged

athas closed this as completed in #1954 May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird behavior upon increasing input variable. (Illegal memory access) #1953

Weird behavior upon increasing input variable. (Illegal memory access) #1953

Ritobusk commented May 28, 2023

athas commented May 28, 2023

Ritobusk commented May 28, 2023 •

edited

Ritobusk commented May 28, 2023 •

edited

athas commented May 29, 2023

Ritobusk commented May 30, 2023

Weird behavior upon increasing input variable. (Illegal memory access) #1953

Weird behavior upon increasing input variable. (Illegal memory access) #1953

Comments

Ritobusk commented May 28, 2023

athas commented May 28, 2023

Ritobusk commented May 28, 2023 • edited

Ritobusk commented May 28, 2023 • edited

athas commented May 29, 2023

Ritobusk commented May 30, 2023

Ritobusk commented May 28, 2023 •

edited

Ritobusk commented May 28, 2023 •

edited