-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid code generated for kernels with optional arguments only #214
Comments
Thanks - I pushed a fix to fix/optional, #215 - please check if that fixes your issue.
Note that arg0.map_data is still valid even if arg0 is optional and not present, if the map it uses is present.
Cheers,
Istvan
… On 2021. Aug 30., at 17:01, mbuergle ***@***.***> wrote:
We found that code is not correctly generated if a kernel only requires optional arguments.
Part of the falsely generated kernel with 4 optional arguments. The generated code contains four times the checks if (arg0.opt) instead of also if (arg1.opt), etc.:
int set_size = op_mpi_halo_exchanges(set, nargs, args);
if (set_size > 0) {
for ( int n=0; n<set_size; n++ ){
if (n<set->core_size && n>0 && n % OP_mpi_test_frequency == 0)
op_mpi_test_all(nargs,args);
if (n==set->core_size) {
op_mpi_wait_all(nargs, args);
}
int map0idx;
if (arg0.opt) {
map0idx = arg0.map_data[n * arg0.map->dim + 0];
}
if (arg0.opt) {
map0idx = arg0.map_data[n * arg0.map->dim + 0];
}
if (arg0.opt) {
map0idx = arg0.map_data[n * arg0.map->dim + 0];
}
if (arg0.opt) {
map0idx = arg0.map_data[n * arg0.map->dim + 0];
}
kernel_MORPHOLOGY_WallFluxes(
&((double*)arg0.data)[1 * map0idx],
&((double*)arg1.data)[2 * map0idx],
&((double*)arg2.data)[3 * map0idx],
&((double*)arg3.data)[4 * map0idx]);
}
}
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#214>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWVVPS5XLCJ5S632F7NY3T7OMMLANCNFSM5DCB7YKA>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Thank you for the fix. It solve the original problem, but now leads to problems for another kernel. In the generated code below, checks are introduced for arguments that are not declared in the scope of the kernel (e.g. arg6, etc.). void op_par_loop_kernel_MORPHOLOGY_CalcBedGradient(char const *name, op_set set,
op_arg arg0,
op_arg arg1,
op_arg arg4,
op_arg arg5,
op_arg arg8,
op_arg arg11,
op_arg arg12,
op_arg arg15){
int nargs = 16;
op_arg args[16];
args[0] = arg0;
arg1.idx = 0;
args[1] = arg1;
for ( int v=1; v<3; v++ ){
args[1 + v] = op_opt_arg_dat(arg1.opt, arg1.dat, v, arg1.map, 2, "double", OP_READ);
}
args[4] = arg4;
arg5.idx = 0;
args[5] = arg5;
for ( int v=1; v<3; v++ ){
args[5 + v] = op_opt_arg_dat(arg5.opt, arg5.dat, v, arg5.map, 1, "double", OP_READ);
}
arg8.idx = 0;
args[8] = arg8;
for ( int v=1; v<3; v++ ){
args[8 + v] = op_opt_arg_dat(arg8.opt, arg8.dat, v, arg8.map, 2, "double", OP_READ);
}
args[11] = arg11;
arg12.idx = 0;
args[12] = arg12;
for ( int v=1; v<3; v++ ){
args[12 + v] = op_arg_dat(arg12.dat, v, arg12.map, 1, "double", OP_READ);
}
args[15] = arg15;
// initialise timers
double cpu_t1, cpu_t2, wall_t1, wall_t2;
op_timing_realloc(19);
op_timers_core(&cpu_t1, &wall_t1);
if (OP_diags>2) {
printf(" kernel routine with indirection: kernel_MORPHOLOGY_CalcBedGradient\n");
}
int set_size = op_mpi_halo_exchanges(set, nargs, args);
if (set_size > 0) {
for ( int n=0; n<set_size; n++ ){
if (n<set->core_size && n>0 && n % OP_mpi_test_frequency == 0)
op_mpi_test_all(nargs,args);
if (n==set->core_size) {
op_mpi_wait_all(nargs, args);
}
int map1idx;
int map2idx;
int map3idx;
int map5idx;
int map6idx;
int map7idx;
map1idx = arg1.map_data[n * arg1.map->dim + 0];
map2idx = arg1.map_data[n * arg1.map->dim + 1];
map3idx = arg1.map_data[n * arg1.map->dim + 2];
if (arg5.opt) {
map5idx = arg5.map_data[n * arg5.map->dim + 0];
}
if (arg6.opt) {
map6idx = arg5.map_data[n * arg5.map->dim + 1];
}
if (arg7.opt) {
map7idx = arg5.map_data[n * arg5.map->dim + 2];
}
if (arg8.opt) {
map5idx = arg5.map_data[n * arg5.map->dim + 0];
}
if (arg9.opt) {
map6idx = arg5.map_data[n * arg5.map->dim + 1];
}
if (arg10.opt) {
map7idx = arg5.map_data[n * arg5.map->dim + 2];
}
|
Apologies, I forgot about vectorised arguments... Should be fixed now. |
Thank you for the quick fix. It solved the latest problem. However, I still fail to link the mixed backend executable. Below the resulting output. We will check if the problem lies on our side. externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_halo_exchanges':
op_dummy_singlenode.c:(.text+0x0): multiple definition of `op_mpi_halo_exchanges'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x31b0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_halo_exchanges_grouped':
op_dummy_singlenode.c:(.text+0x10): multiple definition of `op_mpi_halo_exchanges_grouped'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3230): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_set_dirtybit':
op_dummy_singlenode.c:(.text+0x20): multiple definition of `op_mpi_set_dirtybit'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3320): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_wait_all':
op_dummy_singlenode.c:(.text+0x30): multiple definition of `op_mpi_wait_all'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3380): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_wait_all_grouped':
op_dummy_singlenode.c:(.text+0x40): multiple definition of `op_mpi_wait_all_grouped'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3390): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_halo_exchanges_cuda':
op_dummy_singlenode.c:(.text+0x70): multiple definition of `op_mpi_halo_exchanges_cuda'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x33a0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_set_dirtybit_cuda':
op_dummy_singlenode.c:(.text+0x80): multiple definition of `op_mpi_set_dirtybit_cuda'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3420): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_wait_all_cuda':
op_dummy_singlenode.c:(.text+0x90): multiple definition of `op_mpi_wait_all_cuda'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3480): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reset_halos':
op_dummy_singlenode.c:(.text+0xa0): multiple definition of `op_mpi_reset_halos'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3490): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_barrier':
op_dummy_singlenode.c:(.text+0xb0): multiple definition of `op_mpi_barrier'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34a0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_perf_time':
op_dummy_singlenode.c:(.text+0xc0): multiple definition of `op_mpi_perf_time'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34b0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_float':
op_dummy_singlenode.c:(.text+0xe0): multiple definition of `op_mpi_reduce_float'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34c0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_double':
op_dummy_singlenode.c:(.text+0xf0): multiple definition of `op_mpi_reduce_double'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34d0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_int':
op_dummy_singlenode.c:(.text+0x100): multiple definition of `op_mpi_reduce_int'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34e0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_bool':
op_dummy_singlenode.c:(.text+0x110): multiple definition of `op_mpi_reduce_bool'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34f0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_partition':
op_dummy_singlenode.c:(.text+0x120): multiple definition of `op_partition'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3500): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_partition_ptr(char const*, char const*, op_set_core*, int*, double*)':
op_dummy_singlenode.c:(.text+0x130): multiple definition of `op_partition_ptr(char const*, char const*, op_set_core*, int*, double*)'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3510): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_renumber':
op_dummy_singlenode.c:(.text+0x140): multiple definition of `op_renumber'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0x770): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_renumber_ptr':
op_dummy_singlenode.c:(.text+0x150): multiple definition of `op_renumber_ptr'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0x780): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_compute_moment':
op_dummy_singlenode.c:(.text+0x160): multiple definition of `op_compute_moment'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3530): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_compute_moment_across_times':
op_dummy_singlenode.c:(.text+0x170): multiple definition of `op_compute_moment_across_times'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3540): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_partition_reverse()':
op_dummy_singlenode.c:(.text+0x220): multiple definition of `op_partition_reverse()'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3520): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_is_root':
op_dummy_singlenode.c:(.text+0x250): multiple definition of `op_is_root'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x35f0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `getHybridGPU()':
op_dummy_singlenode.c:(.text+0x260): multiple definition of `getHybridGPU()'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0x790): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_theta_init(op_export_core*, int*, double*, double*, double*)':
op_dummy_singlenode.c:(.text+0x270): multiple definition of `op_theta_init(op_export_core*, int*, double*, double*, double*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcd0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_inc_theta(op_export_core*, int*, double*, double*)':
op_dummy_singlenode.c:(.text+0x280): multiple definition of `op_inc_theta(op_export_core*, int*, double*, double*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xce0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_import_init_size(int, int*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x290): multiple definition of `op_import_init_size(int, int*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xca0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_import_init(op_export_core*, op_dat_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2a0): multiple definition of `op_import_init(op_export_core*, op_dat_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcb0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_export_init(int, int*, op_map_core*, op_set_core*, op_dat_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2b0): multiple definition of `op_export_init(int, int*, op_map_core*, op_set_core*, op_dat_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcc0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_export_data(op_export_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2c0): multiple definition of `op_export_data(op_export_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcf0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_import_data(op_import_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2d0): multiple definition of `op_import_data(op_import_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xd00): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `deviceSync':
op_dummy_singlenode.c:(.text+0x2e0): multiple definition of `deviceSync'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x2ff0): first defined here
collect2: error: ld returned 1 exit status |
I think it's that libop2_core and libop2_cuda should not be both linked, just libop2_cuda |
This is now merged in |
Thank you for the hint with libop2_core and libop2_cuda. |
We found that code is not correctly generated if a kernel only requires optional arguments.
Below a part of the falsely generated kernel with four optional arguments. The generated code contains four times the checks
if (arg0.opt)
instead of alsoif (arg1.opt)
, etc.:The text was updated successfully, but these errors were encountered: