Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid code generated for kernels with optional arguments only #214

Closed
mbuergle opened this issue Aug 30, 2021 · 7 comments
Closed

invalid code generated for kernels with optional arguments only #214

mbuergle opened this issue Aug 30, 2021 · 7 comments

Comments

@mbuergle
Copy link

mbuergle commented Aug 30, 2021

We found that code is not correctly generated if a kernel only requires optional arguments.

Below a part of the falsely generated kernel with four optional arguments. The generated code contains four times the checks if (arg0.opt) instead of also if (arg1.opt), etc.:

  int set_size = op_mpi_halo_exchanges(set, nargs, args);

  if (set_size > 0) {

    for ( int n=0; n<set_size; n++ ){
      if (n<set->core_size && n>0 && n % OP_mpi_test_frequency == 0)
        op_mpi_test_all(nargs,args);
      if (n==set->core_size) {
        op_mpi_wait_all(nargs, args);
      }
      int map0idx;
      if (arg0.opt) {
        map0idx = arg0.map_data[n * arg0.map->dim + 0];
      }
      if (arg0.opt) {
        map0idx = arg0.map_data[n * arg0.map->dim + 0];
      }
      if (arg0.opt) {
        map0idx = arg0.map_data[n * arg0.map->dim + 0];
      }
      if (arg0.opt) {
        map0idx = arg0.map_data[n * arg0.map->dim + 0];
      }


      kernel_MORPHOLOGY_WallFluxes(
        &((double*)arg0.data)[1 * map0idx],
        &((double*)arg1.data)[2 * map0idx],
        &((double*)arg2.data)[3 * map0idx],
        &((double*)arg3.data)[4 * map0idx]);
    }
  }
@reguly
Copy link
Collaborator

reguly commented Aug 30, 2021 via email

@mbuergle
Copy link
Author

Thank you for the fix. It solve the original problem, but now leads to problems for another kernel. In the generated code below, checks are introduced for arguments that are not declared in the scope of the kernel (e.g. arg6, etc.).

void op_par_loop_kernel_MORPHOLOGY_CalcBedGradient(char const *name, op_set set,
  op_arg arg0,
  op_arg arg1,
  op_arg arg4,
  op_arg arg5,
  op_arg arg8,
  op_arg arg11,
  op_arg arg12,
  op_arg arg15){

  int nargs = 16;
  op_arg args[16];

  args[0] = arg0;
  arg1.idx = 0;
  args[1] = arg1;
  for ( int v=1; v<3; v++ ){
    args[1 + v] = op_opt_arg_dat(arg1.opt, arg1.dat, v, arg1.map, 2, "double", OP_READ);
  }

  args[4] = arg4;
  arg5.idx = 0;
  args[5] = arg5;
  for ( int v=1; v<3; v++ ){
    args[5 + v] = op_opt_arg_dat(arg5.opt, arg5.dat, v, arg5.map, 1, "double", OP_READ);
  }

  arg8.idx = 0;
  args[8] = arg8;
  for ( int v=1; v<3; v++ ){
    args[8 + v] = op_opt_arg_dat(arg8.opt, arg8.dat, v, arg8.map, 2, "double", OP_READ);
  }

  args[11] = arg11;
  arg12.idx = 0;
  args[12] = arg12;
  for ( int v=1; v<3; v++ ){
    args[12 + v] = op_arg_dat(arg12.dat, v, arg12.map, 1, "double", OP_READ);
  }

  args[15] = arg15;

  // initialise timers
  double cpu_t1, cpu_t2, wall_t1, wall_t2;
  op_timing_realloc(19);
  op_timers_core(&cpu_t1, &wall_t1);

  if (OP_diags>2) {
    printf(" kernel routine with indirection: kernel_MORPHOLOGY_CalcBedGradient\n");
  }

  int set_size = op_mpi_halo_exchanges(set, nargs, args);

  if (set_size > 0) {

    for ( int n=0; n<set_size; n++ ){
      if (n<set->core_size && n>0 && n % OP_mpi_test_frequency == 0)
        op_mpi_test_all(nargs,args);
      if (n==set->core_size) {
        op_mpi_wait_all(nargs, args);
      }
      int map1idx;
      int map2idx;
      int map3idx;
      int map5idx;
      int map6idx;
      int map7idx;
      map1idx = arg1.map_data[n * arg1.map->dim + 0];
      map2idx = arg1.map_data[n * arg1.map->dim + 1];
      map3idx = arg1.map_data[n * arg1.map->dim + 2];
      if (arg5.opt) {
        map5idx = arg5.map_data[n * arg5.map->dim + 0];
      }
      if (arg6.opt) {
        map6idx = arg5.map_data[n * arg5.map->dim + 1];
      }
      if (arg7.opt) {
        map7idx = arg5.map_data[n * arg5.map->dim + 2];
      }
      if (arg8.opt) {
        map5idx = arg5.map_data[n * arg5.map->dim + 0];
      }
      if (arg9.opt) {
        map6idx = arg5.map_data[n * arg5.map->dim + 1];
      }
      if (arg10.opt) {
        map7idx = arg5.map_data[n * arg5.map->dim + 2];
      }

@reguly
Copy link
Collaborator

reguly commented Sep 1, 2021

Apologies, I forgot about vectorised arguments... Should be fixed now.

@mbuergle
Copy link
Author

mbuergle commented Sep 1, 2021

Thank you for the quick fix. It solved the latest problem. However, I still fail to link the mixed backend executable. Below the resulting output. We will check if the problem lies on our side.

externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_halo_exchanges':
op_dummy_singlenode.c:(.text+0x0): multiple definition of `op_mpi_halo_exchanges'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x31b0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_halo_exchanges_grouped':
op_dummy_singlenode.c:(.text+0x10): multiple definition of `op_mpi_halo_exchanges_grouped'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3230): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_set_dirtybit':
op_dummy_singlenode.c:(.text+0x20): multiple definition of `op_mpi_set_dirtybit'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3320): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_wait_all':
op_dummy_singlenode.c:(.text+0x30): multiple definition of `op_mpi_wait_all'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3380): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_wait_all_grouped':
op_dummy_singlenode.c:(.text+0x40): multiple definition of `op_mpi_wait_all_grouped'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3390): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_halo_exchanges_cuda':
op_dummy_singlenode.c:(.text+0x70): multiple definition of `op_mpi_halo_exchanges_cuda'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x33a0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_set_dirtybit_cuda':
op_dummy_singlenode.c:(.text+0x80): multiple definition of `op_mpi_set_dirtybit_cuda'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3420): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_wait_all_cuda':
op_dummy_singlenode.c:(.text+0x90): multiple definition of `op_mpi_wait_all_cuda'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3480): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reset_halos':
op_dummy_singlenode.c:(.text+0xa0): multiple definition of `op_mpi_reset_halos'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3490): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_barrier':
op_dummy_singlenode.c:(.text+0xb0): multiple definition of `op_mpi_barrier'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34a0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_perf_time':
op_dummy_singlenode.c:(.text+0xc0): multiple definition of `op_mpi_perf_time'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34b0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_float':
op_dummy_singlenode.c:(.text+0xe0): multiple definition of `op_mpi_reduce_float'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34c0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_double':
op_dummy_singlenode.c:(.text+0xf0): multiple definition of `op_mpi_reduce_double'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34d0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_int':
op_dummy_singlenode.c:(.text+0x100): multiple definition of `op_mpi_reduce_int'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34e0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_mpi_reduce_bool':
op_dummy_singlenode.c:(.text+0x110): multiple definition of `op_mpi_reduce_bool'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x34f0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_partition':
op_dummy_singlenode.c:(.text+0x120): multiple definition of `op_partition'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3500): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_partition_ptr(char const*, char const*, op_set_core*, int*, double*)':
op_dummy_singlenode.c:(.text+0x130): multiple definition of `op_partition_ptr(char const*, char const*, op_set_core*, int*, double*)'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3510): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_renumber':
op_dummy_singlenode.c:(.text+0x140): multiple definition of `op_renumber'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0x770): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_renumber_ptr':
op_dummy_singlenode.c:(.text+0x150): multiple definition of `op_renumber_ptr'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0x780): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_compute_moment':
op_dummy_singlenode.c:(.text+0x160): multiple definition of `op_compute_moment'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3530): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_compute_moment_across_times':
op_dummy_singlenode.c:(.text+0x170): multiple definition of `op_compute_moment_across_times'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3540): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_partition_reverse()':
op_dummy_singlenode.c:(.text+0x220): multiple definition of `op_partition_reverse()'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x3520): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_is_root':
op_dummy_singlenode.c:(.text+0x250): multiple definition of `op_is_root'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x35f0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `getHybridGPU()':
op_dummy_singlenode.c:(.text+0x260): multiple definition of `getHybridGPU()'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0x790): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_theta_init(op_export_core*, int*, double*, double*, double*)':
op_dummy_singlenode.c:(.text+0x270): multiple definition of `op_theta_init(op_export_core*, int*, double*, double*, double*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcd0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_inc_theta(op_export_core*, int*, double*, double*)':
op_dummy_singlenode.c:(.text+0x280): multiple definition of `op_inc_theta(op_export_core*, int*, double*, double*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xce0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_import_init_size(int, int*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x290): multiple definition of `op_import_init_size(int, int*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xca0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_import_init(op_export_core*, op_dat_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2a0): multiple definition of `op_import_init(op_export_core*, op_dat_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcb0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_export_init(int, int*, op_map_core*, op_set_core*, op_dat_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2b0): multiple definition of `op_export_init(int, int*, op_map_core*, op_set_core*, op_dat_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcc0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_export_data(op_export_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2c0): multiple definition of `op_export_data(op_export_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xcf0): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `op_import_data(op_import_core*, op_dat_core*)':
op_dummy_singlenode.c:(.text+0x2d0): multiple definition of `op_import_data(op_import_core*, op_dat_core*)'
externalLibs/libop2_cuda.a(op_cuda_decl.c.o):tmpxft_00004df6_00000000-5_op_cuda_decl.cudafe1.cpp:(.text+0xd00): first defined here
externalLibs/libop2_core.a(op_dummy_singlenode.c.o): In function `deviceSync':
op_dummy_singlenode.c:(.text+0x2e0): multiple definition of `deviceSync'
externalLibs/libop2_cuda.a(op_cuda_rt_support.c.o):tmpxft_00004df4_00000000-5_op_cuda_rt_support.cudafe1.cpp:(.text+0x2ff0): first defined here
collect2: error: ld returned 1 exit status

@reguly
Copy link
Collaborator

reguly commented Sep 1, 2021

I think it's that libop2_core and libop2_cuda should not be both linked, just libop2_cuda

@reguly
Copy link
Collaborator

reguly commented Sep 9, 2021

This is now merged in

@reguly reguly closed this as completed Sep 9, 2021
@mbuergle
Copy link
Author

mbuergle commented Sep 9, 2021

Thank you for the hint with libop2_core and libop2_cuda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants