Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type conversion error r8 -> r4 #23

Open
jedwards4b opened this issue Oct 22, 2018 · 29 comments
Open

Type conversion error r8 -> r4 #23

jedwards4b opened this issue Oct 22, 2018 · 29 comments

Comments

@jedwards4b
Copy link

jedwards4b commented Oct 22, 2018

The following code, derived from put_var.f90, demonstrates a problem with R8->R4 conversion in pnetcdf.
IBM XL C/C++ for Linux, V16.1.1 (Beta 6)
Version: 16.01.0001.0000
ppc64le

!
!   Copyright (C) 2013, Northwestern University and Argonne National Laboratory
!   See COPYRIGHT notice in top-level directory.
!
! $Id$

!
! This example shows how to use nf90mpi_put_var_all() to write a 2D
! 4-byte integer array in parallel. It first defines a netCDF variable of
! size global_nx * global_ny where
!    global_nx == 5 and
!    global_ny == (4 * number of MPI processes).
! The data partitioning pattern is a column-wise partitioning across all
! processes. Each process writes a subarray of size nx * ny.
! Note the description above follows the Fortran array index order.
!
! Example commands for MPI run and outputs from running ncmpidump on the
! NC file produced by this example program:
!
!    % mpif90 -O2 -o put_var put_var.f90 -lpnetcdf
!    % mpiexec -n 4 ./put_var /pvfs2/wkliao/testfile.nc
!
!    % ncmpidump /pvfs2/wkliao/testfile.nc
!    netcdf testfile {
!    // file format: CDF-5 (big variables)
!    dimensions:
!            x = 5 ;
!            y = 16 ;
!    variables:
!            int var(y, x) ;
!    data:
!
!     var =
!      0, 0, 0, 0, 0,
!      0, 0, 0, 0, 0,
!      0, 0, 0, 0, 0,
!      0, 0, 0, 0, 0,
!      1, 1, 1, 1, 1,
!      1, 1, 1, 1, 1,
!      1, 1, 1, 1, 1,
!      1, 1, 1, 1, 1,
!      2, 2, 2, 2, 2,
!      2, 2, 2, 2, 2,
!      2, 2, 2, 2, 2,
!      2, 2, 2, 2, 2,
!      3, 3, 3, 3, 3,
!      3, 3, 3, 3, 3,
!      3, 3, 3, 3, 3,
!      3, 3, 3, 3, 3 ;
!    }
!
      subroutine check(err, message)
          use mpi
          use pnetcdf
          implicit none
          integer err
          character(len=*) message

          ! It is a good idea to check returned value for possible error
          if (err .NE. NF90_NOERR) then
              write(6,*) trim(message), trim(nf90mpi_strerror(err))
              call MPI_Abort(MPI_COMM_WORLD, -1, err)
          end if
      end subroutine check

      program main
          use mpi
          use pnetcdf
          implicit none
          integer,parameter :: R8 = selected_real_kind(12) ! 8 byte real
          integer,parameter :: R4 = selected_real_kind( 6) ! 4 byte real

          character(LEN=256) filename, cmd
          integer err, nprocs, rank, ierr, get_args, dummy
          integer cmode, ncid, varid, dimid(2), varid2
          integer(kind=MPI_OFFSET_KIND) nx, ny, global_nx, global_ny
          integer(kind=MPI_OFFSET_KIND) starts(2), counts(2)
          integer(kind=MPI_OFFSET_KIND) malloc_size, sum_size
          PARAMETER(nx=5, ny=4)
          real(kind=R8):: buf(nx,ny)
          logical verbose

          call MPI_Init(err)
          call MPI_Comm_rank(MPI_COMM_WORLD, rank, err)
          call MPI_Comm_size(MPI_COMM_WORLD, nprocs, err)

          ! take filename from command-line argument if there is any
          if (rank .EQ. 0) then
              verbose = .TRUE.
              filename = "testfile.nc"
              ierr = get_args(2, cmd, filename, verbose, dummy)
          endif
          call MPI_Bcast(ierr, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, err)
          if (ierr .EQ. 0) goto 999

          call MPI_Bcast(verbose, 1, MPI_LOGICAL, 0, MPI_COMM_WORLD, err)
          call MPI_Bcast(filename, 256, MPI_CHARACTER, 0, MPI_COMM_WORLD, err)

          ! set parameters
          global_nx = nx
          global_ny = ny * nprocs

          buf = real(rank,kind=R8);

          ! create file, truncate it if exists
          cmode = IOR(NF90_CLOBBER, NF90_64BIT_DATA)
          err = nf90mpi_create(MPI_COMM_WORLD, filename, cmode, &
                               MPI_INFO_NULL, ncid)
          call check(err, 'In nf90mpi_create: ')

          ! define dimensions x and y
          err = nf90mpi_def_dim(ncid, "x", global_nx, dimid(1))
          call check(err, 'In nf90mpi_def_dim x: ')
          err = nf90mpi_def_dim(ncid, "y", global_ny, dimid(2))
          call check(err, 'In nf90mpi_def_dim y: ')

          ! define a 2D variable of double type
          err = nf90mpi_def_var(ncid, "var", NF90_DOUBLE, dimid, varid)
          call check(err, 'In nf90mpi_def_var: ')

          ! define a 2D variable of real type
          err = nf90mpi_def_var(ncid, "var2", NF90_REAL, dimid, varid2)
          call check(err, 'In nf90mpi_def_var: ')

          ! do not forget to exit define mode
          err = nf90mpi_enddef(ncid)
          call check(err, 'In nf90mpi_enddef: ')

          ! now we are in data mode

          ! Note that in Fortran, array indices start with 1
          starts(1) = 1
          starts(2) = ny * rank + 1
          counts(1) = nx
          counts(2) = ny

          err = nf90mpi_put_var_all(ncid, varid, buf, starts, counts)
          call check(err, 'In nf90mpi_put_var_all: ')

          err = nf90mpi_put_var_all(ncid, varid2, buf, starts, counts)
          call check(err, 'In nf90mpi_put_var_all: ')

          ! close the file
          err = nf90mpi_close(ncid)
          call check(err, 'In nf90mpi_close: ')

          ! check if there is any PnetCDF internal malloc residue
 998      format(A,I13,A)
          err = nf90mpi_inq_malloc_size(malloc_size)
          if (err == NF90_NOERR) then
              call MPI_Reduce(malloc_size, sum_size, 1, MPI_INTEGER8, &
                              MPI_SUM, 0, MPI_COMM_WORLD, err)
              if (rank .EQ. 0 .AND. sum_size .GT. 0_MPI_OFFSET_KIND) print 998, &
                  'heap memory allocated by PnetCDF internally has ',  &
                  sum_size/1048576, ' MiB yet to be freed'
          endif

 999      call MPI_Finalize(err)
          ! call EXIT(0) ! EXIT() is a GNU extension
      end program main

The resulting testfile.nc is

netcdf testfile {
dimensions:
	x = 5 ;
	y = 16 ;
variables:
	double var(y, x) ;
	float var2(y, x) ;
data:

 var =
  0, 0, 0, 0, 0,
  0, 0, 0, 0, 0,
  0, 0, 0, 0, 0,
  0, 0, 0, 0, 0,
  1, 1, 1, 1, 1,
  1, 1, 1, 1, 1,
  1, 1, 1, 1, 1,
  1, 1, 1, 1, 1,
  2, 2, 2, 2, 2,
  2, 2, 2, 2, 2,
  2, 2, 2, 2, 2,
  2, 2, 2, 2, 2,
  3, 3, 3, 3, 3,
  3, 3, 3, 3, 3,
  3, 3, 3, 3, 3,
  3, 3, 3, 3, 3 ;

 var2 =
  -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
  -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
  -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
  -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
  -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
  -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
  -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
  -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
  -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
  -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
  -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
  -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
  -28308.59, -28308.59, -28308.59, -28308.59, -28308.59,
  -28308.59, -28308.59, -28308.59, -28308.59, -28308.59,
  -28308.59, -28308.59, -28308.59, -28308.59, -28308.59,
  -28308.59, -28308.59, -28308.59, -28308.59, -28308.59 ;
}
@wkliao
Copy link
Member

wkliao commented Oct 22, 2018

Hi, @jedwards4b Does the same happen to C program?

@jedwards4b
Copy link
Author

Yes, it also appears to happen with a C program, here is a test:

#include <stdio.h>
#include <stdlib.h>
#include <string.h> /* strcpy(), strncpy() */
#include <unistd.h> /* getopt() */
#include <time.h>   /* time() localtime(), asctime() */
#include <mpi.h>
#include <pnetcdf.h>

#define NY 10
#define NX 4

static int verbose;

#define ERR {if(err!=NC_NOERR){printf("Error at %s:%d : %s\n", __FILE__,__LINE__, ncmpi_strerror(err));nerrs++;}}

static void
usage(char *argv0)
{
    char *help =
    "Usage: %s [-h] | [-q] [-k format] [file_name]\n"
    "       [-h] Print help\n"
    "       [-q] Quiet mode (reports when fail)\n"
    "       [-k format] file format: 1 for CDF-1, 2 for CDF-2, 3 for NetCDF4,\n"
    "                                4 for NetCDF4 classic model, 5 for CDF-5\n"
    "       [filename] output netCDF file name\n";
    fprintf(stderr, help, argv0);
}

/*----< pnetcdf_check_mem_usage() >------------------------------------------*/
/* check PnetCDF library internal memory usage */
static int
pnetcdf_check_mem_usage(MPI_Comm comm)
{
    int err, nerrs=0, rank;
    MPI_Offset malloc_size, sum_size;

    MPI_Comm_rank(comm, &rank);

    /* print info about PnetCDF internal malloc usage */
    err = ncmpi_inq_malloc_max_size(&malloc_size);
    if (err == NC_NOERR) {
        MPI_Reduce(&malloc_size, &sum_size, 1, MPI_OFFSET, MPI_SUM, 0, MPI_COMM_WORLD);
        if (rank == 0 && verbose)
            printf("maximum heap memory allocated by PnetCDF internally is %lld bytes\n",
                   sum_size);

        /* check if there is any PnetCDF internal malloc residue */
        err = ncmpi_inq_malloc_size(&malloc_size);
        MPI_Reduce(&malloc_size, &sum_size, 1, MPI_OFFSET, MPI_SUM, 0, MPI_COMM_WORLD);
        if (rank == 0 && sum_size > 0)
            printf("heap memory allocated by PnetCDF internally has %lld bytes yet to be freed\n",
                   sum_size);
    }
    else {
        printf("Error at %s:%d: %s\n", __FILE__,__LINE__,ncmpi_strerror(err));
        nerrs++;
    }
    return nerrs;
}

/*----< pnetcdf_io() >-------------------------------------------------------*/
static int
pnetcdf_io(MPI_Comm comm, char *filename, int cmode)
{
    int i, j, rank, nprocs, err, nerrs=0;
    int ncid, varid, dimid[2];
    double  buf[NY][NX];
    char str_att[128];
    float float_att[100];
    MPI_Offset  global_ny, global_nx;
    MPI_Offset start[2], count[2];

    MPI_Comm_rank(comm, &rank);
    MPI_Comm_size(comm, &nprocs);

    /* create a new file for writing ----------------------------------------*/
    cmode |= NC_CLOBBER;
    err = ncmpi_create(comm, filename, cmode, MPI_INFO_NULL, &ncid); ERR

    /* the global array is NY * (NX * nprocs) */
    global_ny = NY;
    global_nx = NX * nprocs;

    for (i=0; i<NY; i++)
        for (j=0; j<NX; j++)
	  buf[i][j] = (float) rank;

    /* add a global attribute: a time stamp at rank 0 */
    time_t ltime = time(NULL); /* get the current calendar time */
    asctime_r(localtime(&ltime), str_att);
    sprintf(str_att, "Mon Aug 13 21:27:48 2018");

    /* make sure the time string are consistent among all processes */
    MPI_Bcast(str_att, sizeof(str_att), MPI_CHAR, 0, comm);

    err = ncmpi_put_att_text(ncid, NC_GLOBAL, "history", strlen(str_att),
                             &str_att[0]); ERR

    /* define dimensions x and y */
    err = ncmpi_def_dim(ncid, "Y", global_ny, &dimid[0]); ERR
    err = ncmpi_def_dim(ncid, "X", global_nx, &dimid[1]); ERR

    /* define a 2D variable of integer type */
    err = ncmpi_def_var(ncid, "var", NC_FLOAT, 2, dimid, &varid); ERR

    /* add attributes to the variable */
    strcpy(str_att, "example attribute of type text.");
    err = ncmpi_put_att_text(ncid, varid, "str_att_name", strlen(str_att),
                             &str_att[0]); ERR

    for (i=0; i<8; i++) float_att[i] = i;
    err = ncmpi_put_att_float(ncid, varid, "float_att_name", NC_FLOAT, 8,
                              &float_att[0]); ERR
    short short_att=1000;
    err = ncmpi_put_att_short(ncid, varid, "short_att_name", NC_SHORT, 1,
                              &short_att); ERR

    /* do not forget to exit define mode */
    err = ncmpi_enddef(ncid); ERR

    /* now we are in data mode */
    start[0] = 0;
    start[1] = NX * rank;
    count[0] = NY;
    count[1] = NX;

    err = ncmpi_put_vara_all(ncid, varid, start, count, &buf[0][0], NY*NX, MPI_DOUBLE); ERR

    err = ncmpi_close(ncid); ERR

    return nerrs;
}

int main(int argc, char** argv)
{
    extern int optind;
    char filename[256];
    int i, rank, kind=0, cmode=0, nerrs=0;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    verbose = 1;

    /* get command-line arguments */
    while ((i = getopt(argc, argv, "hqk:")) != EOF)
        switch(i) {
            case 'q': verbose = 0;
                      break;
            case 'k': kind = atoi(optarg);
                      break;
            case 'h':
            default:  if (rank==0) usage(argv[0]);
                      MPI_Finalize();
                      return 1;
        }
    if (argv[optind] == NULL) strcpy(filename, "testfile.nc");
    else                      snprintf(filename, 256, "%s", argv[optind]);

    MPI_Bcast(filename, 256, MPI_CHAR, 0, MPI_COMM_WORLD);

    if (verbose && rank == 0) printf("%s: example of using put_vara APIs\n",__FILE__);

    switch (kind) {
        case(2): cmode = NC_64BIT_OFFSET;             break;
        case(3): cmode = NC_NETCDF4;                  break;
        case(4): cmode = NC_NETCDF4|NC_CLASSIC_MODEL; break;
        case(5): cmode = NC_64BIT_DATA;               break;
        default: cmode = 0;
    }

#ifndef PNETCDF_DRIVER_NETCDF4
    /* netcdf4 driver is not enabled, skip */
    if (kind == 3 || kind == 4) {
        MPI_Finalize();
        return 0;
    }
#endif
    nerrs += pnetcdf_io(MPI_COMM_WORLD, filename, cmode);

    nerrs += pnetcdf_check_mem_usage(MPI_COMM_WORLD); 

    MPI_Finalize();
    return (nerrs > 0);
}

@jedwards4b
Copy link
Author

jedwards4b commented Oct 22, 2018

It seems to be an xl compiler issue. The same test using pgi is okay. (on the same machine and using the same MPI library)

@rzurob
Copy link

rzurob commented Oct 23, 2018

I opened a defect against XL for this issue. For followup purposes with the XL beta liaison for your organization, this is defect 158051 in our defect tracking system.

@wkliao
Copy link
Member

wkliao commented Oct 24, 2018

Hi @jedwards4b
You mentioned earlier that the issue does not happen to serial netcdf.
I am wondering if this is still the case.

@jedwards4b
Copy link
Author

HI @wkliao the IBM folks tell me that a workaround is to compile without optimization. It could be that the netcdf build was without -O and the pnetcdf c build had -O. I will look into that tomorrow and verify.

@rzurob
Copy link

rzurob commented Oct 24, 2018

The file that's miscompiled is ncx.lo. So you can limit the workaround to this file by adding the following to your local version of the build tree:
In src/drivers/common/Makefile, add:

ncx.lo: LTCOMPILE += -qnoopt

before the ".c.lo:" recipe.

@rzurob
Copy link

rzurob commented Oct 24, 2018

Another alternative is to use -qnoinline instead of -qnoopt in your local copy of src/drivers/common/Makefile

ncx.lo: LTCOMPILE += -qnoinline

@rzurob
Copy link

rzurob commented Oct 24, 2018

We believe this is a source issue and not a compiler bug. It appears that ncx.c breaks strict ANSI C aliasing rules in regards to how put_ix_float / get_ix_float are used. That's why the file was miscompiled. The file should either be changed to follow strict ANSI C aliasing, or -fno-strict-aliasing should be used to compile it. e.g.
ncx.lo: LTCOMPILE += -fno-strict-aliasing

@wkliao
Copy link
Member

wkliao commented Oct 24, 2018

I recall this strict aliasing error happens before to some xlc compilers. At that time, I added "-fno-inline" to work around. Will this and removing all inline directive from the source codes resolve the problem?

@rzurob
Copy link

rzurob commented Oct 24, 2018

Our recommendation is to add -fno-strict-aliasing for this file. You can keep the inline directives. -fno-strict-aliasing is a big hammer, but it will do the right thing. Removing inlining might work with our current optimizer, but future ones might try more aggressive optimizations that will make the problem appear again.

@jedwards4b
Copy link
Author

jedwards4b commented Oct 24, 2018 via email

@wkliao
Copy link
Member

wkliao commented Oct 24, 2018

Hi @rzurob and @jedwards4b
May I assume you are testing this on a Big Endian machine?
The source codes of put_ix_float in question are shown below.
As you can see, memcpy() is called if the machine is big endian.
I can understand the strict aliasing error on little endian machine because of the byte swap operation in swap4b, but not memcpy. Can you verify?

inline static void
put_ix_float(void *xp, const float *ip)
{
#ifdef WORDS_BIGENDIAN
        (void) memcpy(xp, ip, X_SIZEOF_FLOAT);
#else
        swap4b(xp, ip);
#endif
}

@jedwards4b
Copy link
Author

jedwards4b commented Oct 24, 2018 via email

@wkliao
Copy link
Member

wkliao commented Oct 24, 2018

OK. That makes sense.
I committed a fix (#24) to the master branch using the below solution suggested by @rzurob (thanks!). In addition, I also removed all the inline directives. Let me know if this works for you.

ncx.lo: LTCOMPILE += -fno-strict-aliasing

@jedwards4b
Copy link
Author

It works for the ibm compiler, however now when I try to use the pgi compiler on the same system I am still getting that -fno-strict-aliasing flag.

@wkliao
Copy link
Member

wkliao commented Nov 12, 2018

Hi @jedwards4b
Please send me your config.log file.
FYI The latest configure.ac and Makefile.am checks whether PGI based compilers are used and selects the compile flag understood by PGI.

if MPICC_IS_PGCC
ncx.lo: LTCOMPILE += -alias=traditional
else

@jedwards4b
Copy link
Author

checking if C compiler is pgcc... no

mpicc --version

pgcc 18.7-0 linuxpower target on Linuxpower
PGI Compilers and Tools
Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.

@wkliao
Copy link
Member

wkliao commented Nov 12, 2018

In my script, I use "-V". Could you run "mpicc -V" and let me know?

@jedwards4b
Copy link
Author

I see the problem - in acinclude.m4:

@jedwards4b
Copy link
Author

AC_DEFUN([UD_CHECK_PGCC],[
    AC_CACHE_CHECK([if C compiler is pgcc], [ac_cv_cc_compiler_pgcc],
    [ac_cv_cc_compiler_pgcc=no
     _CC_VER=`$MPICC -V -c 2> /dev/null`
     _CC_VENDOR=`echo $_PGCC_VER | cut -d' ' -f1`
     if test "x${_CC_VENDOR}" = xpgcc ; then
        ac_cv_cc_compiler_pgcc=yes
     fi
     unset _CC_VER
     unset _CC_VENDOR
    ])
])

Shouldn't _PGCC_VER be _CC_VER?

@jedwards4b
Copy link
Author

Yes, changing _PGCC_VER to _CC_VER above solves the problem.

checking if C compiler is pgcc... yes

@wkliao
Copy link
Member

wkliao commented Nov 12, 2018

You forgot to change _PGCC_VER in setting _CC_VENDOR.

This is strange, because _PGCC_VER and _PGCC_VENDOR are used locally.
Maybe the names conflict with one used by PGI environment?

Can you reverse your change and run autoreconf and your configure command again?

@jedwards4b
Copy link
Author

I posted the original before - here is the corrected code:

AC_DEFUN([UD_CHECK_PGCC],[
    AC_CACHE_CHECK([if C compiler is pgcc], [ac_cv_cc_compiler_pgcc],
    [ac_cv_cc_compiler_pgcc=no
     _CC_VER=`$MPICC -V -c 2> /dev/null`
     _CC_VENDOR=`echo $_CC_VER | cut -d' ' -f1`
     if test "x${_CC_VENDOR}" = xpgcc ; then
        ac_cv_cc_compiler_pgcc=yes
     fi
     unset _CC_VER
     unset _CC_VENDOR
    ])
])

I don't see any reason to change it the other way but I have no objection.

@wkliao
Copy link
Member

wkliao commented Nov 12, 2018

I just realized that I have a local committed change that has not been pushed to the repo.
Let me fix this and let you know.

@wkliao
Copy link
Member

wkliao commented Nov 13, 2018

I just pushed a new commit to acinclude.m4 to the master branch that should fixes the problem.

@jedwards4b
Copy link
Author

jedwards4b commented Nov 13, 2018 via email

@wkliao
Copy link
Member

wkliao commented Dec 4, 2018

Hi, @rzurob
It seems -fno-strict-aliasing is only recognizable by the XL compiler V13, but not V12.
In V12, is the corresponding one -qalias=noansi?
(not -qalias=norestrict)

@rzurob
Copy link

rzurob commented Dec 4, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants