Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nosetests fail kpoint test #27

Closed
rousseab opened this issue Apr 20, 2014 · 11 comments
Closed

Nosetests fail kpoint test #27

rousseab opened this issue Apr 20, 2014 · 11 comments

Comments

@rousseab
Copy link
Contributor

Hello,

I find that nosetests fail to execute a kpoint test, as you can see below.

  • Is this a known issue? Is there a known fix?
  • Are others experiencing this problem, or am I seeing malicious version+compilation+machine+OS interference?
  • Is my analysis below correct?

Cheers!

Analysis:

My limited understanding of ABIPY leads me to think that the error is due to the fact that the decorator @returns_None_onfail on the reading routines in class KpointsReaderMixin isn't catching undefined variables in the netcdf file.

ABIPY tries to determine if a file contains a PATH or an IRR. BZ, but assumes an IRR. BZ by default. All variables are defined in the netcdf file, but when a PATH is present, the variables monkhorst_pack_folding, kpoint_grid_shift and kpoint_grid_vectors are not set.

ABIPY fails to catch that these variables are not set, reads their values on disk, which I guess are a set of binary zeros (for signed int32, - 1/2 (2**32)+1 = -2147483647, a value that appears below). ABIPY then fails to realize it is dealing with a PATH, which crashes the nosetest!

Failure output:

====================================================================FAIL: Test the reading of Kpoints from netcdf files.

Traceback (most recent call last):
File ".../abipy/abipy/core/tests/test_kpoints.py", line 167, in test_reading
self.assertTrue(kpoints.is_path)
AssertionError: False is not true
-------------------- >> begin captured stdout << ---------------------
About to read file: .../abipy/abipy/data/runs/data_si_ebands/outdata/si_scf_GSR.nc
ksampling {'kptopt': None, 'shifts': array([ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36]), 'kptrlatt': array([[ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36],
[ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36],
[ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36]]), 'mpdivs': array([8, 8, 8], dtype=int32)}
About to read file: .../abipy/abipy/data/runs/data_si_ebands/outdata/si_nscf_GSR.nc
ksampling {'kptopt': None, 'shifts': array([ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36]), 'kptrlatt': array([[ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36],
[ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36],

[ 9.96920997e+36, 9.96920997e+36, 9.96920997e+36]]), 'mpdivs': array([-2147483647, -2147483647, -2147483647], dtype=int32)}

@temok-cse
Copy link

Dear All,

I also got some issues during the tests (nosetests)

_warning_: UserWarning: Kpoint weights should sum up to one while sum_weights is 14.000

_warning_: .abinit/abipy/abipyrc does not exist!

_failure_: FAIL: Test the reading of Kpoints from netcdf files (si_scf_GSR.nc)
... pymatgen.util.io_utils.FileLockException

I wonder about the severity of this problems, i.e. whether I can use safely abipy.
I am attaching the full log file of nosetests

With regards,
Temok

#######################
_LOG FILE FOR nosetests_
#######################

OSX
Installation from a source downloaded on 2014.04.14
/sw/bin/python2.7 setup.py install

Dependencies (automatically) installed using:
pip 1.5.4 from /sw/lib/python2.7/site-packages/pip-1.5.4-py2.7.egg (python 2.7)
and/or
/sw/bin/easy_install-2.7 --version
setuptools 3.3

cat nosetests.log

/Users/temok/ApplicationsMine/Installed/abipy-20140420/abipy-20140420/abipy/profile.py:192: UserWarning: Abipy configuration file /Users/temok/.abinit/abipy/abipyrc does not exist!
  warn("Abipy configuration file %s does not exist!" % filename)
.................................../Users/temok/ApplicationsMine/Installed/abipy-20140420/abipy-20140420/abipy/core/kpoints.py:740: UserWarning: Kpoint weights should sum up to one while sum_weights is 14.000
The list of kpoints does not represent a homogeneous sampling of the BZ
<class 'abipy.core.kpoints.IrredZone'>
0) [0.500, 0.000, 0.000], weight = 1.000000
1) [0.417, 0.000, 0.000], weight = 1.000000
2) [0.333, 0.000, 0.000], weight = 1.000000
3) [0.250, 0.000, 0.000], weight = 1.000000
4) [0.167, 0.000, 0.000], weight = 1.000000
5) [0.083, 0.000, 0.000], weight = 1.000000
6) [0.000, 0.000, 0.000], weight = 1.000000
7) [0.000, 0.071, 0.071], weight = 1.000000
8) [0.000, 0.143, 0.143], weight = 1.000000
9) [0.000, 0.214, 0.214], weight = 1.000000
10) [0.000, 0.286, 0.286], weight = 1.000000
11) [0.000, 0.357, 0.357], weight = 1.000000
12) [0.000, 0.429, 0.429], weight = 1.000000
13) [0.000, 0.500, 0.500], weight = 1.000000
  warnings.warn(err_msg)
F.......................................................................................................................
======================================================================
FAIL: Test the reading of Kpoints from netcdf files.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/temok/ApplicationsMine/Installed/abipy-20140420/abipy-20140420/abipy/core/tests/test_kpoints.py", line 167, in test_reading
    self.assertTrue(kpoints.is_path)
AssertionError: False is not true
-------------------- >> begin captured stdout << ---------------------
About to read file: /Users/temok/ApplicationsMine/Installed/abipy-20140420/abipy-20140420/abipy/data/runs/data_si_ebands/outdata/si_scf_GSR.nc
ksampling {'kptopt': None, 'shifts': array([  9.96920997e+36,   9.96920997e+36,   9.96920997e+36]), 'kptrlatt': None, 'mpdivs': array([8, 8, 8], dtype=int32)}
About to read file: /Users/temok/ApplicationsMine/Installed/abipy-20140420/abipy-20140420/abipy/data/runs/data_si_ebands/outdata/si_nscf_GSR.nc
ksampling {'kptopt': None, 'shifts': array([  9.96920997e+36,   9.96920997e+36,   9.96920997e+36]), 'kptrlatt': None, 'mpdivs': array([-2147483647, -2147483647, -2147483647], dtype=int32)}

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 155 tests in 71.960s

FAILED (failures=1)
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/sw/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/sw/lib/python2.7/site-packages/pymatgen/io/abinitio/tasks.py", line 625, in save_lastnode_id
    with FileLock(_COUNTER_FILE) as lock:
  File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 205, in __enter__
    self.acquire()
  File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 184, in acquire
    self.lockfile)
FileLockException: /Users/temok/.abinit/abipy/nodecounter.lock: Timeout occured.
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/sw/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/sw/lib/python2.7/site-packages/pymatgen/io/abinitio/tasks.py", line 625, in save_lastnode_id
    with FileLock(_COUNTER_FILE) as lock:
  File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 205, in __enter__
    self.acquire()
  File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 184, in acquire
    self.lockfile)
pymatgen.util.io_utils.FileLockException: /Users/temok/.abinit/abipy/nodecounter.lock: Timeout occurred.

@temok-cse
Copy link

A follow up:
the last line of my previous post reads:
pymatgen.util.io_utils.FileLockException: /Users/temok/.abinit/abipy/nodecounter.lock: Timeout occurred.
and I'd like to mention that such file
~/.abinit/abipy/nodecounter.lock
exists but it is empty. Is this ok?
Cheers,
Temok

@gmatteo
Copy link
Member

gmatteo commented Apr 21, 2014

Hi Bruno,

Is my analysis below correct?

Yes, the problem is that many ETSF-IO files produced by abinit contain
variables that are defined but not initialized.
The decorator returns_None_onfail tries to detect this problem at run-time
so that abipy knows that the value is not initialized. It works on my Mac and
on other linux architectures but obviously this trick is not portable.

I guess that the problematic part is in the test:

# This trick is needed because in many cases we define the netcdf variable
# but we don't write its value. 
 return value if not isinstance(value, MaskedArray) else None

could you add print(type(value)) before returning so that we know the
kind of object that is returned by netcdf4 on your machine?
If it's a numpy array (and I think so) there's little to do at present.
I have generated the netcdf files on my mac (I don't remember the version of netcdf4
I used) but the error you are reporting clearly shows that there's a portability problem.

The best solution would be to change the code in Abinit so that the netcdf file
contains extra information that allows me to understand if the set of k-points
represent a path or a homogeneous mesh.

@gmatteo
Copy link
Member

gmatteo commented Apr 21, 2014

A follow up:
the last line of my previous post reads:
pymatgen.util.io_utils.FileLockException: /Users/temok/.abinit/abipy/nodecounter.lock: Timeout occurred.
and I'd like to mention that such file
~/.abinit/abipy/nodecounter.lock
exists but it is empty. Is this ok?

Yep, it's normal. nodecounter.lock is an empty file used to prevent other processes
from writing data to nodecounter in order to avoid race conditions.

pymatgen.util.io_utils.FileLockException: /Users/temok/.abinit/abipy/nodecounter.lock: Timeout occurred.

I will try to find another implementation of FileLock that is more portable

@gmatteo
Copy link
Member

gmatteo commented Apr 21, 2014

Could you pip install lockfile
Then you have to add the following line at the end of ~pymatgen/util/io_utils.py

from lockfile import FileLock

in order to redefine the class

Let me know it this solves the timeout issue

@temok-cse
Copy link

Thanks Matteo,

I installed lockfile
/sw/bin/pip install lockfile
blablabla
/sw/bin/pip search lockfile
lockfile - Platform-independent file locking module
INSTALLED: 0.9.1 (latest)
zc.lockfile - Basic inter-process locks python-liblockfile - A wrapper around liblockfile, using ctypes. yg.lockfile - Lockfile object with timeouts and context manager

Then ran the test example plot_bands.py
and got the same behaviour.

Then I added
from lockfile import FileLock
at the very end of
./lib/python2.7/site-packages/pymatgen/util/io_utils.py
and now the thing gets a bit worse:

Typing exit or Ctrl-D or quit freezes the prompt. If then I issue Ctrl-C then it gets released and ipython exits ( a single Ctrl-C does not exits nor freezes ipython).

I am appending the two outputs I got,
Thanks,
Temok

#######################
LOG FILE 1 of 2:
RUNNING plot_bands.py
and QUITTING IPYTHON
( lockfile has been installed to the latest version)
#######################

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/sw/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(_targs, *_kargs)
File "/sw/lib/python2.7/site-packages/pymatgen/io/abinitio/tasks.py", line 625, in save_lastnode_id
with FileLock(_COUNTER_FILE) as lock:
File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 205, in enter
self.acquire()
File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 184, in acquire
self.lockfile)
FileLockException: /Users/temok/.abinit/abipy/nodecounter.lock: Timeout occured.
Error in sys.exitfunc:
Traceback (most recent call last):
File "/sw/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(_targs, *_kargs)
File "/sw/lib/python2.7/site-packages/pymatgen/io/abinitio/tasks.py", line 625, in save_lastnode_id
with FileLock(_COUNTER_FILE) as lock:
File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 205, in enter
self.acquire()
File "/sw/lib/python2.7/site-packages/pymatgen/util/io_utils.py", line 184, in acquire
self.lockfile)
pymatgen.util.io_utils.FileLockException: /Users/temok/.abinit/abipy/nodecounter.lock: Timeout occurred.

#######################
LOG FILE 2 of 2:
AFTER ADDING
from lockfile import FileLock
TO THE END OF
pymatgen/util/io_utils.py
########################

In [2]: ^D   (GETS FROZEN) 
^C 
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/sw/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/sw/lib/python2.7/site-packages/pymatgen/io/abinitio/tasks.py", line 625, in save_lastnode_id
    with FileLock(_COUNTER_FILE) as lock:
  File "/sw/lib/python2.7/site-packages/lockfile/__init__.py", line 229, in __enter__
    self.acquire()
  File "/sw/lib/python2.7/site-packages/lockfile/linklockfile.py", line 49, in acquire
    time.sleep(timeout is not None and timeout/10 or 0.1)
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/sw/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/sw/lib/python2.7/site-packages/pymatgen/io/abinitio/tasks.py", line 625, in save_lastnode_id
    with FileLock(_COUNTER_FILE) as lock:
  File "/sw/lib/python2.7/site-packages/lockfile/__init__.py", line 229, in __enter__
    self.acquire()
  File "/sw/lib/python2.7/site-packages/lockfile/linklockfile.py", line 49, in acquire
    time.sleep(timeout is not None and timeout/10 or 0.1)
KeyboardInterrupt

@gmatteo
Copy link
Member

gmatteo commented Apr 21, 2014

Remove with FileLock in the function save_lastnode_id defined
in pymatgen/io/abinitio/tasks
The new version reads:

def save_lastnode_id():
    """Save the id of the last node created."""
        with open(_COUNTER_FILE, "w") as fh:
            fh.write("%d" % _COUNTER)

This change is OK if you are using abipy just for post-processing the results of the
calculation. Other parts of the code, in particular the API used to generate and
run calculations in a semi-automated way, rely on this lock to avoid race conditions.

@rousseab
Copy link
Contributor Author

Hello Matteo,

could you add print(type(value)) before returning so that we know the
kind of object that is returned by netcdf4 on your machine?

I indeed get numpy.ndarrays, as you predicted.

The numbers I get in the undefined variables are

-2147483647 (int32)

9.96920997e+36 (float64)

Could we use these numbers as a signal of an unset variable, or will

this also not be portable?

Cheers,

Bruno

On Mon, Apr 21, 2014 at 11:50 AM, gmatteo notifications@github.com wrote:

Hi Bruno,

Is my analysis below correct?

Yes, the problem is that many ETSF-IO files produced by abinit contain
variables that are defined but not initialized.
The decorator returns_None_onfail tries to detect this problem at run-time
so that abipy knows that the value is not initialized. It works on my Mac
and
on other linux architectures but obviously this trick is not portable.

I guess that the problematic part is in the test:

This trick is needed because in many cases we define the netcdf variable# but we don't write its value.

return value if not isinstance(value, MaskedArray) else None

could you add print(type(value)) before returning so that we know the
kind of object that is returned by netcdf4 on your machine?
If it's a numpy array (and I think so) there's little to do at present.
I have generated the netcdf files on my mac (I don't remember the version
of netcdf4
I used) but the error you are reporting clearly shows that there's a
portability problem.

The best solution would be to change the code in Abinit so that the netcdf
file
contains extra information that allows me to understand if the set of
k-points
represent a path or a homogeneous mesh.


Reply to this email directly or view it on GitHubhttps://github.com//issues/27#issuecomment-40945930
.

@temok-cse
Copy link

Thanks Matteo,
ipython now has a clean exit.
Cheers,
Temok

@gmatteo
Copy link
Member

gmatteo commented Apr 21, 2014

Could we use these numbers as a signal of an unset variable, or will
this also not be portable?

It won't be portable. In the bug tracker of netcdf4 I found a post that may
be related to this problem

Unidata/netcdf4-python#168

which version of netcdf4 do you have?

import netCDF4
netCDF4.__version__

mine is 1.0.6

@rousseab
Copy link
Contributor Author

Hi,

I have version 1.0.8...

cheers,
Bruon

On Mon, Apr 21, 2014 at 4:10 PM, gmatteo notifications@github.com wrote:

Could we use these numbers as a signal of an unset variable, or will
this also not be portable?

It won't be portable. In the bug tracker of netcdf4 I found a post that
may
be related to this problem

Unidata/netcdf4-python#168Unidata/netcdf4-python#168

which version of netcdf4 do you have?

import netCDF4netCDF4.version

mine is 1.0.6


Reply to this email directly or view it on GitHubhttps://github.com//issues/27#issuecomment-40971599
.

@gmatteo gmatteo self-assigned this Apr 22, 2014
@gmatteo gmatteo closed this as completed Mar 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants