Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weird cdscan error #70

Closed
doutriaux1 opened this issue Dec 21, 2016 · 33 comments
Closed

weird cdscan error #70

doutriaux1 opened this issue Dec 21, 2016 · 33 comments
Assignees
Labels
Milestone

Comments

@doutriaux1
Copy link
Contributor

for @mcenerney1 and travis systems cdscan chokes on missing_value attribute. No idea why. The same cdscan version on my mac and linux systems work on the file that fails for @mcenerney1

see: https://travis-ci.org/UV-CDAT/uvcmetrics/builds/185660014

it's got the log details.

cdscan itself is easy to fix as shown in @dnadeau4 PR

but we need to understand why this error is triggered only on SOME systems.

in case the travis log disappear here the gist of the error:

2: XML NAME: /home/travis/test_data/cam_output/c_t_b30.009.cam2.h0_csad836dce7d9e4045bcb5c184366d8bc0.xml
2: RUNNNIG CDSCAN
2: CDSCAN RUN ERROR len() of unsized object
2: END ERROR LOG
2:   File "/home/travis/miniconda/envs/travis/lib/python2.7/site-packages/metrics/computation/reductions.py", line 2864, in run_cdscan
2:     cdscan.main(cdscan_line)
2:   File "/home/travis/miniconda/envs/travis/lib/python2.7/site-packages/cdms2/cdscan.py", line 1635, in main
2:     cleanupAttrs(attrs)
2:   File "/home/travis/miniconda/envs/travis/lib/python2.7/site-packages/cdms2/cdscan.py", line 464, in cleanupAttrs
2:     if len(attval)==1:
2: cdscan_line= ['cdscan', '-q', '-x', '/tmp/travis/uvcmetrics/c_t_b30.009.cam2.h0_csad836dce7d9e4045bcb5c184366d8bc0.xml', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-01.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-02.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-03.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-04.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-05.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-06.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-07.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-08.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-09.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-10.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-11.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0600-12.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-01.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-02.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-03.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-04.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-05.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-06.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-07.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-08.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-09.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-10.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-11.nc', '/home/travis/test_data/cam_output/c_t_b30.009.cam2.h0.0601-12.nc']
@mcenerney1
Copy link
Contributor

It looks like your travis_fix almost worked. diags_test_06 & meta_diags failed. the rest passed.

@mcenerney1
Copy link
Contributor

Is it possible that there is some cache file or build directory that needs to be deleted?

@doutriaux1
Copy link
Contributor Author

no it's bad... really really bad... scary bad...

@doutriaux1
Copy link
Contributor Author

in short the regridder and masking appears to produce different results on your system and travis systems than on any other systems we tested things on. I'm suspecting NC4. But why? I even used travis' won VMs and couldn't reproduce your behavior. IT has to be env variable related...

@mcenerney1
Copy link
Contributor

YIKES!

@williams13
Copy link

williams13 commented Dec 22, 2016 via email

@mcenerney1
Copy link
Contributor

I've been plagued with a cdscan type problem and Travis's cite has had problems running the metrics test. Charles has been tracking this one and it "appears" they are the same problem. The crazy thing is that it works fine everywhere else. My system has been scrubbed and it still has issues.

@doutriaux1
Copy link
Contributor Author

we have no idea, that's what's scary. somehow suddenly on travis (and on Jim's machine) cdscan is following a different path. Something with missing_value. It was easy to fix but now the regridder ends up being a bit different more masking. I'm suspecting this is due to missing value being handled a bit differently with the cdscan generated. I tried using travis VMs via docker but I can't reproduce the error there either. I'm suspecting a env issue. We have to figure out why cdscan behaves differently on these two systems...

@doutriaux1
Copy link
Contributor Author

@mcenerney1 could you yank your anaconda all together and try installing 2.6 not 2.8 and see if the problem persists.

@mcenerney1
Copy link
Contributor

reinstalled anaconda and created new env 2.6_1 with
conda create -n 2.6_1 -c conda-forge -c uvcdat uvcdat=2.6.1
getting
ImportError: No module named vtkGeovisCorePython
What's missing? vtk is
vtk 7.1.0.2.6 uvcdat_master uvcdat

@zshaheen
Copy link
Contributor

@mcenerney1 Dont use the conda-forge channel when installing 2.6.1.

@mcenerney1
Copy link
Contributor

With
conda create -n 2.6_1 -c uvcdat uvcdat=2.6.1
I get import error

import cdms2
Traceback (most recent call last):
File "", line 1, in
File "/Users/mcenerney1/anaconda/envs/2.6_1/lib/python2.7/site-packages/cdms2/init.py", line 17, in
from cdmsobj import CdArray, CdChar, CdByte, CdDouble, CdFloat, CdFromObject, CdInt, CdLong, CdScalar, CdShort, CdString
File "/Users/mcenerney1/anaconda/envs/2.6_1/lib/python2.7/site-packages/cdms2/cdmsobj.py", line 5, in
import cdmsNode
File "/Users/mcenerney1/anaconda/envs/2.6_1/lib/python2.7/site-packages/cdms2/cdmsNode.py", line 10, in
import cdtime
ImportError: dlopen(/Users/mcenerney1/anaconda/envs/2.6_1/lib/python2.7/site-packages/cdtime.so, 2): Library not loaded: @rpath/libjpeg.9.dylib
Referenced from: /Users/mcenerney1/anaconda/envs/2.6_1/lib/libjasper.1.0.0.dylib
Reason: image not found

@durack1
Copy link
Member

durack1 commented Jan 3, 2017

@doutriaux1 @dnadeau4 it also seems a similar issue is now happening with the systematic xml generation on crunchy - it would appear to be a 2.8 specific issue, as 2.6 seems to work:

So 2.8:

(uvcdat) bash-4.1$ /usr/local/anaconda2/envs/2.8/bin/cdscan -x /export/durack1/Desktop/test2p8.xml /cmip5_css02/data/cmip5/output1/CSIRO-BOM/ACCESS1-0/1pctCO2/fx/ocean/fx/r0i0p0/sftof/1/*.nc
Finding common directory ...
Common directory: /cmip5_css02/data/cmip5/output1/CSIRO-BOM/ACCESS1-0/1pctCO2/fx/ocean/fx/r0i0p0/sftof/1/
Scanning files ...
/cmip5_css02/data/cmip5/output1/CSIRO-BOM/ACCESS1-0/1pctCO2/fx/ocean/fx/r0i0p0/sftof/1/sftof_fx_ACCESS1-0_1pctCO2_r0i0p0.nc
Traceback (most recent call last):
  File "/usr/local/anaconda2/envs/2.8/bin/cdscan", line 1681, in <module>
    main(sys.argv)
  File "/usr/local/anaconda2/envs/2.8/bin/cdscan", line 1635, in main
    cleanupAttrs(attrs)
  File "/usr/local/anaconda2/envs/2.8/bin/cdscan", line 464, in cleanupAttrs
    if len(attval)==1:
TypeError: len() of unsized object

And 2.6:

(uvcdat) bash-4.1$ /usr/local/anaconda2/envs/2.6/bin/cdscan -x /export/durack1/Desktop/test2p6.xml /cmip5_css02/data/cmip5/output1/CSIRO-BOM/ACCESS1-0/1pctCO2/fx/ocean/fx/r0i0p0/sftof/1/*.nc
Finding common directory ...
Common directory: /cmip5_css02/data/cmip5/output1/CSIRO-BOM/ACCESS1-0/1pctCO2/fx/ocean/fx/r0i0p0/sftof/1/
Scanning files ...
/cmip5_css02/data/cmip5/output1/CSIRO-BOM/ACCESS1-0/1pctCO2/fx/ocean/fx/r0i0p0/sftof/1/sftof_fx_ACCESS1-0_1pctCO2_r0i0p0.nc
/export/durack1/Desktop/test2p6.xml written

@painter1
Copy link
Contributor

painter1 commented Jan 3, 2017

@durack1 , the cleanupAttrs() error is described, along with a fix, in issue CDAT/cdat#2145. I think that @dnadeau4 fixed the problem about two weeks ago. There are more problems with cdscan.

@durack1
Copy link
Member

durack1 commented Jan 3, 2017

@painter1 thanks for the heads up.. The 2.8 version above is bombing on most of the CMIP5 data, so whatever the issue is, it should be solved pronto.. I'd be happy to test this so I can get my own code back up and running.. It'd also be good to test cdscan against a bunch of the CMIP5 files to make sure a similar issue doesn't recur in the future

@PeterCaldwell
Copy link

Thanks for the effort on this. I'd also like to see a solution actually made operational ASAP because one of my projects is totally stalled out until cdscan works on CMIP5 data again.

@doutriaux1
Copy link
Contributor Author

@durack1 @PeterCaldwell so this has been fixed since before XMas. @durack1 I need to update crunchy still though. So we're good. I have a branch with @painter1 additional fix. Once I added the test to the branch I will merge in master and update crunchy, unless @durack1 needs this asap on crunchy

@doutriaux1
Copy link
Contributor Author

as far the bad error, different number and all goes, I wasted a few days tracking this beforeI realized this branch had bad baselines in it... Hence the different number when I was testing my branch on travis... duh...

@durack1
Copy link
Member

durack1 commented Jan 4, 2017

thanks @doutriaux1 this issue has been considerably complicated by changes to the permissions on the cron jobs, I presume implemented by a network bot.. This will need tweaking in addition to the update of cdscan/uvcdat on the machine

@durack1
Copy link
Member

durack1 commented Jan 4, 2017

@PeterCaldwell @doutriaux1 I have just updated the cron job to run against the 2014-03-31 version of UV-CDAT/cdscan so as long as there are no other system issues it should lead to a successful xml run that completes late next week

@PeterCaldwell
Copy link

Thanks @durack1 ! So it sounds like you've just reverted back to an old version of cdat until @doutriaux1 can rebuild and verify the bugfixed version on crunchy? I'm cool with that.

@dnadeau4
Copy link
Contributor

dnadeau4 commented Jan 4, 2017

@durack1 your problem has been solved and is in master.

#69

@durack1
Copy link
Member

durack1 commented Jan 4, 2017

@PeterCaldwell yep, my approach to this stuff.. If it works.. And I have no intention of ever changing it again, unless of course somehow IT fiddling changes everything around again.. I do hope it solves the issue, we'll find out next week

@PeterCaldwell
Copy link

Thanks guys! Fingers crossed.

@durack1
Copy link
Member

durack1 commented Jan 9, 2017

@doutriaux1 it seems that spawned processes aren't inheriting the environment (and the cdscan path) from the parent.. So if you can update the UV-CDAT install and the file found at /usr/local/uvcdat/latest/bin/cdscan that should solve the problem..

@durack1
Copy link
Member

durack1 commented Jan 9, 2017

@doutriaux1 did you update the crunchy UV-CDAT installation? An updated installation will hopefully solve my persistent problem

@doutriaux1
Copy link
Contributor Author

@durack1 try again now.

@durack1
Copy link
Member

durack1 commented Jan 9, 2017

@doutriaux1 thanks, I've kicked it off again I'll check back in an hour and see if xmls are being written with the update..

@durack1
Copy link
Member

durack1 commented Jan 10, 2017

@doutriaux1 still looks like there is a problem..

@durack1
Copy link
Member

durack1 commented Jan 10, 2017

@doutriaux1 looks like your latest cdscan is pointing to a directory 2017-01-09-nox that is not accessible

$ ls -al /usr/local/anaconda2/envs
total 56
drwxrwxrwx 17 doutriaux1 climate 4096 Jan  9 13:24 2017-01-09
drwx------ 15 doutriaux1 climate 4096 Jan  9 13:32 2017-01-09-nox

lrwxrwxrwx  1 root       root      10 Jan  9 13:35 latest -> 2017-01-09
lrwxrwxrwx  1 root       root      14 Jan  9 13:35 latest-nox -> 2017-01-09-nox
[durack1@crunchy cmip5]$ head -n 10 /usr/local/anaconda2/envs/latest/bin/cdscan
#!/usr/local/anaconda2/envs/2017-01-09-nox/bin/python

import sys

@durack1 durack1 reopened this Jan 10, 2017
@doutriaux1
Copy link
Contributor Author

what do you mean not accessible? It's 777. I'll rechmod the whole the whole /usr/local/anaconda2 just to be safe.

@durack1
Copy link
Member

durack1 commented Jan 10, 2017

@doutriaux1 I changed the perms on 2017-01-09-nox last night to get myself up and running.. My point was that the executables that include a shebang (e.g. cdscan) located in the 2017-01-09 install are actually pointing to 2017-01-09-nox.

This is bad, and should be fixed

@durack1 durack1 reopened this Jan 10, 2017
@dnadeau4
Copy link
Contributor

@durack1 the problem was solved with #69 and was merged into master. Look at line 464 and you will see that the 0dim variables are now read correctly. I think you "nox" problem should be another ticket.

@doutriaux1 doutriaux1 modified the milestone: 2.10 May 5, 2017
@doutriaux1 doutriaux1 added the bug label May 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants