New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1324 netcdf parallel #1388

Merged
merged 49 commits into from Jun 25, 2015

Conversation

Projects
None yet
4 participants
@doutriaux1
Member

doutriaux1 commented Jun 12, 2015

No description provided.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 12, 2015

@dlonie @aashish24 this PR has evolved into something much diff than its original name.

It does turn on PARALLEL capabilities for both hdf5/netcdf4 and also cdms2.

BUT cdms2 // writing is still very much in development, // writing must be done "the right" so while the features are here for experts, the newbie will likely break its code. I will keep exploring the // aspect and improve both cdms2 and documentation on how to use it in this context.

What this PR is really about is
1- Many build fixes on various Lniux, most difficult one being rhea
2- newer NetCDF that gave us 2 order of magnitude speed gain on rhea (along with some configuration tweaks withini rhea)
3: 0d write bug fix in cdms2, string array bug fixes.

all of this is VERY important and high priority for ACME, alsofixes master build on linux rh and ubunutu for me.

possible side effect: loss of dap client (bug in the newer netcdf) I'm investigating this

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 12, 2015

@doutriaux1 one minor request. Can you squash some of the commits such as "getting there" so that history is bit cleaner and its easy to understand your changes. Also, we may need to get rid of building ffi (if not in this branch then in another PR when we update the add_cdat_package and deps cmake macros).

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 12, 2015

@aashish24 found a bug (ctest thank you) so put it in now. Let's keep ffi in, when we you use_system_packages it will just skip it, otherwise people like susannah with no admin won't be able to build it.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 12, 2015

@aashish24 there's only one "getting there" and it's pretty descriptive there. I promise to work even harder on better commit messages in the future. Feel free to squash commits to your liking.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 12, 2015

@aashish24 do you know of a quick tool to squash commits?

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 13, 2015

looking into test1 failure

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 14, 2015

ok last not dap related test passes see mac results at:
https://open.cdash.org/buildSummary.php?buildid=3859039

@allisonvacanti

This comment has been minimized.

Contributor

allisonvacanti commented Jun 15, 2015

Squashing commits: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History

git rebase -i [commitish] is the easiest way. Takes less than a minute for a simple squash.

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 15, 2015

thanks @doutriaux1 , you can use command referred by @dlonie. You can pick a commit that you want to use as a root commit (from that point you want to change history) and then it will show a list of commits something like

pick ff225e6 Fix #1029 - Update ffmpeg from 0.11.1 to 2.7
pick 1e21fec Add whitespace to trigger rebuild

You can change pick to s for squash. You cannot squash top commit in that list if that makes sense.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 15, 2015

@dlonie yep I know that but it requires some thinking 😝 I was hoping for a gui-based once (I know I must be getting old when I ask for a GUI tool...) for less brain power drain.

@allisonvacanti

This comment has been minimized.

Contributor

allisonvacanti commented Jun 15, 2015

It's really quite simple -- run the command, change pick to f(ixup) (or s(quash) depending on whether you want to keep the second commit message), and it's good to go. Won't even have to worry about conflicts unless you rearrange the commits.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 15, 2015

@dlonie it's no problem. Plus it's easier now that I'm at work, got the bigger screens can see multiple windows at once etc... Will squash in a minute.

@doutriaux1 doutriaux1 force-pushed the issue_1324_netcdf_parallel branch from 8b568fe to 9b22d85 Jun 15, 2015

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 15, 2015

@aashish24 is it better now?

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 15, 2015

Its looking much better I think. It could be improved bit more but for now it will work.

doutriaux1 added some commits May 15, 2015

setting up build to enable parallel NetCDF
hdf5 needed to be built with --enable-parallel
hdf5 and pnetcdf needed to be built with -fPIC
netcdf and hdf5 needed to be configured with CC=mpicc hence new cdatmpi_configure_step.cmake file
had to upgrade our netcdf distribution
hdf5 parallel had changed/renamed/removed some functions and wasn't compatible anymore
pyopenssl issues
pyopenssl suddenly refused to build...
had to enable six/cffi/ffi and add pycparser
when file was already here but not readable (like a bad write before)
it would leads to an error, now we remove the file if it's here and 'w'
mode and then create the file.
can't using mpi if shuffle/deflate so turn them off
also turn off classic to ensure using netcdf4
we cannot write mpi if defalte on/off
so if deflate/shuffle are 0 AND classic is 0 we use netcdf4 mpiio

before shuffle and defalte 0 triggered netcdf3

we might want to be able to use NetCDF4 classic format, in which case we
will need to implement a cdms2.useNetCDF4(True) to force using netcdf4
no mattter what
looks like sometimes processors would remove
the file opened by other process after the original (bad) one was removed
thsi seems to fix it

doutriaux1 added some commits Jun 16, 2015

Produces netcdf4 files by default
This may look innocent, but is important change in the way cdms2 works
File will now be produced as NetCDF4 by default even when setting
defalte and shuffle to zero
In order to get NetCDF3 files you need shuffle/defalte/deflatelevel set to 0
you also need netcdf4 set to 0
and you need classic set to 1 (which is the default)

@doutriaux1 doutriaux1 referenced this pull request Jun 18, 2015

Merged

1358 remove auto api #1369

PATCH_COMMAND ${netcdf_PATCH_COMMAND}
CONFIGURE_COMMAND ${CMAKE_COMMAND} -DINSTALL_DIR=<INSTALL_DIR> -DWORKING_DIR=<SOURCE_DIR> -D CONFIGURE_ARGS=${netcdf_configure_args} -P ${cdat_CMAKE_BINARY_DIR}/cdat_configure_step.cmake

This comment has been minimized.

@aashish24

aashish24 Jun 18, 2015

Contributor

Why you took out this?

set(ENV{CC} mpicc)
message("CONFIGURE_ARGS IS ${CONFIGURE_ARGS}")

This comment has been minimized.

@aashish24

aashish24 Jun 18, 2015

Contributor

Please take out this debug messages

This comment has been minimized.

@doutriaux1

doutriaux1 Jun 18, 2015

Member

👍 thanks for spotting and reviewing!

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 18, 2015

@doutriaux1 looks like things are failing in this branch. Do you know why?

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 18, 2015

Successfully updated your environment to use UVCDAT
(changes are valid for this session/terminal only)
Version: 2.2.0-99-gb7f1a56
Location: /home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/build/install
Traceback (most recent call last):
File "/home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/source/testing/vcs/test_vcs_basic_vectors.py", line 25, in
import vcs
File "/home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/build/install/lib/python2.7/site-packages/vcs/init.py", line 32, in
from utils import *
File "/home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/build/install/lib/python2.7/site-packages/vcs/utils.py", line 3, in
import cdtime
ImportError: No module named cdtime

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 18, 2015

not sure works for me on mac(s), rhea, ubnutu.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 18, 2015

@aashish24 I will do a fresh build just to be sure.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 18, 2015

@aashish24 I got it the switch to CMake build seems to have changed where libnetcdf goes... it's now (on my ubuntu at least) under:

./Externals/lib/x86_64-linux-gnu/libnetcdf.so.7
./Externals/lib/x86_64-linux-gnu/libnetcdf.settings
./Externals/lib/x86_64-linux-gnu/libnetcdf.so.7.2.1
./Externals/lib/x86_64-linux-gnu/libnetcdf.so

Will need to go back to mac to double check what happen there

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 18, 2015

ok on my mac it's at the "right" place Externals/lib... Will revert the change for CMake build

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 23, 2015

@doutriaux1 bunch of tests are still waiting. Mind looking into it?

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 23, 2015

@aashish24 I know I fixed this somewhere... Probably didn't push the branch, will push soon.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 24, 2015

@aashish24 seems to pass on garant, no idea what the test failure is about but I don't think it has anything to do with netcdf 😉

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jun 24, 2015

@dlonie I also found another bunch of object that are preciously kept by vcs forever in #1424 fixing this might help you too.

@durack1

This comment has been minimized.

Member

durack1 commented Jun 24, 2015

@doutriaux1 the issue with garant can be fixed with the sleep(1) added in #1393 - a rebase should solve that..

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 25, 2015

thanks @doutriaux1 dashboards are looking pretty good now.

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Jun 25, 2015

👍 I didn't verify the parallel code but in general it looks good to me.

aashish24 added a commit that referenced this pull request Jun 25, 2015

@aashish24 aashish24 merged commit a8d30d9 into master Jun 25, 2015

2 of 3 checks passed

continuous-integration/kitware-buildbot/uvcdat-garant-linux-release/ Build done.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@aashish24 aashish24 deleted the issue_1324_netcdf_parallel branch Jun 25, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment