Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling a very big file with nbodykit #646

Open
boryanah opened this issue Nov 5, 2020 · 6 comments
Open

Resampling a very big file with nbodykit #646

boryanah opened this issue Nov 5, 2020 · 6 comments

Comments

@boryanah
Copy link

boryanah commented Nov 5, 2020

Hi Yu Feng,

I am trying to resample a mesh stored in a bigfile from 2304^3 (40 GB) to 1152^3 and save it again to a bigfile mesh using a parallel script (on NERSC).

So far I have tried doing the following, granting the task 512 GB spread across 16 nodes, but got an out of memory error. Here is the script:

mesh = BigFileMesh(dens_dir+"density_%d.bigfile"%N_dim, mode='real', dataset='Field')
print("loaded bigfile")
mesh.save(dens_dir+"density_%d.bigfile"%N_dim_new, mode='real', dataset='Field', Nmesh=N_dim_new)

(The failure is after loading the bigfile. Also I have very very slightly modified the save function in the source code to allow the user to specify the new downsampled mesh-size Nmesh_new by adding Nmesh=Nmesh to the following line in the source code self.compute(mode,Nmesh=Nmesh) in the save function and have tested through a simple example that it works)

The failure of this line makes me think that the save step is perhaps not parallelized or perhaps that when resampling (i.e. Nmesh is not equivalent to the Nmesh of the original mesh) the self._paint_XXX function isn't.

A new approach that I intend to pursue is loading the bigfile mesh and converting it to a complex field through the paint command (as I believe that uses pfft) and then zeroing all modes that are higher than pi Nmesh_new/Lbox before reverting back to real space through the paint command (and saving to a bigfile the downsampled field). Is that approach parallelized? And if not what functions can I apply to make it so?

My nbodykit version is 0.3.15

@rainwoodman
Copy link
Member

Hi,

The current resampling algorithm in pmesh is doing more or less what you wanted to do. It calls to https://github.com/rainwoodman/pmesh/blob/f8f1d78eb22027fbcf71345d382477cea25ab9e3/pmesh/pm.py#L479

What you do appears to be correct, but the memory is actually bit tight.

The resampling algorithm does use a lot of memory -- it wasn't very optimized. It was originally written to handle 128/256 meshes under extremely low memory load (in the over-decomposed regime) I think.

When I scanned the code in pmesh,, I see scratch spaces for r2c (1x), for the indices (2 to 4 long integers, 4x or 8x of floats), for global sorting (2x-ish), and c2r(1x), and output space (1x), and some masks that can probably be ignored.

So the 512 GB appears to be a bit tight. (as 13 * 40G = 520G).

As usually more nodes = faster processing time, if you can get 32 nodes, then give it a try. We can also try to curb the memory usage of resample, it is likely some of the objects can be eagerly freed with del, for example.

Another reason to run into OOM is that if you use too many ranks such that 1152x1152 is not divided by the process mesh; in that case some ranks will not receive any part of the mesh. This seems to be unlikely as you only have 16 nodes (and potentially 16*64 = 32x32 ranks; how many ranks did you run the script with?), and 1152 = 128 * 9.

@boryanah
Copy link
Author

boryanah commented Nov 6, 2020

Thank you -- that's good to know!

I am currently using I believe 64 ranks (2 tasks per node and 32 nodes):

#!/bin/bash -l                                                                                                                                                   
#SBATCH --tasks-per-node=2                                                                                                                                       
#SBATCH --nodes=32                                                                                                                                               
#SBATCH --mem=16GB                                                                                                                                                                                                                                                                                        
#SBATCH -q debug                                                                                                                                                 
#SBATCH -t 00:05:00                                                                                                                                              
#SBATCH -C haswell                                                                                                                                               
#SBATCH -J test-fields                                                                                                                                           
#SBATCH --mail-user=boryana.hadzhiyska@cfa.harvard.edu                                                                                                           
#SBATCH --mail-type=ALL                                                                                                                                          
conda activate desc

which mpirun

echo $SLURM_NTASKS

mpirun -n $SLURM_NTASKS python obtain_fields.py 

@rainwoodman
Copy link
Member

I am not super familiar with the desc environment.

Could you do a conda list? Is the the environment you built?
mpirun on NERSC may not actually go across multiple nodes.

Did you check if the result of these commands make sense to you?

mpirun -n 64 hostname

and

mpirun -n 64 python -c 'from mpi4py import MPI; print(MPI.COMM_WORLD.rank)'

(A year ago they were giving unexpected results)

Running on NERSC computing nodes with MPI, I usually use the bcast based environment provided by m3035:

source /usr/common/software/m3035/conda-activate.sh 3.7
# optionally, add
# bcast-pip [path_to_sdist_zip_file_of_additional_package.zip ...]

srun -n 64 python obtain_fields.py

@boryanah
Copy link
Author

Hi Yu Feng,

I will attempt to follow those instructions in more detail if I run into more issues in the future, but for now it seems like mpirun is working properly across nodes and outputs different host names!

I was more worried about the question of whether the code is properly parallelized: e.g. when I do manipulations on a bigfile mesh such as mesh.paint('real'), mesh.apply(filter), ArrayMesh(numpy_array), FFTPower(mesh), and mesh.save()? I am slightly confused by the documentation about which functions are and are not parallelized, but I may also be misreading!

Cheers,
Boryana

@rainwoodman
Copy link
Member

rainwoodman commented Nov 12, 2020 via email

@boryanah
Copy link
Author

Thanks a lot for your help! This is all very useful to know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants