Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolating Issue #146

Open
ACDylan opened this issue Sep 27, 2021 · 14 comments
Open

Interpolating Issue #146

ACDylan opened this issue Sep 27, 2021 · 14 comments
Assignees

Comments

@ACDylan
Copy link

ACDylan commented Sep 27, 2021

Hi - I have my High Resolution "mother simulation" ; however when I runned a snapshot with pdd, it was still running after 3 days. By canceling it, the job script gives me:

image
image

The second image being where the simulation stopped.

Is it because of a parameter?

@dnarayanan
Copy link
Owner

can you run powderday on this snapshot interactively? does it hang at some point if you do?

@ACDylan
Copy link
Author

ACDylan commented Sep 27, 2021

By 'interactively', you mean running with the terminal console and not a job?
If so, it blocked my terminal after the first line
Interpolating (scatter) SPH field PartType0: 0it [00:00, ?it/s], the latter running indefinitely.

@dnarayanan
Copy link
Owner

hmm interesting. how many particles is the snapshot? this seems to be hanging in yt (though I've never seen it take 3 days to deposit the octree before).

in a terminal how long does this take to finish running (i.e. does it ever finish?)

import yt
ds = yt.load(snapshotname)
ad = ds.derived_field_list

@ACDylan
Copy link
Author

ACDylan commented Sep 27, 2021

PartType0: 13,870,234
PartType1: 10,000,000
PartType2: 10,000,000
PartType3: 1,250,000
PartType4: 1,584,425

>>> ad = ds.derived_field_list
yt : [INFO     ] 2021-09-27 22:18:46,988 Allocating for 3.670e+07 particles
yt : [INFO     ] 2021-09-27 22:18:46,988 Bounding box cannot be inferred from metadata, reading particle positions to infer bounding box
yt : [INFO     ] 2021-09-27 22:18:50,997 Load this dataset with bounding_box=[[-610.44433594 -612.21533203 -614.03771973], [616.07244873 612.08428955 614.15777588]] to avoid I/O overhead from inferring bounding_box.
Loading particle index: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 53/53 [00:00<00:00, 371.52it/s]

Around a second to load it.
I can try again to run a simulation in a terminal.

Edit: Maybe this is coming from
yt : [INFO ] 2021-09-20 22:32:40,241 Octree bound 31193650 particles

I don't know why is there so much particles. At least, gizmo's snapshot simulation have around 1 million Octree particles, here it is 31M.

@ACDylan
Copy link
Author

ACDylan commented Sep 30, 2021

My lab gave me a zoom-in simulation (while previous simulation is still processing, I have increased the number of cores) and as you can see, it also takes a lot of time for the interpolation.

image

I'll keep you informed!

@dnarayanan
Copy link
Owner

are there any updates for this, or shall I close the issue?

@aussing
Copy link

aussing commented Sep 5, 2022

Hi @ACDylan and @dnarayanan, I'm trying to run Powderday on Gadget-4 HDF5 snapshots and I've got the same issue, was there a solution for this?

@dnarayanan
Copy link
Owner

Hi - hmmm no I never heard from @ACDylan again so I'm not sure what the issue is.

@aussing do you have a snapshot that you can easily share so that I can play with it and see if I can get to the bottom of this? also please let me know what powderday and yt hash you're on.

thanks!

@dnarayanan dnarayanan self-assigned this Sep 6, 2022
@aussing
Copy link

aussing commented Sep 7, 2022

Here is a dropbox link to the snapshot file: https://www.dropbox.com/s/54d8hlu54ojf16d/snapshot_026.hdf5?dl=0
It's 5.7GB, but I can find a smaller snapshot file if need be.

The Powderday hash is 2395ae7,
and I installed yt through conda, I'm using version 4.0.5 and the build number is py38h47df419_0. To get a hash I used conda list --explicit --md5 which returned df416a6d0cabb9cc483212f16467e516

@aussing
Copy link

aussing commented Sep 20, 2022

Hi @dnarayanan, I've discovered something that may or may not be related but running Powderday on our HPC system with Slurm only runs on 1 CPU, even when I requested 16 and specified 16 in the Parameters_master file

@dnarayanan
Copy link
Owner

Hi - I'm guessing that this actually has to do with how this is being called on your specific system.

are you setting 16 as n_processes or n_MPI_processes ? it looks like it's getting stuck in a pool.map stage, which would correspond to the former.

@aussing
Copy link

aussing commented Sep 20, 2022

Both were set to 16

@dnarayanan
Copy link
Owner

Hi,

I wonder if the issue is actually how you're calling the slurm job. Here'a an example for a job where I'm calling 32 pool, 32 MPI:

#! /bin/bash
#SBATCH --account narayanan
#SBATCH --qos narayanan-b
#SBATCH --job-name=smc
#SBATCH --output=pd.o
#SBATCH --error=pd.e
#SBATCH --mail-type=ALL
#SBATCH --mail-user=desika.narayanan@gmail.com
#SBATCH --time=96:00:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=32
#SBATCH --mem-per-cpu=7500
#SBATCH --partition=hpg-default

you may want to contact your sysadmin to find out the best slurm configuration to see if this can be resolved on the side of your HPC.

@aussing
Copy link

aussing commented Oct 10, 2022

Hi @dnarayanan, I'm still not sure why the code is only running on one CPU, but as far as the original interpolating issue, I solved it by setting n_ref to 256 instead of the default 32.

I ran into a separate issue where I got 'WARNING: photon exceeded maximum number of interactions - killing [do_lucy]' in the pd.o file, but I'm able to get around that by setting SED = False.

Edit: the photon interaction warning seems to come up with several different parameters turned on while keeping SED = False, I'm trying to track that down at the moment. -> Also setting Imaging = False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants