-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception ("merge: can't read model file !") in mergemod.c #34
Comments
Hi Pavel, Assuming that you used 16 CPU cores for the parallelization with domain decompositon, the remaining cores are used for shot parallelization. How many shots are you modelling in total? Are they dividible by 24 without any remainder? Does the problem also occur when using less cores for the shot parallelization, or in the extreme case only using the domain decomposition? Best regards, Daniel |
Hello Daniel, This exception is very rare, I don't get it for other model size and number of shots. 20320209ws_fwi_3_strategy_51_Overthrust_true.err.txt |
Hi Pavel, I have the suspicion, that one problem when using shot parallelization might be, that non-merged model files are removed in https://github.com/daniel-koehn/DENISE-Black-Edition/blob/master/src/PSV/model_it_out_PSV.c Try to comment or delete all remove() functions in model_it_out_PSV.c and recompile the source code, before running the code again. If this is indeed the issue, similar problems will occur in gauss_filt.c and gauss_filt_var.c Best regards, Daniel |
Ok, thanks Daniel. I recompiled the code and the problem still occurs on the same velocity model. Though on other models it is not happening. PE 0 is writing model to writing merged model file to ./fwi/ws_fwi_3_strategy_55/Overthrust_true/fld/model/modelTest_rho_stage_1_it_10.bin |
Hello, in my experience setting Nprocx and Nprocy helps to get rid of this error. |
Increasing stringsize variable in fd.h file helped. |
That makes sense. If the stringsize of the model name and directory are longer than the pre-defined maximum stringsize in fd.h, the numbering of the domain decomposition might be missing in the file name extension of the model files. Therefore, the mergemod function will fail to merge the model files from the different sub-domains correctly. Thank you for finding this bug, Pavel. |
Yes, Daniel. |
Sometimes, during my using of Denise PSV I get following error ("merge: can't read model file !") in mergemod.c.
What can be the reasons for this?
I am using 12 nodes 32 cpu each. NPROCX=4,NPROCY=4
**Message from mergemod (printed by PE 0):
PE 0 starts merge of 16 model files
writing merged model file to ./fwi/ws_fwi_3_strategy_51/Overthrust_true/fld/model/modelTest_vs_stage_1_it_10.bin
Opening model files: ./fwi/ws_fwi_3_strategy_51/Overthrust_true/fld/model/modelTest_vs_stage_1_it_10.bin.??? ... finished.
Copying... ... finished.
Use
ximage n1=384 < ./fwi/ws_fwi_3_strategy_51/Overthrust_true/fld/model/modelTest_vs_stage_1_it_10.bin label1=Y label2=X title=./fwi/ws_fwi_3_strategy_51/Overthrust_true/fld/model/modelTest_vs_stage_1_it_10.bin
to visualize model.
PE 0 is writing model to
./fwi/ws_fwi_3_strategy_51/Overthrust_true/fld/model/modelTest_rho_stage_1_it_10.bin.0.0
**Message from mergemod (printed by PE 0):
PE 0 starts merge of 16 model files
writing merged model file to ./fwi/ws_fwi_3_strategy_51/Overthrust_true/fld/model/modelTest_rho_stage_1_it_10.bin
Opening model files: ./fwi/ws_fwi_3_strategy_51/Overthrust_true/fld/model/modelTest_rho_stage_1_it_10.bin.??? Message from PE 0
R U N - T I M E E R R O R:
merge: can't read model file !
...now exiting to system.
-rw-r--r-- 1 plotnips k1404 0 May 19 22:17 modelTest_rho_stage_1_it_10.bin
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.0.0
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.0.1
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.0.2
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.0.3
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.1.0
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.1.1
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.1.2
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.1.3
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.2.0
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.2.1
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.2.2
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.2.3
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.3.0
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.3.1
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.3.2
-rw-r--r-- 1 plotnips k1404 90K May 19 22:17 modelTest_rho_stage_1_it_10.bin.3.3
The text was updated successfully, but these errors were encountered: