-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grib filtering really slow #640
Comments
Here's a log of a ece2cmor run. |
Hi Jukka-Pekka, First some questions: So how long does the cmorisation of 1 year of IFS data take? Are you cmorising model level data? I guess no? If no, did you set The current last release of |
|
Hi @jpkeskinen what kind of storage is you raw output on? It's not some network mounted disc right? |
Hi Gijs, The raw EC-Earth output is on our scratch. I couldn't find any further specifics other than it is a "Lustre parallel storage system". |
Can you do a speed test and copy a year of data from your model output location to your temporary data folder (the one you specify in ece2cmor3 invocation)? |
hmm strange, lustre should work fine. The filtering is the slow part of the processing, but it shouldn't take more than 15 min. for standard resolution IFS on lustre... but do the copy of the grib files just to be sure it's not an issue with your storage please |
I've tried two different locations for the temporary data: the scratch and the local scratch. The latter is supposed to be faster and I have currently ece2cmor running (almost 4 days gone and still filtering the grib files) using that. Copying a year of raw ifs data takes 17 min 43.200 s to the regular scratch and 8 min 28.244 s to the local scratch. |
From your log I see the post-processing and cmorization also takes abnormally long time. Which machine are you running on? |
The computer is CSC's Puhti. It's an Atos cluster with Intel Xeon processors (Cascade Lake) at 2,1 GHz. |
Hi @jpkeskinen maybe it is something with the conda installation? Can you use the conda-installed cdo to do a simple task (select some grib code and do e.g. a spectral transform) with this? So in your submit script, first activate the ece2cmor conda environment and then It may also be you are using too many threads, but that shouldn't affect the filtering too much, which uses only 2 threads |
The cdo command gives me an error:
|
I managed to work around the problem with grib_filter (thanks to Tommi). I now have two grib files, one presumably for model levels and the other for pressure levels. The cdo command takes about 1 second for the pressure level file and about 2 minutes for the model level file. |
in the end what was the solution? It might be helpful for others in the same situation |
The original problem is still there. I was just referring to the cdo test |
I have the same problem as Jukka-Pekka, both on puhti and on cca at ECMWF. On cca, even when modifying only the paths in ece2cmor/scripts/submit_script_examples/submit-at-cca-ece2cmor-leg-job.sh, which Thomas says works for him, and specifying --skip-alevel-vars (for test purposes), I still get the same exponentially growing execution times in grib_filter. |
@plesager you are cmorising similar ECE-AerChem results on cca with reasonable speed, right? |
Ok it seems I broke the performance of the grib filter during previous release... I will have a look at it tomorrow |
Cmorizing one year of IFS (including model levels) with 1.4, I have:
Because I cmorize leg N-2 when running leg N, I've not paid too much attention to the performance (as long as cmorization is faster than EC-Earth I'm happy) and I do not have the numbers from the previous set of runs I did with 1.3 (I vaguely remember to be closer to 3 hours for one leg of IFS). |
Perhaps @Declan-FMI and @jpkeskinen you are using too few or too many threads? Can you post the submission script? |
The 13 day run was done with a single process. This was because I was unable to get any clear advantage on using more. For this reason I suspect I am doing something wrong regarding the parallelisation of ece2cmor. I am not that familiar with threads but I am trying different things. The submission script I used for the currently running jobs (5 and ½ days gone and still filtering the grib files) is here: |
Hi @jpkeskinen what happens if you omit the '#SBATCH -n 1" line? This may confuse the workload manager, it can deduce from your -c option how many resources to allocate. You should give ece2cmor3 at least 2 processes because that allows the grib filter to process the gridpoint file and spectral file in parallel. It the problem persists you will need to profile the application on your system, it will be an interesting exercise! |
Here are the scripts I used on cca / puhti. I have tried cutting the number of cores, increasing the memory per core, using eccodes instead of grib-api, reverting to ece2cmor1.3.0 (yes I know it is not compatible with ec-earth 3.3.2.1), none of those helped.
|
I can try this next and I'll report back what happens. However, I'm starting my holidays today and will be back in August. Thank you for the help so far! |
@plesager can you spot the difference with your setup? Are you sure you are actually running the 1.4 version? @jpkeskinen I am too leaving for holidays (now actually). When I get back the 10th I will have a look at the issue at ECMWF. @Declan-FMI I propose you make your output files (or at least one leg) at ECMWF readable to all so I can run the cmorization myself and profile the application. You can also try to use the parallel queue and see if it is faster. I can start on this when I get back i.e. Monday 13th. |
Output files under |
I confirm that I'm using 1.4 on cca. From the log:
|
That |
I have overwritten the generated script a few times today and it is currently running in an effort to profile the grib_filter part of ece2cmor3. So the script below differs a little from what was generated by the script I posted earlier (just added some stuff to set up extrae, module load papi extrae and the 6 lines below it). |
Ok, so another difference, but again I doubt it chances anything. I do not have:
|
No, that last one I have tried with/without, makes no difference whatsoever. Unsurprisingly, since OpenMP does not enter into the equation at any point. |
Still no progress. On cca, Philippe has kindly shared his post-processing script, but despite using this with minimal adaptation, and installing the same conda version as Philippe is using (4.7.12), I still get these huge increases in grib_filter times. At last attempt, the second ICMSH file took 1 hour. |
OMP_NUM_THREADS should not make any difference, ece2cmor3 uses subprocesses, not openmp threads. I will have a look this afternoon Declan |
I see the same bottleneck Declan. I believe it is the model levels in the raw output, Philippe do you have any of those? I am working on a solution to 'fast-forward' through model level blocks if they aren't requested. |
Yes, you can check the output in: |
The main issue here seemed to be memory cleanup bug in the calls to grib api. I have also included a 'fast-forward' feature to skip model level blocks in the input grib files if not requested. I am doing the final testing now in the filter-fast-forward branch. |
My tests seem to be ok, @jpkeskinen and @Declan-FMI can you test the filter-fast-forward branch? |
Now running on cca and puhti (Finnish supercomputer centre Bull machine). Let's see...if I understand rightly, this should speed up cmorization where model level data is not requested, but if it is requested we should not expect to see any difference. Or did I get something wrong ? |
The memory issue I solved should make everything faster. The fast-forward feature should help when model-level data is not requested (--skip-alevel-vars) but is present in your raw grib files. You can monitor the filtering a bit by just listing all file sizes in the temp directory btw. |
It is clear even from the tail of the log files that this is running much faster now. Both on puhti and cca. |
cmorization appears to have successfully completed on both puhti and cca within very acceptable time. I haven't made any detailed check of the results (which is why I wrote 'appears to'), but at least the present issue looks to be solved. |
I got identical results with the |
Tests have been fixed, branch has been merged (squash) |
Hi,
I'm trying to cmorise the AerChemMIP historical run but the processing of the ifs output takes a lot of time. The grib filtering part seems to be the main problems with the last task taking about 41 hours. The whole ece2mor for ifs takes about 13 days. Also, the time required by the grib filtering appears to increase during the process.
Our computing centre is guessing that the problem could be due to the parallelisation as currently ece2cmor runs only on one core in our machine. Which parts of ece2cmor are supposed to work in parallel?
I am currently using version 1.5. I used version 1.4 earlier and it worked much faster with (almost) the same data.
Any ideas on how to get ece2cmor work a bit faster?
The text was updated successfully, but these errors were encountered: