-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in History #1793
Comments
I'm assigning/pinging @bena-nasa, @atrayano, and @tclune as they probably have insights to memory leaks in History. Perhaps I can work with @bena-nasa to use one of his tools to do history without needing to run full GEOS? |
@lizziel I see what you mean, if it this obvious as it looks in those logs you provided I would think we would see this in GEOS runs, I'll plan on scheduling time to do some tests and see what I can see on our end. |
Thanks @bena-nasa. For those tests I turned on many diagnostics. I also did a passive tracer run which had much fewer diagnostics fields. I still saw the memory leak but it was much smaller. Best turn on many diagnostic fields in your test. If you test with gfortran you could also use the sanitize compiler flag option to hone in on the issues. The PR I linked to above shows the changes @yantosca made to our cmake files to build with those flags (GC-Classic not GCHP). If you can build MAPL with the flags on then your run should point you to the problem. Ideally this would be an option that could be done regularly to detect leaks. |
@mathomp4 @lizziel
and with History on:
So with History on or off I saw a the same small creep in memory used by GEOS. I can play around with my simplified History/ExtData driver and see if I see anything but at least based on this, it seems to be independent of History being on or off |
Would you be able to add GCC option -fsanitize=address to quickly hone in on this? It would be helpful if that option was built into the build system to easily tap into when using GCHP as well. |
@lizziel Well, I tried it out yesterday and the model just said "No". It wouldn't even start up GEOSgcm.x (or maybe it takes more than 10 minutes to start?) I might have to enlist @bena-nasa to see if he can try it out with his application that can just run history. Note that adding the flags wouldn't be hard, though. Just have to figure out the best way to get them in. |
This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue. |
Has there been any development in diagnosing memory leaks in GEOS since I opened this issue? We still see memory creep in GCHP runs. I wonder if this might be addressed in MAPL3. |
@lizziel The last few months I was preparing GEOS for a nature run, and in the process I found and fixed many memory related issues, including memory leaks. Right now my 4-day run at 3km horizontal resolution (c2880) and 137 vertical levels has a fairly flat memory profile. So far I did not find a "smoking" gun in History (for this run I am using a single 2-d collection, on a 11520x5761 lat-lon hourly output. If you want to try my version, everything, including MAPL, has been tagged with "AIST-nr_v1.0". In addition, at some point I started to suspect that memory fragmentation plays a role, and experimented with a different memory allocator (jemalloc). I was very impressed by it. At this resolution, it saved about 20% of the committed memory (this was on our new, 128-cores-per-node AMD Milan chip machine). |
Just to be clear, we have no evidence that jemalloc has any impact on leaks. |
Okay, thanks for the update. I will close this issue and create a new one if we see memory leaks following the update to MAPL3. |
I am looking into memory leaks in GCHP and suspect there is a memory leak occurring in the MAPL History component. The memory jumps every time History output files are written. This can be seen in the log files below for 1-hour and 6-hour duration and frequency diagnostics. These were generated using MAPL 2.26.0.
gchp.1hr_diag.txt
gchp.6hr_diag.txt
@yantosca of the GEOS-Chem Support Team recently diagnosed several GEOS-Chem Classic and HEMCO memory leaks using the gfortran
sanitize
option (see geoschem/GCClassic#24). I wonder if you could do the same on MAPL to see if it quickly hones in on the issue. Tagging @mathomp4.The text was updated successfully, but these errors were encountered: