Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in History #1793

Closed
lizziel opened this issue Nov 8, 2022 · 11 comments
Closed

Memory leak in History #1793

lizziel opened this issue Nov 8, 2022 · 11 comments
Assignees
Labels
⌛ Long Term Long term issues

Comments

@lizziel
Copy link
Contributor

lizziel commented Nov 8, 2022

I am looking into memory leaks in GCHP and suspect there is a memory leak occurring in the MAPL History component. The memory jumps every time History output files are written. This can be seen in the log files below for 1-hour and 6-hour duration and frequency diagnostics. These were generated using MAPL 2.26.0.
gchp.1hr_diag.txt
gchp.6hr_diag.txt

@yantosca of the GEOS-Chem Support Team recently diagnosed several GEOS-Chem Classic and HEMCO memory leaks using the gfortran sanitize option (see geoschem/GCClassic#24). I wonder if you could do the same on MAPL to see if it quickly hones in on the issue. Tagging @mathomp4.

@mathomp4
Copy link
Member

mathomp4 commented Nov 8, 2022

I'm assigning/pinging @bena-nasa, @atrayano, and @tclune as they probably have insights to memory leaks in History.

Perhaps I can work with @bena-nasa to use one of his tools to do history without needing to run full GEOS?

@bena-nasa
Copy link
Collaborator

bena-nasa commented Nov 9, 2022

@lizziel I see what you mean, if it this obvious as it looks in those logs you provided I would think we would see this in GEOS runs, I'll plan on scheduling time to do some tests and see what I can see on our end.

@lizziel
Copy link
Contributor Author

lizziel commented Nov 9, 2022

Thanks @bena-nasa. For those tests I turned on many diagnostics. I also did a passive tracer run which had much fewer diagnostics fields. I still saw the memory leak but it was much smaller. Best turn on many diagnostic fields in your test.

If you test with gfortran you could also use the sanitize compiler flag option to hone in on the issues. The PR I linked to above shows the changes @yantosca made to our cmake files to build with those flags (GC-Classic not GCHP). If you can build MAPL with the flags on then your run should point you to the problem. Ideally this would be an option that could be done regularly to detect leaks.

@bena-nasa
Copy link
Collaborator

@mathomp4 @lizziel
Well I did a 10 day run at c48 with History off, here is what I saw from the beginning to the end:

25.8% Mem Comm:Used
.
.
.
27.9% Mem Comm:Used

and with History on:

27.4% Mem Comm:Used
.
.
.
29.3% Mem Comm:Used

So with History on or off I saw a the same small creep in memory used by GEOS. I can play around with my simplified History/ExtData driver and see if I see anything but at least based on this, it seems to be independent of History being on or off

@lizziel
Copy link
Contributor Author

lizziel commented Nov 15, 2022

Would you be able to add GCC option -fsanitize=address to quickly hone in on this? It would be helpful if that option was built into the build system to easily tap into when using GCHP as well.

@mathomp4
Copy link
Member

Would you be able to add GCC option -fsanitize=address to quickly hone in on this? It would be helpful if that option was built into the build system to easily tap into when using GCHP as well.

@lizziel Well, I tried it out yesterday and the model just said "No". It wouldn't even start up GEOSgcm.x (or maybe it takes more than 10 minutes to start?)

I might have to enlist @bena-nasa to see if he can try it out with his application that can just run history.

Note that adding the flags wouldn't be hard, though. Just have to figure out the best way to get them in.

@stale
Copy link

stale bot commented Jan 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

@stale stale bot added the ❄️ Stale This issue has been marked stale label Jan 15, 2023
@mathomp4 mathomp4 added the ⌛ Long Term Long term issues label Jan 15, 2023
@stale stale bot removed the ❄️ Stale This issue has been marked stale label Jan 15, 2023
@lizziel
Copy link
Contributor Author

lizziel commented Apr 2, 2024

Has there been any development in diagnosing memory leaks in GEOS since I opened this issue? We still see memory creep in GCHP runs. I wonder if this might be addressed in MAPL3.

@atrayano
Copy link
Contributor

atrayano commented Apr 3, 2024

@lizziel The last few months I was preparing GEOS for a nature run, and in the process I found and fixed many memory related issues, including memory leaks. Right now my 4-day run at 3km horizontal resolution (c2880) and 137 vertical levels has a fairly flat memory profile. So far I did not find a "smoking" gun in History (for this run I am using a single 2-d collection, on a 11520x5761 lat-lon hourly output. If you want to try my version, everything, including MAPL, has been tagged with "AIST-nr_v1.0". In addition, at some point I started to suspect that memory fragmentation plays a role, and experimented with a different memory allocator (jemalloc). I was very impressed by it. At this resolution, it saved about 20% of the committed memory (this was on our new, 128-cores-per-node AMD Milan chip machine).

@tclune
Copy link
Collaborator

tclune commented Apr 3, 2024

Just to be clear, we have no evidence that jemalloc has any impact on leaks.

@lizziel
Copy link
Contributor Author

lizziel commented Apr 3, 2024

Okay, thanks for the update. I will close this issue and create a new one if we see memory leaks following the update to MAPL3.

@lizziel lizziel closed this as completed Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⌛ Long Term Long term issues
Projects
None yet
Development

No branches or pull requests

5 participants