-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of ga_malloc #330
Comments
Yes. It should go away. I recall I had a PR for this as part of thread safety 6 years ago. |
Unfortunately, ga_malloc() is used in a NWChem routine Should I simply replace |
You could probably still use it, if you want. I'd just propose getting rid of it inside GA. But same comment applies: does it look like it is doing anything for you inside NWChem? |
Yes, we should replace with malloc. |
One reason to stop using ga_malloc is that it creates a less-than-obvious dependency on MA, as noted in #211. We either want to use regular |
I created a branch, feature/malloc, that replaces ga_malloc with malloc. The test suite mostly runs but there may be a few hangs. However, there also appear to be a few hangs in develop, so I need to track those down. I put the changes inside an ifdef preprocessor block, so it should be easy to switch back and forth from ga_malloc to malloc by uncommenting the #define USE_GA_MALLOC line in base.h. If anyone gets a chance, can you try this branch out and see if there are any performance differences? |
I have just run the following tests on deception with OpenMPI 4.1.4 on the feature/malloc branch without any issue using MPI-PR. srun --mpi=pmi2 -N 2 --tasks-per-node=3 global/testing/mir_perf2.x |
@edoapra, what compiler are you using? I tried running the mir_perf2.x test standalone and it still looks like it is hanging. |
gcc |
Which version? I'm using 8.1.0 |
the default 4.8.5 |
The OpenMPI installation for the the combination gcc 8.1 is broken on deception |
|
I thought you were running on constance. |
I didn't know constance is still around |
Yep, still is. I tried running on deception with gcc/9.4.0 and openmpi/4.1.4 and got the same hangs. I'll try gcc/4.8.5 and see what happens. |
|
Try gcc/9.4.0 |
gcc/4.4.7 is busted on constance |
On constance or deception? |
deception |
Works for me.
|
I am using the |
|
Okay, you must be setting something that I am not. I've loaded gcc/4.8.5 (I've also tried gcc/9.4.0) and openmpi/4.1.4, configured using your configure line and run using the submit script #!/bin/csh source /etc/profile.d/modules.csh #source ~/set_deception2 module load gcc/4.8.5 setenv MCA_btl_openib_allow_ib true and the code still hangs. |
I just built the develop branch and am running the test suite. That looks like it works. There appear to have been a lot of changes since I spun off the ga_malloc branch. |
This script works for me and I am using the feature/malloc branch. Do you happen to have some extra OpenMPI configuration under $HOME/.openmpi ? |
Could you try to run the binary I have copied as |
Okay, I'm an idiot. I didn't pick up changes to feature/malloc that were made recently. It looks like everything is passing for me on deception. I'll see if it works on constance. |
The full test suite passes for me on deception. It looks like the two-sided runtime passes on constance. |
I have also been testing this branch with the full QA test suite in NWChem. |
The RMA runtime mostly passes on deception. The only failure is nbtestc.x, which has been consistently failing even with ga_malloc. I'm not sure if that is a problem with the MPI implementation of a problem on our side with how we handle non-blocking handles. At this point, @edoapra, @ajaypanyala, @jeffhammond do you have an issue with merging this into develop? |
I have no issue against the merge. |
There is already a pull request for this (#348). @jeffhammond, if you get a chance can you approve this and we will merge it into develop? |
I had nontrivial feedback comments on that PR. I'm supportive but I think it can be improved. I'll make the changes if Edo is too busy. |
Could you describe the feedback your are talking about? I am not able to find any comment under your name in #348 |
I don't know why they aren't visible. When I scroll down on #348, I see |
I thought about using MA_Sizeof but didn't so that we wouldn't be adding any additional MA dependencies in case we want to completely divorce GA from MA at some point in the future. I don't know if that is a priority for anyone. It's not a make or break item for me, so if anyone else feels strongly about it, let me know. |
In that case, perhaps we can hide the case-switch inside of a GA_SIZEOF macro. I'm not wedded to MA, merely against code duplication. |
I modified the code and defined a new function pnga_malloc that has the same signature as ga_malloc but can be switched under the hood to use either ga_malloc or malloc. This function is used everywhere that ga_malloc used to be used and that gets rid of the USE_GA_MALLOC symbol (except to convert between the two implementations of pnga_malloc). Somebody already created a GAsizeof function inside GA, so I just used that to handle GA types. |
@ajaypanyala, @edoapra, @jeffhammond, any final comments on this branch? If not, then we can approve the pull request and merge this into develop. |
Where is this commit? |
Oh duh, I forgot to check it in. Try it now. |
Works for me |
Looks good to me |
* Modified code to replace all instances of ga_malloc with regular malloc. * minor fix * another fix * Modified code to replace all instances of ga_malloc with regular malloc. * minor fix * another fix * stop to avoid division by zero * fix for brew install warning * Modified code to replace all instances of ga_malloc with regular malloc. * minor fix * another fix * stop to avoid division by zero * fix for brew install warning * added #if HAVE_STDLIB_H #330 (comment) * fix: replaced MA with malloc/free and fix race condition in LJ_Initialize * Modified handling of ga_malloc by defining a new function pnga_malloc that can be converted to use either ga_malloc or regular malloc. This is the only function that needs to be changed. --------- Co-authored-by: Bruce J Palmer <d3g293@deception01.pnl.gov> Co-authored-by: Ajay Panyala <ajay.panyala@gmail.com> Co-authored-by: Bruce J Palmer <d3g293@deception03.pnl.gov> Co-authored-by: Bruce J Palmer <d3g293@constance03.pnl.gov>
Should we get rid of ga_malloc? It is used in a few places inside the GA code but not extensively. The main reason for getting rid of it would be to get rid of obscure errors that occur if you have not allocated a lot of memory using MA_Init. The original justification for using it was to improve performance by using pinned memory, but on the few occasions when I've swapped regular malloc for ga_malloc, I haven't seen any difference. @edoapra, @jeffhammond, what do you think?
The text was updated successfully, but these errors were encountered: