Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438 #13472

Closed
wants to merge 1 commit into from

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Nov 30, 2018

Description

Fixes #13438

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • [] Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@vandanavk
Copy link
Contributor

@mxnet-label-bot add [pr-work-in-progress]

@larroy Please add "Fixes #13438" in the PR description to automatically close the issue when the PR is merged

@larroy
Copy link
Contributor Author

larroy commented Nov 30, 2018

@samskalicky what do you think of this? I think we should run a few examples to see if the number of threads remains constant when using CPU threaded engine.

@larroy
Copy link
Contributor Author

larroy commented Nov 30, 2018

@vandanavk done. Thanks for the suggestion.

@samskalicky
Copy link
Contributor

Pedro, it would be great if the fix is as simple as commenting out the SetEnv. But what about the corresponding GetEnv in:
https://github.com/apache/incubator-mxnet/blob/266de6bef4da5769431557288d41fab2a02e52ca/src/engine/threaded_engine_perdevice.cc#L79
Shouldnt we need a mechanism to set the number of threads here to 1?

@larroy
Copy link
Contributor Author

larroy commented Nov 30, 2018

Yes most likely the fix is too simplistic. Wanted to have a fist test run to see how it looks.

@larroy larroy changed the title [Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438 [WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438 Dec 4, 2018
@roywei
Copy link
Member

roywei commented Dec 14, 2018

@mxnet-label-bot add[par-awaiting-testing]

dmlc::SetEnv("MXNET_CPU_WORKER_NTHREADS", 1);
dmlc::SetEnv("OMP_NUM_THREADS", 1);
// dmlc::SetEnv("MXNET_CPU_WORKER_NTHREADS", 1);
// dmlc::SetEnv("OMP_NUM_THREADS", 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting to 1 effectively disabled OMP (assuming OMP looks at the environment variable after startup). What happens if the forked process tries to use OMP? There should be a test for this because I know libgomp doesn't play well with forking sometimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review Chris. Can you elaborate and provide more detail on what do you suggest to do?

@Roshrini
Copy link
Member

Roshrini commented Jan 2, 2019

@larroy Can you address comments and rebase this PR? Thanks

@stu1130
Copy link
Contributor

stu1130 commented Jan 16, 2019

@mxnet-label-bot update [pr-awaiting-response]

@marcoabreu marcoabreu added pr-awaiting-response PR is reviewed and waiting for contributor to respond and removed pr-work-in-progress PR is still work in progress labels Jan 16, 2019
@sandeep-krishnamurthy sandeep-krishnamurthy added the pr-work-in-progress PR is still work in progress label Jan 28, 2019
@pinaraws
Copy link

@larroy Can this be merged now? Or should we close this PR?

@piyushghai
Copy link
Contributor

@larroy Gentle ping.

@larroy
Copy link
Contributor Author

larroy commented Apr 4, 2019

I think we need to do more testing to be confident that the solution is solid. I can do it during my next oncall. The change might be trivial but checking for correctness in the multi-threaded scenario and with interactions from other libraries is costly in terms of developer time.

@pinaraws
Copy link

@larroy Gentle ping.

@larroy larroy closed this Aug 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-response PR is reviewed and waiting for contributor to respond pr-work-in-progress PR is still work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libc getenv is not threadsafe