Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AOTInductor] Add config to allow buffer mutation #126584

Closed
wants to merge 1 commit into from

Conversation

muchulee8
Copy link
Contributor

@muchulee8 muchulee8 commented May 17, 2024

Summary:
Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented May 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126584

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit ec70e96 with merge base 5fb11cd (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 17, 2024
Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 18, 2024
Summary:
Pull Request resolved: pytorch#126584

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 19, 2024
Summary:
Pull Request resolved: pytorch#126584

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 19, 2024
Summary:
Pull Request resolved: pytorch#126584

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 19, 2024
Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 19, 2024
@@ -1924,12 +1924,13 @@ def _compile_consts_linux(consts: bytes) -> str:
run_command_and_check(cmd)
log.debug("aot constant binary command: %s", cmd)

# .data section is between .text and .bss. When the size of .data is large,
# during the linking, the relocation of .text against .bss may overflow.
# Rename it to .ldata so that it won't be in between the .text and .bss section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - we may still need this comment?

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 19, 2024
Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 19, 2024
Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
muchulee8 added a commit to muchulee8/pytorch that referenced this pull request May 20, 2024
Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57514729

@@ -744,6 +744,9 @@ class aot_inductor:
# rather than embedded into the data section. Needed to support 1B+ parameter models
force_mmap_weights: bool = False

# flag to allow buffer mutation. This would remove the read-only property from buffers.
allow_buffer_mutation: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do like "0" if is_fbcode() else "1"? (search for other examples in this file)

Copy link
Contributor Author

@muchulee8 muchulee8 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right thing, it's not only about fbcode.
If you take a look at the log and previous comment Intel folks made, they moved from .data to .ldata because it overflows to .bss. The overflow is already happening with cases in OSS world. It's only because the constants they have fits within .ldata, so the solution works for them. But if even larger constants comes in again, we will get more failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that diverging the behavior is not a good idea. How about this: instead of adding a config here, Inductor can detect if the model does buffer mutation (https://github.com/pytorch/pytorch/blob/655038687afd19a4a4c9371b77ff046fd6c84be1/torch/_inductor/lowering.py#L5078C25-L5078C41), then decides what section to use and add const_size checking to explicitly fail when size is over limit. We will work on a more comprehensive support in future, but this at least will notice user with a clear error msg.

muchulee8 added a commit that referenced this pull request May 20, 2024
Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729

[ghstack-poisoned]
muchulee8 added a commit that referenced this pull request May 20, 2024
Summary:

Add an additional config to allow buffer mutation.
For data that's greater than 2GB, we would need to set it as read-only, otherwise overflow would occur.
This is a temporary solution since it won't handle cases that requires mutable data greater than 2GB.

Test Plan: Included in commit.

Differential Revision: D57514729

ghstack-source-id: d311ef3d306a0d1fb912f5c67bcbfdd0ea5aef4b
Pull Request resolved: #126667
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants