Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle filesizes larger than 8 Gb #76707

Merged
merged 4 commits into from
Oct 7, 2022
Merged

Conversation

jozkee
Copy link
Member

@jozkee jozkee commented Oct 6, 2022

Fixes #76563

  • Right now, with any format, an entry greater than 2GB will cause an overflow exception to be thrown
    • The fix addresses this and we can now handle sizes up to what the spec allows, which is 11 octal digits
  • Right now, with V7 and Ustar, an entry greater than 8GB will be silently truncated
    • The fix will make these scenarios throw exceptions since those formats don't support >8GB entries
  • Right now, with other formats, an entry greater than 8GB will also be silently truncated
    • The fix will make these entries get written correctly with "unlimited" sizes
  • PAX is the only entry format that allows entry sizes greater than 8GB (working around the metadata limit)
    • PAX is our default entry format
    • The fix ensures PAX entries can exceed that 8GB limit
  • Impact to dotnet/sdk-container-builds
    • These builds are currently using the GNU format (Microsoft.NET.Build.Containers/Layer.cs) which is subject to the 8GB limit (and the current 2GB size parsing bug)
    • With this fix, dotnet/sdk-container-builds would only be subject to the 8GB limit
    • With this fix, dotnet/sdk-container-builds could switch to PAX and support entries greater than 8GB

@jozkee jozkee added this to the 8.0.0 milestone Oct 6, 2022
@jozkee jozkee self-assigned this Oct 6, 2022
@ghost
Copy link

ghost commented Oct 6, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #76563

Author: Jozkee
Assignees: Jozkee
Labels:

area-System.IO

Milestone: 8.0.0

[InlineData(TarEntryFormat.Pax, LegacyMaxFileSize)]
// Pax supports unlimited size files.
[InlineData(TarEntryFormat.Pax, LegacyMaxFileSize + 1)]
public void WriteEntry_LongFileSize(TarEntryFormat entryFormat, long size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests create ~8 Gb files, locally it took me 153s to run all tests now, when previously was taken ~5 secs. Any suggestions if this can be improved?

Of course, I will add [OuterLoop].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think there is also another attribute that ensures the test does not run in parallel with others, which would help avoid running out of memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid running out of memory.

disk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant memory. I don't think we should be using disk at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memory then disable for 32 bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think in memory is the way to go.

@MattGal remind us, in Helix, what's the minimum $TEMP disk space you can be sure is available? As I seem to recall that on some queues, it can be only a few GB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helix machines will keep taking work as long as the drive their "work" directory is on has > 3172 MB of free space. Looking at this PR it sounds like you want to create 8+ GB files so there's no guarantee this test can work there; perhaps if you must do this you could check the free space first and nerf the test out.

Folks haven't directly linked the ones they're looking at, so grabbing one from RedHat 7 it's hitting this problem even in the case of running on machines that start with ~48 GB free space.

My guess here is that there are multiple tests running in parallel, that create > 8 GB files simultaneously and eat the entire disk quickly, given that the link above ran on a machine that definitely had ~46 GB free when the work item started.

Copy link
Member

@danmoseley danmoseley Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's another thing to do if you're making huge files -- run them alone, serialized and not in parallel. We have prior art for using Xunit collections to do this in the runtime repo for certain tests. (example)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still lean to use a file for the 8Gbs of tar, but just curious, @MattGal does the helix machines could handle a test with 8Gb of memory consumption?

Copy link
Member

@MattGal MattGal Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most Helix test VMs are of the Standard D2* variety (various intel versions and AMD EPYC used in different places for reasons). These machines have 8 GB total RAM, so it's probably a no, but more likely a "varies by OS" problem, since virtualization should allow this in some environments.

@jozkee

This comment was marked as outdated.

@carlossanlop
Copy link
Member

carlossanlop commented Oct 6, 2022

I think we should stick to MemoryStreams to avoid running out of disk space.

@jozkee
Copy link
Member Author

jozkee commented Oct 7, 2022

/azp list

@azure-pipelines

This comment was marked as off-topic.

@jozkee
Copy link
Member Author

jozkee commented Oct 7, 2022

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jozkee
Copy link
Member Author

jozkee commented Oct 7, 2022

@dotnet/area-infrastructure-libraries all Unix outer loop legs exited with code 127 when running python (I think); some Windows legs exited with code 9009 for corerun.exe. Any ideas?

Unix log sample:

Console log: 'System.Formats.Tar.Tests' from job ba2b60b0-9198-40c8-9869-b06a5034fb17 workitem 15dbf331-6bb8-4644-983d-02dcf7898948 (osx.1200.amd64.open) executed on machine dci-mac-build-370.local running macOS-12.4
+ ./RunTests.sh --runtime-path /tmp/helix/working/A6B808E3/p
----- start Fri Oct 7 00:52:29 PDT 2022 =============== To repro directly: =====================================================
pushd .
/tmp/helix/working/A6B808E3/p/dotnet exec --runtimeconfig System.Formats.Tar.Tests.runtimeconfig.json --depsfile System.Formats.Tar.Tests.deps.json xunit.console.dll System.Formats.Tar.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing 
popd
===========================================================================================================
/private/tmp/helix/working/A6B808E3/w/A92908FE/e /private/tmp/helix/working/A6B808E3/w/A92908FE/e
./RunTests.sh: line 168: /tmp/helix/working/A6B808E3/p/dotnet: No such file or directory
/private/tmp/helix/working/A6B808E3/w/A92908FE/e
----- end Fri Oct 7 00:52:29 PDT 2022 ----- exit code 127 ----------------------------------------------------------
ulimit -c value: 0
+ export _commandExitCode=127
+ _commandExitCode=127
+ /usr/local/bin/python3 /tmp/helix/working/A6B808E3/p/reporter/run.py https://dev.azure.com/dnceng-public/ public 892084 eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Im9PdmN6NU1fN3AtSGpJS2xGWHo5M3VfVjBabyJ9.eyJuYW1laWQiOiJjNzczZjJjMi01MTIwLTQyMDctYWZlMi1hZmFmMzVhOGJjMGEiLCJzY3AiOiJhcHBfdG9rZW4iLCJhdWkiOiI5ZWYxMTNkNi0xY2RiLTRjZTktYTU1ZS1hODdhNDY0ZjY4ODAiLCJzaWQiOiJlMzg0YTQ4OC05MWNjLTQ3ZWMtODVkZC05NzE5ODkzZTVmZTIiLCJCdWlsZElkIjoiY2JiMTgyNjEtYzQ4Zi00YWJiLTg2NTEtOGNkY2I1NDc0NjQ5OzQzODc4IiwicHBpZCI6InZzdGZzOi8vL0J1aWxkL0J1aWxkLzQzODc4Iiwib3JjaGlkIjoiZjYwOGQwNTEtZTI1Ny00YWM3LWI0M2QtYmQ2NzY3OWVlZDBlLmxpYnJhcmllc19idWlsZF9vc3hfeDY0X2RlYnVnLl9fZGVmYXVsdCIsInJlcG9JZHMiOiIiLCJpc3MiOiJhcHAudnN0b2tlbi52aXN1YWxzdHVkaW8uY29tIiwiYXVkIjoiYXBwLnZzdG9rZW4udmlzdWFsc3R1ZGlvLmNvbXx2c286NmZjYzkyZTUtNzNhNy00Zjg4LThkMTMtZDkwNDViNDVmYjI3IiwibmJmIjoxNjY1MTI1ODQ1LCJleHAiOjE2NjUxMzYwNDV9.tHAZWLxYXf5MFsP381flJarvxRyDdJS9z0J6HjQAzjoKoo1mGNLG-plGPrqDJpe4pbAHZ-_JucBMLAqNhG3BIpFnq650B4Sefg3Bd4LB6vk7IEVJ_tXO-0mz2xkoXLtJInqVxq_SCNFdus1VbqBXWBbNzYSNhg78QlVxcTY9C7vutndpG4pWTzfoX4CdOTEOS-OA9xjFM1w5OQKHBQ9QinyZry6eW3fIltzeWm-eW9NlA7lXpcbZK17x7dq90_lO1WoU62ZS8Bn6yOZAldWN3AcRnX5agjNjVfFT3QZonU10Rj41NJDY-QD6Tw-IlQv4iUkGp9PbWODODvoUhXQszQ
2022-10-07T07:52:30.119Z	INFO   	run.py	run(48)	main	Beginning reading of test results.
2022-10-07T07:52:30.119Z	INFO   	run.py	__init__(42)	read_results	Searching '/private/tmp/helix/working/A6B808E3/w/A92908FE/e' for test results files
2022-10-07T07:52:30.124Z	INFO   	run.py	__init__(42)	read_results	Searching '/tmp/helix/working/A6B808E3/w/A92908FE/uploads' for test results files
2022-10-07T07:52:30.125Z	WARNING	run.py	__init__(55)	read_results	No results file found in any of the following formats: xunit, junit, trx
2022-10-07T07:52:30.125Z	INFO   	run.py	packing_test_reporter(30)	report_results	Packing 0 test reports to '/tmp/helix/working/A6B808E3/w/A92908FE/e/__test_report.json'
2022-10-07T07:52:30.126Z	INFO   	run.py	packing_test_reporter(33)	report_results	Packed 1400 bytes
+ exit 127
['System.Formats.Tar.Tests' END OF WORK ITEM LOG: Command exited with 127]

Windows log sample:

C:\h\w\A623091C\w\97190865\e>call RunTests.cmd --runtime-path C:\h\w\A623091C\p 
----- start 07/10/2022  8:31:56,34 ===============  To repro directly: ===================================================== 
pushd C:\h\w\A623091C\w\97190865\e\
"C:\h\w\A623091C\p\dotnet.exe" exec --runtimeconfig System.Formats.Tar.Tests.runtimeconfig.json --depsfile System.Formats.Tar.Tests.deps.json xunit.console.dll System.Formats.Tar.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing 
popd
===========================================================================================================

C:\h\w\A623091C\w\97190865\e>"C:\h\w\A623091C\p\dotnet.exe" exec --runtimeconfig System.Formats.Tar.Tests.runtimeconfig.json --depsfile System.Formats.Tar.Tests.deps.json xunit.console.dll System.Formats.Tar.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing  
""C:\h\w\A623091C\p\dotnet.exe"" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.
----- end 07/10/2022  8:31:56,37 ----- exit code 9009 ----------------------------------------------------------
2022-10-07T08:31:56.985Z	INFO   	run.py	run(48)	main	Beginning reading of test results.
2022-10-07T08:31:56.985Z	INFO   	run.py	__init__(42)	read_results	Searching 'C:\h\w\A623091C\w\97190865\e' for test results files
2022-10-07T08:31:57.000Z	INFO   	run.py	__init__(42)	read_results	Searching 'C:\h\w\A623091C\w\97190865\uploads' for test results files
2022-10-07T08:31:57.000Z	WARNING	run.py	__init__(55)	read_results	No results file found in any of the following formats: xunit, junit, trx
2022-10-07T08:31:57.000Z	INFO   	run.py	packing_test_reporter(30)	report_results	Packing 0 test reports to 'C:\h\w\A623091C\w\97190865\e\__test_report.json'
2022-10-07T08:31:57.000Z	INFO   	run.py	packing_test_reporter(33)	report_results	Packed 1408 bytes
ERROR: no se encontr� el proceso "corerun.exe".
['System.Formats.Tar.Tests' END OF WORK ITEM LOG: Command exited with 9009]

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the outerloop tests that fail to find dotnet, I created this arcade issue: dotnet/arcade#11185

From our chat conversation, you verified that some outerloop tests ran your newly added tar tests and they passed.

All other test failures are unrelated. So this LGTM now.

@jozkee
Copy link
Member Author

jozkee commented Oct 7, 2022

Will trigger outerloop once more just to be safe.

@azure-pipelines

This comment was marked as off-topic.

@jozkee
Copy link
Member Author

jozkee commented Oct 7, 2022

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jozkee
Copy link
Member Author

jozkee commented Oct 7, 2022

As discussed in dotnet/arcade#11185, outerloop issues are unrelated.

@jozkee jozkee merged commit 434e7f6 into dotnet:main Oct 7, 2022
@jozkee jozkee deleted the tar_longfiles branch October 7, 2022 19:50
@jozkee
Copy link
Member Author

jozkee commented Oct 7, 2022

/backport to release/7.0

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2022

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3207286202

@carlossanlop
Copy link
Member

FYI the arcade issue was closed. It was determined this was a runtime problem, so I opened this new issue in our repo: #76755

@carlossanlop
Copy link
Member

@rainersigwald @baronfel FYI we are backporting this to GA.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tar: filesizes >= 8Gb are truncated on extraction due to incorrect size being written
5 participants