New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break up large read requests into smaller, pipelined requests. #20125
Break up large read requests into smaller, pipelined requests. #20125
Conversation
This breaks up any read request over 8MB into a series of reads that are 8MB or smaller. In order to avoid network latency induced stalls, we pipeline two requests at a time.
A new Pull Request was created by @bbockelm (Brian Bockelman) for CMSSW_9_2_X. It involves the following packages: Utilities/XrdAdaptor @cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
hi @bbockelm - please make a master branch request too. |
please test |
The tests are being triggered in jenkins. |
+1 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
urgent T0 wanted this in the release, as mentioned in the OPS meeting today |
Is there any way to test that this change actually helps with the problem in the T0? |
Utilities/XrdAdaptor/src/XrdFile.cc
Outdated
// In some cases, the IO layers above us (particularly, if lazy-download is | ||
// enabled) will emit very large reads. We break this up into multiple | ||
// reads in order to avoid hitting timeouts. | ||
std::vector<IOPosBuffer> requests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also do a reserve call here by using the integer division if n and XRD_CL_MAX_READ_SIZE
Utilities/XrdAdaptor/src/XrdFile.cc
Outdated
|
||
uint32_t bytesRead = m_requestmanager->handle(into, n, pos).get(); | ||
std::vector<std::pair<std::future<IOSize>, IOSize>> futures; futures.reserve(requests.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use two lines
Utilities/XrdAdaptor/src/XrdFile.cc
Outdated
bool readReturnedShort = false; | ||
for (auto &future : futures) { | ||
// Future throws an exception on failure. | ||
IOSize result = future.first.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a second loop and not just done at line 281? In fact, I don't think there is a need for the futures container at all. Looks like you just need a handle on two of them, a present and a next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this would help the readability of the code -- indeed, it can be collapsed together (and only two futures are necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I came up with a clean / readable way to do this without maintaining a list. Will update in a moment.
@Dr15Jones the testing will have to be done in a replay of the tier0 with this release. Dirk and Brian have been debating the strategy of this in the ticket and the need to reduce the read size was also requested by a CERN IT developer. I appreciate your C++ review and comments, but once Brian addresses them I think we have to move forward. The tier0 is paused right now awaiting the new release. |
Pull request #20125 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again. |
please test |
The tests are being triggered in jenkins. |
cur_future_expected = chunk; | ||
|
||
// Wait for the prior read; update bytesRead. | ||
check_read(prev_future, prev_future_expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this works the first time through the loop because prev_future.valid()
is false and check_read
drops out immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
+1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_9_2_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_9_3_X is complete. This pull request will now be reviewed by the release team before it's merged. @davidlange6, @smuzaffar (and backports should be raised in the release meeting by the corresponding L2) |
+1 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
I want to share with this thread the notes from the meeting with the storage operations team that occurred last Mon. and which there will be a repeat of next Mon. Issue no. 2 is probably the scariest as there is not way to recover RAW data that gets lost and P5 has already cleaned up. This is why I'm willing to push for a test in a replay only short cut. The source of large read requests comes from lazy downloading, and only the tier0 should still be using that (RAL was recently convinced to stop doing that). Hi all, here the minutes from yesterday's meeting, thanks to Jan for putting Cheers, ============================================================ EOSCMS crisis meeting 2017-08-09, triggered by GGUS#129607 (alarm) present: Christoph (remote), Hervé, Elvin, Dima, Zeynep, John, Luca, Jan CMS sees way too many issues with EOSCMS, manpower-intensive to the point Main issues, in decreasing priority:
CMS has implemented various workarounds (disable client-side write
Bug in client timeout+retry logic - a first attempt got stalled, a retry EOS ops/devs believe this should no longer occur after the EOSCMS MGM CMS T0 jobs should "forget" about these files after 12h, so once fixed
Also assumed to be due to a client-side Xrootd internal retry (where a EOS ops/devs believe(d) this should no longer occur when CMS has seen this still afterwards - will give fresh examples. These files need to be "recovered" as much as possible (raw data, no
At least one known bug that get triggered during namespace "compaction"
ongoing investigation, possibly linked to use of 128MB prefetch but seen
Possible conection to a known xrootd client bug that gets triggered under load, Any pattern? seems to only affect transfers from "glidein"? |
merge following the request/confirmation from @sextonkennedy |
I guess my magic power is gone |
@slava77 , I have added you as special release manager. cms-bot should recognize your merge request |
@sextonkennedy - to be clear, I believe this will significantly address many known causes of [4]. [5] may be a bug in the upstream xrootd client (although it's hard to determine if the server is getting confused in responding or if the client is getting confused in parsing the response). [1 - 3] are likely EOS-specific issues. |
So the reason to push this at high priority in won't be fixed by the pr...interesting. |
Well, it will fix the |
Hi Brian,
The meeting notes explicitly said, under issue [2] "Also assumed to be due to a client-side Xrootd internal retry”. This is one of the most damaging issues. 2 times since I took the ORM shift this week we lost PCL files in this manner. The tier0 had to be paused and the AlCa team had to regenerate the files by hand. This is not sustainable.
Are you disputing the claim made by the EOS developers? Is this a case of them blaming Xrootd developers and the Xrootd developers blaming EOS developers?
Cheers, Liz
On Aug 12, 2017, at 7:32 AM, Brian Bockelman <notifications@github.com<mailto:notifications@github.com>> wrote:
@sextonkennedy<https://github.com/sextonkennedy> - to be clear, I believe this will significantly address many known causes of [4].
[5] may be a bug in the upstream xrootd client (although it's hard to determine if the server is getting confused in responding or if the client is getting confused in parsing the response). [1 - 3] are likely EOS-specific issues.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#20125 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AD4RmLYkIdtmh_qNBOJ475g8eldXrMP1ks5sXZtngaJpZM4O0MH9>.
|
Hi Liz, I didn't do too deep an analysis here; [2] could certainly be in the same category as [5]. The EOS folks have probably investigated deeper on this issue than I have. Regardless, it wouldn't be affected by this PR. Is there a GGUS ticket for this? If we want to dig deeper from the client side, it'd probably be good to catch me up on the issue (additionally, not analyze a separate bug on a closed PR...). Brian |
This breaks up any read request over 8MB into a series of reads that are 8MB or smaller. In order to avoid network latency induced stalls, we pipeline two requests at a time.
The intent is to prevent large read requests (such as the 128MB ones used by lazy-download) from hitting the per-operation timeout.