Break up large read requests into smaller, pipelined requests. #20125

bbockelm · 2017-08-11T02:34:07Z

This breaks up any read request over 8MB into a series of reads that are 8MB or smaller. In order to avoid network latency induced stalls, we pipeline two requests at a time.

The intent is to prevent large read requests (such as the 128MB ones used by lazy-download) from hitting the per-operation timeout.

This breaks up any read request over 8MB into a series of reads that are 8MB or smaller. In order to avoid network latency induced stalls, we pipeline two requests at a time.

cmsbuild · 2017-08-11T02:34:25Z

A new Pull Request was created by @bbockelm (Brian Bockelman) for CMSSW_9_2_X.

It involves the following packages:

Utilities/XrdAdaptor

@cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit this is something you requested to watch as well.
@davidlange6 you are the release manager for this.

cms-bot commands are listed here

davidlange6 · 2017-08-11T05:59:56Z

hi @bbockelm - please make a master branch request too.

davidlange6 · 2017-08-11T06:00:01Z

please test

cmsbuild · 2017-08-11T06:00:10Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/22210/console Started: 2017/08/11 08:02

cmsbuild · 2017-08-11T06:57:58Z

+1
Tested at: 3e45943
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22210/summary.html

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
1d79e98
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22210/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22210/git-merge-result

cmsbuild · 2017-08-11T06:58:01Z

Comparison job queued.

cmsbuild · 2017-08-11T08:40:51Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22210/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 22
DQMHistoTests: Total histograms compared: 1791740
DQMHistoTests: Total failures: 44267
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 1747307
DQMHistoTests: Total skipped: 166
DQMHistoTests: Total Missing objects: 0
Checked 90 log files, 14 edm output root files, 22 DQM output files

slava77 · 2017-08-11T12:23:30Z

urgent

T0 wanted this in the release, as mentioned in the OPS meeting today
@drkovalskyi

Dr15Jones · 2017-08-11T13:00:49Z

Is there any way to test that this change actually helps with the problem in the T0?

Dr15Jones · 2017-08-11T13:06:18Z

Utilities/XrdAdaptor/src/XrdFile.cc

+  // In some cases, the IO layers above us (particularly, if lazy-download is
+  // enabled) will emit very large reads.  We break this up into multiple
+  // reads in order to avoid hitting timeouts.
+  std::vector<IOPosBuffer> requests;


You can also do a reserve call here by using the integer division if n and XRD_CL_MAX_READ_SIZE

Dr15Jones · 2017-08-11T13:07:35Z

Utilities/XrdAdaptor/src/XrdFile.cc


-  uint32_t bytesRead = m_requestmanager->handle(into, n, pos).get();
+  std::vector<std::pair<std::future<IOSize>, IOSize>> futures; futures.reserve(requests.size());


Please use two lines

Dr15Jones · 2017-08-11T13:17:19Z

Utilities/XrdAdaptor/src/XrdFile.cc

+  bool readReturnedShort = false;
+  for (auto &future : futures) {
+    // Future throws an exception on failure.
+    IOSize result = future.first.get();


Why is this a second loop and not just done at line 281? In fact, I don't think there is a need for the futures container at all. Looks like you just need a handle on two of them, a present and a next.

I thought this would help the readability of the code -- indeed, it can be collapsed together (and only two futures are necessary).

Ok, I think I came up with a clean / readable way to do this without maintaining a list. Will update in a moment.

sextonkennedy · 2017-08-11T13:32:21Z

@Dr15Jones the testing will have to be done in a replay of the tier0 with this release. Dirk and Brian have been debating the strategy of this in the ticket and the need to reduce the read size was also requested by a CERN IT developer. I appreciate your C++ review and comments, but once Brian addresses them I think we have to move forward. The tier0 is paused right now awaiting the new release.

cmsbuild · 2017-08-11T14:26:55Z

Pull request #20125 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

smuzaffar · 2017-08-11T14:40:14Z

please test

cmsbuild · 2017-08-11T14:40:29Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/22226/console Started: 2017/08/11 16:41

Dr15Jones · 2017-08-11T14:53:45Z

Utilities/XrdAdaptor/src/XrdFile.cc

+    cur_future_expected = chunk;
+
+    // Wait for the prior read; update bytesRead.
+    check_read(prev_future, prev_future_expected);


So this works the first time through the loop because prev_future.valid() is false and check_read drops out immediately?

Dr15Jones · 2017-08-11T15:07:02Z

+1

cmsbuild · 2017-08-11T15:07:18Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_9_2_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_9_3_X is complete. This pull request will now be reviewed by the release team before it's merged. @davidlange6, @smuzaffar (and backports should be raised in the release meeting by the corresponding L2)

cmsbuild · 2017-08-11T16:04:31Z

+1
Tested at: b541b8d
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22226/summary.html

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
7130621
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22226/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22226/git-merge-result

cmsbuild · 2017-08-11T16:04:34Z

Comparison job queued.

cmsbuild · 2017-08-11T17:55:44Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20125/22226/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 22
DQMHistoTests: Total histograms compared: 1792872
DQMHistoTests: Total failures: 29342
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 1763364
DQMHistoTests: Total skipped: 166
DQMHistoTests: Total Missing objects: 0
Checked 90 log files, 14 edm output root files, 22 DQM output files

sextonkennedy · 2017-08-12T06:58:01Z

I want to share with this thread the notes from the meeting with the storage operations team that occurred last Mon. and which there will be a repeat of next Mon. Issue no. 2 is probably the scariest as there is not way to recover RAW data that gets lost and P5 has already cleaned up. This is why I'm willing to push for a test in a replay only short cut. The source of large read requests comes from lazy downloading, and only the tier0 should still be using that (RAL was recently convinced to stop doing that).

Hi all,

here the minutes from yesterday's meeting, thanks to Jan for putting
down the notes!
Please let us know if we missed/forgot something

Cheers,
Luca

============================================================

EOSCMS crisis meeting 2017-08-09, triggered by GGUS#129607 (alarm)

present: Christoph (remote), Hervé, Elvin, Dima, Zeynep, John, Luca, Jan

CMS sees way too many issues with EOSCMS, manpower-intensive to the point
that they are considering alternatives (run outside of CERN, stream
initially to CASTOR, ..).
CMS load should be comparable to last year (but with bigger files).
P5 is severely limited on storage and needs to delete files soon after
EOS has acknowledged
successful transfer.

Main issues, in decreasing priority:

"disappearing files": files get written (OK), checked (OK), then go
away. affect raw data files.
"0-size files": files get written(OK), checked (OK), then namespace
shows them to be truncated
"disappearing directories": (recent, discovered by accident -
potentially huge impact?)
"Machine not on the network": annoying background error rate

CMS has implemented various workarounds (disable client-side write
recovery, "eoscp -x"),
still see errors that were supposed to be fix.

"disappearing files"

Bug in client timeout+retry logic - a first attempt got stalled, a retry
went OK, eventually the stalled attempt got treated, caused an error,
and cleanup-on-error removed the file.

EOS ops/devs believe this should no longer occur after the EOSCMS MGM
update 2017-08-07 19:00 (0.3.265: workaround in place that no longer
"cleanup" in case replicas
exists - this is a server-side workaround).
Anything after that time needs investigation - please report.

CMS T0 jobs should "forget" about these files after 12h, so once fixed
would expect a quick drop of error messages. However, external transfers
might still want to access these files much later (and get errors).

"0-size files"

Also assumed to be due to a client-side Xrootd internal retry (where a
second attempt just "truncates" the file. The actual file content is
still on EOS, but the size mismatch causes the files to no longer
readable (which partially contributes to "Machine not on the network"-errors)

EOS ops/devs believe(d) this should no longer occur when
"XRD_WRITERECOVERY=0" is set. This has been done for T0 "agent" (date?),
and for StorageManager (2017-08-08)

CMS has seen this still afterwards - will give fresh examples.

These files need to be "recovered" as much as possible (raw data, no
longer at P5) by EOS ops.

"disappearing directories"

At least one known bug that get triggered during namespace "compaction"
(which happened 2017-08-07 ~18:00). Directories can be manually
recovered if missing, and a cold restart of the namespace should bring
all of them back.
-> Next update should include such a cold restart. CMS OK with the
associated downtime (<1h).

"Machine not on the network"

ongoing investigation, possibly linked to use of 128MB prefetch but seen
by 3 different job classes. Lower priority. No recent CMS-side changes.

"response mixup" (servers answers for a different file than request)

Possible conection to a known xrootd client bug that gets triggered under load,
recommendation was to update to 4.6.
CMS challenges this (old client was in use for a year, but errors are
recent; client-side load (streams,
jobs, CPU) is unchanged) but will update tomorrow.

Any pattern? seems to only affect transfers from "glidein"?
CMS explains that 90% of transfers would have this (based on CPU
allocations), rest is PheDex (which uses GridFTP, which internally
speaks Xrootd)

slava77 · 2017-08-12T10:23:28Z

merge

following the request/confirmation from @sextonkennedy

slava77 · 2017-08-12T10:29:14Z

I guess my magic power is gone

smuzaffar · 2017-08-12T12:22:11Z

@slava77 , I have added you as special release manager. cms-bot should recognize your merge request
cms-sw/cms-bot@51357c8#diff-61ad223fc9fb3b45ce3e5cdac2916918

bbockelm · 2017-08-12T12:32:38Z

@sextonkennedy - to be clear, I believe this will significantly address many known causes of [4].

[5] may be a bug in the upstream xrootd client (although it's hard to determine if the server is getting confused in responding or if the client is getting confused in parsing the response). [1 - 3] are likely EOS-specific issues.

davidlange6 · 2017-08-12T13:42:28Z

So the reason to push this at high priority in won't be fixed by the pr...interesting.

bbockelm · 2017-08-12T13:54:36Z

Well, it will fix the ALARM ticket (https://ggus.eu/index.php?mode=ticket_info&ticket_id=129607). I just don't see how it would fix the server-side issues.

sextonkennedy · 2017-08-12T21:24:13Z

Hi Brian, The meeting notes explicitly said, under issue [2] "Also assumed to be due to a client-side Xrootd internal retry”. This is one of the most damaging issues. 2 times since I took the ORM shift this week we lost PCL files in this manner. The tier0 had to be paused and the AlCa team had to regenerate the files by hand. This is not sustainable. Are you disputing the claim made by the EOS developers? Is this a case of them blaming Xrootd developers and the Xrootd developers blaming EOS developers? Cheers, Liz On Aug 12, 2017, at 7:32 AM, Brian Bockelman <notifications@github.com<mailto:notifications@github.com>> wrote: @sextonkennedy<https://github.com/sextonkennedy> - to be clear, I believe this will significantly address many known causes of [4]. [5] may be a bug in the upstream xrootd client (although it's hard to determine if the server is getting confused in responding or if the client is getting confused in parsing the response). [1 - 3] are likely EOS-specific issues. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#20125 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AD4RmLYkIdtmh_qNBOJ475g8eldXrMP1ks5sXZtngaJpZM4O0MH9>.

bbockelm · 2017-08-12T21:51:49Z

Hi Liz,

I didn't do too deep an analysis here; [2] could certainly be in the same category as [5]. The EOS folks have probably investigated deeper on this issue than I have. Regardless, it wouldn't be affected by this PR.

Is there a GGUS ticket for this? If we want to dig deeper from the client side, it'd probably be good to catch me up on the issue (additionally, not analyze a separate bug on a closed PR...).

Brian

Break up large read requests into smaller, pipelined requests.

3e45943

This breaks up any read request over 8MB into a series of reads that are 8MB or smaller. In order to avoid network latency induced stalls, we pipeline two requests at a time.

cmsbuild added this to the CMSSW_9_2_X milestone Aug 11, 2017

cmsbuild added comparison-pending core-pending orp-pending pending-signatures tests-pending labels Aug 11, 2017

cmsbuild added tests-started and removed tests-pending labels Aug 11, 2017

cmsbuild added tests-approved and removed tests-started labels Aug 11, 2017

cmsbuild added comparison-available and removed comparison-pending labels Aug 11, 2017

cmsbuild added the urgent label Aug 11, 2017

Dr15Jones reviewed Aug 11, 2017

View reviewed changes

Dr15Jones suggested changes Aug 11, 2017

View reviewed changes

Dr15Jones reviewed Aug 11, 2017

View reviewed changes

Fix code formatting issues.

e468ae2

cmsbuild removed comparison-available tests-approved labels Aug 11, 2017

cmsbuild added tests-started and removed tests-pending labels Aug 11, 2017

Dr15Jones reviewed Aug 11, 2017

View reviewed changes

cmsbuild added core-approved fully-signed and removed core-pending pending-signatures labels Aug 11, 2017

cmsbuild added tests-approved and removed tests-started labels Aug 11, 2017

cmsbuild added comparison-available and removed comparison-pending labels Aug 11, 2017

bbockelm mentioned this pull request Aug 11, 2017

Pipeline large xrootd reads master #20142

Merged

cmsbuild merged commit 69bf76d into cms-sw:CMSSW_9_2_X Aug 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break up large read requests into smaller, pipelined requests. #20125

Break up large read requests into smaller, pipelined requests. #20125

bbockelm commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

davidlange6 commented Aug 11, 2017

davidlange6 commented Aug 11, 2017

cmsbuild commented Aug 11, 2017 •

edited

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

slava77 commented Aug 11, 2017

Dr15Jones commented Aug 11, 2017

Dr15Jones Aug 11, 2017

Dr15Jones Aug 11, 2017

Dr15Jones Aug 11, 2017

bbockelm Aug 11, 2017 •

edited

bbockelm Aug 11, 2017

sextonkennedy commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

smuzaffar commented Aug 11, 2017

cmsbuild commented Aug 11, 2017 •

edited

Dr15Jones Aug 11, 2017

bbockelm Aug 11, 2017

Dr15Jones commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

sextonkennedy commented Aug 12, 2017

slava77 commented Aug 12, 2017

slava77 commented Aug 12, 2017

smuzaffar commented Aug 12, 2017

bbockelm commented Aug 12, 2017

davidlange6 commented Aug 12, 2017

bbockelm commented Aug 12, 2017

sextonkennedy commented Aug 12, 2017 via email

bbockelm commented Aug 12, 2017


		uint32_t bytesRead = m_requestmanager->handle(into, n, pos).get();
		std::vector<std::pair<std::future<IOSize>, IOSize>> futures; futures.reserve(requests.size());

Break up large read requests into smaller, pipelined requests. #20125

Break up large read requests into smaller, pipelined requests. #20125

Conversation

bbockelm commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

davidlange6 commented Aug 11, 2017

davidlange6 commented Aug 11, 2017

cmsbuild commented Aug 11, 2017 • edited

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

slava77 commented Aug 11, 2017

Dr15Jones commented Aug 11, 2017

Dr15Jones Aug 11, 2017

Choose a reason for hiding this comment

Dr15Jones Aug 11, 2017

Choose a reason for hiding this comment

Dr15Jones Aug 11, 2017

Choose a reason for hiding this comment

bbockelm Aug 11, 2017 • edited

Choose a reason for hiding this comment

bbockelm Aug 11, 2017

Choose a reason for hiding this comment

sextonkennedy commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

smuzaffar commented Aug 11, 2017

cmsbuild commented Aug 11, 2017 • edited

Dr15Jones Aug 11, 2017

Choose a reason for hiding this comment

bbockelm Aug 11, 2017

Choose a reason for hiding this comment

Dr15Jones commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

cmsbuild commented Aug 11, 2017

sextonkennedy commented Aug 12, 2017

slava77 commented Aug 12, 2017

slava77 commented Aug 12, 2017

smuzaffar commented Aug 12, 2017

bbockelm commented Aug 12, 2017

davidlange6 commented Aug 12, 2017

bbockelm commented Aug 12, 2017

sextonkennedy commented Aug 12, 2017 via email

bbockelm commented Aug 12, 2017

cmsbuild commented Aug 11, 2017 •

edited

bbockelm Aug 11, 2017 •

edited

cmsbuild commented Aug 11, 2017 •

edited