-
Notifications
You must be signed in to change notification settings - Fork 4.2k
/
IndexIntoFile.h
1119 lines (928 loc) · 49.2 KB
/
IndexIntoFile.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#ifndef DataFormats_Provenance_IndexIntoFile_h
#define DataFormats_Provenance_IndexIntoFile_h
/** \class edm::IndexIntoFile
Used to quickly find the Events, Lumis, and Runs in a single
ROOT format data file and step through them in the desired
order.
A list of the most important functions that a client would
use directly follows. There are detailed comments below with
the declaration of each function.
The begin and end functions are used to start and stop
an iteration loop. An argument to the iterator constructor
determines the order of iteration.
The functions findPosition, findEventPosition, findRunPosition,
and findLumiPosition are used to navigate directly to specific
runs, lumis, and events.
The functions mentioned above return an object of type
IndexIntoFileItr. The IndexIntoFileItr class has member
functions which allow one to navigate forward and backward
through the runs, lumis, and events in alternative ways.
See more comments with the declaration of each public member
function in IndexIntoFileItr.
The iterator will know what the current item is (as one
would expect). This could be a run, lumi, or event. It
knows more than that though, it knows all three as is
explained below.
In the run state, IndexIntoFileItr knows which lumi will
be processed next after the run and also which event will
be processed after the lumi. These may not be the first
ones in the run if the skip functions were used.
In the lumi state, the IndexIntoFileItr will always point
at the last associated run and the next event to be processed
after the lumi. This may not be the first event if the skip
function was used.
In the event state, the IndexIntoFileItr will always point
at the last corresponding run and also the last corresponding
lumi.
There can be multiple run entries in a TTree associated
with the same run number and ProcessHistoryID in a file.
There can also be multiple lumi entries associated with
the same lumi number, run number, and ProcessHistoryID.
Both sorting orders will make these subgroups contiguous,
but beyond that is up to the client (normally PoolSource,
which passes them up to the EventProcessor)
to deal with merging the multiple run (or lumi) entries
together.
One final comment with regards to IndexIntoFileItr. This
is not an STL iterator and it cannot be used with std::
algorithms. The data structures are complex and designed
to optimize memory usage. It would be difficult or impossible
implement an iterator that is STL compliant.
Here is a summary of the data structures in IndexIntoFile.
The persistent data consists of two vectors.
processHistoryIDs_ is a std::vector<ProcessHistoryID> that
contains the ProcessHistoryIDs with one element in the
vector for each unique ProcessHistoryID. On output they
are ordered as they first written out for each output
file. On input they are ordered as they are first seen
in each process. Note that each ProcessHistoryID is stored
once in this separate vector. Everywhere else it is needed
it stored as an index into this vector because the
ProcessHistoryID itself is large and it would take more
memory to store them repeatedly in the other vectors.
Note that the ProcessHistoryID's referenced in this
class are always the "reduced" ProcessHistoryID's,
not the ProcessHistoryID of the full ProcessHistory.
You cannot use them to directly access the ProcessHistory
from the ProcessHistoryRegistry.
runOrLumiEntries_ is a std::vector<RunOrLumiEntry>.
This vector holds one element per entry in the run
TTree and one element per entry in the lumi TTree.
When sorted, everything associated with a given run and
ProcessHistoryID will be contiguous in the vector.
These groups of elements will be sorted in the order they
first appear in the input file. Within each of
these groups the run entries come first in order,
followed by the elements associated with the lumis.
The lumis are also contiguous and sorted by first
appearance in the input file. Within a lumi they
are sorted by entry order. And each luminosity
element corresponds to one contiguous sequence
of Events in the Events TTree. Before we implemented
concurrent processing of luminosity blocks that
was the full story. After that it became more complicated
because the sequence of events written to an output
file that were associated with a particular luminosity
block TTree entry might no longer be contiguous. Events
from another concurrent luminosity block could get
mixed in because with multiple events processing simultaneously,
there is no longer any guarantee they will finish in the
same order that they were read. This is handled by writing
additional elements to the runOrLumiEntries_ vector, one for
each contiguous block of events associated with a particular
luminosity block. Only one of these vector elements will
be associated with each entry in the luminosity block TTree
and all the others will contain a TTree entry number that
is invalid (note the one element with a valid entry might
or might not be associated with a contiguous block of
events). In the sort of elements related to a particular
luminosity block, the entries with invalid entry numbers
will come before the valid ones. (It has not happened
yet as I am writing this, but a similar thing will
likely occur for runs if/when we ever implement concurrent
runs)
There are a number of transient data members also.
The 3 most important of these are vectors. To
save memory, these are only filled when needed.
runOrLumiIndexes_ is a std::vector<RunOrLumiIndexes>.
There is a one to one correspondence between the
elements of this vector and the elements of runOrLumiEntries_.
The elements of this vector are sorted in numerical
order using the ProcessHistoryID index, the run number,
and the lumi number. This ordering allows iteration
in numerical order and also fast lookup based on run
number and lumi number. Each element also has indexes
into the eventNumbers_ and eventEntries_ vectors which
hold the information giving the event numbers and
event entry numbers. Note this vector is initially
formed in the identical order as runOrLumiEntries_
and then sorted with a stable sort. The fact that it
is a stable sort means that within the subrange associated
with a particular luminosity block the elements with
an invalid TTree entry number come first and the later
come the elements with a valid TTree entry number
which are in order of TTree entry number.
eventNumbers_ is a std::vector containing EventNumber_t's.
eventEntries_ is a std::vector containing EventEntry's.
If filled, both vectors contain the same number of
entries with identical event numbers sorted in the
same order. The only difference is that one includes
the entry numbers and thus takes more memory.
Each element of runOrLumiIndexes_ associated with a luminosity
block has the indexes necessary to find the range inside
eventNumbers_ or eventEntries_ corresponding to its lumi.
Within that range the elements are sorted by event number,
which is used for the numerical order iteration and to
find an event by the event number.
The details of the data structure are a little different
when reading files written before release 3_8_0
(backward compatibility, see RootFile::fillIndexIntoFile
for the details).
This data structure is optimized for low memory usage when
there are large numbers of events. The optimal case would
occur when there was was one run in a file, one luminosity block
in that run and everything had the same ProcessHistoryID.
If duplicate checking were off and the process simply iterated
through the file in the default order, then only the persistent
vectors would be filled. One vector would contain 2 elements,
one for the run and the other for the lumi. The other
vector would contain one element, the ProcessHistoryID.
Even if there were a billion events, that would be all that
would exist and take up memory. The data structure is not the
optimal structure for a very sparse skim, but the overheads
should be tolerable given the number of runs and luminosity
blocks that should occur in CMS data.
Normally only the persistent parts of the data structure are
filled in the output module. The output module fills them using
two functions designed specifically for that purpose. The
functions are addEntry and sortVector_Run_Or_Lumi_Entries.
There are some complexities associated with filling the data structure,
mostly revolving around optimizations to minimize the per event memory
usage. The client needs to know which parts of the data structure to
fill. See the functions below named fixIndexes, setNumberOfEvents,
setEventFinder, fillEventNumbers, fillEventEntries, and inputFileClosed.
Note that this class is not intended to be used directly by the average
CMS user. PoolSource and PoolOutputModule are the main clients. Other
executables that read ROOT format data files, but do not use PoolSource
may also need to use it directly (FWLite, Fireworks, edmFileUtil ...).
The interface is too complex for general use.
\author W. David Dagenhart, created 19 May, 2010
*/
#include "DataFormats/Provenance/interface/EventID.h"
#include "DataFormats/Provenance/interface/ProcessHistoryID.h"
#include "DataFormats/Provenance/interface/RunID.h"
#include "DataFormats/Provenance/interface/RunLumiEventNumber.h"
#include "FWCore/Utilities/interface/propagate_const.h"
#include "FWCore/Utilities/interface/value_ptr.h"
#include "FWCore/Utilities/interface/thread_safety_macros.h"
#include <memory>
#include <cassert>
#include <iosfwd>
#include <map>
#include <set>
#include <vector>
class TestIndexIntoFile;
class TestIndexIntoFile1;
class TestIndexIntoFile2;
class TestIndexIntoFile3;
class TestIndexIntoFile4;
class TestIndexIntoFile5;
namespace edm {
class ProcessHistoryRegistry;
class RootFile;
class IndexIntoFile {
public:
class IndexIntoFileItr;
class SortedRunOrLumiItr;
class IndexRunLumiEventKey;
using EntryNumber_t = long long;
static constexpr int invalidIndex = -1;
static constexpr RunNumber_t invalidRun = 0U;
static constexpr LuminosityBlockNumber_t invalidLumi = 0U;
static constexpr EventNumber_t invalidEvent = 0U;
static constexpr EntryNumber_t invalidEntry = -1LL;
enum EntryType { kRun, kLumi, kEvent, kEnd };
IndexIntoFile();
~IndexIntoFile();
ProcessHistoryID const& processHistoryID(int i) const;
std::vector<ProcessHistoryID> const& processHistoryIDs() const;
/// This enum is used to specify the order of iteration.
/// In firstAppearanceOrder there are 3 sort criteria, in order of precedence these are:
///
/// 1. firstAppearance of the ProcessHistoryID and run number in the file
///
/// 2. firstAppearance of the ProcessHistoryID, run number and lumi number in the file
///
/// 3. entry number
///
/// In numerical order the criteria are in order of precedence are:
///
/// 1. processHistoryID index (which are normally in order of appearance in the process)
///
/// 2. run number
///
/// 3. lumi number
///
/// 4. event number
///
/// 5. entry number
enum SortOrder { numericalOrder, firstAppearanceOrder };
/// Used to start an iteration over the Runs, Lumis, and Events in a file.
/// Note the argument specifies the order
IndexIntoFileItr begin(SortOrder sortOrder) const;
/// Used to end an iteration over the Runs, Lumis, and Events in a file.
IndexIntoFileItr end(SortOrder sortOrder) const;
/// Used to determine whether or not to disable fast cloning.
bool iterationWillBeInEntryOrder(SortOrder sortOrder) const;
/// True if no runs, lumis, or events are in the file.
bool empty() const;
/// Find a run, lumi, or event.
/// Returns an iterator pointing at it. The iterator will always
/// be in numericalOrder mode.
/// If it is not found the entry type of the iterator will be kEnd.
/// If it is found the entry type of the iterator will always be
/// kRun so the next thing to be processed is the run containing
/// the desired lumi or event or if looking for a run, the run itself.
/// If the lumi and event arguments are 0 (invalid), then it will
/// find a run. If only the event argument is 0 (invalid), then
/// it will find a lumi. If will look for an event if all three
/// arguments are nonzero or if only the lumi argument is 0 (invalid).
/// Note that it will find the first match only so if there is more
/// than one match then the others cannot be found with this method.
/// The order of the search is by processHistoryID index, then run
/// number, then lumi number, then event entry.
/// If searching for a lumi the iterator will advance directly
/// to the desired lumi after the run even if it is not the
/// first lumi in the run. If searching for an event, the
/// iterator will advance to the lumi containing the run and
/// then the requested event after run even if there are other
/// lumis earlier in that run and other events earlier in that lumi.
IndexIntoFileItr findPosition(RunNumber_t run, LuminosityBlockNumber_t lumi = 0U, EventNumber_t event = 0U) const;
IndexIntoFileItr findPosition(SortOrder sortOrder,
RunNumber_t run,
LuminosityBlockNumber_t lumi = 0U,
EventNumber_t event = 0U) const;
/// Same as findPosition,except the entry type of the returned iterator will be kEvent or kEnd and the event argument must be nonzero.
/// This means the next thing to be processed will be the event if it is found.
IndexIntoFileItr findEventPosition(RunNumber_t run, LuminosityBlockNumber_t lumi, EventNumber_t event) const;
/// Same as findPosition,except the entry type of the returned iterator will be kLumi or kEnd and the lumi argument must be nonzero.
/// This means the next thing to be processed will be the lumi if it is found.
IndexIntoFileItr findLumiPosition(RunNumber_t run, LuminosityBlockNumber_t lumi) const;
/// Same as findPosition.
IndexIntoFileItr findRunPosition(RunNumber_t run) const;
bool containsItem(RunNumber_t run, LuminosityBlockNumber_t lumi, EventNumber_t event) const;
bool containsEvent(RunNumber_t run, LuminosityBlockNumber_t lumi, EventNumber_t event) const;
bool containsLumi(RunNumber_t run, LuminosityBlockNumber_t lumi) const;
bool containsRun(RunNumber_t run) const;
SortedRunOrLumiItr beginRunOrLumi() const;
SortedRunOrLumiItr endRunOrLumi() const;
/// The intersection argument will be filled with an entry for each event in both IndexIntoFile objects.
/// To be added the event must have the same ProcessHistoryID index, run number, lumi number and event number.
void set_intersection(IndexIntoFile const& indexIntoFile, std::set<IndexRunLumiEventKey>& intersection) const;
/// Returns true if the IndexIntoFile contains 2 events with the same ProcessHistoryID index, run number, lumi number and event number.
bool containsDuplicateEvents() const;
//*****************************************************************************
//*****************************************************************************
class RunOrLumiEntry {
public:
RunOrLumiEntry();
RunOrLumiEntry(EntryNumber_t orderPHIDRun,
EntryNumber_t orderPHIDRunLumi,
EntryNumber_t entry,
int processHistoryIDIndex,
RunNumber_t run,
LuminosityBlockNumber_t lumi,
EntryNumber_t beginEvents,
EntryNumber_t Event);
EntryNumber_t orderPHIDRun() const { return orderPHIDRun_; }
EntryNumber_t orderPHIDRunLumi() const { return orderPHIDRunLumi_; }
EntryNumber_t entry() const { return entry_; }
int processHistoryIDIndex() const { return processHistoryIDIndex_; }
RunNumber_t run() const { return run_; }
LuminosityBlockNumber_t lumi() const { return lumi_; }
EntryNumber_t beginEvents() const { return beginEvents_; }
EntryNumber_t endEvents() const { return endEvents_; }
bool isRun() const { return lumi() == invalidLumi; }
void setOrderPHIDRun(EntryNumber_t v) { orderPHIDRun_ = v; }
void setOrderPHIDRunLumi(EntryNumber_t v) { orderPHIDRunLumi_ = v; }
void setProcessHistoryIDIndex(int v) { processHistoryIDIndex_ = v; }
bool operator<(RunOrLumiEntry const& right) const {
if (orderPHIDRun_ == right.orderPHIDRun()) {
if (orderPHIDRunLumi_ == right.orderPHIDRunLumi()) {
return entry_ < right.entry();
}
return orderPHIDRunLumi_ < right.orderPHIDRunLumi();
}
return orderPHIDRun_ < right.orderPHIDRun();
}
private:
// All Runs, Lumis, and Events associated with the same
// ProcessHistory and Run in the same input file are processed
// contiguously. This parameter establishes the default order
// of processing of these contiguous subsets of data.
EntryNumber_t orderPHIDRun_;
// All Lumis and Events associated with the same
// ProcessHistory, Run, and Lumi in the same input file are
// processed contiguously. This parameter establishes the
// default order of processing of these contiguous subsets
// of data.
EntryNumber_t orderPHIDRunLumi_; // -1 if a run
// TTree entry number of Run or Lumi
// Always will be valid except when the IndexIntoFile was
// created while processing more than 1 luminosity block
// at a time (multiple concurrent lumis). In that case
// there can be multiple contiguous event ranges associated
// with the same lumi TTree entry. Exactly one of those will
// have a valid entry_ number and the rest will be set
// to the invalid value (-1). For a particular lumi, the
// invalid ones sort before the valid ones.
EntryNumber_t entry_;
int processHistoryIDIndex_;
RunNumber_t run_;
LuminosityBlockNumber_t lumi_; // 0 indicates this is a run entry
// These are entry numbers in the Events TTree
// Each RunOrLumiEntry is associated with one contiguous range of events.
// This is disjoint from the ranges associated with all other RunOrLumiEntry's
EntryNumber_t beginEvents_; // -1 if a run or a lumi with no events
EntryNumber_t endEvents_; // -1 if a run or a lumi with no events
};
//*****************************************************************************
//*****************************************************************************
class RunOrLumiIndexes {
public:
RunOrLumiIndexes(int processHistoryIDIndex, RunNumber_t run, LuminosityBlockNumber_t lumi, int indexToGetEntry);
int processHistoryIDIndex() const { return processHistoryIDIndex_; }
RunNumber_t run() const { return run_; }
LuminosityBlockNumber_t lumi() const { return lumi_; }
int indexToGetEntry() const { return indexToGetEntry_; }
long long beginEventNumbers() const { return beginEventNumbers_; }
long long endEventNumbers() const { return endEventNumbers_; }
bool isRun() const { return lumi() == invalidLumi; }
void setBeginEventNumbers(long long v) { beginEventNumbers_ = v; }
void setEndEventNumbers(long long v) { endEventNumbers_ = v; }
bool operator<(RunOrLumiIndexes const& right) const {
if (processHistoryIDIndex_ == right.processHistoryIDIndex()) {
if (run_ == right.run()) {
return lumi_ < right.lumi();
}
return run_ < right.run();
}
return processHistoryIDIndex_ < right.processHistoryIDIndex();
}
private:
int processHistoryIDIndex_;
RunNumber_t run_;
LuminosityBlockNumber_t lumi_; // 0 indicates this is a run entry
int indexToGetEntry_;
// The next two data members are indexes into the vectors eventNumbers_ and
// eventEntries_ (which both have the same number of entries in the same order,
// the only difference being that one contains only events numbers and is
// smaller in memory).
// If there are no events, then the next two are equal (and the value is the
// index where the first event would have gone if there had been one)
// Note that there can be many RunOrLumiIndexes objects where these two values are
// the same if there are many noncontiguous ranges of events associated with the same
// PHID-Run-Lumi (this one range in eventNumbers_ corresponds to the union of
// all the noncontiguous ranges in the Events TTree).
long long beginEventNumbers_; // first event this PHID-Run-Lumi (-1 if a run or not set)
long long endEventNumbers_; // one past last event this PHID-Run-Lumi (-1 if a run or not set)
};
//*****************************************************************************
//*****************************************************************************
class EventEntry {
public:
EventEntry() : event_(invalidEvent), entry_(invalidEntry) {}
EventEntry(EventNumber_t event, EntryNumber_t entry) : event_(event), entry_(entry) {}
EventNumber_t event() const { return event_; }
EntryNumber_t entry() const { return entry_; }
bool operator<(EventEntry const& right) const { return event() < right.event(); }
bool operator==(EventEntry const& right) const { return event() == right.event(); }
private:
EventNumber_t event_;
EntryNumber_t entry_;
};
//*****************************************************************************
//*****************************************************************************
class SortedRunOrLumiItr {
public:
SortedRunOrLumiItr(IndexIntoFile const* indexIntoFile, unsigned runOrLumi);
IndexIntoFile const* indexIntoFile() const { return indexIntoFile_; }
unsigned runOrLumi() const { return runOrLumi_; }
bool operator==(SortedRunOrLumiItr const& right) const;
bool operator!=(SortedRunOrLumiItr const& right) const;
SortedRunOrLumiItr& operator++();
bool isRun();
void getRange(long long& beginEventNumbers,
long long& endEventNumbers,
EntryNumber_t& beginEventEntry,
EntryNumber_t& endEventEntry);
RunOrLumiIndexes const& runOrLumiIndexes() const;
private:
IndexIntoFile const* indexIntoFile_;
// This is an index into runOrLumiIndexes_
// which gives the current position of the iteration
unsigned runOrLumi_;
};
//*****************************************************************************
//*****************************************************************************
class IndexIntoFileItrImpl {
public:
IndexIntoFileItrImpl(IndexIntoFile const* indexIntoFile,
EntryType entryType,
int indexToRun,
int indexToLumi,
int indexToEventRange,
long long indexToEvent,
long long nEvents);
virtual ~IndexIntoFileItrImpl();
virtual IndexIntoFileItrImpl* clone() const = 0;
EntryType getEntryType() const { return type_; }
void next();
void skipEventForward(int& phIndexOfSkippedEvent,
RunNumber_t& runOfSkippedEvent,
LuminosityBlockNumber_t& lumiOfSkippedEvent,
EntryNumber_t& skippedEventEntry);
void skipEventBackward(int& phIndexOfEvent,
RunNumber_t& runOfEvent,
LuminosityBlockNumber_t& lumiOfEvent,
EntryNumber_t& eventEntry);
virtual int processHistoryIDIndex() const = 0;
virtual RunNumber_t run() const = 0;
virtual LuminosityBlockNumber_t lumi() const = 0;
virtual EntryNumber_t entry() const = 0;
virtual LuminosityBlockNumber_t peekAheadAtLumi() const = 0;
virtual EntryNumber_t peekAheadAtEventEntry() const = 0;
EntryNumber_t firstEventEntryThisRun();
EntryNumber_t firstEventEntryThisLumi();
virtual bool skipLumiInRun() = 0;
virtual bool lumiEntryValid(int index) const = 0;
void advanceToNextRun();
void advanceToNextLumiOrRun();
bool skipToNextEventInLumi();
void initializeRun();
void initializeLumi();
bool operator==(IndexIntoFileItrImpl const& right) const;
IndexIntoFile const* indexIntoFile() const { return indexIntoFile_; }
int size() const { return size_; }
EntryType type() const { return type_; }
int indexToRun() const { return indexToRun_; }
int indexToLumi() const { return indexToLumi_; }
int indexToEventRange() const { return indexToEventRange_; }
long long indexToEvent() const { return indexToEvent_; }
long long nEvents() const { return nEvents_; }
void copyPosition(IndexIntoFileItrImpl const& position);
void getLumisInRun(std::vector<LuminosityBlockNumber_t>& lumis) const;
protected:
void setInvalid();
void setIndexToLumi(int value) { indexToLumi_ = value; }
void setIndexToEventRange(int value) { indexToEventRange_ = value; }
void setIndexToEvent(long long value) { indexToEvent_ = value; }
void setNEvents(long long value) { nEvents_ = value; }
private:
virtual void initializeLumi_() = 0;
virtual bool nextEventRange() = 0;
virtual bool previousEventRange() = 0;
bool previousLumiWithEvents();
virtual bool setToLastEventInRange(int index) = 0;
virtual EntryType getRunOrLumiEntryType(int index) const = 0;
virtual bool isSameLumi(int index1, int index2) const = 0;
virtual bool isSameRun(int index1, int index2) const = 0;
virtual LuminosityBlockNumber_t lumi(int index) const = 0;
IndexIntoFile const* indexIntoFile_;
int size_;
EntryType type_;
int indexToRun_;
int indexToLumi_;
int indexToEventRange_;
long long indexToEvent_;
long long nEvents_;
};
//*****************************************************************************
//*****************************************************************************
class IndexIntoFileItrNoSort : public IndexIntoFileItrImpl {
public:
IndexIntoFileItrNoSort(IndexIntoFile const* indexIntoFile,
EntryType entryType,
int indexToRun,
int indexToLumi,
int indexToEventRange,
long long indexToEvent,
long long nEvents);
IndexIntoFileItrImpl* clone() const override;
int processHistoryIDIndex() const override;
RunNumber_t run() const override;
LuminosityBlockNumber_t lumi() const override;
EntryNumber_t entry() const override;
LuminosityBlockNumber_t peekAheadAtLumi() const override;
EntryNumber_t peekAheadAtEventEntry() const override;
bool skipLumiInRun() override;
bool lumiEntryValid(int index) const override;
private:
void initializeLumi_() override;
bool nextEventRange() override;
bool previousEventRange() override;
bool setToLastEventInRange(int index) override;
EntryType getRunOrLumiEntryType(int index) const override;
bool isSameLumi(int index1, int index2) const override;
bool isSameRun(int index1, int index2) const override;
LuminosityBlockNumber_t lumi(int index) const override;
};
//*****************************************************************************
//*****************************************************************************
class IndexIntoFileItrSorted : public IndexIntoFileItrImpl {
public:
IndexIntoFileItrSorted(IndexIntoFile const* indexIntoFile,
EntryType entryType,
int indexToRun,
int indexToLumi,
int indexToEventRange,
long long indexToEvent,
long long nEvents);
IndexIntoFileItrImpl* clone() const override;
int processHistoryIDIndex() const override;
RunNumber_t run() const override;
LuminosityBlockNumber_t lumi() const override;
EntryNumber_t entry() const override;
LuminosityBlockNumber_t peekAheadAtLumi() const override;
EntryNumber_t peekAheadAtEventEntry() const override;
bool skipLumiInRun() override;
bool lumiEntryValid(int index) const override;
private:
void initializeLumi_() override;
bool nextEventRange() override;
bool previousEventRange() override;
bool setToLastEventInRange(int index) override;
EntryType getRunOrLumiEntryType(int index) const override;
bool isSameLumi(int index1, int index2) const override;
bool isSameRun(int index1, int index2) const override;
LuminosityBlockNumber_t lumi(int index) const override;
};
//*****************************************************************************
//*****************************************************************************
class IndexIntoFileItr {
public:
/// This itended to be used only internally and by IndexIntoFile.
/// One thing that is needed for the future, is to add some checks
/// to make sure the iterator is in a valid state inside this constructor.
/// It is currently possible to create an iterator with this constructor
/// in an invalid state and the behavior would then be undefined. In the
/// existing internal usages the iterator will always be valid. (for
/// example IndexIntoFile::begin and IndexIntoFile::findPosition will
/// always return a valid iterator).
IndexIntoFileItr(IndexIntoFile const* indexIntoFile,
SortOrder sortOrder,
EntryType entryType,
int indexToRun,
int indexToLumi,
int indexToEventRange,
long long indexToEvent,
long long nEvents);
EntryType getEntryType() const { return impl_->getEntryType(); }
int processHistoryIDIndex() const { return impl_->processHistoryIDIndex(); }
RunNumber_t run() const { return impl_->run(); }
LuminosityBlockNumber_t lumi() const { return impl_->lumi(); }
EntryNumber_t entry() const { return impl_->entry(); }
/// Same as lumi() except when the the current type is kRun.
/// In that case instead of always returning 0 (invalid), it will return the lumi that will be processed next
LuminosityBlockNumber_t peekAheadAtLumi() const { return impl_->peekAheadAtLumi(); }
/// Same as entry() except when the the current type is kRun or kLumi.
/// In that case instead of always returning -1 (invalid), it will return
/// the event entry that will be processed next and which is in the current
/// run and lumi. If there is none it still returns -1 (invalid).
EntryNumber_t peekAheadAtEventEntry() const { return impl_->peekAheadAtEventEntry(); }
/// Returns the TTree entry of the first event which would be processed in the
/// current run/lumi if all the events in the run/lumi were processed in the
/// current processing order. If there are none it returns -1 (invalid).
EntryNumber_t firstEventEntryThisRun() { return impl_->firstEventEntryThisRun(); }
EntryNumber_t firstEventEntryThisLumi() { return impl_->firstEventEntryThisLumi(); }
// This is intentionally not implemented.
// It would be difficult to implement for the no sort mode,
// either slow or using extra memory.
// It would be easy to implement for the sorted iteration,
// but I did not implement it so both cases would offer a
// consistent interface.
// It looks like in all cases where this would be needed
// it would not be difficult to get the event number
// directly from the event auxiliary.
// We may need to revisit this decision in the future.
// EventNumber_t event() const;
/// Move to next event to be processed
IndexIntoFileItr& operator++() {
impl_->next();
return *this;
}
/// Move to whatever is immediately after the current event
/// or after the next event if there is not a current event,
/// but do not modify the type or run/lumi
/// indexes unless it is necessary because there
/// are no more events in the current run or lumi.
void skipEventForward(int& phIndexOfSkippedEvent,
RunNumber_t& runOfSkippedEvent,
LuminosityBlockNumber_t& lumiOfSkippedEvent,
EntryNumber_t& skippedEventEntry) {
impl_->skipEventForward(phIndexOfSkippedEvent, runOfSkippedEvent, lumiOfSkippedEvent, skippedEventEntry);
}
/// Move so that the event immediately preceding the
/// the current position is the next event processed.
/// If the type is kEvent or kLumi, then change the type to kRun
/// if and only if the preceding event is in a different
/// run. If the type is kEvent, change the type to kLumi if
/// the lumi is different but the run is the same. Otherwise
/// leave the type unchanged.
void skipEventBackward(int& phIndexOfEvent,
RunNumber_t& runOfEvent,
LuminosityBlockNumber_t& lumiOfEvent,
EntryNumber_t& eventEntry) {
impl_->skipEventBackward(phIndexOfEvent, runOfEvent, lumiOfEvent, eventEntry);
}
/// Move to the next lumi in the current run.
/// Returns false if there is not one.
bool skipLumiInRun() { return impl_->skipLumiInRun(); }
/// Move to the next event in the current lumi.
/// Returns false if there is not one.
bool skipToNextEventInLumi() { return impl_->skipToNextEventInLumi(); }
void advanceToNextRun() { impl_->advanceToNextRun(); }
void advanceToNextLumiOrRun() { impl_->advanceToNextLumiOrRun(); }
void advanceToEvent();
void advanceToLumi();
bool operator==(IndexIntoFileItr const& right) const { return *impl_ == *right.impl_; }
bool operator!=(IndexIntoFileItr const& right) const { return !(*this == right); }
/// Should only be used internally and for tests
void initializeRun() { impl_->initializeRun(); }
/// Should only be used internally and for tests
void initializeLumi() { impl_->initializeLumi(); }
/// Copy the position without modifying the pointer to the IndexIntoFile or size
void copyPosition(IndexIntoFileItr const& position);
void getLumisInRun(std::vector<LuminosityBlockNumber_t>& lumis) const { impl_->getLumisInRun(lumis); }
private:
//for testing
friend class ::TestIndexIntoFile;
friend class ::TestIndexIntoFile3;
friend class ::TestIndexIntoFile4;
friend class ::TestIndexIntoFile5;
// The rest of these are intended to be used only by code which tests
// this class.
IndexIntoFile const* indexIntoFile() const { return impl_->indexIntoFile(); }
int size() const { return impl_->size(); }
EntryType type() const { return impl_->type(); }
int indexToRun() const { return impl_->indexToRun(); }
int indexToLumi() const { return impl_->indexToLumi(); }
int indexToEventRange() const { return impl_->indexToEventRange(); }
long long indexToEvent() const { return impl_->indexToEvent(); }
long long nEvents() const { return impl_->nEvents(); }
value_ptr<IndexIntoFileItrImpl> impl_;
};
//*****************************************************************************
//*****************************************************************************
class IndexRunKey {
public:
IndexRunKey(int index, RunNumber_t run) : processHistoryIDIndex_(index), run_(run) {}
int processHistoryIDIndex() const { return processHistoryIDIndex_; }
RunNumber_t run() const { return run_; }
bool operator<(IndexRunKey const& right) const {
if (processHistoryIDIndex_ == right.processHistoryIDIndex()) {
return run_ < right.run();
}
return processHistoryIDIndex_ < right.processHistoryIDIndex();
}
private:
int processHistoryIDIndex_;
RunNumber_t run_;
};
//*****************************************************************************
//*****************************************************************************
class IndexRunLumiKey {
public:
IndexRunLumiKey(int index, RunNumber_t run, LuminosityBlockNumber_t lumi)
: processHistoryIDIndex_(index), run_(run), lumi_(lumi) {}
int processHistoryIDIndex() const { return processHistoryIDIndex_; }
RunNumber_t run() const { return run_; }
LuminosityBlockNumber_t lumi() const { return lumi_; }
bool operator<(IndexRunLumiKey const& right) const {
if (processHistoryIDIndex_ == right.processHistoryIDIndex()) {
if (run_ == right.run()) {
return lumi_ < right.lumi();
}
return run_ < right.run();
}
return processHistoryIDIndex_ < right.processHistoryIDIndex();
}
private:
int processHistoryIDIndex_;
RunNumber_t run_;
LuminosityBlockNumber_t lumi_;
};
//*****************************************************************************
//*****************************************************************************
class IndexRunLumiEventKey {
public:
IndexRunLumiEventKey(int index, RunNumber_t run, LuminosityBlockNumber_t lumi, EventNumber_t event)
: processHistoryIDIndex_(index), run_(run), lumi_(lumi), event_(event) {}
int processHistoryIDIndex() const { return processHistoryIDIndex_; }
RunNumber_t run() const { return run_; }
LuminosityBlockNumber_t lumi() const { return lumi_; }
EventNumber_t event() const { return event_; }
bool operator<(IndexRunLumiEventKey const& right) const {
if (processHistoryIDIndex_ == right.processHistoryIDIndex()) {
if (run_ == right.run()) {
if (lumi_ == right.lumi()) {
return event_ < right.event();
}
return lumi_ < right.lumi();
}
return run_ < right.run();
}
return processHistoryIDIndex_ < right.processHistoryIDIndex();
}
private:
int processHistoryIDIndex_;
RunNumber_t run_;
LuminosityBlockNumber_t lumi_;
EventNumber_t event_;
};
//*****************************************************************************
//*****************************************************************************
class EventFinder {
public:
virtual ~EventFinder() {}
virtual EventNumber_t getEventNumberOfEntry(EntryNumber_t entry) const = 0;
};
//*****************************************************************************
//*****************************************************************************
// The next two functions are used by the output module to fill the
// persistent data members.
/// Used by RootOutputModule to fill the persistent data.
/// This will not work properly if entries are not added in the same order as in RootOutputModule
void addEntry(ProcessHistoryID const& processHistoryID,
RunNumber_t run,
LuminosityBlockNumber_t lumi,
EventNumber_t event,
EntryNumber_t entry);
/// Used by RootOutputModule after all entries have been added.
/// This only works after the correct sequence of addEntry calls,
/// because it makes some corrections before sorting. A std::stable_sort
/// works in cases where those corrections are not needed.
void sortVector_Run_Or_Lumi_Entries();
//used internally by addEntry
void addLumi(int index, RunNumber_t run, LuminosityBlockNumber_t lumi, EntryNumber_t entry);
//*****************************************************************************
//*****************************************************************************
// The next group of functions is used by the PoolSource (or other
// input related code) to fill the IndexIntoFile.
/// Used by PoolSource to force the ProcessHistoryID indexes to be consistent across all input files.
/// Currently this consistency is important when duplicate checking across all input files.
/// It may be important for other reasons in the future.
/// It is important this be called immediately after reading in the object from the input file,
/// before filling the transient data members or using the indexes in any way.
void fixIndexes(std::vector<ProcessHistoryID>& processHistoryIDs);
/// The number of events needs to be set before filling the transient event vectors.
/// It is used to resize them.
void setNumberOfEvents(EntryNumber_t nevents) { transient_.numberOfEvents_ = nevents; }
/// Calling this enables the functions that fill the event vectors to get the event numbers.
/// It needs to be called before filling the events vectors
/// This implies the client needs to define a class that inherits from
/// EventFinder and then create one. This function is used to pass in a
/// pointer to its base class.
void setEventFinder(std::shared_ptr<EventFinder> ptr) { transient_.eventFinder_ = ptr; }
/// Fills a vector of event numbers.
/// Not filling it reduces the memory used by IndexIntoFile.
/// As long as the event finder is still pointing at an open file
/// this will automatically be called on demand (when the event
/// numbers are are needed). In cases, where the input file may be
/// closed when the need arises, the client code must call this
/// explicitly and fill the vector before the file is closed.
/// In PoolSource, this is necessary when duplicate checking across
/// all files and when doing lookups to see if an event is in a
/// previously opened file. Either this vector or the one that
/// also contains event entry numbers can be used when looking for
/// duplicate events within the same file or looking up events in
/// in the current file without reading them.
void fillEventNumbers() const;
/// Fills a vector of objects that contain an event number and
/// the corresponding TTree entry number for the event.
/// Not filling it reduces the memory used by IndexIntoFile.
/// As long as the event finder is still pointing at an open file
/// this will automatically be called on demand (when the event
/// numbers and entries are are needed). It makes sense for the
/// client to fill this explicitly in advance if it is known that
/// it will be needed, because in some cases this will prevent the
/// event numbers vector from being unnecessarily filled (wasting
/// memory). This vector will be needed when iterating over events
/// in numerical order or looking up specific events. The entry
/// numbers are needed if the events are actually read from the
/// input file.
void fillEventEntries() const;
/// If needEventNumbers is true then this function does the same thing
/// as fillEventNumbers. If NeedEventEntries is true, then this function
/// does the same thing as fillEventEntries. If both are true, it fills
/// both within the same loop and it uses less CPU than calling those
/// two functions separately.
void fillEventNumbersOrEntries(bool needEventNumbers, bool needEventEntries) const;
/// If something external to IndexIntoFile is reading through the EventAuxiliary
/// then it could use this to fill in the event numbers so that IndexIntoFile
/// will not read through it again.
std::vector<EventNumber_t>& unsortedEventNumbers() { return transient_.unsortedEventNumbers_; }
std::vector<EventNumber_t> const& unsortedEventNumbers() const { return transient_.unsortedEventNumbers_; }
/// Clear some vectors and eventFinder when an input file is closed.
/// This reduces the memory used by IndexIntoFile
void inputFileClosed();