Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for recursive graph bisection. #12489

Merged
merged 21 commits into from
Sep 14, 2023

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 4, 2023

Recursive graph bisection is an extremely effective algorithm to reorder doc IDs in a way that improves both storage and query efficiency by clustering similar documents together. It usually performs better than other techniques that try to achieve a similar goal such as sorting the index in natural order (e.g. by URL) or by a min-hash, though it comes at a higher index-time cost.

The original paper is good but I found this follow-up reproducibility study to describe the algorithm in more practical ways.

Recursive graph bisection is an extremely effective algorithm to reorder doc
IDs in a way that improves both storage and query efficiency by clustering
similar documents together. It usually performs better than other techniques
that try to achieve a similar goal such as sorting the index in natural order
(e.g. by URL) or by a min-hash, though it comes at a higher index-time cost.

The [original paper](https://arxiv.org/pdf/1602.08820.pdf) is good but I found
this [follow-up paper](http://engineering.nyu.edu/~suel/papers/bp-ecir19.pdf)
to describe the algorithm in more practical ways.
@jpountz
Copy link
Contributor Author

jpountz commented Aug 4, 2023

I'm opening this draft in case someone wants to take a look. I only checked the output on very small indices for now. I also ran it on larger indexes, such as a 1.8M-docs wikimedium10m segment to see how long it takes (4 minutes on my 24-cores machine) but I haven't checked if the result made sense yet. It's probably full of bugs!

@jpountz
Copy link
Contributor Author

jpountz commented Aug 20, 2023

I think it's starting to look better now. I worked on some inefficiencies and applied some of the optimizations suggested by Mackenzie et al. in "Tradeoff Options for Bipartite Graph Partitioning":

  • Use a simplified estimator that only requires two log computations.
  • Simulated annealing to stop iterating when the gain would be small by using the iteration number as a threshold.

With the suggested defaults of minDocFreq=4,096 and minPartitionSize=32, I'm getting the following performance numbers on wikimedium10m (10M docs):

  • indexing (24 threads): 6.5 minutes
  • force-merging (single thread): 4.2 minutes
  • reordering doc IDs, including building a forward index by uninverting the inverted index (24 threads): 5.6 minutes
  • serializing the reordered view via addIndexes (single thread): 7.4 minutes

Then comparing query performance, I'm getting interesting results. I had to disable verification of scores and counts because of the reordering. A quick manual check suggests that results are valid. I can guess why some queries like conjunctions are faster, but I'm not sure for OrHighLow or HighPhrase. Regarding sorting tasks, their performance is highly dependent on the index order, so I'm considering them as noise.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                       OrHighLow      583.26      (5.9%)      400.94      (5.9%)  -31.3% ( -40% -  -20%) 0.000
                      HighPhrase       28.64      (7.6%)       20.78      (5.0%)  -27.4% ( -37% -  -16%) 0.000
                      TermDTSort      113.57      (2.5%)       90.23      (1.2%)  -20.5% ( -23% -  -17%) 0.000
               HighTermTitleSort       72.11      (1.5%)       63.72      (1.4%)  -11.6% ( -14% -   -8%) 0.000
                        PKLookup      290.91      (3.5%)      264.17      (3.0%)   -9.2% ( -15% -   -2%) 0.000
                        HighTerm      634.93      (6.4%)      584.08      (5.9%)   -8.0% ( -19% -    4%) 0.000
                          IntNRQ       54.77     (16.8%)       50.44     (13.2%)   -7.9% ( -32% -   26%) 0.098
               HighTermMonthSort     7652.28      (2.8%)     7294.07      (3.2%)   -4.7% ( -10% -    1%) 0.000
                    OrHighNotLow      600.38      (5.5%)      600.98      (5.3%)    0.1% ( -10% -   11%) 0.953
                         Respell      272.80      (2.2%)      274.82      (2.3%)    0.7% (  -3% -    5%) 0.301
                        Wildcard      157.34      (4.4%)      160.16      (3.9%)    1.8% (  -6% -   10%) 0.172
                HighSloppyPhrase       19.99      (5.5%)       20.74      (4.3%)    3.8% (  -5% -   14%) 0.016
                         Prefix3      840.82      (4.7%)      882.90      (5.8%)    5.0% (  -5% -   16%) 0.002
                          Fuzzy1      361.19      (2.8%)      383.68      (3.7%)    6.2% (   0% -   13%) 0.000
            HighIntervalsOrdered        8.51      (4.9%)        9.10      (4.3%)    6.9% (  -2% -   16%) 0.000
                   OrHighNotHigh      408.11      (4.8%)      440.27      (5.1%)    7.9% (  -1% -   18%) 0.000
                    HighSpanNear       23.57      (3.2%)       25.52      (3.7%)    8.3% (   1% -   15%) 0.000
                   OrNotHighHigh      367.13      (4.2%)      397.89      (4.2%)    8.4% (   0% -   17%) 0.000
                          Fuzzy2      188.81      (2.1%)      204.96      (2.6%)    8.6% (   3% -   13%) 0.000
             MedIntervalsOrdered       28.61      (4.8%)       31.32      (4.3%)    9.5% (   0% -   19%) 0.000
                     LowSpanNear       51.15      (3.3%)       56.30      (2.6%)   10.1% (   4% -   16%) 0.000
                     MedSpanNear       46.95      (3.1%)       51.69      (2.9%)   10.1% (   3% -   16%) 0.000
                 MedSloppyPhrase       59.70      (5.0%)       66.11      (4.1%)   10.7% (   1% -   20%) 0.000
                    OrHighNotMed      514.44      (5.3%)      577.94      (5.7%)   12.3% (   1% -   24%) 0.000
             LowIntervalsOrdered       78.83      (3.8%)       88.74      (3.7%)   12.6% (   4% -   20%) 0.000
                 LowSloppyPhrase       54.64      (4.4%)       62.23      (3.6%)   13.9% (   5% -   22%) 0.000
                      OrHighHigh       60.79      (8.6%)       69.68      (9.4%)   14.6% (  -3% -   35%) 0.000
                         MedTerm      864.76      (5.3%)     1024.07      (7.0%)   18.4% (   5% -   32%) 0.000
                       LowPhrase       80.72      (4.5%)       97.66      (5.1%)   21.0% (  10% -   32%) 0.000
                       MedPhrase       49.16      (4.5%)       60.54      (5.1%)   23.1% (  12% -   34%) 0.000
                      AndHighMed      183.79      (4.8%)      238.88      (6.2%)   30.0% (  18% -   42%) 0.000
                     AndHighHigh       88.11      (5.8%)      114.97      (7.0%)   30.5% (  16% -   46%) 0.000
                    OrNotHighMed      462.38      (3.5%)      606.15      (4.7%)   31.1% (  22% -   40%) 0.000
            HighTermTitleBDVSort       27.04      (1.7%)       35.47      (5.8%)   31.2% (  23% -   39%) 0.000
                    OrNotHighLow     1666.47      (3.9%)     2265.52      (3.6%)   35.9% (  27% -   45%) 0.000
                       OrHighMed      170.06      (4.6%)      242.18      (8.7%)   42.4% (  27% -   58%) 0.000
                         LowTerm     1032.01      (4.4%)     1520.01      (7.2%)   47.3% (  34% -   61%) 0.000
                      AndHighLow     1577.56      (2.9%)     2337.64      (6.8%)   48.2% (  37% -   59%) 0.000
           HighTermDayOfYearSort      199.31      (2.3%)      357.90      (2.9%)   79.6% (  72% -   86%) 0.000

@jpountz
Copy link
Contributor Author

jpountz commented Aug 20, 2023

I ran the benchmark multiple times to see if the slowdown on OrHighLow reproduced, and it does. I took the first OrHighLow query in the tasks file: OrHighLow: 2005 valois # freq=835460 freq=2277, and it reproduces the slowdown too. I printed doc freqs of both 2005 and valois for each 1% of the doc ID space (so 100k docs since the index has 10M docs), and it gives the following distributions:

Original index:
2005: [6363, 6296, 6187, 6448, 5812, 5304, 5394, 5340, 4968, 4322, 3041, 2989, 2367, 3991, 5087, 5401, 5561, 5328, 5482, 5235, 5287, 5513, 5817, 5940, 5707, 6057, 6642, 6252, 5963, 5698, 5652, 5630, 5675, 5736, 6189, 5679, 5935, 5868, 5965, 6014, 5698, 5746, 6173, 5843, 6035, 6097, 6004, 6341, 7390, 9190, 10011, 10986, 12463, 12324, 12079, 12109, 12274, 12338, 12676, 13237, 13494, 13261, 11942, 12720, 13443, 13589, 13497, 14363, 14285, 14433, 15217, 14572, 14124, 15481, 14246, 14612, 14002, 16313, 13869, 15555, 17412, 14246, 11731, 6999, 6612, 5965, 6392, 6200, 6142, 6222, 6301, 6340, 6415, 6369, 6262, 6202, 5945, 5807, 5861, 5870]
valois: [15, 22, 24, 31, 45, 62, 53, 96, 89, 87, 20, 14, 3, 16, 32, 27, 35, 28, 27, 18, 25, 37, 19, 19, 42, 26, 29, 14, 11, 10, 15, 10, 24, 54, 34, 43, 12, 18, 18, 27, 16, 68, 8, 34, 56, 43, 38, 20, 25, 15, 15, 21, 17, 23, 25, 43, 19, 17, 14, 11, 5, 4, 7, 17, 19, 23, 15, 10, 9, 11, 26, 25, 15, 20, 12, 22, 18, 19, 8, 23, 10, 18, 20, 6, 13, 15, 9, 9, 5, 10, 5, 15, 9, 12, 8, 5, 6, 10, 10, 15]

Reordered index:
2005: [1270, 597, 4767, 4579, 5490, 5282, 6493, 8367, 6432, 8939, 10370, 5048, 5958, 2415, 3788, 3184, 3256, 3643, 4017, 5183, 5249, 5104, 4424, 4997, 4750, 4276, 4960, 3428, 6715, 10277, 3500, 9427, 7701, 11009, 12684, 11684, 10947, 7721, 1463, 3840, 2213, 5607, 5538, 4133, 4750, 3557, 1977, 9233, 11173, 12639, 12849, 11259, 9666, 13103, 13936, 13909, 2192, 331, 1741, 2321, 3081, 4867, 4991, 3727, 5269, 5890, 1854, 4784, 8763, 7446, 2818, 4713, 13496, 17533, 15171, 5990, 8934, 10878, 14437, 12181, 12459, 7063, 5931, 5114, 5762, 11964, 10558, 8220, 2396, 353, 1003, 4298, 1751, 4883, 26546, 49839, 37667, 41060, 51507, 14902]
valois: [0, 0, 3, 1, 3, 6, 4, 2, 9, 2, 0, 8, 1, 0, 0, 2, 1, 0, 0, 0, 0, 1, 1, 2, 1, 63, 335, 72, 2, 7, 17, 17, 1, 12, 5, 6, 1, 19, 27, 2, 10, 3, 2, 42, 30, 84, 64, 6, 4, 1, 14, 28, 7, 28, 5, 8, 8, 14, 9, 5, 15, 3, 48, 400, 162, 47, 86, 93, 5, 14, 22, 2, 3, 1, 0, 6, 4, 1, 5, 4, 1, 4, 135, 4, 107, 11, 12, 4, 15, 4, 14, 22, 3, 1, 8, 1, 0, 1, 4, 0]

First, the reordering works pretty well, as there are 11 contiguous ranges of 100k doc IDs that don't have a single occurrence of valois in the reordered index, while there were none in the original index. And this helps some queries, e.g. counting documents that contain both 2005 and valois runs more than 2x faster with the reordered index as Lucene needs to decompress fewer blocks.

But I suspect that it is also the source of the slowdown with the disjunction: valois not only has a lower term freq, it also has a higher score contribution, so dynamic pruning starts working better once it has seen k(=100) hits for the higher scoring clause. This is when the minimum competitive score gets close to the actual score of the k-th top hit. In the original index, this happens after evaluating only 5% of the doc ID space given how matches are uniformly spread across the doc ID space. In the reordered index, this happens after evaluating 26% of the doc ID space. So it takes much longer for dynamic pruning to start helping significantly. I suspect we have room for improvement to better deal with this sort of scenario.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 29, 2023

So it takes much longer for dynamic pruning to start helping significantly. I suspect we have room for improvement to better deal with this sort of scenario.

I opened #12526 for a potential solution to this problem.

@mikemccand
Copy link
Member

@jpountz did you measure any change to index size with the reordered docids?

@jpountz
Copy link
Contributor Author

jpountz commented Sep 8, 2023

I did. My wikimedium file is sorted by title, which already gives some compression compared to random ordering. Disappointedly, recursive graph bisection only improved compression of postings (doc) by 1.5%. It significantly hurts stored fields though, I suspect it's because the title field is stored, and stored fields take advantage of splits of the same article being next to one another.

File before (MB) after (MB)
terms (tim) 307 315
postings (doc) 1706 1685
positions (pos) 2563 2540
points (kdd) 122 126
doc values (dvd) 686 693
stored fields (fdt) 255 364
norms (nvd) 20 20
total 5664 5747

It gave me doubts whether the algorithm was correctly implemented in the beginning, but the query speedups and postings distributions suggest it is not completely wrong.

I should run on wikibigall too.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 10, 2023

Wikibigall. Less space spent on doc valuse this time since I did not enable indexing of facets. There is a more significant size reduction of postings this time (-10.5%). This is not misaligned with the reproducibility paper which observered size reductions of 18% with partitioned Elias-Fano and 5% with SVByte on the Wikipedia dataset. I would expect PFor to be somewhere in between as it's better able to take advantage of small gaps between docs than SVByte, but less than partioned Elias-Fano.

File before (MB) after (MB)
terms (tim) 767 766
postings (doc) 2779 2489
positions (pos) 11356 10569
points (kdd) 100 99
doc values (dvd) 456 461
stored fields (fdt) 249 257
norms (nvd) 13 13
total 15734 14669

Benchmarks still show slowdowns on phrase queries and speedups on conjunctions, though it's less spectacular than on wikimedium10m.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         MedTerm      652.41      (7.5%)      493.97      (2.6%)  -24.3% ( -31% -  -15%) 0.000
                      HighPhrase       30.86      (3.5%)       23.85      (2.6%)  -22.7% ( -27% -  -17%) 0.000
                       LowPhrase       51.09      (3.1%)       42.38      (2.2%)  -17.1% ( -21% -  -12%) 0.000
                         LowTerm     1057.76      (5.4%)      881.22      (2.5%)  -16.7% ( -23% -   -9%) 0.000
                       MedPhrase       82.18      (3.0%)       71.88      (1.7%)  -12.5% ( -16% -   -8%) 0.000
               HighTermMonthSort     6482.52      (4.5%)     5739.50      (3.5%)  -11.5% ( -18% -   -3%) 0.000
                        PKLookup      293.95      (3.2%)      276.15      (3.7%)   -6.1% ( -12% -    0%) 0.000
                 MedSloppyPhrase        8.68      (2.7%)        8.20      (2.9%)   -5.5% ( -10% -    0%) 0.000
                       OrHighLow      578.06      (4.4%)      550.49      (4.0%)   -4.8% ( -12% -    3%) 0.016
                HighSloppyPhrase        7.43      (2.2%)        7.10      (4.0%)   -4.4% ( -10% -    1%) 0.003
                          Fuzzy1      244.70      (2.9%)      238.49      (3.3%)   -2.5% (  -8% -    3%) 0.080
                      OrHighHigh       39.76      (9.5%)       39.21      (6.1%)   -1.4% ( -15% -   15%) 0.717
                        HighTerm      370.57      (8.5%)      367.09      (4.4%)   -0.9% ( -12% -   13%) 0.768
                 LowSloppyPhrase       13.68      (2.3%)       13.71      (3.3%)    0.2% (  -5% -    5%) 0.868
                         Respell      204.23      (1.8%)      204.98      (2.0%)    0.4% (  -3% -    4%) 0.679
                         Prefix3      225.23      (5.1%)      226.74      (5.5%)    0.7% (  -9% -   11%) 0.786
                        Wildcard      170.34      (4.0%)      171.63      (3.4%)    0.8% (  -6% -    8%) 0.665
                          IntNRQ       92.30     (11.9%)       95.15     (10.2%)    3.1% ( -17% -   28%) 0.555
                     MedSpanNear        5.79      (6.8%)        5.99      (9.3%)    3.4% ( -11% -   20%) 0.378
                       OrHighMed      104.41      (7.3%)      107.99      (5.3%)    3.4% (  -8% -   17%) 0.253
                    HighSpanNear        2.47      (4.2%)        2.56      (4.1%)    3.7% (  -4% -   12%) 0.059
                          Fuzzy2      139.96      (2.8%)      146.77      (2.6%)    4.9% (   0% -   10%) 0.000
                     LowSpanNear       42.96      (3.6%)       45.21      (2.5%)    5.2% (   0% -   11%) 0.000
                     AndHighHigh       33.24      (6.2%)       36.20      (4.3%)    8.9% (  -1% -   20%) 0.000
                      AndHighMed      131.84      (5.2%)      144.31      (3.2%)    9.5% (   0% -   18%) 0.000
           HighTermDayOfYearSort      186.67      (2.9%)      208.78      (3.2%)   11.8% (   5% -   18%) 0.000
                      AndHighLow      590.69      (3.2%)      677.22      (2.2%)   14.6% (   9% -   20%) 0.000

@jpountz jpountz marked this pull request as ready for review September 10, 2023 10:42
@mikemccand
Copy link
Member

Thanks @jpountz -- these are fascinating results! I wonder why stored fields index size wasn't really hurt nearly as much for wikibigall but was for wikimediumall?

It's interesting that .pos was helped some on wikibigall versus wikimediumall.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 10, 2023

I wonder why stored fields index size wasn't really hurt nearly as much for wikibigall but was for wikimediumall?

This is because wikimedium uses chunks of articles as documents, and every chunk has the title of the Wikipedia article, so there are often ten or more adjacent docs that have the same title. This is a best case for stored fields compression as only the fist title is actually stored and other occurrences of the same title are replaced with a reference to the first occurrence. With reordering, these duplicate titles are no longer in the same block, so it goes back to just deduplicating bits of title strings, instead of entire titles. wikibig doesn't have this best case scenario for stored fields compression. Ordering only helps a bit because articles are in title order, so there are more duplicate strings in a block of stored fields (shared prefixes) compared to the reordered index.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 10, 2023

Regarding positions, the reproducibility paper noted that the algorithm helped term frequencies a bit, though not as much as docs. It doesn't say anythink about positions, though I suspect that if it tends to group together docs that have the same freq for the same term, then gaps in positions also tend to be more regular.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 13, 2023

I just found a bug that in practice only made BP run one iteration per level, fixing it makes performance better (wikibigall):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ      122.77     (15.4%)      114.15      (0.7%)   -7.0% ( -20% -   10%) 0.363
                        PKLookup      294.84      (2.9%)      282.06      (2.7%)   -4.3% (  -9% -    1%) 0.030
                       OrHighLow      713.73      (3.5%)      688.95      (3.7%)   -3.5% ( -10% -    3%) 0.170
                        Wildcard       78.71      (4.2%)       78.01      (1.1%)   -0.9% (  -6% -    4%) 0.682
                         Prefix3      131.65      (9.1%)      132.63      (7.3%)    0.7% ( -14% -   18%) 0.898
                         Respell      203.56      (0.3%)      205.74      (1.1%)    1.1% (   0% -    2%) 0.051
               HighTermMonthSort     6065.88      (2.1%)     6162.98      (1.5%)    1.6% (  -1% -    5%) 0.208
                    HighSpanNear        5.21      (1.7%)        5.40      (2.6%)    3.6% (   0% -    7%) 0.021
                 MedSloppyPhrase        5.78      (3.5%)        6.15      (5.3%)    6.3% (  -2% -   15%) 0.047
                     MedSpanNear        9.40      (0.8%)       10.05      (1.1%)    6.9% (   4% -    8%) 0.000
                     LowSpanNear       13.99      (1.0%)       15.28      (1.2%)    9.2% (   6% -   11%) 0.000
                HighSloppyPhrase        1.26      (4.9%)        1.38      (8.3%)    9.9% (  -3% -   24%) 0.039
                      OrHighHigh       46.12      (8.9%)       55.13      (6.8%)   19.5% (   3% -   38%) 0.001
                          Fuzzy2      163.38      (0.8%)      199.07      (0.7%)   21.8% (  20% -   23%) 0.000
                 LowSloppyPhrase       28.75      (2.2%)       35.28      (3.1%)   22.7% (  17% -   28%) 0.000
                      HighPhrase        7.58      (2.1%)        9.35      (1.7%)   23.4% (  19% -   27%) 0.000
                       OrHighMed      146.19      (6.5%)      183.57      (5.2%)   25.6% (  12% -   39%) 0.000
           HighTermDayOfYearSort      153.45      (2.5%)      194.38      (1.9%)   26.7% (  21% -   31%) 0.000
                          Fuzzy1      259.92      (2.4%)      345.09      (2.5%)   32.8% (  27% -   38%) 0.000
                        HighTerm      478.18      (9.8%)      670.01      (9.2%)   40.1% (  19% -   65%) 0.000
                         MedTerm      577.98      (9.0%)      845.32     (10.0%)   46.3% (  25% -   71%) 0.000
                      AndHighMed      157.39      (4.5%)      243.75      (7.3%)   54.9% (  41% -   69%) 0.000
                         LowTerm     1016.15      (7.6%)     1671.11      (9.8%)   64.5% (  43% -   88%) 0.000
                      AndHighLow      746.14      (1.7%)     1227.66      (4.2%)   64.5% (  57% -   71%) 0.000
                       MedPhrase       41.72      (2.0%)       71.95      (3.4%)   72.4% (  65% -   79%) 0.000
                     AndHighHigh       31.03      (7.0%)       56.59     (13.4%)   82.4% (  57% -  110%) 0.000
                       LowPhrase       69.04      (1.5%)      126.15      (3.4%)   82.7% (  76% -   88%) 0.000

Space savings are also bigger on postings:

File before (MB) after (MB)
terms (tim) 767 763
postings (doc) 2779 2260
positions (pos) 11356 10522
points (kdd) 100 99
doc values (dvd) 456 462
stored fields (fdt) 249 226
norms (nvd) 13 13
total 15734 14360

@jpountz jpountz added this to the 9.8.0 milestone Sep 14, 2023
@jpountz jpountz merged commit 39f3777 into apache:main Sep 14, 2023
4 checks passed
@jpountz
Copy link
Contributor Author

jpountz commented Sep 14, 2023

Since it's fairly unintrusive to other functionality, I felt free to merge.

@jpountz jpountz deleted the recursive_graph_bisection branch September 14, 2023 16:21
jpountz added a commit that referenced this pull request Sep 14, 2023
Recursive graph bisection is an extremely effective algorithm to reorder doc
IDs in a way that improves both storage and query efficiency by clustering
similar documents together. It usually performs better than other techniques
that try to achieve a similar goal such as sorting the index in natural order
(e.g. by URL) or by a min-hash, though it comes at a higher index-time cost.

The [original paper](https://arxiv.org/pdf/1602.08820.pdf) is good but I found
this [reproducibility study](http://engineering.nyu.edu/~suel/papers/bp-ecir19.pdf)
to describe the algorithm in more practical ways.
jpountz added a commit that referenced this pull request Sep 14, 2023
jpountz added a commit that referenced this pull request Sep 15, 2023
Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using default ThreadFactoty for fork join pools runs test without permissions in case of enabled security manager.

See test failures here: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Windows/794/


public void testSingleTermWithForkJoinPool() throws IOException {
int concurrency = TestUtil.nextInt(random(), 1, 8);
ForkJoinPool pool = new ForkJoinPool(concurrency);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default implementation of ForkJoinPool executes tasks without any permissions. This causes the test to fail if a FS based directory implemntation is used:

To fix use a thread factory that does not remove all permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants