Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate parallelizing TransactionImpl.readUnread() #948

Closed
keith-turner opened this issue Oct 20, 2017 · 5 comments · Fixed by #1080
Closed

Investigate parallelizing TransactionImpl.readUnread() #948

keith-turner opened this issue Oct 20, 2017 · 5 comments · Fixed by #1080
Milestone

Comments

@keith-turner
Copy link
Contributor

When a transaction only writes to a row+col and then has a collision Fluo will read the row+col after the collision to look for orphaned locks. If there are multiple row+cols then the code reads them one at time. The could possibly be parallelized by calling some of the existing methods in Fluo that read multiple row+cols at once.

The code in question is at TransactionImpl line 519

@keith-turner
Copy link
Contributor Author

Actually it does an entire row and all of it cols at once. Could still do all row cols at once.

@jkosh44
Copy link
Contributor

jkosh44 commented Oct 22, 2019

Is this still valid? I was looking at Map<RowColumn, Bytes> get(Collection<RowColumn> rowColumns) and get(Collection<Bytes> rows, Set<Column> columns) as options but they don't have the Consumer<Entry<Key, Value>> locksSeen parameter.

Another option would be something like the following, which just runs all the getImpl call concurrently. It's a little messy but I'd think of someway to clean it up.

List<CompletableFuture> reads = columnsToRead.entrySet()
            .stream()
            .map(entry -> CompletableFuture.runAsync(()->getImpl(entry.getKey(), entry.getValue(), locksSeen)))
            .collect(Collectors.toList());
    return CompletableFuture.allOf(reads.toArray(new CompletableFuture[]{})).get();

@jkosh44
Copy link
Contributor

jkosh44 commented Oct 22, 2019

I was also thinking, though I haven't fully thought it through yet, that we could change readUnread(CommitData cd, Consumer<Entry<Key, Value>> locksSeen) and some of the other helper methods to return a CompletableFutures and chain them together with thenComposeAsync. That way if we go with the completable future method above we wouldn't have to block with .get().

@keith-turner
Copy link
Contributor Author

Is this still valid?

Yes.

I was looking at Map<RowColumn, Bytes> get(Collection<RowColumn> rowColumns) and get(Collection<Bytes> rows, Set<Column> columns) as options but they don't have the Consumer<Entry<Key, Value>> locksSeen parameter.

I feel like this is the right path. Looking inside get(Collection<RowColumn> rowColumns) there is a ParallelSnapshotScanner, I wonder if this could be used. Create a parallel snapshot scanner and then read all the data from it. Can pass locksSeen to the constructor of ParallelSnapshotScanner.

Another option would be something like the following, which just runs all the getImpl call concurrently.

I am wary of using CompletableFuture.runAsync() for operations that do I/O because it runs in a JVM wide shared thread pool. If the Fluo client library does this it could impact user code that is also using runAsync()

@jkosh44
Copy link
Contributor

jkosh44 commented Oct 23, 2019

Gotcha, we could go a similar route of Map<Column, Bytes> getImpl(Bytes row, Set<Column> columns, Consumer<Entry<Key, Value>> locksSeen) and create a public Map<RowColumn, Bytes> getImpl(Collection<RowColumn> rowColumns, Map<Bytes, Set<Column>> locksSeen) method which allows you to provide the locksSeen for the ParallelSnapshotScanner constructor.

jkosh44 added a commit to jkosh44/fluo that referenced this issue Oct 28, 2019
When a transaction only writes to a row+col and then has a collision
Fluo will read the row+col after the collision to look for orphaned
locks. This commit parallelizes this behavior by reading all row+cols
at once.

This commit accomplishes this by using a ParallelSnapshotScanner
instead of a SnapshotScanner. Since ParallelSnapshotScanner resolves
write locks in the implementation, the code to resolve write locks was
removed from reachUnread() and checkForOrphanedLocks().

Fixes apache#948
@ctubbsii ctubbsii added this to the 2.0.0 milestone Oct 28, 2019
jkosh44 added a commit to jkosh44/fluo that referenced this issue Oct 28, 2019
When a transaction only writes to a row+col and then has a collision
Fluo will read the row+col after the collision to look for orphaned
locks. This commit parallelizes this behavior by reading all row+cols
at once.

This commit accomplishes this by using a ParallelSnapshotScanner
instead of a SnapshotScanner. Since ParallelSnapshotScanner resolves
write locks in the implementation, the code to resolve write locks was
removed from reachUnread() and checkForOrphanedLocks().

Fixes apache#948
jkosh44 added a commit to jkosh44/fluo that referenced this issue Oct 30, 2019
When a transaction only writes to a row+col and then has a collision
Fluo will read the row+col after the collision to look for orphaned
locks. This commit parallelizes this behavior by reading all row+cols
at once.

This commit accomplishes this by using a ParallelSnapshotScanner
instead of a SnapshotScanner. Since ParallelSnapshotScanner resolves
write locks in the implementation, the code to resolve write locks was
removed from reachUnread() and checkForOrphanedLocks().

Fixes apache#948
jkosh44 added a commit to jkosh44/fluo that referenced this issue Oct 30, 2019
When a transaction only writes to a row+col and then has a collision
Fluo will read the row+col after the collision to look for orphaned
locks. This commit parallelizes this behavior by reading all row+cols
at once.

This commit accomplishes this by using a ParallelSnapshotScanner
instead of a SnapshotScanner.

Fixes apache#948
keith-turner pushed a commit that referenced this issue Oct 30, 2019
When a transaction only writes to a row+col and then has a collision
Fluo will read the row+col after the collision to look for orphaned
locks. This commit parallelizes this behavior by reading all row+cols
at once.

This commit accomplishes this by using a ParallelSnapshotScanner
instead of a SnapshotScanner.

Fixes #948
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants