Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypt Sorted Recovery files #2076

Closed
milleruntime opened this issue May 4, 2021 · 10 comments
Closed

Encrypt Sorted Recovery files #2076

milleruntime opened this issue May 4, 2021 · 10 comments
Assignees
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Milestone

Comments

@milleruntime
Copy link
Contributor

During recovery, the write ahead logs are read, sorted and written back to disk using org.apache.hadoop.io.MapFile.Writer(). If the On Disk Encryption feature is used, the sorted Map files will not be encrypted during this time. In order for the On Disk Encryption to be complete, these files need to be encrypted on disk. The best solution would probably be to use RFile, like is used with the rest of Accumulo.

Code dealing with sorting:

private void writeBuffer(String destPath, List<Pair<LogFileKey,LogFileValue>> buffer, int part)

public boolean recoverLogs(KeyExtent extent, Collection<Collection<String>> walogs)

Then tserver will recover the sorted logs:
tabletServer.recover(this.getTabletServer().getVolumeManager(), extent, logEntries,
absPaths, m -> {

public void recover(KeyExtent extent, List<Path> recoveryLogs, Set<String> tabletFiles,
MutationReceiver mr) throws IOException {

@milleruntime milleruntime added the enhancement This issue describes a new feature, improvement, or optimization. label May 4, 2021
@milleruntime milleruntime self-assigned this May 4, 2021
@milleruntime
Copy link
Contributor Author

The first hurdle is changing SortedLogRecovery to use RFile instead of MapFile. That will probably be one pull request. Once that is done, then we will be able to use them for encryption.

@ctubbsii my initial work translating the WAL event data to RFiles implements a schema that stores LogEvents in the column family. I think this will allow using locality groups based on the different values of the enum. See https://github.com/apache/accumulo/blob/3fd5cad92f9b63ac19e4466f3f2d5237b905262c/server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogEvents.java

@ctubbsii
Copy link
Member

I was thinking locality groups could be used to efficiently segregate information from different tablets. So, to recover a single tablet, you don't need to scan through any of the other tablets' data. Using locality groups for the log event types is a more traditional way of using locality groups where the locality group is configured in the Accumulo configuration properties. However, we don't really have a use case to only read a single event type at a time... we do have a use case to read all event types for a single tablet at a time. At least, I think that would be useful.

@milleruntime
Copy link
Contributor Author

I can see where that would speed up recovery. I haven't done much with locality groups so not sure how that would work with the sorting of the log events. I was going with a schema that preserves the current way the WAL events sort, by Event + tabletId + sequence number:

        Row = EventTypeInteger_tabletID_seq
        Family = event
        Qualifier = tserverSession OR filename OR KeyExtent

I created the Event type integer based on what is returned by the compareTo() of LogFileKey. Then appended the tabletId integer (which is written from commitSession.getLogId()) and finally the sequence number. The event type determines what will be stored in the column qualifier.

public int compareTo(LogFileKey o) {
if (eventType(this.event) != eventType(o.event)) {
return eventType(this.event) - eventType(o.event);
}
if (this.event == OPEN)
return 0;
if (this.tabletId != o.tabletId) {
return this.tabletId - o.tabletId;
}
return sign(this.seq - o.seq);

I was using scanner.fetchColumnFamily(cf); where the cf could be 0 or many Text... colFamToFetch and passing in the different events to fetch. Maybe I can put up a draft PR of just the LogFileKey.toKey() method so you can provide some feedback. There are already a good number of changes but its mostly changing all the iterators to work with Key/Value.

@milleruntime
Copy link
Contributor Author

If I gave each enum in LogEvents a unique number for sorting, then I wouldn't have to store the string in the column family. The new numbering could look like this:

    // OPEN = 0
    // DEFINE_TABLET = 1
    // COMPACTION_START = 2
    // COMPACTION_FINISH = 3
    // MUTATION = 4
    // MANY_MUTATIONS = 5

It would be extra convenient if the ordinal() of the enum corresponded to these numbers but currently that is not the case. The sorted WALs between previous versions and this one will already be incompatible so it may not be a big deal to change the order of the LogEvents enum. But I wonder if it could lead to some crazy bugs if a user were to recover an old WAL.

@keith-turner
Copy link
Contributor

keith-turner commented May 12, 2021

I was thinking locality groups could be used to efficiently segregate information from different tablets. So, to recover a single tablet, you don't need to scan through any of the other tablets' data.

I think the sorted WALs are structured such that needed information sorts together and can be efficiently found via seek. I have not looked at the code in a while, but I am not sure sorted WALs are ever scanned entirely. I think they may only be seeked to specific places. If this is true then there may not be a benefit to LGs. I think the sort segregates information as needed.

@keith-turner
Copy link
Contributor

But I wonder if it could lead to some crazy bugs if a user were to recover an old WAL.

Would need to maintain code to recover old WALs, or refuse to upgrade if WALs are present forcing user to run old version and resolve WALs. Refusing to upgrade could be annoying for the user as its difficult to know if there are WALs. They could end up in a loop trying to run new version and having to fall back to old version.

@ctubbsii
Copy link
Member

I wouldn't use an integer in place of the enum. RFile is already probably fine at compressing, and there's value in being able to actually read the file contents while troubleshooting. I don't think it would save much.

As Keith says, the current key scheme may be adequate... it's certainly the simplest migration. In that case, you don't need to use the CF, CQ, or CV at all. Just Row/Value.

Where I was going with the LG conversation was that I was just thinking that we could eliminate the tablet ID in the key structure, and make it efficient to recover an individual tablet if each tablet got its own locality group. We might even be able to eliminate the tabletID mapping and just use the extent directly, since it wouldn't be duplicated in the key. I don't see any advantage to using the event type for locality groups, though. We don't need efficient reads of all events of a specific type (I don't think so, anyway).

The wrench in my thinking is that I'm not that familiar right now with the recovery process, and I'm not actually sure why event type sorts before tablet ID.

@milleruntime
Copy link
Contributor Author

As Keith says, the current key scheme may be adequate... it's certainly the simplest migration. In that case, you don't need to use the CF, CQ, or CV at all. Just Row/Value.

The current scheme in LogFileKey is just an implementation for WritableComparable which includes serialization/deserialization and the compareTo method. Part of this task is translating LogFileKey into an Accumulo Key so we can store it in an RFile. I am also translating LogFileValue into an Accumulo Value (but this is just the mutations serialized). We need somewhere to put all the information serialized in LogFileKey into Key, otherwise it will be lost when sorted.

We don't need efficient reads of all events of a specific type (I don't think so, anyway).

There are a few places during recovery where we iterate over the logs to get different types of information. The first time we only look for DEFINE_TABLET events in the log. This is where I got the idea to use event for CF. Check it out here:

new RecoveryLogsIterator(fs, recoveryLogs, minKey(DEFINE_TABLET), maxKey(DEFINE_TABLET))) {

The wrench in my thinking is that I'm not that familiar right now with the recovery process, and I'm not actually sure why event type sorts before tablet ID.

Join the club! I still don't understand tabletId and seq in LogFileKey. This comment here makes me think its tracking in each WAL:

// A tablet may leave a tserver and then come back, in which case it would have a different and
// higher tablet id. Only want to consider events in the log related to the last time the tablet
// was loaded.

@milleruntime
Copy link
Contributor Author

Need to finish #2197 before completing this task.

@milleruntime
Copy link
Contributor Author

This is done so can be closed. Any work done for #2197 would be a separate task.

@ctubbsii ctubbsii added this to the 2.1.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Projects
None yet
Development

No branches or pull requests

3 participants