Encrypt Sorted Recovery files #2076

milleruntime · 2021-05-04T13:00:49Z

During recovery, the write ahead logs are read, sorted and written back to disk using org.apache.hadoop.io.MapFile.Writer(). If the On Disk Encryption feature is used, the sorted Map files will not be encrypted during this time. In order for the On Disk Encryption to be complete, these files need to be encrypted on disk. The best solution would probably be to use RFile, like is used with the rest of Accumulo.

Code dealing with sorting:

accumulo/server/tserver/src/main/java/org/apache/accumulo/tserver/log/LogSorter.java

Line 177 in bbd87a6

    
           private void writeBuffer(String destPath, List<Pair<LogFileKey,LogFileValue>> buffer, int part)

accumulo/server/manager/src/main/java/org/apache/accumulo/manager/recovery/RecoveryManager.java

Line 151 in 7e3c67d

    
           public boolean recoverLogs(KeyExtent extent, Collection<Collection<String>> walogs)

Then tserver will recover the sorted logs:

accumulo/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java

Lines 357 to 358 in 30ce59f

    
           tabletServer.recover(this.getTabletServer().getVolumeManager(), extent, logEntries, 
        
               absPaths, m -> {

accumulo/server/tserver/src/main/java/org/apache/accumulo/tserver/log/SortedLogRecovery.java

Lines 285 to 286 in 6a74b46

    
           public void recover(KeyExtent extent, List<Path> recoveryLogs, Set<String> tabletFiles, 
        
               MutationReceiver mr) throws IOException {

The text was updated successfully, but these errors were encountered:

milleruntime · 2021-05-11T17:02:27Z

The first hurdle is changing SortedLogRecovery to use RFile instead of MapFile. That will probably be one pull request. Once that is done, then we will be able to use them for encryption.

@ctubbsii my initial work translating the WAL event data to RFiles implements a schema that stores LogEvents in the column family. I think this will allow using locality groups based on the different values of the enum. See https://github.com/apache/accumulo/blob/3fd5cad92f9b63ac19e4466f3f2d5237b905262c/server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogEvents.java

ctubbsii · 2021-05-11T22:02:37Z

I was thinking locality groups could be used to efficiently segregate information from different tablets. So, to recover a single tablet, you don't need to scan through any of the other tablets' data. Using locality groups for the log event types is a more traditional way of using locality groups where the locality group is configured in the Accumulo configuration properties. However, we don't really have a use case to only read a single event type at a time... we do have a use case to read all event types for a single tablet at a time. At least, I think that would be useful.

milleruntime · 2021-05-12T10:42:38Z

I can see where that would speed up recovery. I haven't done much with locality groups so not sure how that would work with the sorting of the log events. I was going with a schema that preserves the current way the WAL events sort, by Event + tabletId + sequence number:

        Row = EventTypeInteger_tabletID_seq
        Family = event
        Qualifier = tserverSession OR filename OR KeyExtent

I created the Event type integer based on what is returned by the compareTo() of LogFileKey. Then appended the tabletId integer (which is written from commitSession.getLogId()) and finally the sequence number. The event type determines what will be stored in the column qualifier.

accumulo/server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogFileKey.java

Lines 155 to 164 in 583ca55

    
           public int compareTo(LogFileKey o) { 
        
             if (eventType(this.event) != eventType(o.event)) { 
        
               return eventType(this.event) - eventType(o.event); 
        
             } 
        
             if (this.event == OPEN) 
        
               return 0; 
        
             if (this.tabletId != o.tabletId) { 
        
               return this.tabletId - o.tabletId; 
        
             } 
        
             return sign(this.seq - o.seq);

I was using scanner.fetchColumnFamily(cf); where the cf could be 0 or many Text... colFamToFetch and passing in the different events to fetch. Maybe I can put up a draft PR of just the LogFileKey.toKey() method so you can provide some feedback. There are already a good number of changes but its mostly changing all the iterators to work with Key/Value.

milleruntime · 2021-05-12T11:50:44Z

If I gave each enum in LogEvents a unique number for sorting, then I wouldn't have to store the string in the column family. The new numbering could look like this:

    // OPEN = 0
    // DEFINE_TABLET = 1
    // COMPACTION_START = 2
    // COMPACTION_FINISH = 3
    // MUTATION = 4
    // MANY_MUTATIONS = 5

It would be extra convenient if the ordinal() of the enum corresponded to these numbers but currently that is not the case. The sorted WALs between previous versions and this one will already be incompatible so it may not be a big deal to change the order of the LogEvents enum. But I wonder if it could lead to some crazy bugs if a user were to recover an old WAL.

keith-turner · 2021-05-12T14:05:10Z

I was thinking locality groups could be used to efficiently segregate information from different tablets. So, to recover a single tablet, you don't need to scan through any of the other tablets' data.

I think the sorted WALs are structured such that needed information sorts together and can be efficiently found via seek. I have not looked at the code in a while, but I am not sure sorted WALs are ever scanned entirely. I think they may only be seeked to specific places. If this is true then there may not be a benefit to LGs. I think the sort segregates information as needed.

keith-turner · 2021-05-12T14:15:27Z

But I wonder if it could lead to some crazy bugs if a user were to recover an old WAL.

Would need to maintain code to recover old WALs, or refuse to upgrade if WALs are present forcing user to run old version and resolve WALs. Refusing to upgrade could be annoying for the user as its difficult to know if there are WALs. They could end up in a loop trying to run new version and having to fall back to old version.

ctubbsii · 2021-05-12T16:58:55Z

I wouldn't use an integer in place of the enum. RFile is already probably fine at compressing, and there's value in being able to actually read the file contents while troubleshooting. I don't think it would save much.

As Keith says, the current key scheme may be adequate... it's certainly the simplest migration. In that case, you don't need to use the CF, CQ, or CV at all. Just Row/Value.

Where I was going with the LG conversation was that I was just thinking that we could eliminate the tablet ID in the key structure, and make it efficient to recover an individual tablet if each tablet got its own locality group. We might even be able to eliminate the tabletID mapping and just use the extent directly, since it wouldn't be duplicated in the key. I don't see any advantage to using the event type for locality groups, though. We don't need efficient reads of all events of a specific type (I don't think so, anyway).

The wrench in my thinking is that I'm not that familiar right now with the recovery process, and I'm not actually sure why event type sorts before tablet ID.

milleruntime · 2021-05-12T18:02:21Z

As Keith says, the current key scheme may be adequate... it's certainly the simplest migration. In that case, you don't need to use the CF, CQ, or CV at all. Just Row/Value.

The current scheme in LogFileKey is just an implementation for WritableComparable which includes serialization/deserialization and the compareTo method. Part of this task is translating LogFileKey into an Accumulo Key so we can store it in an RFile. I am also translating LogFileValue into an Accumulo Value (but this is just the mutations serialized). We need somewhere to put all the information serialized in LogFileKey into Key, otherwise it will be lost when sorted.

We don't need efficient reads of all events of a specific type (I don't think so, anyway).

There are a few places during recovery where we iterate over the logs to get different types of information. The first time we only look for DEFINE_TABLET events in the log. This is where I got the idea to use event for CF. Check it out here:

accumulo/server/tserver/src/main/java/org/apache/accumulo/tserver/log/SortedLogRecovery.java

Line 105 in 6a74b46

    
           new RecoveryLogsIterator(fs, recoveryLogs, minKey(DEFINE_TABLET), maxKey(DEFINE_TABLET))) {

The wrench in my thinking is that I'm not that familiar right now with the recovery process, and I'm not actually sure why event type sorts before tablet ID.

Join the club! I still don't understand tabletId and seq in LogFileKey. This comment here makes me think its tracking in each WAL:

accumulo/server/tserver/src/main/java/org/apache/accumulo/tserver/log/SortedLogRecovery.java

Lines 290 to 292 in 6a74b46

    
           // A tablet may leave a tserver and then come back, in which case it would have a different and 
        
           // higher tablet id. Only want to consider events in the log related to the last time the tablet 
        
           // was loaded.

milleruntime · 2021-09-07T15:38:48Z

Need to finish #2197 before completing this task.

milleruntime · 2022-01-25T13:48:13Z

This is done so can be closed. Any work done for #2197 would be a separate task.

milleruntime added the enhancement This issue describes a new feature, improvement, or optimization. label May 4, 2021

milleruntime self-assigned this May 4, 2021

milleruntime mentioned this issue Jun 2, 2021

Make sorted recovery write to RFiles #2117

Merged

milleruntime closed this as completed Jan 25, 2022

ctubbsii added this to the 2.1.0 milestone Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encrypt Sorted Recovery files #2076

Encrypt Sorted Recovery files #2076

milleruntime commented May 4, 2021

milleruntime commented May 11, 2021

ctubbsii commented May 11, 2021

milleruntime commented May 12, 2021

milleruntime commented May 12, 2021

keith-turner commented May 12, 2021 •

edited

Loading

keith-turner commented May 12, 2021

ctubbsii commented May 12, 2021

milleruntime commented May 12, 2021

milleruntime commented Sep 7, 2021

milleruntime commented Jan 25, 2022

Encrypt Sorted Recovery files #2076

Encrypt Sorted Recovery files #2076

Comments

milleruntime commented May 4, 2021

milleruntime commented May 11, 2021

ctubbsii commented May 11, 2021

milleruntime commented May 12, 2021

milleruntime commented May 12, 2021

keith-turner commented May 12, 2021 • edited Loading

keith-turner commented May 12, 2021

ctubbsii commented May 12, 2021

milleruntime commented May 12, 2021

milleruntime commented Sep 7, 2021

milleruntime commented Jan 25, 2022

keith-turner commented May 12, 2021 •

edited

Loading