Skip to content

Commit cc277a7

Browse files
committed
MDEV-36024: Redesign innodb_encrypt_log=ON
The innodb_encrypt_log=ON subformat of FORMAT_10_8 is inefficient, because a new encryption or decryption context is being set up for every log record payload snippet. An in-place conversion between the old and new innodb_encrypt_log=ON format is technically possible. No such conversion has been implemented, though. There is some overhead with respect to the unencrypted format (innodb_encrypt_log=OFF): At the end of each mini-transaction, right before the CRC-32C, additional 8 bytes will be reserved for a nonce (really, log_sys.get_flushed_lsn()), which forms a part of an initialization vector. log_t::FORMAT_ENC_11: The new format identifier, a UTF-8 encoding of 🗝 U+1F5DD OLD KEY (encryption). In this format, everything except the types and lengths of log records will be encrypted. Thus, unlike in FORMAT_10_8, also page identifiers and FILE_ records will be encrypted. The initialization vector (IV) consists of the 8-byte nonce as well as the type and length byte(s) of the first record of the mini-transaction. Page identifiers will no longer form any part of the IV. The old log_t::FORMAT_ENC_10_8 (innodb_encrypt_log=ON) will be supported both by mariadb-backup and by crash recovery. Downgrade from the new format will only be possible if the new server has been running or restarted with innodb_encrypt_log=OFF. If innodb_encrypt_log=ON, only the new log_t::FORMAT_ENC_11 will be written. log_t::is_recoverable(): A new predicate, which holds for all 3 formats. recv_sys_t::tmp_buf: A heap-allocated buffer for decrypting a mini-transaction, or for making the wrap-around of a memory-mapped log file contiguous. recv_sys_t::start_lsn: The start of the mini-transaction. Updated at the start of parse_tail(). log_decrypt_mtr(): Decrypt a mini-transaction in recv_sys.tmp_buf. Theoretically, when reading the log via pread() rather than a read-only memory mapping, we could modify the contents of log_sys.buf in place. If we did that, we would have to re-read the last log block into log_sys.buf before resuming writes, because otherwise that block could be re-written as a mix of old decrypted data and new encrypted data, which would cause a subsequent recovery failure unless the log checkpoint had been advanced beyond this point. log_decrypt_legacy(): Decrypt a log_t::FORMAT_ENC_10_8 record snippet on stack. Replaces recv_buf::copy_if_needed(). recv_sys_t::get_backup_parser(): Return a recv_sys_t::parser, that is, a pointer to an instantiation of parse_mmap or parse_mtr for the current log format. recv_sys_t::parse_mtr(), recv_sys_t::parse_mmap(): Add a parameter template<uint32_t> for the current log_sys.format. log_parse_start(): Validate the CRC-32C of a mini-transaction. This has been split from the recv_sys_t::parse() template to reduce code duplication. These two are the lowest-level functions that will be instantiated for both recv_buf and recv_ring. recv_sys_t::parse(): Split into ::log_parse_start() and parse_tail(). Add a parameter template<uint32_t format> to specialize for log_sys.format at compilation time. recv_sys_t::parse_tail(): Operate on pointers to contiguous mini-transaction data. Use a parameter template<bool ENC_10_8> for special handling of the old innodb_encrypt_log=ON format. The former recv_buf::get_buf() is being inlined here. Much of the logic is split into non-inline functions, to avoid duplicating a lot of code for every template expansion. log_crypt: Encrypt or decrypt a mini-transaction in place in the new innodb_encrypt_log=ON format. We will use temporary buffers so that encryption_ctx_update() can be invoked on integer multiples of MY_AES_BLOCK_SIZE, except for the last bytes of the encrypted payload, which will be encrypted or decrypted in place thanks to ENCRYPTION_FLAG_NOPAD. log_crypt::append(): Invoke encryption_ctx_update() in MY_AES_BLOCK_SIZE (16-byte) blocks and scatter/gather shorter data blocks as needed. log_crypt::finish(), Handle the last (possibly incomplete) block as a special case, with ENCRYPTION_FLAG_NOPAD. mtr_t::parse_length(): Parse the length of a log record. mtr_t::encrypt(): Use log_crypt instead of the old log_encrypt_buf(). recv_buf::crc32c(): Add a parameter for the initial CRC-32C value. recv_sys_t::rewind(): Operate on pointers to the start of the mini-transaction and to the first skipped record. recv_sys_t::trim(): Declare as ATTRIBUTE_COLD so that this rarely invoked function will not be expanded inline in parse_tail(). recv_sys_t::parse_init(): Handle INIT_PAGE or FREE_PAGE while scanning to the end of the log. recv_sys_t::parse_page0(): Handle WRITE to FSP_SPACE_SIZE and FSP_SPACE_FLAGS. recv_sys_t::parse_store_if_exists(), recv_sys_t::parse_store(), recv_sys_t::parse_oom(): Handle page-level log records. mlog_decode_varint_length(): Make use of __builtin_clz() to avoid a loop when possible. mlog_decode_varint(): Define only on const byte*, as ATTRIBUTE_NOINLINE static because it is a rather large function. recv_buf::decode_varint(): Trivial wrapper for mlog_decode_varint(). recv_ring::decode_varint(): Special implementation. log_page_modify(): Note that a page will be modified in recovery. Split from recv_sys_t::parse_tail(). log_parse_file(): Handle non-page log records. log_record_corrupted(), log_unknown(), log_page_id_corrupted(): Common error reporting functions.
1 parent 1afc682 commit cc277a7

File tree

13 files changed

+1115
-885
lines changed

13 files changed

+1115
-885
lines changed

extra/mariabackup/xtrabackup.cc

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,8 @@ struct xb_filter_entry_t{
203203

204204
/** whether log_copying_thread() is active; protected by recv_sys.mutex */
205205
static bool log_copying_running;
206+
/** the log parsing function for --backup */
207+
static recv_sys_t::parser backup_log_parse;
206208
/** for --backup, target LSN to copy the log to; protected by recv_sys.mutex */
207209
lsn_t metadata_to_lsn;
208210

@@ -2640,7 +2642,10 @@ static byte log_hdr_buf[log_t::START_OFFSET + SIZE_OF_FILE_CHECKPOINT];
26402642
static void log_hdr_init()
26412643
{
26422644
memset(log_hdr_buf, 0, sizeof log_hdr_buf);
2643-
mach_write_to_4(LOG_HEADER_FORMAT + log_hdr_buf, log_t::FORMAT_10_8);
2645+
/* log_t::FORMAT_ENC_10_8 is written to the file as FORMAT_10_8 */
2646+
mach_write_to_4(LOG_HEADER_FORMAT + log_hdr_buf,
2647+
log_sys.format == log_t::FORMAT_ENC_11
2648+
? log_t::FORMAT_ENC_11 : log_t::FORMAT_10_8);
26442649
mach_write_to_8(LOG_HEADER_START_LSN + log_hdr_buf,
26452650
log_sys.next_checkpoint_lsn);
26462651
snprintf(reinterpret_cast<char*>(LOG_HEADER_CREATOR + log_hdr_buf),
@@ -3441,8 +3446,7 @@ static bool xtrabackup_copy_mmap_logfile()
34413446
const byte *start= &log_sys.buf[recv_sys.offset];
34423447
ut_d(recv_sys_t::parse_mtr_result r);
34433448

3444-
if ((ut_d(r=) recv_sys.parse_mmap<recv_sys_t::store::BACKUP>(false)) ==
3445-
recv_sys_t::OK)
3449+
if ((ut_d(r=) backup_log_parse(false)) == recv_sys_t::OK)
34463450
{
34473451
do
34483452
{
@@ -3460,8 +3464,7 @@ static bool xtrabackup_copy_mmap_logfile()
34603464
start = seq + 1;
34613465
}
34623466
}
3463-
while ((ut_d(r=) recv_sys.parse_mmap<recv_sys_t::store::BACKUP>(false)) ==
3464-
recv_sys_t::OK);
3467+
while ((ut_d(r=) backup_log_parse(false)) == recv_sys_t::OK);
34653468

34663469
if (xtrabackup_copy_mmap_snippet(dst_log_file, start,
34673470
&log_sys.buf[recv_sys.offset]))
@@ -3534,8 +3537,7 @@ static bool xtrabackup_copy_logfile(bool early_exit)
35343537
if (log_sys.buf[recv_sys.offset] <= 1)
35353538
break;
35363539

3537-
if (recv_sys.parse_mtr<recv_sys_t::store::BACKUP>(false) ==
3538-
recv_sys_t::OK)
3540+
if (backup_log_parse(false) == recv_sys_t::OK)
35393541
{
35403542
do
35413543
{
@@ -3545,8 +3547,7 @@ static bool xtrabackup_copy_logfile(bool early_exit)
35453547
sequence_offset));
35463548
*seq= 1;
35473549
}
3548-
while ((r= recv_sys.parse_mtr<recv_sys_t::store::BACKUP>(false)) ==
3549-
recv_sys_t::OK);
3550+
while ((r= backup_log_parse(false)) == recv_sys_t::OK);
35503551

35513552
if (ds_write(dst_log_file, log_sys.buf + start_offset,
35523553
recv_sys.offset - start_offset))
@@ -5585,6 +5586,7 @@ static bool xtrabackup_backup_func()
55855586
/* copy log file by current position */
55865587

55875588
mysql_mutex_lock(&recv_sys.mutex);
5589+
backup_log_parse = recv_sys.get_backup_parser();
55885590
recv_sys.lsn = log_sys.next_checkpoint_lsn;
55895591

55905592
const bool log_copy_failed = xtrabackup_copy_logfile(true);

mysql-test/suite/encryption/t/recovery_memory.test

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,5 @@ let $restart_parameters=;
4343
SELECT COUNT(*) FROM t1;
4444
ALTER TABLE t1 FORCE;
4545
DROP TABLE t1;
46+
47+
--rmdir $basedir

storage/innobase/handler/ha_innodb.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3681,6 +3681,8 @@ static int innodb_init_abort()
36813681
{
36823682
DBUG_ENTER("innodb_init_abort");
36833683

3684+
recv_sys.tmp_free();
3685+
36843686
if (fil_system.temp_space) {
36853687
fil_system.temp_space->close();
36863688
}

storage/innobase/include/log0crypt.h

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -79,11 +79,10 @@ ATTRIBUTE_COLD bool log_decrypt(byte* buf, lsn_t lsn, ulint size);
7979
@return buf */
8080
byte *log_decrypt_buf(const byte *iv, byte *buf, const byte *data, uint len);
8181

82-
/** Decrypt a log snippet.
83-
@param iv initialization vector
84-
@param buf buffer to be replaced with encrypted contents
85-
@param end pointer past the end of buf */
86-
void log_decrypt_buf(const byte *iv, byte *buf, const byte *const end);
82+
/** Decrypt a mini-transaction in place.
83+
@param buf start of the mini-transaction
84+
@param end end of data (followed by sequence byte and the 8-byte nonce) */
85+
void log_decrypt_mtr(byte *buf, const byte *end) noexcept;
8786

8887
/** Encrypt or decrypt a temporary file block.
8988
@param[in] src block to encrypt or decrypt

storage/innobase/include/log0log.h

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,8 @@ struct log_t
153153
static constexpr uint32_t FORMAT_10_8= 0x50687973;
154154
/** The MariaDB 10.8.0 format with innodb_encrypt_log=ON */
155155
static constexpr uint32_t FORMAT_ENC_10_8= FORMAT_10_8 | FORMAT_ENCRYPTED;
156+
/** The MariaDB 10.11 format with innodb_encrypt_log=ON */
157+
static constexpr uint32_t FORMAT_ENC_11= 0xf09f979d;
156158

157159
/** Location of the first checkpoint block */
158160
static constexpr size_t CHECKPOINT_1= 4096;
@@ -499,12 +501,18 @@ struct log_t
499501

500502
/** Set the log file format. */
501503
void set_latest_format(bool encrypted) noexcept
502-
{ format= encrypted ? FORMAT_ENC_10_8 : FORMAT_10_8; }
504+
{ format= encrypted ? FORMAT_ENC_11 : FORMAT_10_8; }
503505
/** @return whether the redo log is encrypted */
504506
bool is_encrypted() const noexcept { return format & FORMAT_ENCRYPTED; }
505507
/** @return whether the redo log is in the latest format */
506508
bool is_latest() const noexcept
507-
{ return (~FORMAT_ENCRYPTED & format) == FORMAT_10_8; }
509+
{ return format == FORMAT_10_8 || format == FORMAT_ENC_11; }
510+
/** @return whether the redo log is in a format that can be recovered */
511+
bool is_recoverable() const noexcept
512+
{
513+
return (format | FORMAT_ENCRYPTED) == FORMAT_ENC_10_8 ||
514+
format == FORMAT_ENC_11;
515+
}
508516

509517
/** @return capacity in bytes */
510518
lsn_t capacity() const noexcept { return file_size - START_OFFSET; }

storage/innobase/include/log0recv.h

Lines changed: 87 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -252,8 +252,12 @@ struct recv_sys_t
252252
size_t len;
253253
/** start offset of non-parsed log records in log_sys.buf */
254254
size_t offset;
255+
/** start offset of the currently parsed mini-transaction */
256+
size_t start_offset;
255257
/** log sequence number of the first non-parsed record */
256258
lsn_t lsn;
259+
/** log sequence number at the start of parse_tail() */
260+
lsn_t start_lsn;
257261
/** log sequence number of the last parsed mini-transaction */
258262
lsn_t scanned_lsn;
259263
/** log sequence number at the end of the FILE_CHECKPOINT record, or 0 */
@@ -271,10 +275,17 @@ struct recv_sys_t
271275
/** iterator to pages, used by parse() */
272276
map::iterator pages_it;
273277

278+
/** The allocated size of tmp_buf. The 1+8 extra bytes are
279+
needed for FORMAT_ENC_11 in parse(). */
280+
static constexpr size_t tmp_buf_size{MTR_SIZE_MAX + 9};
281+
/** buffer for decrypting mini-transactions or handling non-contiguous
282+
mini-transactions */
283+
byte *tmp_buf;
284+
274285
/** Process a record that indicates that a tablespace size is being shrunk.
275286
@param page_id first page that is not in the file
276287
@param lsn log sequence number of the shrink operation */
277-
inline void trim(const page_id_t page_id, lsn_t lsn);
288+
ATTRIBUTE_COLD void trim(const page_id_t page_id, lsn_t lsn);
278289

279290
/** Undo tablespaces for which truncate has been logged
280291
(indexed by page_id_t::space() - srv_undo_space_id_start) */
@@ -290,27 +301,53 @@ struct recv_sys_t
290301
/** The contents of the doublewrite buffer */
291302
recv_dblwr_t dblwr;
292303

293-
__attribute__((warn_unused_result))
304+
/** Free tmp_buf after the log will no longer be parsed. */
305+
void tmp_free() noexcept;
306+
307+
__attribute__((warn_unused_result))
294308
inline dberr_t read(os_offset_t offset, span<byte> buf);
295309
inline size_t files_size();
296310
void close_files();
297311

298312
/** Advance pages_it if it matches the iterator */
299-
void pages_it_invalidate(const map::iterator &p)
313+
void pages_it_invalidate(const map::iterator &p) noexcept
300314
{
301315
mysql_mutex_assert_owner(&mutex);
302316
if (pages_it == p)
303317
pages_it++;
304318
}
305319
/** Invalidate pages_it if it points to the given tablespace */
306-
void pages_it_invalidate(uint32_t space_id)
320+
void pages_it_invalidate(uint32_t space_id) noexcept
307321
{
308322
mysql_mutex_assert_owner(&mutex);
309323
if (pages_it != pages.end() && pages_it->first.space() == space_id)
310324
pages_it= pages.end();
311325
}
312326

313327
private:
328+
/** In parse_tail<storing=NO>(), handle INIT_PAGE or FREE_PAGE
329+
@param id page that is being initialized or freed */
330+
void parse_init(const page_id_t id) noexcept;
331+
332+
/** Handle WRITE to FSP_SPACE_SIZE and FSP_SPACE_FLAGS.
333+
@param id tablespace header page
334+
@param b log record snippet
335+
@param size whether FSP_SPACE_SIZE is being changed
336+
@param flags whether FSP_SPACE_FLAGS is being changed */
337+
void parse_page0(const page_id_t id, const byte *b, bool size, bool flags)
338+
noexcept;
339+
340+
/** @return whether parse_store() needs to be invoked
341+
@param space_id tablespace identifier */
342+
bool parse_store_if_exists(uint32_t space_id) const noexcept;
343+
344+
/** Store a parsed log record.
345+
@param id page identifier
346+
@param l log record
347+
@param size size of the log record
348+
@return whether we ran out of memory */
349+
bool parse_store(const page_id_t id, const byte *l, size_t size) noexcept;
350+
314351
/** Attempt to initialize a page based on redo log records.
315352
@param p iterator
316353
@param mtr mini-transaction
@@ -381,13 +418,10 @@ struct recv_sys_t
381418

382419
/** Register a redo log snippet for a page.
383420
@param it page iterator
384-
@param start_lsn start LSN of the mini-transaction
385-
@param lsn @see mtr_t::commit_lsn()
386421
@param l redo log snippet
387422
@param len length of l, in bytes
388423
@return whether we ran out of memory */
389-
bool add(map::iterator it, lsn_t start_lsn, lsn_t lsn,
390-
const byte *l, size_t len);
424+
bool add(map::iterator it, const byte *l, size_t len);
391425

392426
/** Parsing result */
393427
enum parse_mtr_result {
@@ -397,42 +431,67 @@ struct recv_sys_t
397431
PREMATURE_EOF,
398432
/** the end of the log was reached */
399433
GOT_EOF,
400-
/** parse<true>(l, false) ran out of memory */
434+
/** parse<YES>(l, false) ran out of memory */
401435
GOT_OOM
402436
};
403437

404438
/** Whether to store parsed log records */
405439
enum store{NO,BACKUP,YES};
406440

407441
private:
408-
/** Parse and register one log_t::FORMAT_10_8 mini-transaction.
442+
/** Parse and register one mini-transaction.
443+
@tparam source type of log data source
409444
@tparam storing whether to store the records
445+
@tparam format log record format (log_sys.format)
410446
@param l log data source
411447
@param if_exists if store: whether to check if the tablespace exists */
412-
template<typename source,store storing>
413-
inline parse_mtr_result parse(source &l, bool if_exists) noexcept;
414-
415-
/** Rewind a mini-transaction when parse() runs out of memory.
416-
@param l log data source
417-
@param begin start of the mini-transaction */
418-
template<typename source>
419-
ATTRIBUTE_COLD void rewind(source &l, source &begin) noexcept;
448+
template<typename source,store storing,uint32_t format>
449+
inline __attribute__((always_inline))
450+
parse_mtr_result parse(source l, bool if_exists) noexcept;
451+
452+
/** Report that multi-batch recovery is needed.
453+
@retval GOT_OOM always */
454+
parse_mtr_result parse_oom() noexcept;
455+
456+
/** Parse and register one mini-transaction.
457+
@tparam ENC_10_8 whether this in log_t::FORMAT_ENC_10_8
458+
@tparam storing whether to store the records
459+
@param begin start of the mini-transaction
460+
@param if_exists if store: whether to check if the tablespace exists
461+
@param size size of the mini-transaction
462+
@retval OK on success
463+
@retval GOT_EOF on corruption
464+
@retval GOT_OOM if we ran out of memory for recv_sys.pages */
465+
template<bool ENC_10_8,recv_sys_t::store storing>
466+
parse_mtr_result parse_tail(const byte *begin, bool if_exists, size_t size)
467+
noexcept;
468+
469+
/** Rewind a mini-transaction when parse_tail() runs out of memory.
470+
@param begin start of the mini-transaction
471+
@param end start of the first unprocessed record */
472+
ATTRIBUTE_COLD void rewind(const byte *begin, const byte *end) noexcept;
420473

421474
/** Report progress in terms of LSN or pages remaining */
422475
ATTRIBUTE_COLD void report_progress() const;
423-
public:
424-
/** Parse and register one log_t::FORMAT_10_8 mini-transaction,
476+
/** Parse and register a mini-transaction,
425477
without handling any log_sys.is_mmap() buffer wrap-around.
426478
@tparam storing whether to store the records
479+
@tparam format log_sys.format
427480
@param if_exists storing=YES: whether to check if the tablespace exists */
428-
template<store storing>
429-
static parse_mtr_result parse_mtr(bool if_exists) noexcept;
430-
/** Parse and register one log_t::FORMAT_10_8 mini-transaction,
481+
template<store storing,uint32_t format>
482+
static parse_mtr_result parse_mtr(bool if_exists);
483+
public:
484+
/** Parse and register a mini-transaction,
431485
handling log_sys.is_mmap() buffer wrap-around.
432486
@tparam storing whether to store the records
487+
@tparam format log_sys.format
433488
@param if_exists storing=YES: whether to check if the tablespace exists */
434-
template<store storing>
435-
static parse_mtr_result parse_mmap(bool if_exists) noexcept;
489+
template<store storing,uint32_t format>
490+
static parse_mtr_result parse_mmap(bool if_exists);
491+
/** mini-transaction parser */
492+
using parser= parse_mtr_result(*)(bool if_exists);
493+
/** @return the parsing function for mariadb-backup --backup */
494+
static parser get_backup_parser() noexcept;
436495

437496
/** Erase log records for a page. */
438497
void erase(map::iterator p);
@@ -462,8 +521,9 @@ struct recv_sys_t
462521

463522
/** Flag data file corruption during recovery. */
464523
ATTRIBUTE_COLD void set_corrupt_fs() noexcept;
465-
/** Flag log file corruption during recovery. */
466-
ATTRIBUTE_COLD void set_corrupt_log() noexcept;
524+
/** Flag log file corruption during recovery.
525+
@retval GOT_EOF always */
526+
ATTRIBUTE_COLD parse_mtr_result set_corrupt_log() noexcept;
467527

468528
/** @return whether data file corruption was found */
469529
bool is_corrupt_fs() const { return UNIV_UNLIKELY(found_corrupt_fs); }

0 commit comments

Comments
 (0)