Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ctid as IV base instead of offset calculation #107

Merged
merged 3 commits into from
Feb 7, 2024

Conversation

dutow
Copy link
Collaborator

@dutow dutow commented Jan 25, 2024

This commit modifies the ID calculation of normal tuples to just use the alrady exisitng ItemPointer to offset the IV instead of the actual offset addresses, as the ItemPointer doesn't change during moves and also easier to use for replication.

As part of this, the structure of the IV is also changed: instead of using the offset as the base number, and incrementing it sequentially, we now insert the "base" ItemPointer at the high part of the IV, and start the counter at the other end, at the low part of the IV.

This means that we are no longer using AES-CTR, but instead rely on a custom AES based encryption, but this is also required for toast, as with that, we can't rely on the uniqueness of the address in the entire data range.

Old encryption tests are also deleted, as they no longer work with these changes.

This commit modifies the ID calculation of normal tuples to just
use the alrady exisitng ItemPointer to offset the IV instead of
the actual offset addresses, as the ItemPointer doesn't change during
moves and also easier to use for replication.

As part of this, the structure of the IV is also changed: instead of
using the offset as the base number, and incrementing it sequentially,
we now insert the "base" ItemPointer at the high part of the IV, and
start the counter at the other end, at the low part of the IV.

This means that we are no longer using AES-CTR, but instead rely on a
custom AES based encryption, but this is also required for toast, as
with that, we can't rely on the uniqueness of the address in the entire
data range.

Old encryption tests are also deleted, as they no longer work with
these changes.
@@ -809,7 +810,9 @@ pg_tde_fetch_toast_slice(Relation toastrel, Oid valueid, int32 attrsize,
}
}
/* Decrypt the data chunk by chunk here */
PG_TDE_DECRYPT_DATA((curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + encrypt_offset + valueid,

memcpy(iv_prefix, &valueid, sizeof(Oid));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the valueid remains the same throughout the function call, so no need to call memcpy in the loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -844,6 +847,7 @@ pg_tde_toast_encrypt(Pointer dval, Oid valueid, RelKeysData *keys)
int32 data_size =0;
char* data_p;
char* encrypted_data;
char iv_prefix[16] = {0,};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the correct indentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

iv_prefix[2] = ip.ip_blkid.bi_lo / 256;
iv_prefix[3] = ip.ip_blkid.bi_lo % 256;
iv_prefix[4] = ip.ip_posid / 256;
iv_prefix[5] = ip.ip_posid % 256;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either create a function or a Macro for calculating the IV prefix. The same calculation is happening in two places. Secondly, Also please add comments explaining the calculation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

uint32 data_len = size - header_size;

// ctid stored in item is incorrect (not set) at this point
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL discourages C++ (single-line //) code comments. Please change to the multiline comment format.
Ref: https://www.postgresql.org/docs/current/source-format.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} while(0)

#define PG_TDE_RE_ENCRYPT_TUPLE_DATA(_read_bn, _read_offset_in_page, _read_data, \
_write_bn, _write_offset_in_page, _write_data, _data_len, _keys) \
do { \
uint64 read_offset_in_file = (_read_bn * BLCKSZ) + _read_offset_in_page; \
uint64 write_offset_in_file = (_write_bn * BLCKSZ) + _write_offset_in_page; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the PG_TDE_RE_ENCRYPT_TUPLE_DATA Macro is not required anymore, it needs to be removed along with its invocations from 'src/access/pg_tde_prune.c' file. Currently, this is producing many 'unused variable' warnings on the variables, that were declared just to pass to this macro.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@codeforall
Copy link
Collaborator

There is also a tiny typo in the commit message. "alrady"

Copy link
Collaborator

@codeforall codeforall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Contributor

@hqakhtar hqakhtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR, but the code requires refactoring and addition of a lot of comments. The reason for approval is that refactoring should be dealt with separately.

A couple of issues to mention are:
(1) IV should be typedef
(2) 16 should be replaced with a #define (or use an existing define)
(3) 15 should be replaced with the value defined in (2) - 1.
(4) iv prefix initialisation is inconsistent; {0} and {0, }. To avoid such inconsistencies, we should really have a macro or a function instead.
(5) 256 should instead be defined as well.
(6) IMHO, bitwise mask/ops will be more performant.

@hqakhtar hqakhtar merged commit b6ccd6c into Percona-Lab:main Feb 7, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants