Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 7 additions & 10 deletions src/pcre2_compile_class.c
Original file line number Diff line number Diff line change
Expand Up @@ -1802,17 +1802,14 @@ if ((xclass_props & XCLASS_REQUIRED) != 0)
PUT(code, 0, (uint32_t)(char_lists_size >> 1));
code += LINK_SIZE;

#if defined PCRE2_DEBUG || defined SUPPORT_VALGRIND
/* If we added padding to align the list, initialize the bytes to
defined values, so the library is valgrind-clean. It could also
be a security concern for clients calling into PCRE2 via bindings
from a memory-safe language, if pcre2_serialize_encode() exposes
uninitialized memory that may contain sensitive information. */

if ((char_lists_size & 0x2) != 0)
{
/* In debug the unused 16 bit value is set
to a fixed value and marked unused. */
((uint16_t*)data)[-1] = 0x5555;
#ifdef SUPPORT_VALGRIND
VALGRIND_MAKE_MEM_NOACCESS(data - 2, 2);
#endif
Comment on lines -1811 to -1813
Copy link
Contributor

@carenas carenas Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if the data is unconditionally initialized it should be made inaccessible for reads, since it is not valid, so this part of the code is still needed IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, the test then fails. The data can't be no-access, because the serialisation code reads it. But if it is uninitialised (for valgrind) then we can't use valgrind to assert that all the memory has been set.

That's a nasty trade off. I wonder if there's a way to benefit both ways...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically; the fact that the serialization code is reading it is a bug that was introduced when this feature was added in 10.45, and therefore we should fix that as well by making sure that when the code is serialized all holes are skipped instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could just define the holes as something and specify that they always contain zeros or some other value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we overcomplicate this. Just write any value there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Philip and Zoltan. My PR was just writing a value in. That's all we need.

}
#endif
((uint16_t*)data)[-1] = 0xdead;

cb->char_lists_size =
CLIST_ALIGN_TO(char_lists_size, sizeof(uint32_t));
Expand Down
25 changes: 25 additions & 0 deletions src/pcre2test_inc.h
Original file line number Diff line number Diff line change
Expand Up @@ -2019,6 +2019,9 @@ uint32_t use_forbid_utf = forbid_utf;
PCRE2_SIZE patlen, full_patlen;
PCRE2_SIZE valgrind_access_length;
PCRE2_SIZE erroroffset;
int32_t serialize_rc;
uint8_t *serialized_bytes;
PCRE2_SIZE serialized_size;

/* The perltest.sh script supports only / as a delimiter. */

Expand Down Expand Up @@ -2966,6 +2969,28 @@ if ((pat_patctl.control2 & CTL2_NL_SET) != 0)
rc = show_pattern_info();
if (rc != PR_OK) return rc;

/* Verify that the compiled structure can be serialized without generating
memory errors. */

serialize_rc = pcre2_serialize_encode((const pcre2_code **)&compiled_code, 1,
&serialized_bytes, &serialized_size, general_context);
if (serialize_rc != 1)
{
cfprintf(clr_test_error, outfile, "** pcre2_serialize_encode() returned %d instead of 1\n",
serialize_rc);
return PR_ABEND;
}

#if defined SUPPORT_VALGRIND
if (VALGRIND_CHECK_MEM_IS_DEFINED(serialized_bytes, serialized_size) != 0)
{
cfprintf(clr_test_error, outfile, "** pcre2_serialize_encode() returned undefined data\n");
return PR_ABEND;
}
#endif

pcre2_serialize_free(serialized_bytes);

/* The "push" control requests that the compiled pattern be remembered on a
stack. This is mainly for testing the serialization functionality. */

Expand Down