Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deserialize nested serialization with associative arrays #10

Closed
andreaskienast opened this issue Jul 13, 2020 · 7 comments
Closed

Comments

@andreaskienast
Copy link

Hello,

we encounter an issue where deserialization of a string fails that has nested serialized objects and the most inner one is an associative array. This issue occurs in PHP 5.6, unfortunately, we cannot test older releases.

Example serialization:

$serialized= 'a:3:{i:0;O:12:"Test\Subject":1:{s:8:"\000*\000value";s:5:"outer";}i:1;s:51:"O:12:"Test\Subject":1:{s:8:"\000*\000value";s:5:"inner";}";i:2;s:155:"O:12:"Test\Subject":1:{s:8:"\000*\000value";s:107:"a:1:{s:36:"arrays-and-stones-may-break-my-bones";O:12:"Test\Subject":1:{s:8:"\000*\000value";s:10:"very-inner";}}";}";}';

Deserialization fails with the following messages:

PHP Notice:  unserialize(): Error at offset 353 of 421 bytes in vendor/brumann/polyfill-unserialize/src/Unserialize.php on line 57
PHP Stack trace:
PHP   1. {main}() poc.php:0
PHP   2. Brumann\Polyfill\Unserialize::unserialize() poc.php:23
PHP   3. unserialize() vendor/brumann/polyfill-unserialize/src/Unserialize.php:57

Notice: unserialize(): Error at offset 353 of 421 bytes in vendor/brumann/polyfill-unserialize/src/Unserialize.php on line 57

I've added the poc.php and composer.json files as archive to this ticket: broken-deserialize-poc.zip

Kind regards
Andreas

@ohader
Copy link
Contributor

ohader commented Jul 13, 2020

Nice one... the problem with serialize is that it needs to be processed sequentially and therefore does not have internal escape sequences.

$inner = new \stdClass();
$outer = new \stdClass();
$inner->value = serialize('inner');
$outer->value = serialize(['item', $inner]);
var_dump(serialize($outer));

results in

O:8:"stdClass":1:{s:5:"value";s:76:"a:2:{i:0;s:4:"item";i:1;O:8:"stdClass":1:{s:5:"value";s:12:"s:5:"inner";";}}";}
                                ^^ "............ just a 76 chars long string for the outer instance ............"

Current matched pattern groups looks like this → https://regex101.com/r/3Bbktq/1

Thus, the inner result of serialize just can be determined by analyzing the whole stream since "nested" parts don't have escaped literals like \" instead of plain "...

@dbrumann
Copy link
Owner

Thanks for the detailed description and the follow up. 👍
I'll have a look at it and will prepare a release as soon as possible.

@chr-hertel
Copy link

i think this requires a way more complex solution that is not only able to recognize serialized objects but scalar values and arrays as well. to tackle that a recursive solution would be my first choice.

in every iteration we need to recognize the next level properties with their types (s, b, a ... and oc O). in case of an object we extract the next level of properties and parse them again...

this approach would eliminate wrongly parsed objects if they are part of a string property, but like i said: way more complex

maybe there's a better way 🤷‍♂️

@dbrumann
Copy link
Owner

Thanks @ohader for your PR. I'll have a look on Friday. It looks really promising. 👍 Thanks for putting so much work into it.

I am thinking about making this a 2.0, rather than a 1.1 or 1.0.5, due to the major change it introduces in the code and to make sure it will not have a side effect on people who don't run into this issue right now. I will add a note to the known bugs in the 1.x-branch and people can decide whether this is an issue for them or not.

Let me know if this sounds like a good idea for you or if you would rather have this as a 1.x release to make sure people will get the bugfix with a regular composer update (assuming they rely on the semantic versioning). I think both approaches are valid. Since I do not actively use this library right now, I would like to pick what is best for the users.

@ohader
Copy link
Contributor

ohader commented Jul 22, 2020

@dbrumann Lots of people and projects are taking benefits from this unserialize polyfill (so does typo3/phar-stream-wrapper) - so "sure thing" on trying to fix this issue 👍

My assumptions for this change were:

  • false-positives can only occur in serialized string items (s:123:"...") but not in any other type
  • when processing the data stream from left to right those string items can be identified by collecting the start offset and their end offset positions
  • object items are not allowed to occur in those collected string items (those are the current false-positives)

Usually I would have used more objects like some Offset and OffsetCollection (OffsetCollection::contains(Offset)). But for this change I tried to keep new structures to a minimum and used good old array for it... 😉

@ohader
Copy link
Contributor

ohader commented Jul 22, 2020

Let me know if this sounds like a good idea for you or if you would rather have this as a 1.x release to make sure people will get the bugfix with a regular composer update (assuming they rely on the semantic versioning). I think both approaches are valid. Since I do not actively use this library right now, I would like to pick what is best for the users.

From my point of view a 2.0 release makes sense. Nested serialization does not seem to be a problem for current consumers of this library. My changes also might introduce some implications on performance and memory consumption. Thus, 👍 on a 2.0 version.

@dbrumann
Copy link
Owner

I have just released v2.0.0 and it's already available via packagist. Thanks again for you contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants