Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use serialize() instead of json_encode() in MemoizingParser - faster/simpler, but more memory usage #690

Merged
merged 1 commit into from
Oct 1, 2020

Conversation

Nyholm
Copy link
Contributor

@Nyholm Nyholm commented Oct 1, 2020

This will fix Roave/BackwardCompatibilityCheck#267 and probably aslo nikic/PHP-Parser#710

When I ran BackwardCompatibilityCheck on Nyholm/psr7 MessageTrait.php I got a TypeError.

addcslashes() expects parameter 1 to be string, null given
vendor/nikic/php-parser/lib/PhpParser/PrettyPrinter/Standard.php:984

Somehow, a String_ get a null value.

After some more debugging I found that it was only true for "cached" ATS's that came from the MemoizingParser. Replacing json_encode() with serialize() fixed the issue.

@Nyholm
Copy link
Contributor Author

Nyholm commented Oct 1, 2020

Strange.. I cannot reproduce the error with a test..

When Im running the BC check tool, I get the hash A. I get one AST when I parse it and a different when I use the json decoded one from cache.

When Im running the test, I get the same hash A, but it works..

@Ocramius
Copy link
Member

Ocramius commented Oct 1, 2020

When Im running the BC check tool, I get the hash A. I get one AST when I parse it and a different when I use the json decoded one from cache.

Probably a deeply nested bug in the serialization process? Can you check if using serialize()/unserialize() fixes the immediate pain?

@Nyholm
Copy link
Contributor Author

Nyholm commented Oct 1, 2020

Using serialize fixed my problem. Thank you

@localheinz will this patch also fix your issue?

@Nyholm Nyholm changed the title Dont json decode AST Use serialize instead of json_encode in MemoizingParser Oct 1, 2020
@Nyholm Nyholm marked this pull request as ready for review October 1, 2020 10:15
@Ocramius
Copy link
Member

Ocramius commented Oct 1, 2020

Before moving on with this, I'd first like to get @kukulich's input on #627, and why it was done with a JSON encode/decode tool instead.

I can imagine the strings to be longer with serialize(), but still: #627 (comment)

@Ocramius
Copy link
Member

Ocramius commented Oct 1, 2020

@Nyholm before you close your editor on your end, can you squash the commits and factor in some more text about what has been done here/why?

@Nyholm
Copy link
Contributor Author

Nyholm commented Oct 1, 2020

Im sorry that I fail to add a test for this. I tried to add my full class but it works with json_decode when I run PHPUnit but not when I run Roave-bc-checker. (I do get the same hash for the file)

See patch

diff --git a/test/unit/SourceLocator/Ast/Parser/MemoizingParserTest.php b/test/unit/SourceLocator/Ast/Parser/MemoizingParserTest.php
index 134e4f0b..9e8f1a05 100644
--- a/test/unit/SourceLocator/Ast/Parser/MemoizingParserTest.php
+++ b/test/unit/SourceLocator/Ast/Parser/MemoizingParserTest.php
@@ -4,6 +4,7 @@ declare(strict_types=1);
 
 namespace Roave\BetterReflectionTest\Reflector;
 
+use PhpParser\Lexer\Emulative;
 use PhpParser\Node;
 use PhpParser\Node\Name;
 use PhpParser\Parser;
@@ -87,4 +88,243 @@ class MemoizingParserTest extends TestCase
             'Each time a tree is requested, a new copy is provided',
         );
     }
+    
+    public function testSerialization(): void
+    {
+        
+$code = <<<'CODE'
+<?php
+
+declare(strict_types=1);
+
+namespace Nyholm\Psr7;
+
+use Psr\Http\Message\StreamInterface;
+
+/**
+ * Trait implementing functionality common to requests and responses.
+ *
+ * @author Michael Dowling and contributors to guzzlehttp/psr7
+ * @author Tobias Nyholm <tobias.nyholm@gmail.com>
+ * @author Martijn van der Ven <martijn@vanderven.se>
+ *
+ * @internal should not be used outside of Nyholm/Psr7 as it does not fall under our BC promise
+ */
+trait MessageTrait
+{
+    use LowercaseTrait;
+
+    /** @var array Map of all registered headers, as original name => array of values */
+    private $headers = [];
+
+    /** @var array Map of lowercase header name => original name at registration */
+    private $headerNames = [];
+
+    /** @var string */
+    private $protocol = '1.1';
+
+    /** @var StreamInterface|null */
+    private $stream;
+
+    public function getProtocolVersion(): string
+    {
+        return $this->protocol;
+    }
+
+    public function withProtocolVersion($version): self
+    {
+        if ($this->protocol === $version) {
+            return $this;
+        }
+
+        $new = clone $this;
+        $new->protocol = $version;
+
+        return $new;
+    }
+
+    public function getHeaders(): array
+    {
+        return $this->headers;
+    }
+
+    public function hasHeader($header): bool
+    {
+        return isset($this->headerNames[self::lowercase($header)]);
+    }
+
+    public function getHeader($header): array
+    {
+        $header = self::lowercase($header);
+        if (!isset($this->headerNames[$header])) {
+            return [];
+        }
+
+        $header = $this->headerNames[$header];
+
+        return $this->headers[$header];
+    }
+
+    public function getHeaderLine($header): string
+    {
+        return \implode(', ', $this->getHeader($header));
+    }
+
+    public function withHeader($header, $value): self
+    {
+        $value = $this->validateAndTrimHeader($header, $value);
+        $normalized = self::lowercase($header);
+
+        $new = clone $this;
+        if (isset($new->headerNames[$normalized])) {
+            unset($new->headers[$new->headerNames[$normalized]]);
+        }
+        $new->headerNames[$normalized] = $header;
+        $new->headers[$header] = $value;
+
+        return $new;
+    }
+
+    public function withAddedHeader($header, $value): self
+    {
+        if (!\is_string($header) || '' === $header) {
+            throw new \InvalidArgumentException('Header name must be an RFC 7230 compatible string.');
+        }
+
+        $new = clone $this;
+        $new->setHeaders([$header => $value]);
+
+        return $new;
+    }
+
+    public function withoutHeader($header): self
+    {
+        $normalized = self::lowercase($header);
+        if (!isset($this->headerNames[$normalized])) {
+            return $this;
+        }
+
+        $header = $this->headerNames[$normalized];
+        $new = clone $this;
+        unset($new->headers[$header], $new->headerNames[$normalized]);
+
+        return $new;
+    }
+
+    public function getBody(): StreamInterface
+    {
+        if (null === $this->stream) {
+            $this->stream = Stream::create('');
+        }
+
+        return $this->stream;
+    }
+
+    public function withBody(StreamInterface $body): self
+    {
+        if ($body === $this->stream) {
+            return $this;
+        }
+
+        $new = clone $this;
+        $new->stream = $body;
+
+        return $new;
+    }
+
+    private function setHeaders(array $headers): void
+    {
+        foreach ($headers as $header => $value) {
+            if (\is_int($header)) {
+                // If a header name was set to a numeric string, PHP will cast the key to an int.
+                // We must cast it back to a string in order to comply with validation.
+                $header = (string) $header;
+            }
+            $value = $this->validateAndTrimHeader($header, $value);
+            $normalized = self::lowercase($header);
+            if (isset($this->headerNames[$normalized])) {
+                $header = $this->headerNames[$normalized];
+                $this->headers[$header] = \array_merge($this->headers[$header], $value);
+            } else {
+                $this->headerNames[$normalized] = $header;
+                $this->headers[$header] = $value;
+            }
+        }
+    }
+
+    /**
+     * Make sure the header complies with RFC 7230.
+     *
+     * Header names must be a non-empty string consisting of token characters.
+     *
+     * Header values must be strings consisting of visible characters with all optional
+     * leading and trailing whitespace stripped. This method will always strip such
+     * optional whitespace. Note that the method does not allow folding whitespace within
+     * the values as this was deprecated for almost all instances by the RFC.
+     *
+     * header-field = field-name ":" OWS field-value OWS
+     * field-name   = 1*( "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^"
+     *              / "_" / "`" / "|" / "~" / %x30-39 / ( %x41-5A / %x61-7A ) )
+     * OWS          = *( SP / HTAB )
+     * field-value  = *( ( %x21-7E / %x80-FF ) [ 1*( SP / HTAB ) ( %x21-7E / %x80-FF ) ] )
+     *
+     * @see https://tools.ietf.org/html/rfc7230#section-3.2.4
+     */
+    private function validateAndTrimHeader($header, $values): array
+    {
+        if (!\is_string($header) || 1 !== \preg_match("@^[!#$%&'*+.^_`|~0-9A-Za-z-]+$@", $header)) {
+            throw new \InvalidArgumentException('Header name must be an RFC 7230 compatible string.');
+        }
+
+        if (!\is_array($values)) {
+            // This is simple, just one value.
+            if ((!\is_numeric($values) && !\is_string($values)) || 1 !== \preg_match("@^[ \t\x21-\x7E\x80-\xFF]*$@", (string) $values)) {
+                throw new \InvalidArgumentException('Header values must be RFC 7230 compatible strings.');
+            }
+
+            return [\trim((string) $values, " \t")];
+        }
+
+        if (empty($values)) {
+            throw new \InvalidArgumentException('Header values must be a string or an array of strings, empty array given.');
+        }
+
+        // Assert Non empty array
+        $returnValues = [];
+        foreach ($values as $v) {
+            if ((!\is_numeric($v) && !\is_string($v)) || 1 !== \preg_match("@^[ \t\x21-\x7E\x80-\xFF]*$@", (string) $v)) {
+                throw new \InvalidArgumentException('Header values must be RFC 7230 compatible strings.');
+            }
+
+            $returnValues[] = \trim((string) $v, " \t");
+        }
+
+        return $returnValues;
+    }
+}
+
+CODE;
+
+
+        $wrappedParser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7, new Emulative([
+                'usedAttributes' => ['comments', 'startLine', 'endLine', 'startFilePos', 'endFilePos'],
+            ]));
+
+        $parser = new MemoizingParser($wrappedParser);
+
+        self::assertEquals(
+            $wrappedParser->parse($code),
+            $parser->parse($code),
+        );
+        self::assertEquals(
+            $parser->parse($code),
+            $parser->parse($code),
+            'Equal tree is produced at each iteration',
+        );
+        self::assertNotSame(
+            $parser->parse($code),
+            $parser->parse($code),
+            'Each time a tree is requested, a new copy is provided',
+        );
+    }
 }

@Ocramius
Copy link
Member

Ocramius commented Oct 1, 2020

Im sorry that I fail to add a test for this

I think there's a deeper bug in the JsonSerializer of nikic/PHP-Parser anyway :D

@kukulich
Copy link
Collaborator

kukulich commented Oct 1, 2020

The memory usage with json_encode is better ¯\_(ツ)_/¯

@Nyholm
Copy link
Contributor Author

Nyholm commented Oct 1, 2020

The commits are squashed and the PR description is updated.

@Ocramius
Copy link
Member

Ocramius commented Oct 1, 2020

Switching to serialize() has a positive impact on CPU at the cost of Memory usage, from what I can see.

master:

Time: 01:24.069, Memory: 1.06 GB

OK, but incomplete, skipped, or risky tests!
Tests: 7738, Assertions: 81032, Skipped: 3.

This branch:

Time: 01:00.135, Memory: 1.34 GB

OK, but incomplete, skipped, or risky tests!
Tests: 7738, Assertions: 81032, Skipped: 3.

@kukulich
Copy link
Collaborator

kukulich commented Oct 1, 2020

@Ocramius I see higher memory usage on this branch (based on your comment)

@Ocramius
Copy link
Member

Ocramius commented Oct 1, 2020

@kukulich yes, I edited my comment, sorry :D

@Ocramius
Copy link
Member

Ocramius commented Oct 1, 2020

Right, shipping this then - I think it both simplifies things a bit, and brings some perf downstream, at the cost of more memory (of which we pretty much all have more than enough :-P )

@Ocramius Ocramius changed the title Use serialize instead of json_encode in MemoizingParser Use serialize() instead of json_encode() in MemoizingParser - faster, but more memory usage Oct 1, 2020
@Ocramius Ocramius changed the title Use serialize() instead of json_encode() in MemoizingParser - faster, but more memory usage Use serialize() instead of json_encode() in MemoizingParser - faster/simpler, but more memory usage Oct 1, 2020
@Ocramius Ocramius merged commit 553c5e6 into Roave:master Oct 1, 2020
@Ocramius Ocramius added this to the 4.10.0 milestone Oct 1, 2020
@Ocramius Ocramius self-assigned this Oct 1, 2020
@Nyholm
Copy link
Contributor Author

Nyholm commented Oct 1, 2020

Thank you for merging

@localheinz
Copy link
Contributor

Thank you, @Nyholm and @Ocramius!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse error on 5.x
4 participants