Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a normalize whitespace error in the DomCrawler #2894

Merged
merged 1 commit into from
May 7, 2021

Conversation

fritzmg
Copy link
Contributor

@fritzmg fritzmg commented Mar 19, 2021

Currently the following error will occur when running the Crawler:

"exception" => Exception {
    #message: ""Symfony\Component\DomCrawler\Crawler::text()" will normalize whitespaces by default in Symfony 5.0, set the second "$normalizeWhitespace" argument se to retrieve the non-normalized version of the text."
    #code: 16384
    #file: "vendor\terminal42\escargot\src\Subscriber\RobotsSubscriber.php"
    #line: 244
    trace: {
        vendor\terminal42\escargot\src\Subscriber\RobotsSubscriber.php:244 { …}
        Terminal42\Escargot\Subscriber\RobotsSubscriber->Terminal42\Escargot\Subscriber\{closure}() {}
        vendor\symfony\dom-crawler\Crawler.php:622 { …}
        vendor\contao\contao\core-bundle\src\Search\Document.php:113 { …}
        vendor\symfony\dom-crawler\Crawler.php:355 { …}
        vendor\contao\contao\core-bundle\src\Search\Document.php:120 { …}
        vendor\contao\contao\core-bundle\src\Search\Indexer\DefaultIndexer.php:146 { …}
        vendor\contao\contao\core-bundle\src\Search\Indexer\DefaultIndexer.php:76 { …}
        vendor\contao\contao\core-bundle\src\Search\Indexer\DelegatingIndexer.php:34 { …}
        vendor\contao\contao\core-bundle\src\Crawl\Escargot\Subscriber\SearchIndexSubscriber.php:165 { …}
        vendor\terminal42\escargot\src\Escargot.php:515 { …}
        vendor\terminal42\escargot\src\Escargot.php:457 { …}
        vendor\terminal42\escargot\src\Escargot.php:330 { …}
        vendor\contao\contao\core-bundle\src\Command\CrawlCommand.php:135 { …}
        vendor\symfony\console\Command\Command.php:255 { …}
        vendor\symfony\console\Application.php:1027 { …}
        vendor\symfony\framework-bundle\Console\Application.php:97 { …}
        vendor\symfony\console\Application.php:273 { …}
        vendor\symfony\framework-bundle\Console\Application.php:83 { …}
        vendor\symfony\console\Application.php:149 { …}
        vendor\bin\contao-console:21 { …}
        vendor\bin\contao-console:21 { …}
    }
}

This PR fixes this by specifically setting the second parameter of Symfony\Component\DomCrawler\Crawler::text to true (which was previously the default).

@fritzmg fritzmg added the bug label Mar 19, 2021
@fritzmg fritzmg added this to the 4.9 milestone Mar 19, 2021
@fritzmg fritzmg self-assigned this Mar 19, 2021
@Toflar Toflar changed the base branch from 4.x to 4.9 March 19, 2021 16:04
@Toflar
Copy link
Member

Toflar commented Mar 19, 2021

We should have noticed this with tests. Apparently that was not covered, can you add a test case for that?

@fritzmg
Copy link
Contributor Author

fritzmg commented Mar 19, 2021

Hm, the tests would already cover this - but no error occurs there. Presumably because trigger_error is already silenced in Symfony. However, I suppose that may be the custom error handler in Escargot makes the error then actually visible. May be Escargot should restore the error handler after this block?

@fritzmg
Copy link
Contributor Author

fritzmg commented Mar 19, 2021

That would explain why the error only now surfaces. The error handler in Escargot was only added 10 days ago (in terminal42/escargot@ef7d131#diff-e66b75aabb721cde7baf72cca5b86fcc33eb03e24ed5b5d550d0a0b2c476972c).

@Toflar do you want me to create an issue there?

@@ -110,7 +110,7 @@ public function extractJsonLdScripts(string $context = '', string $type = ''): a
->filterXPath('descendant-or-self::script[@type = "application/ld+json"]')
->each(
static function (Crawler $node) {
$data = json_decode($node->text(), true);
$data = json_decode($node->text(null, true), true);
Copy link
Member

@leofeyer leofeyer Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. If true was the default and will still be the default in the future, why do we have to set it explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not technically necessary with the release of terminal42/escargot 1.0.5. However, not setting the parameters will still result in

@trigger_error(sprintf('"%s()" will normalize whitespaces by default in Symfony 5.0, set the second "$normalizeWhitespace" argument to false to retrieve the non-normalized version of the text.', __METHOD__), \E_USER_DEPRECATED);

being triggered under certain circumstances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should ignore the deprecation in these places as we do not want the non-normalized version of the text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not want the non-normalized version we have to use ->text(null, true) in Symfony 4.

@leofeyer leofeyer changed the title Fix normalize whitespace error Fix a normalize whitespace error in the DomCrawler May 7, 2021
@leofeyer leofeyer merged commit 57f9d58 into contao:4.9 May 7, 2021
@leofeyer
Copy link
Member

leofeyer commented May 7, 2021

Thank you @fritzmg.

@fritzmg fritzmg deleted the fix-crawler-text-retrieval branch May 7, 2021 14:00
bezin pushed a commit to bezin/contao that referenced this pull request Nov 26, 2021
Description
-----------

Currently the following error will occur when running the Crawler:

```
"exception" => Exception {
    #message: ""Symfony\Component\DomCrawler\Crawler::text()" will normalize whitespaces by default in Symfony 5.0, set the second "$normalizeWhitespace" argument se to retrieve the non-normalized version of the text."
    #code: 16384
    #file: "vendor\terminal42\escargot\src\Subscriber\RobotsSubscriber.php"
    #line: 244
    trace: {
        vendor\terminal42\escargot\src\Subscriber\RobotsSubscriber.php:244 { …}
        Terminal42\Escargot\Subscriber\RobotsSubscriber->Terminal42\Escargot\Subscriber\{closure}() {}
        vendor\symfony\dom-crawler\Crawler.php:622 { …}
        vendor\contao\contao\core-bundle\src\Search\Document.php:113 { …}
        vendor\symfony\dom-crawler\Crawler.php:355 { …}
        vendor\contao\contao\core-bundle\src\Search\Document.php:120 { …}
        vendor\contao\contao\core-bundle\src\Search\Indexer\DefaultIndexer.php:146 { …}
        vendor\contao\contao\core-bundle\src\Search\Indexer\DefaultIndexer.php:76 { …}
        vendor\contao\contao\core-bundle\src\Search\Indexer\DelegatingIndexer.php:34 { …}
        vendor\contao\contao\core-bundle\src\Crawl\Escargot\Subscriber\SearchIndexSubscriber.php:165 { …}
        vendor\terminal42\escargot\src\Escargot.php:515 { …}
        vendor\terminal42\escargot\src\Escargot.php:457 { …}
        vendor\terminal42\escargot\src\Escargot.php:330 { …}
        vendor\contao\contao\core-bundle\src\Command\CrawlCommand.php:135 { …}
        vendor\symfony\console\Command\Command.php:255 { …}
        vendor\symfony\console\Application.php:1027 { …}
        vendor\symfony\framework-bundle\Console\Application.php:97 { …}
        vendor\symfony\console\Application.php:273 { …}
        vendor\symfony\framework-bundle\Console\Application.php:83 { …}
        vendor\symfony\console\Application.php:149 { …}
        vendor\bin\contao-console:21 { …}
        vendor\bin\contao-console:21 { …}
    }
}
```

This PR fixes this by specifically setting the second parameter of `Symfony\Component\DomCrawler\Crawler::text` to `true` (which was previously the default).

Commits
-------

5bff597 fix normalize whitespace error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants