You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The utility classes RefreshTag and RobotsTags both use XPATH to retrieve META tags. They currently do so by looking for //META which is inefficient as it searches everywhere in the document. These 2 methods can take up to 18% of the processing time for JSoupParserBolt and 16% of the overall CPU.
Instead, we can use a more constraining XPATH which will look only into /HTML/HEAD or /HTML/BODY, the latter is not the recommended variant but can be found in the wild.
The text was updated successfully, but these errors were encountered:
Profiling after the change doesn't show a significant impact on RobotsTags.extractMetaTags but RefreshTag takes only 1/2 the time it used to. This represents 14% of the processing time for JSoupParserBolt and 11% of the overall CPU.
The utility classes RefreshTag and RobotsTags both use XPATH to retrieve META tags. They currently do so by looking for //META which is inefficient as it searches everywhere in the document. These 2 methods can take up to 18% of the processing time for JSoupParserBolt and 16% of the overall CPU.
Instead, we can use a more constraining XPATH which will look only into /HTML/HEAD or /HTML/BODY, the latter is not the recommended variant but can be found in the wild.
The text was updated successfully, but these errors were encountered: