Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Optimisation: faster extraction of META tags #553
The utility classes RefreshTag and RobotsTags both use XPATH to retrieve META tags. They currently do so by looking for //META which is inefficient as it searches everywhere in the document. These 2 methods can take up to 18% of the processing time for JSoupParserBolt and 16% of the overall CPU.
Instead, we can use a more constraining XPATH which will look only into /HTML/HEAD or /HTML/BODY, the latter is not the recommended variant but can be found in the wild.