Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
replace fromstring(html) by fromstring(html, parser=newparser) #1575
It is a possible speed improvement : lxml.html.fromstring without a specific parser doesn't benefit from multithread. See :
I confirm the benchmark, but the result may be different with searx since fromhtml function is called at different time.
So in duckduckgo.py for example :
def response(resp): results =  doc = fromstring(resp.text)
should be replace by
def response(resp): results =  parser = HTMLParser() doc = fromstring(resp.text, parser=parser)
Most probably it would convenient to add utility function doing that in searx.utils module :
from lxml.html import fromstring, HTMLParser def htmlfromstring(str, **kwargs): parser = HTMLParser() return fromstring(resp.text, parser=parser, **kwargs)
Note : selectolax is faster than lxml.