Document is empty #43

fheilz · 2018-08-06T17:32:49Z

I'm getting this error when I try to run:

Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\pyquery\pyquery.py", line 95, in fromstring
result = getattr(etree, meth)(context)
File "src\lxml\etree.pyx", line 3213, in lxml.etree.fromstring
File "src\lxml\parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
File "src\lxml\parser.pxi", line 1764, in lxml.etree._parseDoc
File "src\lxml\parser.pxi", line 1126, in lxml.etree._BaseParser._parseDoc
File "src\lxml\parser.pxi", line 600, in lxml.etree._ParserContext._handleParseResultDoc
File "src\lxml\parser.pxi", line 710, in lxml.etree._handleParseResult
File "src\lxml\parser.pxi", line 639, in lxml.etree._raiseParseError
File "", line 17
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 17, column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\USER\Desktop\Twitter Stock Market\current.py", line 16, in
for tweet in get_tweets('trump', pages=3):
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\twitter_scraper.py", line 78, in get_tweets
yield from gen_tweets(pages)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\twitter_scraper.py", line 26, in gen_tweets
url='bunk', default_encoding='utf-8')
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\requests_html.py", line 419, in init
element=PyQuery(html)('html') or PyQuery(f'{html}')('html'),
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\pyquery\pyquery.py", line 255, in init
elements = fromstring(context, self.parser)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\pyquery\pyquery.py", line 99, in fromstring
result = getattr(lxml.html, meth)(context)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\lxml\html_init_.py", line 876, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\lxml\html_init_.py", line 765, in document_fromstring
"Document is empty")
lxml.etree.ParserError: Document is empty

thorsummoner · 2018-12-03T05:49:07Z

seconded

ParthS007 · 2019-02-14T13:49:49Z

Closing due to inactivity 👍

james-see · 2019-03-06T20:40:34Z

I am getting this same issue

adamhrv · 2019-04-09T21:27:41Z

I had same error as @LiquidPrototype @thorsummoner
it seems to happen because the r.json in twitter_scraper.py line 26 yields an empty json response for the items_html value

{'has_more_items': False,
 'items_html': '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n',
 'min_position': None,
 'new_latent_count': 0}

breaking the while loop when encountering the empty html seems to avoid the error

while pages > 0:
  try:
      items_html = r.json()['items_html'].strip()
    if not items_html:
        break # exit the loop if the items_html is empty

james-see · 2019-04-09T22:24:11Z

Worth checking out my repo I made to get twitter data for bot identification purposes. I was using this repo until it broke for me too. https://github.com/jamesacampbell/botrnot and it is pip3 install botrnot

ParthS007 closed this as completed Feb 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document is empty #43

Document is empty #43

fheilz commented Aug 6, 2018

thorsummoner commented Dec 3, 2018

ParthS007 commented Feb 14, 2019

james-see commented Mar 6, 2019

adamhrv commented Apr 9, 2019

james-see commented Apr 9, 2019

Document is empty #43

Document is empty #43

Comments

fheilz commented Aug 6, 2018

thorsummoner commented Dec 3, 2018

ParthS007 commented Feb 14, 2019

james-see commented Mar 6, 2019

adamhrv commented Apr 9, 2019

james-see commented Apr 9, 2019