Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document is empty #43

Closed
fheilz opened this issue Aug 6, 2018 · 5 comments
Closed

Document is empty #43

fheilz opened this issue Aug 6, 2018 · 5 comments

Comments

@fheilz
Copy link

fheilz commented Aug 6, 2018

I'm getting this error when I try to run:

Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\pyquery\pyquery.py", line 95, in fromstring
result = getattr(etree, meth)(context)
File "src\lxml\etree.pyx", line 3213, in lxml.etree.fromstring
File "src\lxml\parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
File "src\lxml\parser.pxi", line 1764, in lxml.etree._parseDoc
File "src\lxml\parser.pxi", line 1126, in lxml.etree._BaseParser._parseDoc
File "src\lxml\parser.pxi", line 600, in lxml.etree._ParserContext._handleParseResultDoc
File "src\lxml\parser.pxi", line 710, in lxml.etree._handleParseResult
File "src\lxml\parser.pxi", line 639, in lxml.etree._raiseParseError
File "", line 17
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 17, column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\USER\Desktop\Twitter Stock Market\current.py", line 16, in
for tweet in get_tweets('trump', pages=3):
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\twitter_scraper.py", line 78, in get_tweets
yield from gen_tweets(pages)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\twitter_scraper.py", line 26, in gen_tweets
url='bunk', default_encoding='utf-8')
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\requests_html.py", line 419, in init
element=PyQuery(html)('html') or PyQuery(f'{html}')('html'),
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\pyquery\pyquery.py", line 255, in init
elements = fromstring(context, self.parser)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\pyquery\pyquery.py", line 99, in fromstring
result = getattr(lxml.html, meth)(context)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\lxml\html_init_.py", line 876, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\site-packages\lxml\html_init_.py", line 765, in document_fromstring
"Document is empty")
lxml.etree.ParserError: Document is empty

@thorsummoner
Copy link

seconded

@ParthS007
Copy link

Closing due to inactivity 👍

@james-see
Copy link

I am getting this same issue

@adamhrv
Copy link

adamhrv commented Apr 9, 2019

I had same error as @LiquidPrototype @thorsummoner
it seems to happen because the r.json in twitter_scraper.py line 26 yields an empty json response for the items_html value

{'has_more_items': False,
 'items_html': '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n',
 'min_position': None,
 'new_latent_count': 0}

breaking the while loop when encountering the empty html seems to avoid the error

while pages > 0:
  try:
      items_html = r.json()['items_html'].strip()
    if not items_html:
        break # exit the loop if the items_html is empty

@james-see
Copy link

Worth checking out my repo I made to get twitter data for bot identification purposes. I was using this repo until it broke for me too. https://github.com/jamesacampbell/botrnot and it is pip3 install botrnot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants