New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error if <!DOCTYPE html> is present in HTML #223
Comments
@rock321987 I've tried to replicate the error using Docker root@a3506a595f72:~# uname -ar
Linux a3506a595f72 4.14.33+ #1 SMP Sat Aug 11 08:05:16 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux One difference is that the import entrypoint is root@a3506a595f72:~# pip3 install pattern
Requirement already satisfied: pattern in /usr/local/lib/python3.6/dist-packages
Requirement already satisfied: backports.csv in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: feedparser in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: lxml in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: pdfminer.six in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: python-docx in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: mysqlclient in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: nltk in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: cherrypy in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->pattern)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->pattern)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->pattern)
Requirement already satisfied: idna<2.8,>=2.5 in /usr/lib/python3/dist-packages (from requests->pattern)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from pdfminer.six->pattern)
Requirement already satisfied: pycryptodome in /usr/local/lib/python3.6/dist-packages (from pdfminer.six->pattern)
Requirement already satisfied: sortedcontainers in /usr/local/lib/python3.6/dist-packages (from pdfminer.six->pattern)
Requirement already satisfied: singledispatch in /usr/local/lib/python3.6/dist-packages (from nltk->pattern)
Requirement already satisfied: cheroot>=6.2.4 in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: zc.lockfile in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: more-itertools in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: portend>=2.1.1 in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: backports.functools-lru-cache in /usr/local/lib/python3.6/dist-packages (from cheroot>=6.2.4->cherrypy->pattern)
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from zc.lockfile->cherrypy->pattern)
Requirement already satisfied: tempora>=1.8 in /usr/local/lib/python3.6/dist-packages (from portend>=2.1.1->cherrypy->pattern)
Requirement already satisfied: jaraco.functools>=1.20 in /usr/local/lib/python3.6/dist-packages (from tempora>=1.8->portend>=2.1.1->cherrypy->pattern)
Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from tempora>=1.8->portend>=2.1.1->cherrypy->pattern)
root@a3506a595f72:~# python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pattern.web import Document
>>> ss='''<!DOCTYPE html><a></a>'''
>>> ss
'<!DOCTYPE html><a></a>'
>>> aaz=Document(ss)
>>> aaz.children
[Text('html'), Element(tag='html')]
>>> Otherwise the parse was fine without error. How did you build/install |
Yeah. You are right. The pattern3 library I used was different. I used the dev branch from pattern library and it worked for me. At the time I was using it, it wasn't available on pip. |
This problem can be reproduced as
gives an error
Updating the string to
ss='''<a></a>'''
do not gives error.The text was updated successfully, but these errors were encountered: