Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error if <!DOCTYPE html> is present in HTML #223

Closed
rock321987 opened this issue Mar 22, 2018 · 2 comments
Closed

Error if <!DOCTYPE html> is present in HTML #223

rock321987 opened this issue Mar 22, 2018 · 2 comments

Comments

@rock321987
Copy link

This problem can be reproduced as

from pattern3.web import document
ss='''<!DOCTYPE html><a></a>'''
aaz=Document(ss)
aaz.children

gives an error

Traceback (most recent call last):
File "", line 1, in
File "/home/user/anaconda3/lib/python3.6/site-packages/pattern3/web/init.py", line 3580, in getattr
raise AttributeError("'Element' object has no attribute '%s'" % k)
AttributeError: 'Element' object has no attribute 'children'

Updating the string to ss='''<a></a>''' do not gives error.

@initbar
Copy link

initbar commented Nov 27, 2018

@rock321987 I've tried to replicate the error using Docker ubuntu:18.04 image:

root@a3506a595f72:~# uname -ar 
Linux a3506a595f72 4.14.33+ #1 SMP Sat Aug 11 08:05:16 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux

One difference is that the import entrypoint is pattern and not pattern3:

root@a3506a595f72:~# pip3 install pattern 
Requirement already satisfied: pattern in /usr/local/lib/python3.6/dist-packages
Requirement already satisfied: backports.csv in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: feedparser in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: lxml in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: pdfminer.six in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: python-docx in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: mysqlclient in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: nltk in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: cherrypy in /usr/local/lib/python3.6/dist-packages (from pattern)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->pattern)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->pattern)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->pattern)
Requirement already satisfied: idna<2.8,>=2.5 in /usr/lib/python3/dist-packages (from requests->pattern)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from pdfminer.six->pattern)
Requirement already satisfied: pycryptodome in /usr/local/lib/python3.6/dist-packages (from pdfminer.six->pattern)
Requirement already satisfied: sortedcontainers in /usr/local/lib/python3.6/dist-packages (from pdfminer.six->pattern)
Requirement already satisfied: singledispatch in /usr/local/lib/python3.6/dist-packages (from nltk->pattern)
Requirement already satisfied: cheroot>=6.2.4 in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: zc.lockfile in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: more-itertools in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: portend>=2.1.1 in /usr/local/lib/python3.6/dist-packages (from cherrypy->pattern)
Requirement already satisfied: backports.functools-lru-cache in /usr/local/lib/python3.6/dist-packages (from cheroot>=6.2.4->cherrypy->pattern)
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from zc.lockfile->cherrypy->pattern)
Requirement already satisfied: tempora>=1.8 in /usr/local/lib/python3.6/dist-packages (from portend>=2.1.1->cherrypy->pattern)
Requirement already satisfied: jaraco.functools>=1.20 in /usr/local/lib/python3.6/dist-packages (from tempora>=1.8->portend>=2.1.1->cherrypy->pattern)
Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from tempora>=1.8->portend>=2.1.1->cherrypy->pattern)
root@a3506a595f72:~# python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pattern.web import Document
>>> ss='''<!DOCTYPE html><a></a>'''
>>> ss
'<!DOCTYPE html><a></a>'
>>> aaz=Document(ss)
>>> aaz.children
[Text('html'), Element(tag='html')]
>>> 

Otherwise the parse was fine without error. How did you build/install pattern module locally?

@rock321987
Copy link
Author

Yeah. You are right. The pattern3 library I used was different. I used the dev branch from pattern library and it worked for me. At the time I was using it, it wasn't available on pip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants