Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with Python 3.7 #208

Closed
Arfrever opened this issue Oct 30, 2018 · 2 comments
Closed

Incompatibility with Python 3.7 #208

Arfrever opened this issue Oct 30, 2018 · 2 comments

Comments

@Arfrever
Copy link

Arfrever commented Oct 30, 2018

$ python3.7 -c 'import mwparserfromhell'
python3.7: mwparserfromhell/parser/ctokenizer/tokenizer.c:210: load_entities: Assertion `PyList_Check(defmap)' failed.
Aborted (core dumped)

The code is actually broken with all versions of Python. Older versions just silently tolerate it.

In mwparserfromhell/parser/ctokenizer/tokenizer.c, load_entities function contains:

    tempmod = PyImport_ImportModule(ENTITYDEFS_MODULE);
    if (!tempmod)
        return -1;
    defmap = PyObject_GetAttrString(tempmod, "entitydefs");
    if (!defmap)
        return -1;
    Py_DECREF(tempmod);
    deflist = PyDict_Keys(defmap);
    if (!deflist)
        return -1;
    Py_DECREF(defmap);
numdefs = (unsigned) PyList_GET_SIZE(defmap);

So defmap is actually entitydefs attribute in ENTITYDEFS_MODULE.
mwparserfromhell/parser/ctokenizer/tokenizer.h has:

#ifdef IS_PY3K
    ...
    #define ENTITYDEFS_MODULE "html.entities"
    ...
#else
    ...
    #define ENTITYDEFS_MODULE "htmlentitydefs"
    ...
#endif

So with Python 2, defmap is htmlentitydefs.entitydefs, while with Python 3, it is html.entities.entitydefs.

As you can see in:
https://github.com/python/cpython/blob/2.7/Lib/htmlentitydefs.py
https://github.com/python/cpython/blob/3.6/Lib/html/entities.py
https://github.com/python/cpython/blob/3.7/Lib/html/entities.py

# maps the HTML entity name to the character
# (or a character reference if the character is outside the Latin-1 range)
entitydefs = {}

entitydefs is a dict, not a list, so PyList_GET_SIZE should not be used with it.

In Python 3.6, PyList_GET_SIZE is defined:
https://github.com/python/cpython/blob/3.6/Include/listobject.h

#define PyList_GET_SIZE(op) Py_SIZE(op)

In Python 3.7, PyList_GET_SIZE is defined:
https://github.com/python/cpython/blob/3.7/Include/listobject.h

#define PyList_GET_SIZE(op) (assert(PyList_Check(op)),Py_SIZE(op))

See also:
https://bugs.python.org/issue29867
python/cpython@1a5856b

@Arfrever
Copy link
Author

Fix for this error:

--- a/mwparserfromhell/parser/ctokenizer/tokenizer.c
+++ b/mwparserfromhell/parser/ctokenizer/tokenizer.c
@@ -207,7 +207,7 @@ static int load_entities(void)
     if (!deflist)
         return -1;
     Py_DECREF(defmap);
-    numdefs = (unsigned) PyList_GET_SIZE(defmap);
+    numdefs = (unsigned) PyDict_Size(defmap);
     entitydefs = calloc(numdefs + 1, sizeof(char*));
     if (!entitydefs)
         return -1;

@earwig
Copy link
Owner

earwig commented Nov 1, 2018

Thanks for catching this. Apparently Python compiles extensions (in my environment) with -DNDEBUG, so the assert wasn't triggering, and apparently PyList_GET_SIZE() works on dicts for some reason.

It should have been PyList_GET_SIZE(deflist), because the reference to defmap is dropped by that point.

@earwig earwig closed this as completed in 2c206bc Nov 1, 2018
@earwig earwig added this to the version 0.5.2 milestone Nov 1, 2018
@earwig earwig self-assigned this Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants