Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
992 lines (813 sloc) 29.5 KB
Death by a thousand leaks: What statically-analysing 370 Python extensions looks like
=====================================================================================
David Malcolm <dmalcolm@redhat.com>
:author: David Malcolm
:copyright: David Malcolm (CC BY-SA-3.0)
////
.https://us.pycon.org/2013/schedule/presentation/95/
What happens when you run a custom C static analysis tool ("cpychecker") on hundreds of Python extensions? I'll talk about the kinds of errors that my tool found, how to run it on your own code, and how to prevent memory leaks and crasher bugs in the C code of your Python extension modules.
gcc-with-cpychecker is a static analysis tool I've written that can automatically detect reference- counting bugs in the C code of Python extension modules (and various other mistakes).
I've run the tool on hundreds of Python extensions, and it has found hundreds of real bugs.
////
What is static analysis?
------------------------
* Discovering properties of a program *without running it*
* Programs that analyze other programs
* Treating programs as data, rather than code
* In particular, automatically finding bugs in code
What kind of code will be analyzed?
-----------------------------------
For this talk:
The C code of Python extension modules
Prerequisites
-------------
* I'm going to assume basic familiarity with Python, and with either C or C++
* Hopefully you've used, debugged, or written a Python extension module in C (perhaps via SWIG or Cython)
Outline
-------
* Intro to "cpychecker"
* How to run the tool on your own code
* How I ran the tool on *lots* of code
* What bugs came up frequently
* Recommendations on dealing with C and C++ from Python
* Q & A
cpychecker
----------
* Part of my Python plugin for GCC
* 6500 lines of Python code implementing a static checker for C extension modules
* It finds bugs in the C code of Python extension modules
* https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
See my PyCon US 2012 talk:
Static analysis of Python extension modules using GCC
https://us.pycon.org/2012/schedule/presentation/78/
What it checks for (1): Reference counting
------------------------------------------
For every object:
* "what is my reference count?" aka "ob_refcnt" (the object's view of how many pointers point to it)
* the reality of how many pointers point to it
As a C extension module author you must manually keep these in sync using Py_INCREF and Py_DECREF.
The two kinds of bugs:
* ob_refcnt too high: (memory leaks - hence the title of this talk)
* ob_refcnt too low: (crashes)
cpychecker warns about these issues
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
Checking reference counts
-------------------------
* For each path through the function and PyObject*, it determines:
- what the reference count *ought to be* at the end of the function (based on how many pointers point to the object)
- what the reference count *is*
It will issues warnings for any that are incorrect.
Limitations of the refcount checking
------------------------------------
* purely intraprocedural
* assumes every function returning a PyObject* returns a *new* reference, rather than a *borrowed* reference
- it knows about most of the CPython API and its rules
- you can manually mark functions with non-standard behavior
* only tracks 0 and 1 times through any loop, to ensure that the analysis doesn't go on forever
* can be defeated by relatively simple code (turn up --maxtrans argument)
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
What it checks for (2)
----------------------
It checks for the following along all of those code paths:
* Dereferencing a NULL pointer (e.g. using result of an allocator without checking the result is non-NULL)
* Passing NULL to CPython APIs that will crash on NULL
* Usage of uninitialized local variables
* Dereferencing a pointer to freed memory
* Returning a pointer to freed memory
* Returning NULL without setting an exception
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
What it checks for (3)
----------------------
It also does some simpler checking:
* type in calls to PyArg_ParseTuple et al
* types and NULL termination of PyMethodDef tables
* types and NULL termination of PyObject_Call{Function|Method}ObjArgs
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
What it doesn't check for (patches welcome!)
--------------------------------------------
* tp_traverse errors (which can mess up the garbage collector); missing it altogether, or omitting fields
* errors in GIL handling
* lock/release mismatches
* missed opportunities to release the GIL (e.g. compute-intensive functions; functions that wait on IO/syscalls)
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
What it can't check for
------------------------
* Does the code "do the right thing"?
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
Building cpychecker
-------------------
.Dependencies (on Fedora)
[source,bash]
----
sudo yum install \
gcc-plugin-devel \
python-devel \
python-six \
python-pygments \
graphviz
----
.Downloading the code
[source,bash]
----
git clone git://git.fedorahosted.org/gcc-python-plugin.git
----
Building the checker
--------------------
.Building the checker
[source,bash]
----
make plugin
----
.Checking that it's working
[source,bash]
----
make demo
----
Textual output from "make demo"
-------------------------------
[source,bash]
----
demo.c: In function ‘make_a_list_of_random_ints_badly’:
demo.c:90:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "i" [enabled by default]
argument 3 ("&count") had type
"long int *" (pointing to 64 bits)
but was expecting
"int *" (pointing to 32 bits)
for format code "i"
demo.c:102:1: warning: ob_refcnt of '*item' is 1 too high [enabled by default]
demo.c:102:1: note: was expecting final ob_refcnt to be N + 1 (for some unknown N)
demo.c:102:1: note: due to object being referenced by: PyListObject.ob_item[0]
demo.c:102:1: note: but final ob_refcnt is N + 2
demo.c:97:14: note: PyLongObject allocated at: item = PyLong_FromLong(random());
demo.c:90:26: note: when PyArg_ParseTuple() succeeds at: if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:90:8: note: taking False path at: if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:94:10: note: reaching: list = PyList_New(0);
demo.c:94:10: note: when PyList_New() succeeds at: list = PyList_New(0);
demo.c:96:5: note: when considering range: 1 <= count.0 <= 0x7fffffff at: for (i = 0; i < count; i++) {
demo.c:96:5: note: taking True path at: for (i = 0; i < count; i++) {
demo.c:97:31: note: reaching: item = PyLong_FromLong(random());
demo.c:97:14: note: when PyLong_FromLong() succeeds at: item = PyLong_FromLong(random());
demo.c:97:14: note: ob_refcnt is now refs: 1 + N where N >= 0
demo.c:98:22: note: when PyList_Append() succeeds at: PyList_Append(list, item);
demo.c:98:22: note: ob_refcnt is now refs: 2 + N where N >= 0
demo.c:98:22: note: '*item' is now referenced by 1 non-stack value(s): PyListObject.ob_item[0]
demo.c:96:5: note: when considering count.0 == (int)1 from demo.c:90 at: for (i = 0; i < count; i++) {
demo.c:96:5: note: taking False path at: for (i = 0; i < count; i++) {
demo.c:101:5: note: reaching: return list;
demo.c:102:1: note: returning
demo.c:102:1: note: found 1 similar trace(s) to this
demo.c:98:22: warning: calling PyList_Append with NULL as argument 1 (list) at demo.c:98 [enabled by default]
demo.c:90:26: note: when PyArg_ParseTuple() succeeds at: if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:90:8: note: taking False path at: if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:94:10: note: reaching: list = PyList_New(0);
demo.c:94:10: note: when PyList_New() fails at: list = PyList_New(0);
demo.c:96:5: note: when considering range: 1 <= count.0 <= 0x7fffffff at: for (i = 0; i < count; i++) {
demo.c:96:5: note: taking True path at: for (i = 0; i < count; i++) {
demo.c:97:31: note: reaching: item = PyLong_FromLong(random());
demo.c:97:14: note: when PyLong_FromLong() succeeds at: item = PyLong_FromLong(random());
demo.c:98:22: note: PyList_Append() invokes Py_TYPE() on the pointer via the PyList_Check() macro, thus accessing (NULL)->ob_type
demo.c:98:22: note: found 1 similar trace(s) to this
demo.c:86:1: note: graphical error report for function 'make_a_list_of_random_ints_badly' written out to 'demo.c.make_a_list_of_random_ints_badly-refcount-errors.html'
demo.c: In function ‘buggy_converter’:
demo.c:76:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "O&" [enabled by default]
argument 4 ("&i") had type
"int *" (pointing to 32 bits)
but was expecting
"Py_ssize_t *" (pointing to 64 bits) (from second argument of "int (*fn) (struct PyObject *, Py_ssize_t *)")
for format code "O&"
demo.c: In function ‘kwargs_example’:
demo.c:62:37: warning: Mismatching type in call to PyArg_ParseTupleAndKeywords with format code "(ff):kwargs_example" [enabled by default]
argument 5 ("&x") had type
"double *" (pointing to 64 bits)
but was expecting
"float *" (pointing to 32 bits)
for format code "f"
demo.c:62:37: warning: Mismatching type in call to PyArg_ParseTupleAndKeywords with format code "(ff):kwargs_example" [enabled by default]
argument 6 ("&y") had type
"double *" (pointing to 64 bits)
but was expecting
"float *" (pointing to 32 bits)
for format code "f"
demo.c: In function ‘too_many_varargs’:
demo.c:50:26: warning: Too many arguments in call to PyArg_ParseTuple with format string "i"
expected 1 extra arguments:
"int *" (pointing to 32 bits)
but got 2:
"int *" (pointing to 32 bits)
"int *" (pointing to 32 bits)
[enabled by default]
demo.c: In function ‘not_enough_varargs’:
demo.c:40:25: warning: Not enough arguments in call to PyArg_ParseTuple with format string "i"
expected 1 extra arguments:
"int *" (pointing to 32 bits)
but got none
[enabled by default]
demo.c: In function ‘socket_htons’:
demo.c:30:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "i:htons" [enabled by default]
argument 3 ("&x1") had type
"long unsigned int *" (pointing to 64 bits)
but was expecting
"int *" (pointing to 32 bits)
for format code "i"
----
Graphical output from "cpychecker"
----------------------------------
image:images/sample-html-error-report.png[]
Running it on your own code
---------------------------
.Distutils
[source,bash]
----
CC=/path/to/built/plugin/gcc-with-cpychecker \
python setup.py build
----
to set the *environment variable*
.Makefiles
[source,bash]
----
make CC=/path/to/built/plugin/gcc-with-cpychecker
----
to override the *Makefile variable* CC.
Let us know how you get on!
---------------------------
Mailing list:
* gcc-python-plugin@lists.fedorahosted.org
** See: https://fedorahosted.org/mailman/listinfo/gcc-python-plugin
Analyze all the things!
-----------------------
* The goal: analyze all of the C Python extensions in a recent Linux distribution
** Specifically: all of the Python 2 C code in Fedora 17
*** Every source rpm that builds something that links against libpython2.7
*** 370(ish) packages
* The reality:
** Some unevenness in the data coverage, so take my numbers with a pinch of salt
////
FIXME:
* didn't include python
* coverage seems uneven
For the Boston rehearsal:
- a subset: 113 packages: everything named "python-*"
////
Running cpychecker a *lot*
--------------------------
Scaling up to hundreds of projects:
* building via RPM
- hides the distutils vs Makefile vs CMake etc
* "mock" builds
- every build gets its own freshly-provisioned chroot
* "mock-with-analysis": running checkers:
- cpychecker
- cppcheck
- clang-analyzer
- gcc warnings
mock-with-analysis: https://github.com/fedora-static-analysis/mock-with-analysis
Scaling up (continued)
----------------------
* separation of model from presentation
- "Firehose" XML format: https://github.com/fedora-static-analysis/firehose
* detect analyzers that fail or exceed 1 minute to run
* store the result in a database
* capture any sources mentioned in a report
* can also capture arbitrary data e.g. code metrics
Graphical report
----------------
image:images/viewing-an-issue.png[]
Comparing builds
----------------
image:images/comparison-view.png[]
Managing failures
-----------------
image:images/viewing-a-failure.png[]
What are the 20 *most* commonly used Py/_Py entrypoints?
---------------------------------------------------------
[source,python]
----
[(u'PyArg_ParseTuple', 3034),
(u'PyErr_SetString', 2954),
(u'Py_BuildValue', 1429),
(u'PyErr_Occurred', 904),
(u'PyString_FromString', 814),
(u'PyInt_FromLong', 764),
(u'PyErr_Format', 651),
(u'PyEval_RestoreThread', 615),
(u'PyEval_SaveThread', 615),
(u'PyArg_ParseTupleAndKeywords', 470),
(u'PyErr_NoMemory', 444),
(u'_PyArg_ParseTuple_SizeT', 441),
(u'PyErr_Clear', 427),
(u'PyList_New', 404),
(u'PyObject_GetAttrString', 394),
(u'PyType_IsSubtype', 371),
(u'PyString_AsString', 367),
(u'Py_InitModule4_64', 347),
(u'PyDict_SetItemString', 336)]
----
What are the *least* commonly used Py/_Py entrypoints?
-------------------------------------------------------
There are many with just 1 user, but most of these are false positives
from projects using the Py/_Py prefix.
* about 50 actual CPython API entrypoints with just one user
* about 100 "entrypoints" due to other projects reusing the prefix
* see source code of this talk if you're interested in the data:
https://github.com/davidmalcolm/PyCon-US-2013-Talk
////
Apparent true CPython API entrypoints with one user in 2013-03-14 dataset:
(u'PyBuffer_FromMemory', 1),
(u'PyBuffer_FromReadWriteObject', 1),
(u'PyCapsule_GetContext', 1),
(u'PyCapsule_GetName', 1),
(u'PyCapsule_IsValid', 1),
(u'PyClass_New', 1),
(u'PyCodec_Encoder', 1),
(u'PyDict_Items', 1),
(u'PyDict_MergeFromSeq2', 1),
(u'PyDict_Update', 1),
(u'PyErr_NewExceptionWithDoc', 1),
(u'PyFile_SetBufSize', 1),
(u'PyFile_WriteObject', 1),
(u'PyInt_AsUnsignedLongLongMask', 1),
(u'PyInt_Check', 1),
(u'PyList_AsTuple', 1),
(u'PyList_Reverse', 1),
(u'PyLong_AsUnsignedLongMask', 1),
(u'PyLong_AsVoidPtr', 1),
(u'PyMarshal_ReadObjectFromString', 1),
(u'PyMarshal_WriteObjectToString', 1),
(u'PyMethod_New', 1),
(u'PyNumber_CoerceEx', 1),
(u'PyNumber_InPlacePower', 1),
(u'PyNumber_Power', 1),
(u'PyOS_mystrnicmp', 1),
(u'PyRun_SimpleFileExFlags', 1),
(u'PySeqIter_New', 1),
(u'PySequence_Repeat', 1),
(u'PySequence_SetSlice', 1),
(u'PySet_Add', 1),
(u'PySet_Contains', 1),
(u'PySet_New', 1),
(u'PyString_DecodeEscape', 1),
(u'PyString_FromFormatV', 1),
(u'PyStructSequence_InitType', 1),
(u'PyStructSequence_New', 1),
(u'PyThreadState_GetDict', 1),
(u'PyUnicodeUCS4_AsRawUnicodeEscapeString', 1),
(u'PyUnicodeUCS4_DecodeUnicodeEscape', 1),
(u'PyUnicodeUCS4_Encode', 1),
(u'PyUnicodeUCS4_Replace', 1),
(u'PyUnicodeUCS4_Resize', 1),
(u'Py_GetArgcArgv', 1),
(u'Py_Initialiaze', 1),
(u'Py_InitializeEx', 1),
(u'Py_Main', 1),
(u'_PyArg_Parse_SizeT', 1),
(u'_PyUnicode_AsString', 1),
Apparent "invaders" of the namespace, with 2013-03-14 dataset:
(u'PyArray_As1D', 1),
(u'PyArray_CanCastSafely', 1),
(u'PyArray_ContiguousFromObject', 1),
(u'PyArray_Free', 1),
(u'PyArray_FromDimsAndData', 1),
(u'PyCD_New', 1),
(u'PyEvent_FillUserEvent', 1),
(u'PyEvent_New2', 1),
(u'PyGSL_PyArray_prepare_gsl_matrix_view', 1),
(u'PyGSL_PyArray_prepare_gsl_vector_view', 1),
(u'PyGSL_check_python_return', 1),
(u'PyGSL_convert_to_gsl_function', 1),
(u'PyGSL_error_flag', 1),
(u'PyGSL_get_error_object', 1),
(u'PyGSL_init_api', 1),
(u'PyGSL_init_errno', 1),
(u'PyGSL_matrix_check', 1),
(u'PyGSL_multimin_f_init', 1),
(u'PyGSL_pdf_dd_to_ui', 1),
(u'PyGSL_pdf_ddd_to_dd', 1),
(u'PyGSL_pdf_uiuiui_to_ui', 1),
(u'PyGSL_pylong_to_ulong', 1),
(u'PyGSL_register_accel_err_object', 1),
(u'PyGSL_rng_dA_to_dA', 1),
(u'PyGSL_rng_dd_to_ui', 1),
(u'PyGSL_rng_ddd_to_dd', 1),
(u'PyGSL_rng_ddd_to_double', 1),
(u'PyGSL_rng_to_ddd', 1),
(u'PyGSL_rng_to_nd', 1),
(u'PyGSL_rng_to_ulong', 1),
(u'PyGSL_rng_ui_to_double', 1),
(u'PyGSL_rng_uidA_to_uiA', 1),
(u'PyGSL_rng_uiuiui_to_ui', 1),
(u'PyGSL_rng_ul_to_ulong', 1),
(u'PyGSL_siman_destroy', 1),
(u'PyGSL_siman_release_x', 1),
(u'PyGame_Video_AutoInit', 1),
(u'PyGame_Video_AutoQuit', 1),
(u'PyGreenlet_GetCurrent', 1),
(u'PyGreenlet_SetParent', 1),
(u'PyImaging_MapperNew', 1),
(u'PyJoystick_New', 1),
(u'PyMOLOptions_Free', 1),
(u'PyMOLOptions_New', 1),
(u'PyMOL_CmdLoad', 1),
(u'PyMOL_ConfigureShadersGL', 1),
(u'PyMOL_Draw', 1),
(u'PyMOL_Free', 1),
(u'PyMOL_GetProgress', 1),
(u'PyMOL_GetProgressChanged', 1),
(u'PyMOL_Idle', 1),
(u'PyMOL_Key', 1),
(u'PyMOL_NewWithOptions', 1),
(u'PyMOL_SetDefaultMouse', 1),
(u'PyMOL_SetInterrupt', 1),
(u'PyMOL_SetSwapBuffersFn', 1),
(u'PyMOL_Special', 1),
(u'PyMOL_Start', 1),
(u'PyMOL_Stop', 1),
(u'PyNamemapper_valueForKey', 1),
(u'PyNetCDFFileObject_dealloc', 1)]
(u'PyNetCDFFileObject_new', 1),
(u'PyNetCDFFile_CreateDimension', 1),
(u'PyNetCDFFile_CreateVariable', 1),
(u'PyNetCDFFile_GetAttribute', 1),
(u'PyNetCDFFile_Sync', 1),
(u'PyNetCDFVariableObject_ass_item', 1),
(u'PyNetCDFVariableObject_item', 1),
(u'PyNetCDFVariable_GetShape', 1),
(u'PyNetCDFVariable_SetAttribute', 1),
(u'PyOpenSSL_LongToHex', 1),
(u'PyOutline_AsOutline', 1),
(u'PyShm_memory', 1),
(u'PyShm_semaphore', 1),
(u'PySip_new', 1),
(u'PySurface_Prep', 1),
(u'PySurface_Unprep', 1),
(u'PyTabprm_clear', 1),
(u'PyUFunc_GenericFunction', 1),
(u'PyUFunc_GenericReduceAt', 1),
(u'PyUnits_clear', 1),
(u'PyWcs_clear', 1),
(u'PyWcs_set_cpdis1', 1),
(u'PyWcs_set_cpdis2', 1),
(u'PyWcs_set_det2im1', 1),
(u'PyWcs_set_det2im2', 1),
(u'PyWcs_set_sip', 1),
(u'PyWcs_set_wcs', 1),
(u'PyWtbarr_clear', 1),
(u'PythonCmd_Error', 1),
(u'PythonData_free', 1),
(u'_PyGSL_register_err_object', 1),
(u'_PyGSL_register_error', 1),
(u'_PyLong_Frexp', 1),
(u'_PylibMC_Deflate', 1),
(u'_PylibMC_DoMulti', 1),
(u'_PylibMC_IncrMulti', 1),
(u'_PylibMC_Inflate', 1),
(u'_PylibMC_Pickle', 1),
(u'_PylibMC_RunCasCommand', 1),
(u'_PylibMC_Unpickle', 1),
////
What are the 10 most complicated functions?
-------------------------------------------
i.e. query by cyclomatic complexity
[source,python]
----
python-4Suite-XML-1.0.2:Ft/Xml/XPath/XPathParser.c:parser_parse: 1601
Cython-0.15.1:Cython/Compiler/Lexicon.c:__pyx_pf_6Cython_8Compiler_7Lexicon_make_lexicon: 627
ntop-4.1.0:http.c:returnHTTPPage: 567
scipy-0.10.0:scipy/spatial/qhull/src/global.c:qh_initflags: 471
scipy-0.10.0:scipy/spatial/qhull/src/global.c:qh_initflags: 471
python-4Suite-XML-1.0.2:Ft/Xml/XPointer/XPointerParser.c:parser_parse: 455
python-zmq-2.1.9:zmq/core/constants.c:PyInit_constants: 428
pymol-1.4.1:layer2/Sculpt.c:SculptMeasureObject: 387
pymol-1.4.1:layer2/RepCylBond.c:RepCylBondNew: 386
pymol-1.4.1:layer2/RepWireBond.c:RepWireBondNew: 325
----
To be fair, many of the above are autogenerated (and going through C
preprocessor)
What did the analyzers complain about?
--------------------------------------
Based on a rebuild of all Fedora 17 source RPMs named "python-*" (113 packages, including python itself):
TODO: redo this for *all* of the Python packages (370)
[source,python]
----
[(('cpychecker', 'refcount-too-high'), 851),
(('cpychecker', 'mismatching-type-in-format-string'), 783),
(('cpychecker', 'returns-NULL-without-setting-exception'), 610),
(('clang-analyzer', None), 540),
(('cppcheck', 'nullPointer'), 427),
(('cppcheck', 'syntaxError'), 382),
(('cpychecker', 'null-ptr-dereference'), 295),
(('gcc', 'strict-aliasing'), 221),
(('cpychecker', 'null-ptr-argument'), 194),
(('cpychecker', 'refcount-too-low'), 193),
(('cpychecker', 'flags-within-PyMethodDef'), 160),
(('gcc', 'unused-but-set-variable'), 153),
(('gcc', None), 85),
(('cpychecker', 'usage-of-uninitialized-data'), 62),
(('gcc', 'unused-variable'), 60),
(('gcc', 'parentheses'), 55),
(('gcc', 'deprecated-declarations'), 42),
(('gcc', 'pointer-sign'), 40),
(('gcc', 'maybe-uninitialized'), 37),
(('cppcheck', 'uninitvar'), 26),
(('gcc', 'format'), 26),
(('cppcheck', 'memleakOnRealloc'), 23),
(('cppcheck', 'memleak'), 19),
(('gcc', 'unused-result'), 16),
(('gcc', 'implicit-function-declaration'), 15),
(('gcc', 'unused-label'), 12),
(('gcc', 'int-to-pointer-cast'), 11),
(('cppcheck', 'resourceLeak'), 8),
(('gcc', 'switch'), 6),
(('cpychecker', 'missing-keyword-argument'), 6),
(('gcc', 'return-type'), 5),
(('gcc', 'address'), 4),
(('cpychecker', 'arg-was-not-PyObject-ptr'), 4),
(('cppcheck', 'uninitstring'), 4),
(('gcc', 'uninitialized'), 4),
(('cpychecker', 'not-enough-vars-in-format-string'), 4),
(('cppcheck', 'useClosedFile'), 4),
(('cpychecker', 'read-from-deallocated-memory'), 3),
(('gcc', 'pointer-to-int-cast'), 3),
(('cppcheck', 'invalidScanfFormatWidth'), 2),
(('cpychecker', 'too-many-vars-in-format-string'), 2),
(('gcc', 'sequence-point'), 2),
(('gcc', 'nonnull'), 2),
(('cpychecker', 'returns-pointer-to-deallocated-memory'), 1),
(('cppcheck', 'deallocDealloc'), 1),
(('cppcheck', 'IOWithoutPositioning'), 1),
(('cpychecker', 'call-of-null-function-ptr'), 1),
(('gcc', 'enum-compare'), 1),
(('gcc', 'unused-value'), 1),
(('cppcheck', 'doubleFree'), 1)]
----
What did cpychecker complain about?
-----------------------------------
[source,python]
----
[('refcount-too-high', 851),
('mismatching-type-in-format-string', 783),
('returns-NULL-without-setting-exception', 610),
('null-ptr-dereference', 295),
('null-ptr-argument', 194),
('refcount-too-low', 193),
('flags-within-PyMethodDef', 160),
('usage-of-uninitialized-data', 62),
('missing-keyword-argument', 6),
('arg-was-not-PyObject-ptr', 4),
('not-enough-vars-in-format-string', 4),
('read-from-deallocated-memory', 3),
('too-many-vars-in-format-string', 2),
('returns-pointer-to-deallocated-memory', 1),
('call-of-null-function-ptr', 1)]
----
Reference-counting bugs
-----------------------
[source,python]
----
[('refcount-too-high', 851),
('refcount-too-low', 193)]
----
Leaving an ob_refcnt too high is thus the most common true bug the tool is
reporting
TODO: where are the leaks happening?
Missing Py_INCREF() on Py_None
------------------------------
(Saw this specific instance of 'refcount-too-low' 14 times on the 113 package run)
////
$ for f in 2013-02-24-fedora-17-mass-run/*/*/*/x86_64/static-analysis/reports/*.xml; do grep 'consider using "Py_RETURN_NONE;"' $f ; done
Ft/Xml/src/domlette/document.c:91:1: note: consider using "Py_RETURN_NONE;"
cupsppd.c:671:1: note: consider using "Py_RETURN_NONE;"
cupsppd.c:649:1: note: consider using "Py_RETURN_NONE;"
cupsppd.c:624:1: note: consider using "Py_RETURN_NONE;"
cupsppd.c:606:1: note: consider using "Py_RETURN_NONE;"
cupsppd.c:581:1: note: consider using "Py_RETURN_NONE;"
cupsppd.c:260:1: note: consider using "Py_RETURN_NONE;"
src/xmlpythonizer.c:674:1: note: consider using "Py_RETURN_NONE;"
src/dmidecodemodule.c:672:9: note: consider using "Py_RETURN_NONE;"
psycopg/connection_type.c:562:1: note: consider using "Py_RETURN_NONE;"
psycopg/connection_type.c:562:1: note: consider using "Py_RETURN_NONE;"
psycopg/connection_type.c:562:1: note: consider using "Py_RETURN_NONE;"
psycopg/connection_type.c:562:1: note: consider using "Py_RETURN_NONE;"
ptl/cimport.c:152:1: note: consider using "Py_RETURN_NONE;"
////
[source,c]
----
PyObject*
some_method(PyObject *self, PyObject *args)
{
[...snip...]
/* BUG: loses a reference to Py_None */
return Py_None;
}
----
[source, bash]
----
$ python script.py
Fatal error: deallocating None
----
Fixing Py_INCREF on Py_None
---------------------------
[source,c]
----
PyObject*
some_method(PyObject *self, PyObject *args)
{
[...snip...]
/* Fixed version of the above: */
Py_RETURN_NONE;
}
----
Reference leak in Py_BuildValue with "O"
----------------------------------------
[source,c]
----
/* BUG: reference leak: */
return Py_BuildValue("Oi", some_object_we_own_a_ref_on, 42);
----
[source,c]
----
/* Fixed version of the above: */
return Py_BuildValue("Ni", some_object_we_own_a_ref_on, 42);
----
[source,c]
----
/* If it's just one object, why use Py_BuildValue? */
return some_object_we_own_a_ref_on;
----
What were those format string errors?
-------------------------------------
a.k.a. the 783 'mismatching-type-in-format-string' errors
By function:
[source,python]
----
[(u'PyArg_ParseTuple', 487),
(u'PyArg_ParseTupleAndKeywords', 170),
(u'Py_BuildValue', 135),
(u'PyArg_Parse', 2)]
----
(in a way this is merely expressing what I'm checking for)
650 errors using PyArg_ParseTuple[AndKeywords]
----------------------------------------------
* 9 errors where "i" was used to unpack a 32-bit int value,
but was passed a 64-bit destination (partial initialization, leading
to possible data corruption)
* 282 "errors" where "s", "s#", "z", "z#" was used to unpack a string object
to a "char \*", rather than a "const char *"
* numerous false alarms for the "O" and "O!" code due to the checker not
knowing about the relevant PyTypeObject
So *most* of these are false alarms
489 places lacking error-checking
---------------------------------
[source,python]
----
[('null-ptr-dereference', 295),
('null-ptr-argument', 194)]
----
Do more error-checking, please!
"goto" considered wonderful
---------------------------
[source,c]
----
{
PyObject *local0 = NULL;
PyObject *local1 = NULL;
PyObject *local2 = NULL;
/* etc */
local0 = PyFoo_DoBar();
if (!local0) goto error;
/* etc */
return result;
error:
Py_XDECREF(local2);
Py_XDECREF(local1);
Py_XDECREF(local0);
return NULL;
}
----
DO NOT DO THIS: Py_XDECREF(PyObject_CallObject()) (1)
------------------------------------------------------
[source,c]
----
/*
This is bogus code: Py_XDECREF expands its argument multiple times,
so the function is actually called up to 4 times
(assuming a non pydebug build of CPython).
*/
Py_XDECREF(PyObject_CallObject(callable, args));
/* Seen in the wild in:
python-alsa-1.0.25-1.fc17:
pyalsa/alsamixer.c:179
pyalsa/alsahcontrol.c:190
pyalsa/alsaseq.c:3277
python-4Suite-XML-1.0.2-14.fc17:
Ft/Xml/src/domlette/refcounts.c:80
*/
----
DO NOT DO THIS: Py_XDECREF(PyObject_CallObject()) (2)
------------------------------------------------------
This expands to:
[source, c]
----
do {
if ((PyObject_CallObject(callable, args)) == ((void *)0))
;
else
do {
if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0)
;
else
(*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
PyObject_CallObject(callable, args);
} while (0);
} while (0);
----
Filed as http://bugs.python.org/issue17206
DO NOT DO THIS: Py_XDECREF(PyObject_CallObject()) (3)
------------------------------------------------------
...which is effectively:
[source, c]
----
/* Call it once */
if ((PyObject_CallObject(callable, args)) != NULL) {
/*
If it doesn't raise an exception, leak the reference (BUG 1),
and call it again (BUG 2).
Assume that the second call doesn't raise an exception,
otherwise segfault the interpreter (BUG 3),
and DECREF the result, but don't deallocate if the refcount
is zero (BUGS 4 and 5)
*/
if (--(PyObject_CallObject(callable, args)->ob_refcnt) == 0) {
/*
If the refcount is zero, call it again! (BUG 6)
Assume the result is non-NULL (otherwise segfaulting, BUG 7)
and deallocate whatever you got back (even if the refcount
is non-zero, BUG 8)
*/
(*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
/* and for good measure, call it agains (BUG 9)
and leak a reference to the result (BUG 10) */
PyObject_CallObject(callable, args);
}
}
----
Filed as http://bugs.python.org/issue17206
The correct way to discard the result
-------------------------------------
[source,c]
----
PyObject *result;
result = PyObject_CallObject(callable, args);
Py_XDECREF(result);
/* Presumably the caller will do something about any exception: */
return (result != NULL) ? 0 : -1;
----
Dealing with C and C++ from Python
----------------------------------
* Do you *really* need C?
- Can you get away with pure Python code?
* Consider using Cython
* ctypes is good, but has its own issues
* cffi?
* If you must use C, run cpychecker on your code
In conclusion
-------------
* Intro to "cpychecker"
* How to run the tool on your own code
* How I ran the tool on *lots* of code
* What bugs came up frequently
* Recommendations on dealing with C and C++ from Python
* Q & A
Q & A
-----
[big]#Thanks for listening!#
image::http://i.creativecommons.org/l/by-sa/3.0/88x31.png[Creative Commons License, link="http://creativecommons.org/licenses/by-sa/3.0/deed.en_US"]
This work is licensed under a http://creativecommons.org/licenses/by-sa/3.0/deed.en_US[Creative Commons Attribution-ShareAlike 3.0 Unported License]
cpychecker's mailing list: https://fedorahosted.org/mailman/listinfo/gcc-python-plugin
Source code for talk: https://github.com/davidmalcolm/PyCon-US-2013-Talk
* Slides built using asciidoc's slidy backend
////
See http://kaczanowscy.pl/tomek/2011-09/nice-presentations-in-no-time-with-asciidoc-and-slidy for help with asciidoc and slidy
asciidoc cheatsheet: http://powerman.name/doc/asciidoc
////