Clone this wiki locally
This attempts to list suggestions for smaller projects involving Cython, suitable for university assignments or similar. If you are interested, mail the Cython deveoper mailing list or Dag Seljebotn: email@example.com
All tasks will provide experience with Python, test-driven development, and working within and interacting with an open source community.
1. Test framework for directory tree compilation
- Difficulty: Normal
- Type of task: Utility scripting, testing
- Skills required: Basic Python, carefulness
- Cython has lots of tests, but they focus on the Cython language and compiler themselves. Additions to the test framework must be made to regression test things like:
- Cython being able to find include files in their proper location (with directories searched in the appropriate order according to different sets of include paths).
- That different compilation methods (single file vs. multiple file) all give correct results.
- That one module written in Cython is callable from another one.
The target objective is a test framework, meaning convenience utilities for writing the actual tests. The problem with the current test framework in this area is that for each single test a full directory tree would have to be constructed manually; and a small variation would then need a duplication of that tree. This is too much of a hassle and so such tests just aren't written. So what is needed is a simple file format for specifying all the necesarry directories and file variations for a test suite in a single file (Dag Seljebotn has some ideas for the contents if anybody is interested).
- Suggested plan:
- 1. Think about and describe (in cooperation with Cython developers) how tests should be written using the framework. 1. Write such a test, testing something simple. 1. Write a script which generates an appropriate temporary directory structures from the test definition file. 1. Integrate this with the existing test runner (simply feed the directory structure to it, and it takes care of compilation, running any doctests etc.)
2. Support for complex floating point datatype
- Difficulty: Normal
- Type of task: Integrating a new feature in an existing program (50 000 lines)
- Skills required: Good Python and C knowledge, ability to understand a large program structure without taking in all the details
Cython recently got more features for numerical computations. Unfortunately, there's a big hole in Cython's abilities: There's no convenient builtin support for complex datatypes (as in complex numbers). One can use the Python complex objects, but they are way too slow for numerical purposes.
The end-goal is that code like this:
#!python cdef complex double x = 3.0 + 4j, y = 1 - 1j, z z = x * y
results in efficient C code.
- Suggested plan:
1. Write some very simple C code defining complex datatype structs (containing two floats) and some C macros for doing the arithmetic operations with these. Note that we want to support non-C99-compilers so you cannot use the C complex type (though you are welcome to add this in addition). 1. Write a test-case (like the above code), using perhaps only + at first. It will of course fail to compile. 1. Add a complex datatype to
PyrexTypes.py1. Add support for the
Parsing.py. 1. Have a look at
ExprNodes.py. It should no longer construct a Python object directly -- instead it should construct an object of the type added in step 3. This should follow the pattern in
- To do this, one must add "type coercion" to correctly coerce the new complex type to Python complex float objects.
ExprNodes.pyto add complex types as a case and call the appropriate arithmetic macros
3. Possible variable value analysis
- Difficulty: Ambitious (but fun!)
- Type of task: Create elegant and isolated algorithm and code
- Skills required: Good Python knowledge, good problem solving skills
(Also known as flow control analysis.)
Several optimizations can be done automatically if one infers certain things from the code. Below we will focus on whether a variable can be known not to be set to
None -- if so, a check for this can be dropped in the C code on attribute access.
The only kind of statements that will be made by such analysis is of the kind "it is known that this variable is not None at this location". If nothing is known at a certain location, no harm is done (the generated code just runs a little bit slower). Therefore all such analysis is on best-effort basis.
Here's a code example, and the comments indicate what could be inferred by your code:
#!python def f(arg): if arg is not None: print arg.x # arg cannot be None arg = some() try: print arg.x # here arg can be None again as it was assigned to print arg.y # arg cannot be None (as that would raise an exception on previous line) except: print arg.x # arg can be None print arg.x # arg cannot be None, whether or not an exception was raised
The kind of code you need to write is a recursive algorithm working on a tree representing the code. It should work "from the top and downwards", in some ways simulating an interpreter, and record what it knows along the way (e.g. "what does this statement mean for the next line to be executed", "what does this statement mean for the next except block" and so on).
You should probably ask for more details and have an idea of how to proceed before picking this task! Also there is existing algorithms for this kind of thing which can be adapted (though we rather want something simple that covers 70% of the cases now than something perfect that covers everything in a year!)
- Suggested plan:
- 1. Write a very simple unit test. You write some code like the above, and give the existing test framework the test code and your algorithm as input. Then you write code to validate the the right things were detected by your (yet not existing) algorithm about variables (by inspecting the returned tree). 1. Write an algorithm which makes the unit test case succeed. 1. Try to break it with a new difficult testcase. It is ok for the algorithm to gracefully give up in a lot of cases and say "we don't know", but it should never guarantee things which cannot be guaranteed! 1. Repeat with more sophisticated cases (with try-except statements, more complex if-expressions, more than one block inside an if-test and merge what is known when completing the blocks, and so on)
4. Unit test and replace the command line parser
- Difficulty: Easy
- Type of task: Code cleanup and testing
- Skills required: Python, perhaps some API design skills
- There are two things that could use some cleaning up:
Main.pyunder some circumstances calls
CmdLine.pyto parse a command line, and under other circumstances (when used as a library) not. This API is a bit unclean. Instead one should improve the API for using Cython as a library (i.e. on a "build this file and this file with these options" level) and then write
CmdLine.pyas a standalone, isolated client of this library.
CmdLine.pydoes not use the Python
optparsemodule for parsing command line arguments, while it probably should.
In order to do this safely (and not have hundreds of users suddenly discover that their favorite command line no longer work) one should first write unit tests which test all of the existing command line parsing code thoroughly.