Skip to content
arthursribeiro edited this page Apr 8, 2011 · 17 revisions

##master-page:HomepageTemplate #format wiki #language en

Arthur de Souza Ribeiro's Page


Cython's Google Summer of Code Proposal

Project Title: Reimplement C modules in CPython's standard library in Cython.

Student: Arthur de Souza Ribeiro

Organization: Python Software Foundation

Abstract

Cython is a language that makes writing C extensions for the Python language as easy as Python itself. The Cython language is very close to the Python language, but Cython additionally supports calling C functions and declaring C types on variables and class attributes. This allows the compiler to generate very efficient C code from Cython code. This makes Cython the ideal language for wrapping external C libraries, and for fast C modules that speed up the execution of Python code.

CPython's code is actually hard to maintain and hard to read, in this context and having all the powerfull of Cython, the core idea of the project is to rewrite modules in CPython's standard library in Cython that are currently written in C, simplifying the implementation to make it easier for CPython developers to maintain their code base and to try to make the modules even faster than they are to illustrate Cython's optimisation capabilities.

Project Goals

The main project goals are to rewrite C modules in standard CPython's library in Cython and try to make the modules even faster than they are to show off Cython's optimisation capabilities.

To rewrite CPython's modules, you have to see what header the C files use and remove all the configuration code that are inside it. Cython makes things very simpler, because it is very similar to Python, creating short and powerfull libraries to be used in the code.

To implement the modules in Cython and test if they are correct I'm intending to write the cython code, generate the .so and run the Python's regression tests.

For example, in Python, there is a C module called _math.c, one function that belongs to this module is:

double
_Py_asinh(double x)
{
    double w;
    double absx = fabs(x);

    if (Py_IS_NAN(x) || Py_IS_INFINITY(x)) {
        return x+x;
    }
    if (absx < two_pow_m28) {           /* |x| < 2**-28 */
        return x;                       /* return x inexact except 0 */
    }
    if (absx > two_pow_p28) {           /* |x| > 2**28 */
        w = log(absx)+ln2;
    }
    else if (absx > 2.0) {              /* 2 < |x| < 2**28 */
        w = log(2.0*absx + 1.0 / (sqrt(x*x + 1.0) + absx));
    }
    else {                              /* 2**-28 <= |x| < 2= */
        double t = x*x;
        w = m_log1p(absx + t / (1.0 + sqrt(1.0 + t)));
    }
    return copysign(w, x);

}

this Function after reimplemented in Cython is:

cpdef double asinh(double x):
    cdef double w, absx
    absx = fabs(x)

    if Py_IS_NAN(x) or Py_IS_INFINITY(x):
        return x+x
    if absx < two_pow_m28:
        return x
    if absx > two_pow_p28:
        w = log(absx)+ln2
    elif absx > 2.0:
        w = log(2.0*absx + 1.0 / (sqrt(x*x + 1.0) + absx))
    else:
        t = x*x;
        w = log1p(absx + t / (1.0 + sqrt(1.0 + t)));
    return copysign(w, x);

I intend to generate the .so file to this function, use it inside python and run its tests. If passed, I assume the module is ok and that I can replace the older one. For example, in Python 3.2 package, the tests are described in /Lib/tests specially in math case, there is file called test_math.py where the tests for the math module are described. To run its tests I have to do:

$ ./python -m test -v test_math

After doing this to the module I create, If passed, I can assume the module was produced right.

Another activity that I'm intending to do is to profile the code that I created, this is going to be very useful because we measure how efficient the code is, and whether it is meeting expectations. This activity I want to do similar to how Cython community compare the Cython code to the Python one. So, I'm going to discuss to the community how this should be done.

The process described above is the one I'm going to use to reimplement all the C modules I suggested and the one community does in CPython's library.

Roadmap

I'm already programming in Cython, so, I won't have to spend so much time learning language specification. So, I hope to start the project before the start of the program. I decided to implement a certain number of modules in each milestone, to use GSoC time as better as I can.

  • 1st Milestone (April 26th to May 23rd)

In the first week of this milestone I want to stay in contact to my mentor to define strategies that are going to be adopted in the project. By the time I do this, I wonder to take a more deep study about CPython's API.

After that, I don't need a big study period to get familiar with cython things, so, I decided to implement three modules: math, binascii and bisect

  • 2nd Milestone (May 24th to July 11th)

In this milestone, I expect to have the best possible use of my time (It is the longer milestone) and implement six Python modules. The modules are: dis, functools, itertools, collections, time and random

  • Final Milestone (July 12th to Aug 15th)

In the final milestone I am going to implement at least three modules. These modules should be chosen by the Python community to provide maximum value. Attach all the modules to a stable Python version and generate a release to be used with Cython modules in it.

Progress

Project's progress will be reported weekly in my blog [1]. If allowed, I intend to put some tips of Cython (implementation details) to users that I might be using in the GSoC project too.

Why me?

I am a very dedicated and motivated developer and have a good knowledge of Python language, as well as a good knowledge of C language too. I'm a person that really likes challenges and I'm really looking forward to work hard on this project. I already started rewriting some code to understand how work is going to be done and consequently got in contact with Cython too. I'm very excited to contribute with Cython community not only in Google Summer of Code, but, after it too.

Who am I?

My name is Arthur de Souza Ribeiro and I'm a fourth-year student of Computer Science in Federal University of Campina Grande, Brazil. I'm a python programmer and have good knowledge of other languages too, like Java, C, C++, Qt, Grails and ActionScript .

I've already participated of other open source projects, like BRisa UPnP framework[2], that has a version written in Python and other in Qt. Beyond this project, I've worked in projects that involves Python for S60 (Symbian OS nokia cellphones) and porting python applications to maemo devices, like Nokia N800. Also realized another activities like Web Programming, and applications that interact with Twitter Platform. I'm experienced with SVN, Mercurial and GIT, I also have good knowledge of database systems.

Contact Info

Email: <<MailTo(arthurdesribeiro AT SPAMFREE gmail.com DOT com)>>

Blog: http://arthursribeiro.blogspot.com

Freenode: arthursribeiro

References

[1] - http://arthursribeiro.blogspot.com/ [2] - https://garage.maemo.org/projects/brisa/


Email: <<MailTo(arthurdesribeiro AT gmail.com DOT com)>>


CategoryHomepage

Clone this wiki locally