Clone this wiki locally
This is in response to this statement by Robert: -- I think this is good to talk about now. Basically, you want to change Cython to use the "visitor pattern" rather than the recursive pattern that it currently uses. This has pros and cons, but I think this could be a good thing, and you are certainly convinced that its needed to get started on the GSoC stuff. -- So, leaving the patch and implementation strategies aside for now I'm going to attempt discussing this from the mountain top. (If a lot of this seems trivial, all the better -- it can then start to serve as documentation for a direction, rather than my "private" efforts :-). If not, it will be good to discuss it.) I honestly believe that any time spent discussing and implementing this now can repay itself many times later on in saved developer hours and a lower learning curve for new developers. If people agree with me in how important this is, I think it would be good to perhaps spend some time at the beginning of dev1 talking and discussing this (with some slides etc.). Here we go: Assertion 1: Development of Cython has a potential for becoming stagnant (if it hasn't already) quickly. My (subjective) felling is that the codebase has a somewhat high learning curve. If one wants to add a fundamentally new features (as opposed to quick fixes, which do currently happen very quickly -- but think type inference!) it looks like one has to be very careful to keep many different effects of one's changes in the head at the same time. Assertion 2: The reason for this state can at least in some part be attributed to some confusion as to what the current code tree *is*. I believe that the reason that the codebase can be practical in the current state is tightly connected with how similar C and CPython is with Python as a language. The design, so to speak, arose in an historical happy accident, because the input and output are almost 1:1 related. A Python function and a C function loosely correspond, and so FuncDefNode can serve the role as both the parsed element and the element due for C serialization. However there is no reason to assume that the 1:1 correspondance holds everywhere; and I think there is a tendency that "ugly hacks" already tend to arise in situations where the correspondance in program structure diverges more. (To consider how this might become worse, think about a "method template node".) If this was only a theoretically founded critisism one might simply dismiss it, but I believe the problems in assertion 1 comes from this very fact, and that they can be solved by dealing with this conceptual problem. The proposed solution: Work towards "The ideal solution" below in tiny, incremental steps. All *new* major features are implemented with the ideal solution in mind, and old code refactored as far as is needed for that to happen, but one doesn't start rewriting working and bugfixed code just for the sake of abstract purity. The ideal solution: A strict pipeline-based approach where the "data" (one or more trees, graphs or similar) is in different stages, with different phases transforming one into the other. I.e, always try to fit what one does into the following example scheme: data stage:string of code -> phase:parsing -> data stage:syntax tree -> phase:expand with statements -> data stage:tree without with statements -> phase:create scopes -> data stage:tree with scopes -> phase:analyse types -> data stage:tree with ".type" attributes -> ... ... data stage:tree closely related to *structure* of C source -> phase:output Whenever something doesn't quit fit in the scheme of naive input/output, split it up into more phases until it does :-) (Digression: Closely related to C structure above means, for instance, no inner functions. No with statements. No untyped variables. That kind of thing. Classes may be included though, since one can create a 1:1 correspondance between C code using the CPython API and a class; so it is not about language "features" as such but some deeper structure.) Note that all phases does not need to be visitors! For instance, the output-to-C phase will stay a recursive process for the foreseeable future. This is a way of thinking, not a recipe for implementation. (And indeed, this way of thinking is already followed to a degree -- but not everywhere, and we get problems at the spots where it is not.) A very important side-effect of this is that it lends itself to "implementing complex Cython statements as more simple Cython statements". (Though ideally "more simple Cython statements" can also be "intermediate simpler instructions" that bridges the gap between Cython and C in a more fine-grained way in order to give more code reuse than there is today. But I digress.) (Implementation notes: This does not necesarrily mean that the classes of the nodes changes between phases; one does whatever is most convenient. For instance, if one has T2 = expand_with_nodes(T1), then T1 is allowed to keep WithStatementNode, T2 is however not, but otherwise the trees are the same. In the same way, if T1 = type_analysis(T0), then ExprNodes in T0 is not allowed to have the type attribute, while ExprNodes in T1 are required to have them. This can be a documentation matter or enforced (in a number of ways), that issue is better left for later.). Final words: This might seem trivial. It is not. The reason is that currently, almost wherever you try to implement two simple phases f and g one ends up having the result of f depend on what is done in g for some nodes, and the result of g depend on f for other nodes. And so neither f(g(T)) nor g(f(T)) can be made to work. The point is, this way of changing how one thinks mostly affects the compiler design as such, more than it is a specific style of coding. Visitors happen to be the most natural way to implement this, but are not the main point. I'd be happy (or, happier than I am today) with a cleanly defined phase-based recursive process. -- Dag Sverre
Comments in retrospect
The point about stagnation was a bit harsh and probably wrong.