Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a clear interface to and from sympy, deal with function calls in a language-independent way #51

Closed
mstimberg opened this issue Jun 11, 2013 · 10 comments
Assignees

Comments

@mstimberg
Copy link
Member

We need to make clear which strings are interpreted by sympy and which are not. Most of the time using sympy and executing the string in Python/C gives the same result, but there are important differences, e.g. integer fractions (see #38 ).

In principle we want to use sympy only for "mathematics" (differential equations, state updater descriptions), not for "programming" (reset statements, pre/post codes). On the other hand, using a sympy expression allows for using the CCode printer, which replaces some function calls and ** by pow. But we also already have a mechanism in place that deals with functions directly.

Proposed procedure:

  • Have functions that convert between "mathematical strings" and sympy objects. These functions should be always used, never use "sympify" or "str(sympy_object)" directly. This is mostly done already, see parse_to_sympy and sympy_to_str in brian2.codegen.parsing.
  • We don't use the Python/C code printer from sympy, instead functions are translated via the Function system in code generation (e.g. in C we would rather use #define abs fabs instead of using sympy to replace abs calls by fabs)
  • We need a system to add support for new languages to this system, i.e. the system should be mostly language-agnostic. Python has a special role, we use the Python function for unit checking (even though we could not do this and instead have an explicit mechanism for taking care of this).

The only thing I'm not clear about is the translation about ** -- maybe we simply require the use of a power function? Note that this is not as bad as it sounds, it would not affect equations (there we could even support ^) but only abstract code where this will not be used often, anyway.

@ghost ghost assigned mstimberg Jun 11, 2013
@thesamovar
Copy link
Member

I don't think we should require use of a power function - we have to find a way to make ** work even if it means getting dirty and using AST etc. Also, translation to C is a bit more work than just changing function names. There's also things like logical operators: a and b would become a&&b as would a&b, logical_and(a, b), etc. I think sympy worked hard to include all of these. I wonder if there is a way to make sympy not do any rearrangements or simplifications? If so, we could put abstract code through with no problems I guess?

Incidentally, I'm beginning to have doubts about the use of #define in our C code. I had kind of forgotten that when you #define it changes that symbol everywhere, not just in the current source file. I'm not sure how important this is, but we might want to bear it in mind. I came across the problem in the standalone stuff because second is used a lot in the STL, and when I did #define second 1.0 it messed everything up.

@mstimberg
Copy link
Member Author

I don't think we should require use of a power function - we have to find a way to make ** work even if it means getting dirty and using AST etc.

Well, I don't see a strong need for it, but probably you are right.

Also, translation to C is a bit more work than just changing function names. There's also things like logical operators: a and b would become a&&b as would a&b, logical_and(a, b), etc. I think sympy worked hard to include all of these.

Um, I don't think sympy has support for all that, actually. AFAICT you can only use a & b or And(a, b) in sympy expressions. And in general we currently always work with multiplication instead of an and operator -- we don't have a bool type for state variables, anyway. I therefore don't think we can use sympy for these kind of operations. Say we have an expression a & b, this is parseable by sympy and we can use the C code printer to convert it into a && b. But what about Python code? Using the string representation here will lead to And(a, b). We cannot use the Python code printer as it is meant to produce Python code that evaluates to the same sympy expression, in this case it would print:

a = Symbol('a')
b = Symbol('b')
e = And(a, b)

So I think there are three ways to deal with this issue:

  • The current approach: Try to use a subset of Python/C syntax that the two languages have in common, e.g. use multiplication instead of an and operation, and specify a set of functions that can be used.
  • A new approach: Use sympy for everything and define our own Python/C printer.
  • Another new approach: Explicitly define our own mini language (basically using Python syntax of course, but a bit more restricted) and parse this ourselves.

The second option is considerably more work than the current approach (and potentially ties our implementation strongly to sympy, e.g. stuff might break when sympy updates) but would allow us to directly translate functions and operators without any #define hackery. A small detail: If we want abstract code (or rather the right-hand-side of abstract code statements) to be parseable with sympy, we'll always have to deal with the rand() function before that.

The third approach is probably the most work (if we want to get it right), but maybe feasible building on the existing python AST parsing. Creating C or Python code from a parse tree representation would then be quite straightforward, even when it involves replacing ** by power, etc. and rand() wouldn't be a problem. The advantage would be that we have complete control and do not rely on sympy not changing (on the other hand, we'll continue to use sympy for mathematical statements, anyway...).

@thesamovar
Copy link
Member

I'm beginning to think that it might not be so difficult to implement our own mini language actually. Here is a sample implementation that converts ** to pow. It doesn't do everything, but already quite a bit. Note that it introduces lots of unnecessary parentheses, but they are also harmless. This could also be prettified.

import re

def get_identifiers(expr):
    return set(re.findall(r'\b[A-Za-z_][A-Za-z0-9_]*\b', expr))

class Var(object):
    def __init__(self, name):
        self.name = name
        self.items = []
    def __mul__(self, other):
        return MulVar(self, other)
    def __add__(self, other):
        return AddVar(self, other)
    def __pow__(self, other):
        return PowVar(self, other)
    def __call__(self, *args):
        return FuncVar(self, *args)
    def __str__(self):
        return self.name

class OperatorVar(Var):
    def __init__(self, left, right):
        self.items = [left, right]
    left = property(fget=lambda self: self.items[0])
    right = property(fget=lambda self: self.items[1])
    def __str__(self):
        return '(%s)%s(%s)'%(str(self.left), self.op, str(self.right))

class AddVar(OperatorVar):
    op = '+'

class MulVar(OperatorVar):
    op = '*'

class PowVar(OperatorVar):
    op = '**'

class FuncVar(Var):
    def __init__(self, func, *args):
        self.items = [func]+list(args)
    func = property(fget=lambda self: self.items[0])
    args = property(fget=lambda self: self.items[1:])
    def __str__(self):
        argslist = ', '.join(str(arg) for arg in self.args)
        return '%s(%s)'%(str(self.func), argslist)

def parse(expr):
    varnames = get_identifiers(expr)
    ns = dict((varname, Var(varname)) for varname in varnames)
    return eval(expr, ns)

def replace_pow(var):
    newitems = []
    if isinstance(var, PowVar):
        var = FuncVar(Var('pow'), var.left, var.right)
    else:
        var.items = [replace_pow(item) for item in var.items]
    return var

x = parse('a+b*c+d(e)+f**g')
print x
print replace_pow(x)

The result is:

(((a)+((b)*(c)))+(d(e)))+((f)**(g))
(((a)+((b)*(c)))+(d(e)))+(pow(f, g))

This is based on how sympy works internally, but as you say if we have our own system we're not dependent on sympy version changes and we have more control to do what we want. What do you think?

@mstimberg
Copy link
Member Author

I'm tending towards this approach as well. I'm a bit hesitant regarding the eval approach for parsing (sympy does not do this), it feels as if this could go very wrong for unsupported syntax constructs. I did not find any concrete example where it could fail, though ;-) At first I though of issues like using a string such as a or b which would be directly evaluated by Python to True or False, but you can handle this case by overwriting __or__. So I guess by overwriting all the special functions this could actually work and we could give meaningful error messages (e.g. with your example, writing a[:5] + b[:5] would already fail because Var does not have a __setitem__ method, the error message wouldn't be very helpful, though).

The good thing is that we would be very explicit about what operations we support (instead of saying something like: "everything that sympy understands + rand() and randn()") and this would also be tied to the function mechanism in code generation, so we can do the function name translations for C without using #define. The function mechanism still needs a bit of thought, though. Probably the function that goes from the parse tree representation to programming language code needs also access to the namespace/specifiers to get the information about the functions? And the ** operator would be treated like a function?

@mstimberg
Copy link
Member Author

Um, actually and and or don't go through __and__ and __or__ (& and | do), so we can't really catch this issue directly. But I guess implementing __nonzero__ to catch this situation and raise an error still allows us to handle this situation gracefully.

@thesamovar
Copy link
Member

It's a shame we can't use and and or, but as you say, we can raise an error and tell people to use & and | so it's not too bad.

I also like that it's really explicit and lets us control it. Another benefit is that we can do Java output which isn't supported by sympy I think.

I could expand the snippet above into a little module, probably to be included in codegen somewhere. I'll start an issue for it and we can write a list of requirements for the system before I go ahead. I'll write more there.

@mstimberg
Copy link
Member Author

It's a shame we can't use and and or, but as you say, we can raise an error and tell people to use & and | so it's not too bad.

Actually, preventing users from using and and or is more a feature than a bug, we don't have to deal with Python's peculiar interpretation on non-boolean values then... Stuff like 2 < 3 and 4 returning 4, 2 <3 or 4 returning True can be quite confusing.

@thesamovar
Copy link
Member

Agreed!

@mstimberg
Copy link
Member Author

Ok, I added a SympyNodeRenderer to the AST parser (in the syntax_translation branch), we now use a common syntax for equations and abstract code statements -- nice! The way code is generated might seem to be a bit complicated, but I think it is a fairly robust and nicely testable way. It now goes something like:

  1. Equation string is parsed via the ast_parser, resulting in a new string ('v / tau' --> 'Symbol("v") / Symbol("tau")')
  2. This string is evaluated in a sympy namespace, leading to a sympy object
  3. State update does its magic, juggling around with the sympy object
  4. The sympy object is translated back to a string to generate the abstract code
  5. Code generation uses the ast_parser to translate the abstract code string into programming language code

Step 1 + 2 are handled by str_to_sympy, step 4 is sympy_to_str. In the long term, all of this needs a lot more testing, your syntax translation tests already revealed some problems (that are now fixed):

  • Equalities and inequalities are tricky in sympy, e.g. sympyify('v != 0') will simply be True, since "v and 0 are not the same thing". The AST parsing now handles this correctly, i.e. as Ne(v, 0) in sympy terms.
  • The standard StrPrinter (that is also invoked when doing str(sympy_expression)) does not handle and/or/not very nicely: str(str_to_sympy('a & b ')) --> And(a, b), this is taken care of via a new printer class derived from StrPrinter that is used in sympy_to_str: sympy_to_str(str_to_sympy('a & b')) --> (a) and (b)

The parsing currently treats all numbers in equations as floats, I think that is the safest option for now.

@thesamovar
Copy link
Member

Nice! Agreed on numbers in equations being floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants