New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython compilation error for floating point arguments to range (in synapse generation) #781

Closed
mstimberg opened this Issue Nov 18, 2016 · 15 comments

Comments

Projects
None yet
3 participants
@mstimberg
Member

mstimberg commented Nov 18, 2016

Code like the following will lead to a compilation error in Cython:

S.connect(j='k for k in range(i-3., i+4.) if i!=k', skip_if_invalid=True)
Error compiling Cython file:
------------------------------------------------------------
...
    for _i in range(N_pre):
        _raw_pre_idx = _i + _source_offset

                
        i = _array_neurongroup_i[_raw_pre_idx]
        _iter_high = 4.0 + i
                        ^
------------------------------------------------------------

/home/marcel/.cython/brian_extensions/_cython_magic_fa12725a6193fc9a6e17c2d0896ab48f.pyx:170:25: Cannot assign type 'double' to 'int32_t'

Error compiling Cython file:
------------------------------------------------------------
...

                
        i = _array_neurongroup_i[_raw_pre_idx]
        _iter_high = 4.0 + i
        _iter_step = 1
        _iter_low = (-3.0) + i
                          ^
------------------------------------------------------------

/home/marcel/.cython/brian_extensions/_cython_magic_fa12725a6193fc9a6e17c2d0896ab48f.pyx:172:27: Cannot assign type 'double' to 'int32_t'

We could either use our AST parsing system to raise an argument if the arguments are not integers, or simply force a conversion to integer (which is what e.g. the C++ code is implicitly doing now).

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Nov 18, 2016

Member

I would tend to say raise an error: it's easy to wrap with an int if you want to.

Member

thesamovar commented Nov 18, 2016

I would tend to say raise an error: it's easy to wrap with an int if you want to.

@dsprenkels

This comment has been minimized.

Show comment
Hide comment
@dsprenkels

dsprenkels Feb 8, 2017

Contributor

I would like to try and fix this bug. I think it is important that all runtimes behave in the same way, so there are two obvious fixes:

  1. Implicitly cast floating point types to integers in the cython backend.
  2. Throw an error and also change the other backends to throw an error.

Option 2 will change the API and could maybe break existing production code. So my preference would be to implement 1. (In this case, printing a warning is an additional option.)

Contributor

dsprenkels commented Feb 8, 2017

I would like to try and fix this bug. I think it is important that all runtimes behave in the same way, so there are two obvious fixes:

  1. Implicitly cast floating point types to integers in the cython backend.
  2. Throw an error and also change the other backends to throw an error.

Option 2 will change the API and could maybe break existing production code. So my preference would be to implement 1. (In this case, printing a warning is an additional option.)

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Feb 8, 2017

Member

Re-thinking about this, I tend to agree with @thesamovar and rather raise an error. The reason is that:

  1. in our "abstract code" we mostly try to follow Python semantics, and Python's range function will raise an error for float arguments
  2. casting to float might hide an user error (e.g. if the user wrote range(0.5, 1.5) then they probably did not understand what range is supposed to do here)

Option 2 will change the API and could maybe break existing production code.

This is certainly an important point to consider. However, in the past we have taken the stance that breaking user code is ok if:

  1. the user code was wrong and only working "by accident" (I'd argue that this is the case here, range with float arguments is not correct), and
  2. the error message clearly tells the user how to correct the problem (in this case, the user should use the int function or rewrite it to use integers)

For raising the exception, I wouldn't do it at the numpy/Cython/C++ level, but rather already one step before. The code should be analyzed on the abstract code level, and raise an error if an argument is not an integer. We already have code around that does this kind of analysis (e.g. we use it when we determine the type for a newly introduced variable), but this part of our code base is a bit messy and waiting for some overdue refactoring, so I am not sure whether you want to dive into this...
If you do (which obviously would be much appreciated), I can have a closer look at the code myself and give you some pointers.

Member

mstimberg commented Feb 8, 2017

Re-thinking about this, I tend to agree with @thesamovar and rather raise an error. The reason is that:

  1. in our "abstract code" we mostly try to follow Python semantics, and Python's range function will raise an error for float arguments
  2. casting to float might hide an user error (e.g. if the user wrote range(0.5, 1.5) then they probably did not understand what range is supposed to do here)

Option 2 will change the API and could maybe break existing production code.

This is certainly an important point to consider. However, in the past we have taken the stance that breaking user code is ok if:

  1. the user code was wrong and only working "by accident" (I'd argue that this is the case here, range with float arguments is not correct), and
  2. the error message clearly tells the user how to correct the problem (in this case, the user should use the int function or rewrite it to use integers)

For raising the exception, I wouldn't do it at the numpy/Cython/C++ level, but rather already one step before. The code should be analyzed on the abstract code level, and raise an error if an argument is not an integer. We already have code around that does this kind of analysis (e.g. we use it when we determine the type for a newly introduced variable), but this part of our code base is a bit messy and waiting for some overdue refactoring, so I am not sure whether you want to dive into this...
If you do (which obviously would be much appreciated), I can have a closer look at the code myself and give you some pointers.

@dsprenkels

This comment has been minimized.

Show comment
Hide comment
@dsprenkels

dsprenkels Feb 17, 2017

Contributor

There is a small challenge with fixing this issue. That is that the brian2 compiler is not type-aware. Because of this, we do not know that the invalid float literal passed to range is not an integer, until the very end of the compilation, when Cython tries to add it to an int.

The fix that I would propose is walking the AST of the range arguments to see if there are any float literals that are not (indirectly) passed as an argument to int. This is a bit ugly, but I do not see another straightforward method for checking the type of the range argument.

Contributor

dsprenkels commented Feb 17, 2017

There is a small challenge with fixing this issue. That is that the brian2 compiler is not type-aware. Because of this, we do not know that the invalid float literal passed to range is not an integer, until the very end of the compilation, when Cython tries to add it to an int.

The fix that I would propose is walking the AST of the range arguments to see if there are any float literals that are not (indirectly) passed as an argument to int. This is a bit ugly, but I do not see another straightforward method for checking the type of the range argument.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Feb 17, 2017

Member

Actually we do have the types (although we only distinguish between float, int and boolean), and we do use a walk of the AST to get them. The problem was that we were just not checking the types. There should be examples of type checking in the source code if you search for ast.

Member

thesamovar commented Feb 17, 2017

Actually we do have the types (although we only distinguish between float, int and boolean), and we do use a walk of the AST to get them. The problem was that we were just not checking the types. There should be examples of type checking in the source code if you search for ast.

@dsprenkels

This comment has been minimized.

Show comment
Hide comment
@dsprenkels

dsprenkels Feb 19, 2017

Contributor

Thanks for the hint. I'll go look into it.

Contributor

dsprenkels commented Feb 19, 2017

Thanks for the hint. I'll go look into it.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Feb 21, 2017

Member

Unfortunately the issue is slightly more tricky here. We do parse the AST to get the component of the generator expression, but we do not use our BrianASTRender which annotates the nodes of the AST with additional type information. In principle we could of course do it, but the devil is in the details: for example, we normally do not allow keyword arguments in functions calls (e.g. in equations), so BrianASTRenderer raises an error if it encounters a function call with a keyword argument. In the synapse generator syntax, we do however allow specific keyword arguments and even enforce them to make things clear and explicit: if you use the sample function, you are supposed to write something like sample(5, p=0.5). So we need to figure stuff like this out (e.g. adding an option to the BrianASTRenderer initializer). We shouldn't spend too much time in making this very elegant and general, though, as our AST-parsing/rendering needs a bit of a complete overhaul anyway (we are parsing/rendering everything more often than we should).

Member

mstimberg commented Feb 21, 2017

Unfortunately the issue is slightly more tricky here. We do parse the AST to get the component of the generator expression, but we do not use our BrianASTRender which annotates the nodes of the AST with additional type information. In principle we could of course do it, but the devil is in the details: for example, we normally do not allow keyword arguments in functions calls (e.g. in equations), so BrianASTRenderer raises an error if it encounters a function call with a keyword argument. In the synapse generator syntax, we do however allow specific keyword arguments and even enforce them to make things clear and explicit: if you use the sample function, you are supposed to write something like sample(5, p=0.5). So we need to figure stuff like this out (e.g. adding an option to the BrianASTRenderer initializer). We shouldn't spend too much time in making this very elegant and general, though, as our AST-parsing/rendering needs a bit of a complete overhaul anyway (we are parsing/rendering everything more often than we should).

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Feb 21, 2017

Member

Well a quick hack would be to parse the strings for the keyword arguments not using BrianASTRenderer and then pass just those through it. That would work wouldn't it?

Member

thesamovar commented Feb 21, 2017

Well a quick hack would be to parse the strings for the keyword arguments not using BrianASTRenderer and then pass just those through it. That would work wouldn't it?

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Feb 22, 2017

Member

Well a quick hack would be to parse the strings for the keyword arguments not using BrianASTRenderer and then pass just those through it. That would work wouldn't it?

Yes, I think something along these lines would be the best approach: leave parse_synapse_generator as it is (i.e. it just checks for the general syntax), and then hand over the individual elements returned by it (iterator_kwds etc.) to BrianASTRenderer and check that the types are correct. I realized that in addition to floats in the range call, floats for the element will also make Cython fail:

S.connect(j='1.0*k for k in range(i-3, i+4) if i!=k')

And for completeness, we probably don't want to accept non-boolean conditions, either, i.e. raise an error for

S.connect(j='k for k in range(i-3, i+4) if 42')
Member

mstimberg commented Feb 22, 2017

Well a quick hack would be to parse the strings for the keyword arguments not using BrianASTRenderer and then pass just those through it. That would work wouldn't it?

Yes, I think something along these lines would be the best approach: leave parse_synapse_generator as it is (i.e. it just checks for the general syntax), and then hand over the individual elements returned by it (iterator_kwds etc.) to BrianASTRenderer and check that the types are correct. I realized that in addition to floats in the range call, floats for the element will also make Cython fail:

S.connect(j='1.0*k for k in range(i-3, i+4) if i!=k')

And for completeness, we probably don't want to accept non-boolean conditions, either, i.e. raise an error for

S.connect(j='k for k in range(i-3, i+4) if 42')
@dsprenkels

This comment has been minimized.

Show comment
Hide comment
@dsprenkels

dsprenkels Feb 26, 2017

Contributor

I have looked a lot at the code and this approach seems good to me. I have only one obstacle: In the case of k for k in range(i-3.0, i+4.0) if i!=k I would like to build up the type of the expressions in the range arguments. I can't seem to find any function that allows me to compute the type of the expression. This is needed because stuff like k for k in range(i-int(3.0), i) if i!=k should actually be valid.

Contributor

dsprenkels commented Feb 26, 2017

I have looked a lot at the code and this approach seems good to me. I have only one obstacle: In the case of k for k in range(i-3.0, i+4.0) if i!=k I would like to build up the type of the expressions in the range arguments. I can't seem to find any function that allows me to compute the type of the expression. This is needed because stuff like k for k in range(i-int(3.0), i) if i!=k should actually be valid.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Feb 27, 2017

Member

Not sure whether I understand your question correctly, but the BrianASTRenderer will give you the type of any expression. So, I was thinking of adding something like this in the _add_synapses_generator function of brian2.synapses.synapses (after importing brian_ast from brian2.parsing.bast):

        parsed = parse_synapse_generator(j)
        if parsed['iterator_func'] == 'range':
            # We expect all arguments of the range function to be integers
            for argname, arg in parsed['iterator_kwds'].items():
                annotated = brian_ast(arg, dict(self.variables))
                if annotated.dtype != 'integer':
                    raise SyntaxError('The "%s" argument of the range function was '
                                      '"%s", but it needs to be an '
                                      'integer.' % (argname, arg))

But I tested it with the int function and it does not work correctly (maybe that's the problem you struggled with?) -- it seems we have not resolved external variables/functions at this point, which means that the variables dictionary only contains the variables of the synapses group and therefore the BrianASTRenderer does not know that the int function returns an integer. I'll have a look at what's going wrong there.

Member

mstimberg commented Feb 27, 2017

Not sure whether I understand your question correctly, but the BrianASTRenderer will give you the type of any expression. So, I was thinking of adding something like this in the _add_synapses_generator function of brian2.synapses.synapses (after importing brian_ast from brian2.parsing.bast):

        parsed = parse_synapse_generator(j)
        if parsed['iterator_func'] == 'range':
            # We expect all arguments of the range function to be integers
            for argname, arg in parsed['iterator_kwds'].items():
                annotated = brian_ast(arg, dict(self.variables))
                if annotated.dtype != 'integer':
                    raise SyntaxError('The "%s" argument of the range function was '
                                      '"%s", but it needs to be an '
                                      'integer.' % (argname, arg))

But I tested it with the int function and it does not work correctly (maybe that's the problem you struggled with?) -- it seems we have not resolved external variables/functions at this point, which means that the variables dictionary only contains the variables of the synapses group and therefore the BrianASTRenderer does not know that the int function returns an integer. I'll have a look at what's going wrong there.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Feb 27, 2017

Member

Hmm, actually this is not much of a mystery. We only finally resolve all the identifiers in the code in create_runner_codeobj (which is called at the very end of _add_synapses_generator). As I said earlier, we want to do some quite fundamental refactoring of the code generation pipeline at some point, this would fix issues like this.
For now, a quick-and-dirty solution would be to extend the code I posted earlier as follows (this is not optimal as it does duplicate a bit of work, but it should be ok I think):

        parsed = parse_synapse_generator(j)
        if namespace is None:
            namespace = get_local_namespace(level=level + 1)
        if parsed['iterator_func'] == 'range':
            # We expect all arguments of the range function to be integers
            for argname, arg in parsed['iterator_kwds'].items():
                identifiers = get_identifiers(arg)
                variables = self.resolve_all(identifiers, run_namespace=namespace,
                                             user_identifiers=identifiers)
                annotated = brian_ast(arg, variables)
                if annotated.dtype != 'integer':
                    raise SyntaxError('The "%s" argument of the range function was '
                                      '"%s", but it needs to be an '
                                      'integer.' % (argname, arg))
Member

mstimberg commented Feb 27, 2017

Hmm, actually this is not much of a mystery. We only finally resolve all the identifiers in the code in create_runner_codeobj (which is called at the very end of _add_synapses_generator). As I said earlier, we want to do some quite fundamental refactoring of the code generation pipeline at some point, this would fix issues like this.
For now, a quick-and-dirty solution would be to extend the code I posted earlier as follows (this is not optimal as it does duplicate a bit of work, but it should be ok I think):

        parsed = parse_synapse_generator(j)
        if namespace is None:
            namespace = get_local_namespace(level=level + 1)
        if parsed['iterator_func'] == 'range':
            # We expect all arguments of the range function to be integers
            for argname, arg in parsed['iterator_kwds'].items():
                identifiers = get_identifiers(arg)
                variables = self.resolve_all(identifiers, run_namespace=namespace,
                                             user_identifiers=identifiers)
                annotated = brian_ast(arg, variables)
                if annotated.dtype != 'integer':
                    raise SyntaxError('The "%s" argument of the range function was '
                                      '"%s", but it needs to be an '
                                      'integer.' % (argname, arg))

dsprenkels added a commit to dsprenkels/brian2 that referenced this issue Mar 25, 2017

Add test case for #781
This was fixed in 1c044c9

dsprenkels added a commit to dsprenkels/brian2 that referenced this issue Mar 25, 2017

dsprenkels added a commit to dsprenkels/brian2 that referenced this issue Mar 25, 2017

dsprenkels added a commit to dsprenkels/brian2 that referenced this issue Mar 25, 2017

@dsprenkels

This comment has been minimized.

Show comment
Hide comment
@dsprenkels

dsprenkels Mar 25, 2017

Contributor

I opened a PR in #827. I basically ended up using your solution and adding tests. It is a little bit more clear to me now how your parsing and identifier resolving works.

Contributor

dsprenkels commented Mar 25, 2017

I opened a PR in #827. I basically ended up using your solution and adding tests. It is a little bit more clear to me now how your parsing and identifier resolving works.

dsprenkels added a commit to dsprenkels/brian2 that referenced this issue Mar 27, 2017

dsprenkels added a commit to dsprenkels/brian2 that referenced this issue Mar 28, 2017

dsprenkels added a commit to dsprenkels/brian2 that referenced this issue Mar 28, 2017

@dsprenkels

This comment has been minimized.

Show comment
Hide comment
@dsprenkels

dsprenkels Mar 28, 2017

Contributor

This issue can now be closed.

Contributor

dsprenkels commented Mar 28, 2017

This issue can now be closed.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Mar 29, 2017

Member

This issue can now be closed.

Yes, indeed! (You probably know this already, but if you include "Fixes #781" or "Closes #781" in a commit message or the PR description, then the bug will be automatically closed on merge).

Member

mstimberg commented Mar 29, 2017

This issue can now be closed.

Yes, indeed! (You probably know this already, but if you include "Fixes #781" or "Closes #781" in a commit message or the PR description, then the bug will be automatically closed on merge).

@mstimberg mstimberg closed this Mar 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment