Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for "any" built-in #246

Merged
merged 55 commits into from Jun 24, 2019

Conversation

@bksahu
Copy link
Member

commented Feb 6, 2019

What does this PR do?

Enhancements

  • Any builtin Node added
  • Specialized optimizing algorithm
  • Tests for Any built-in

Fixes

Why was it initiated? Any relevant Issues?

  • optimizes the any built-in from #232
  • fixes #349

PR Checklist

  • Correct base branch selected? develop for new features and bug fixes too. If it's
    part of a hotfix, it will be moved to master during it.
  • All tests still pass. Check the developer manual about Running the Tests. There
    are Travis tests that cover the most important things however, and you are
    welcome to rely on those, but they might not cover enough.
  • Ideally new features or fixed regressions ought to be covered via new tests.
  • Ideally new or changed features have documentation updates.
return None

def computeExpression(self, trace_collection):
return self.getValue().computeExpressionAny(

This comment has been minimized.

Copy link
@kayhayen

kayhayen Feb 6, 2019

Member

The computeExpressionAny would be what is missing. You can see something similar at:

def computeExpressionLen(self, len_node, trace_collection):

This comment has been minimized.

Copy link
@kayhayen

kayhayen Feb 8, 2019

Member

Actually I was sort of misleading you here. The slot concept in Nuitka is the one of Python. There are slots for __float__, __int__, and there is __len__ for types to overload. For any, I do not think that is the case necessarily, although I would have to check.

My suspect is that is uses mere iteration. We would have to check the implementation in Python. I am going to send you a link for what code to look at, once I made a move of a repository not currently under Nuitka organisation, although it should be. So allow a few hours for that.

This comment has been minimized.

Copy link
@bksahu

bksahu Feb 11, 2019

Author Member

@kayhayen You are right. Both len and any use iteration as input. Still, I have created the slots for any.

This comment has been minimized.

Copy link
@kayhayen

kayhayen Mar 6, 2019

Member

You shouldn't do that though, it's not a slot. There are methods like canPredictIterationValues has hasTypeSlotLen that are intended to be used.

For cases where the argument can predict the iteration values, then we should try and look at them, and see if any of them is true. We would have to discuss however, how you avoid infinite checks. Since range(40000) can predict the values, do we really want to consider all of its predicted values? We need to limit ourselves here somewhat, or make it possible to query generic traits of an iterable.

That however is not there yet, so probably checking for iteration size < 256 or so, is what we will do right now.

@kayhayen kayhayen changed the title WIP: Added any buildin Adding support for "any" built-in Feb 6, 2019

@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

Thanks for starting off right away, I will follow up on your question regarding how one can see the effect, later today, also with hints about testing specifically the built-in.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

The API documentation, which is not current automatically updated (but then again, it's not like it's changing every minute) is here:

http://nuitka.net/apidoc/html/classnuitka_1_1nodes_1_1ExpressionBases_1_1ExpressionBase.html

There is a couple of flaws in there, mostly because it becomes visible for the first time. For instance, top level "namespaces" contains only nuitka.containers.odict which is non-sense. But other things are good. The page I linked above is actually pretty insightful.

I will be aiming at making answers into API documentation improvements as well, but for now we only got what is there. If you find an undocumented virtual method, it is extremely like to be documented in the base class what it is supposed to do. The doc is not duplicated into every derived class. Not sure if it should be too, but I totally see how that is less friendly to you as a reader of the len built-in node.

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Feb 6, 2019

I would also like to contribute to the docs. How can I do so?

@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

@bksahu so we are using doxygen for the API docs, and that is easy to do yourself. Obviously you need to install doxygen, then top level in the Nuitka repository, you execute:

doxygen doc/Doxyfile

And there is a html folder that you can browse and open the index.html in your web browser.

The doc is influenced by the function Python docstrings mainly of course. I am currently unsure how C++ documentation should look like, what it expects there. But the Python side is generally the one that needs it most.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 8, 2019

I am currently considering to go black formatted code, checkout #138 where I have been discussing this. It touches a lot of code, and will cause merge conflicts with existing PRs.

Therefore, I would offer you to merge this one, disabling the extractor link much like with I did with the len example I gave in the GSoC issue. And only then would I apply black, and you would resume.

For that it probably would be a good idea for you to push all you have so far to this branch, should you have been working some more. What you got so far is perfect, apart of the missing BUILTIN_ANY C code, and the computeExpression not being all that correct yet. Let me know @bksahu when you are ready for the black re-format.

Goal is to eliminate manual formatting of code entirely, and with many PRs from many people going to come, I think that is necessary and must be done ASAP indeed. I definitely want you to not suffer anything at all from that though.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 8, 2019

Ah, and one more thing, in tests/basics/BuiltinsTest.py you find examples of optimizable and non-optimizable uses of Python built-ins. For any you would add tests. The reason CI is saying it's OK, is because we have no use of any in the test codes.

You can execute that test by running tests/basics/run_all.py search BuiltinsTest which will compare CPython and Nuitka outputs. Therefore a lot of print is going on in these tests.

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Feb 9, 2019

@kayhayen Please give me some time. I will push the remaining code ASAP.

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Feb 9, 2019

@kayhayen I have pushed the C code. And as per my understanding, any should be doing what len is doing i.e checking the input value.

@kayhayen kayhayen added the gsoc2019 label Feb 9, 2019

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Feb 10, 2019

@kayhayen What do you think about rather than slots for every method (eg. hasShapeSlotAny and hasShapeSlotLen) we can have slots for input types (eg. hasShapeIterable)? This way we can reduce the manual work.

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Feb 11, 2019

Compiling with Nuitka on this script with verbose flag :

def f():
    x = something()
    return any(x)

Gives the following output :

Nuitka:DEBUG:any_test.py:1 : new_expression : Using original '__file__' value.
Nuitka:DEBUG:any_test.py:3 : new_builtin_ref : Module variable 'any' found to be built-in reference.
Nuitka:DEBUG:any_test.py:3 : new_builtin : Replaced call to built-in 'any' with built-in call 'EXPRESSION_BUILTIN_ANY'.
@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 11, 2019

Very well, I was updating the Developer_Manual with an explanation of what slots are in Python and how they are represented in Nuitka. It's currently on factory branch, but I will soon make a new pre-release with blacked source code and enhanced manual.

Getting black and pylint to play along is sort of a struggle. Please read what I wrote there and let me know if you think it helps. I was offline for a couple of hours yesterday and used the time to write up what I thought is missing:

https://github.com/Nuitka/Nuitka/blob/factory/Developer_Manual.rst#python-slots-in-optimization

I sort of try to add what we do. Also added instructions for the API doc how to create that. The next chapter is also new, and then this one too:

https://github.com/Nuitka/Nuitka/blob/factory/Developer_Manual.rst#built-in-call-optimization

Although I stopped short there. But this is going to be information that you mostly have. What is missing there is how you deduce what a built-in does. I will later send you links like these ones:

https://github.com/Nuitka/Nuitka-references/blob/471b125f6ab586e91d2e200667a62345ac0eb58a/final/Python-3.7.2/Python/bltinmodule.c#L1580

and

https://github.com/Nuitka/Nuitka-references/blob/471b125f6ab586e91d2e200667a62345ac0eb58a/final/Python-3.7.2/Python/bltinmodule.c#L411
https://github.com/Nuitka/Nuitka-references/blob/471b125f6ab586e91d2e200667a62345ac0eb58a/final/python2.7-2.7.6/Python/bltinmodule.c#L127

These are CPython API uses, that you will have to understand. You can read about those here:

https://docs.python.org/2/c-api/index.html or https://docs.python.org/3/c-api/index.html

I would give you an introduction into the critical part for us in a tele session. More about that in another note.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 11, 2019

I meant to say, I recommend you have your own checkout of that references repo too. It is missing a README and contains very obsolete PyPy (in fact I never looked at it after I added it, but that it's probably wrong) and has a few inconsistencies.

Also I would like to describe how it's meant to be used, but yeah, until the other day, it was still private, but I think it's very useful to not have to search for pointers to make all over the place, and makes it easy to diff things, git grep on multiple versions, etc.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

@bksahu You can try and resolve the conflicts from blackened code change, but I can do it for you too, as this is fairly unusual and I have done this before. It should be relatively easy.

I need and want to make pre-commit hooks that execute automatic Python and C code formatting at the moment of commit, but they are not there yet. Basically black nuitka is what should be done. I have excluded tests for now.

bksahu added 2 commits Feb 13, 2019

@kayhayen kayhayen force-pushed the Nuitka:develop branch from d7af90e to 7a1269a Feb 20, 2019

@bksahu

This comment has been minimized.

Copy link
Member Author

commented May 21, 2019

@kayhayen If I fix this pylint error for py27 then I'm going to get many errors for py3. What should I do?

@bksahu

This comment has been minimized.

Copy link
Member Author

commented May 21, 2019

Also, I think there is a few places to add iteration handles for, e.g. (x)range and that would lead to another change, in what makes sense. Basically all that does canPredictIterationValues() should receive a handle that is not None.

As every node is a child node of ExpressionBase, isn't it already has access to getIterationHandle? Do you want me to replace every getIterationValue with our handle?

@kayhayen

This comment has been minimized.

Copy link
Member

commented May 23, 2019

Regarding getIterationHandle vs getIterationValue, this is what I want you do do:

a) Look at every existing getIterationValue and consider adding an getIterationHandle that could later be used as a replacement. Or like you did, if we can find a way to really have neither, then add a TODO item for that.

b) By doing so, you will make sure that the new style any will be roughly as optimal as the older approach, where we used getIterationValue but had to change to the new one.

c) Later, we can make the switch, when we think we are there. For new stuff, definitely we will prefer the handle, as cleanups, gradually switch over all methods related to iteration right now.

Yours,
Kay

@kayhayen

This comment has been minimized.

Copy link
Member

commented May 23, 2019

The PyLint for Python2 on Travis is old and sometimes may have bugs. The promise to also lint Python2 code using their new Python3 code base, has not yet been fulfilled by the PyLint project yet.

@kayhayen

This comment has been minimized.

Copy link
Member

commented on nuitka/nodes/ContainerMakingNodes.py in 69f42aa May 24, 2019

These are not constants, I think you need a dedicated implementation here, that works on getElements() of these container making nodes instead.

This comment has been minimized.

Copy link
Member Author

replied May 24, 2019

Yeah. I thought so. The thing is I'm unable to invoke the debugger in that script. I have tried print((1,2)) and print([1, 2]) but no success. Could you give me some example of how I can invoke the debugger on this module?

This comment has been minimized.

Copy link
Member

replied May 24, 2019

You question is, because you have been unable to trigger the iteration handle use?

Well, first of all, [1,2] is directly made a EXPRESSION_CONSTANT_REF_LIST and not an EXPRESSION_MAKE_LIST.

I think you ought to aim at testing this: any([some_var, 5]), as that will be that node type, and your any built-in code will then trigger its use. For random access, that is basically not used yet, we just added it with a plan to use it, but any does not, but maybe some other optimization will later.

This comment has been minimized.

Copy link
Member

replied May 24, 2019

I edited the example, so it's nice for any and actually ought to be optimized, preserving the side effect of accessing some_var if it needs to be.

print(any([None, None, None]))
print(any([None, 4, None]))
print(any([]))
print(any(range(20)))

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 26, 2019

Member

Because it is so small, 20 elements only, it is optimized into a list soon. Only ranges larger than 256 elements are not optimized, so I suggest you go with 260 elements, and try all kinds of ranges, with one, two and 3 arguments, each time making sure it's more than 256 elements.

This comment has been minimized.

Copy link
@bksahu

bksahu May 27, 2019

Author Member

Okay. So, I have added tests with >256 elements. But I'm failing those tests because of this assert statement https://github.com/Nuitka/Nuitka/pull/246/files/ba4ad49960f2fcac76973cf6c503210e3bc7a6af#diff-bcd50821425789a3c9bfd17b9c3d16f1R229

I tried to move them into a try-catch block but that didn't help either.

This comment has been minimized.

Copy link
@bksahu

bksahu May 27, 2019

Author Member

So, Instead of asserting should I return None? Is that a good idea?

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 27, 2019

Member

Your assert looks suspicious, can it be that you are comparing a function and an integer? That is going to give pretty random results.

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 27, 2019

Member

Also, why would be size matter. We took care in the any node to not try too many things, by aborting after 256 checks, that could be good enough, but it means the user needs to take care, but the producer of the handle, it cannot tell if that is an issue to you or not, so I think it should just represent the best possible knowledge.

This comment has been minimized.

Copy link
@bksahu

bksahu May 27, 2019

Author Member

And another thing I didn't check immediately, but are you making sure that e.g. getLow().getIntegerValue() is actually working and not returning None. I didn't yet see that, but it would definitely be needed to have None tests.

As the get methods are in the Base class, I have put them in try catch block which will catch if getHigh().getIntergerValue() gives an attribute error which it will when getHigh() returns a None. https://github.com/Nuitka/Nuitka/pull/246/files#diff-bcd50821425789a3c9bfd17b9c3d16f1R223

So, as far as None tests are concerned every time I am testing for any(range(260)) it is also taking care of the None test case.

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 27, 2019

Member

I really, really dislike that kind of error handling. I prefer them to be checks, please change these things to explicit checks.

E.g. will getHigh().getIntergerValue() hopefully always raise an AttributeError and that is very hard to spot (misspelling), and a failed re-factoring, etc. could also cause this. Code should never be that fragile. I could well imagine something deep behind that function to raise an AttributeError due to other programming mistakes.

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 27, 2019

Member

And this is definitely wrong in one way or the other: assert constant_value.getIterationLength < 10000, either it is a coding rule violation, namely an integer attribute named like a function, or a function not being called.

I think you do this here:

>>> class C(object):
...    def f():
...       pass
... 
>>> C().f < 10000
False

The value is pretty random. I believe, it is comparing the pointer to the object that False is to the bound function object, and that can go either way.

Since this is a common mistake, in Python3, it gives: TypeError: unorderable types: method() < int() but not so in Python2.

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 27, 2019

Member

Just to be clear, avoid using exception handling for control flow at all costs. When an exception hits, your first instinct seems to be to catch it. Instead, you need to make sure it doesn't raise.

The first approach leads to very fragile and potentially slow code that makes it hard to understand the context you are in. If things can happen in an exception handler, and mean AttributeError is supposed to me e.g. something is None that is very un-obvious.

This comment has been minimized.

Copy link
@bksahu

bksahu May 28, 2019

Author Member

And this is definitely wrong in one way or the other: assert constant_value.getIterationLength < 10000, either it is a coding rule violation, namely an integer attribute named like a function, or a function not being called.

I think you do this here:

>>> class C(object):
...    def f():
...       pass
... 
>>> C().f < 10000
False

The value is pretty random. I believe, it is comparing the pointer to the object that False is to the bound function object, and that can go either way.

Since this is a common mistake, in Python3, it gives: TypeError: unorderable types: method() < int() but not so in Python2.

Yes, I know that I was missing () after getIterationLength and I have already updated it. You are probably looking at an older commit.

nuitka/nodes/ContainerMakingNodes.py Show resolved Hide resolved
@abstractmethod
def getNextValueExpression(self):
"""Abstract method to get next iteration value."""
return

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 26, 2019

Member

Doesn't pylint hate those useless returns? Puting the docstring is enough, can leave out the return statement, unless this would be a default implementation actually called by base class manually.

This comment has been minimized.

Copy link
@bksahu

bksahu May 27, 2019

Author Member

Nope. Initially, I used pass in place of return, pylint2 was not happy with it.

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 27, 2019

Member

Yes, pass is also not needed.

You see, doc strings in Python are statements. A string as a statement has not much effect, and therefore, Python allows its parser to special case those strings as first statements.

But that is why def f(): "" is a complete function definition. Python injects a return at the end of each function body automatically. And the pass is not needed, because, well there is already a statement, that is why it complained. The fact that PyLint does not complain about the return doesn't convince me of it being good. Just remove them, and PyLint and I will both be happy.

As far as blocks go, you can do these:

while 1:
   ""

Not that are any good, just to illustrate what is happening. PyLint is probably going to hate that one though, because it's not a doc string. Statement without effect is what I would expect to be warned about.

So documenting your functions has the extra bonus of allowing to remove the pass that you initially likely had there.

return "<%s of %r>" % (self.__class__.__name__, self.constant_node)

def getNextValueExpression(self):
"""Return the next iteration value or StopIteration exception

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 26, 2019

Member

This one has a better doc string explaining the contract than the base class. It should say that None return thing too.

Sequential access of the expression
"""

def __init__(self, constant_node):

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 26, 2019

Member

So this one is getting a list with the element values, instead of a node, right? So why is it named constant_node, that seems misleading.

def __init__(self, constant_value):
ConstantRangeIterationHandleBase.__init__(self, constant_value)
assert self.constant_value.isExpressionBuiltinRange3()
self.constant = range(self.low, self.high, self.step)

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 26, 2019

Member

This is calling range which creates an arbitrarily large list. I think we said, you can use a constant, but it would have to be an xrange (the one from nuitka.__past__).

iteration_value = next(self.iter)
except StopIteration:
return StopIteration
return bool(iteration_value)

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 26, 2019

Member

The type of iteration_value is not what you treat it as here. It is a node, and should be asked its truth value. This is going to be true in this form always.

bksahu added 2 commits May 27, 2019
print(any(range(260)))
print(any(range(1, 270)))
print(any(range(2, 1024, 5)))
except Exception as e:

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 27, 2019

Member

Why catch an exception, there is none that should be raised?

Removed
try catch block from IterationHandles and BuiltinsTest
@kayhayen

This comment has been minimized.

Copy link
Member

commented on nuitka/nodes/IterationHandles.py in d9cfcf9 May 28, 2019

Two things to learn here.

They say "DRY" rule which means, "don't repeat yourself". So do not make repeated function calls, who knows how expensive it will be to get that integer value, so use it, it will always be better to assign such call result to a variable, and give it a name. That name should also work like a comment, does it not, explaining what you think this is.

Second, never ever optionally assign an attribute in a __init__, that is also very bad, esp. as the next line is not conditional on it, it will get an AttributeError.

So why not just assign self.low even if it is None, and then treat is None for your constant. I also doubt that self.constant has a value that is useful, there is nothing an xrange really offers, except maybe the start, stop, step attributes, so why not replace it with that?

This comment has been minimized.

Copy link
Member Author

replied May 28, 2019

You are right about DRY, that's why only I use try catch block before. But it turns about using a try catch block or something that assigns optionally is not a good idea. Although I am afk now, but I'm thinking something like this:

def __init__(self, constant):
   self.constant = constant
   self.low = constant.getLow().getIntegerValue()
   self.high = constant.getHigh().getIntegerValue()
   self.step = constant.getStep().getIntegerValue()
   self.iter = iter(xrange(self.low, self.high, self.step))

This comment has been minimized.

Copy link
Member

replied May 29, 2019

That's much better indeed, although I think not every of these values needed to be attached to self necessarily

@kayhayen
Copy link
Member

left a comment

Cosmetic stuff, but needed

@@ -354,6 +395,8 @@ def getIterationValue(self, element_index):

result = low + step * element_index

result = low + step * element_index

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 31, 2019

Member

That seems like a mistaken change to getIterationValue

This comment has been minimized.

Copy link
@kayhayen

kayhayen Jun 19, 2019

Member

Still present?

@@ -828,7 +844,6 @@ def getDetails(self):
def makeConstantRefNode(constant, source_ref, user_provided=False):
# This is dispatching per constant value and types, every case
# to be a return statement, pylint: disable=too-many-branches,too-many-return-statements

This comment has been minimized.

Copy link
@kayhayen

kayhayen May 31, 2019

Member

These are two separate comments, should not remove the new line between them.

bksahu added 2 commits May 31, 2019

@kayhayen kayhayen force-pushed the Nuitka:develop branch from 3580003 to 38893e6 Jun 13, 2019

@kayhayen kayhayen merged commit 45d791f into Nuitka:develop Jun 24, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
kayhayen added a commit that referenced this pull request Jul 8, 2019
Adding support for "any" built-in (#246)
* Added support for "any" builtin

* This introduces iteration handles, which should be used to get an handle for a node
  that describes its iteration state.

* Also meta class trick generalized for re-use with iteration handles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.