Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to predict for dict and set while optimizing "any" built-in #349

Closed
bksahu opened this issue Apr 11, 2019 · 10 comments · Fixed by #246

Comments

@bksahu
Copy link
Member

commented Apr 11, 2019

As discussed in #246 : Unable to predict dict and set datatypes while optimizing "any" built-in because those datatypes are not indexable.
I suspect this is primarly because this line while optimization

truth_value = value.getIterationValue(i).getTruthValue()

Reproducer

1. Script: set.py

print(any(set([0, 1, 2, 3, 3])))

Executed:
$ ~/Nuitka/bin/nuitka3-run --verbose set.py

Output:

Nuitka:DEBUG:set.py:1 : new_expression : Using original '__file__' value.
Nuitka:DEBUG:set.py:1 : new_builtin_ref : Module variable 'print' found to be built-in reference.
Nuitka:DEBUG:set.py:1 : new_builtin_ref : Module variable 'any' found to be built-in reference.
Nuitka:DEBUG:set.py:1 : new_builtin_ref : Module variable 'set' found to be built-in reference.
Nuitka:DEBUG:set.py:1 : new_builtin : Replaced call to built-in 'set' with built-in call 'EXPRESSION_BUILTIN_SET'.
Nuitka:DEBUG:set.py:1 : new_builtin : Replaced call to built-in 'any' with built-in call 'EXPRESSION_BUILTIN_ANY'.
Nuitka:DEBUG:set.py:1 : new_constant : Built-in call to 'set' pre-computed. Predicted constant result.
Problem with statement at set.py:1:
-> print(any(set([0, 1, 2, 3, 3])))

Nuitka:INFO:Interrupted while working on '<Node 'PYTHON_MAIN_MODULE' with {'filename': 'set.py', 'main_added': False, 'mode': 'compiled'}>'.
Traceback (most recent call last):
  File "/home/bksahu/Nuitka/nuitka/__main__.py", line 184, in <module>
    main()
  File "/home/bksahu/Nuitka/nuitka/__main__.py", line 177, in main
    MainControl.main()
  File "/home/bksahu/Nuitka/nuitka/MainControl.py", line 745, in main
    main_module = createNodeTree(filename=filename)
  File "/home/bksahu/Nuitka/nuitka/MainControl.py", line 139, in createNodeTree
    Optimization.optimize(main_module.getOutputFilename())
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 545, in optimize
    finished = makeOptimizationPass(initial_pass=False)
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 456, in makeOptimizationPass
    changed = optimizeModule(current_module)
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 167, in optimizeModule
    changed = optimizeCompiledPythonModule(module)
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 100, in optimizeCompiledPythonModule
    module.computeModule()
  File "/home/bksahu/Nuitka/nuitka/nodes/ModuleNodes.py", line 461, in computeModule
    trace_collection=self.trace_collection
  File "/home/bksahu/Nuitka/nuitka/nodes/StatementNodes.py", line 165, in computeStatementsSequence
    new_statement = statement.computeStatementsSequence(trace_collection)
  File "/home/bksahu/Nuitka/nuitka/nodes/FrameNodes.py", line 182, in computeStatementsSequence
    new_statement = trace_collection.onStatement(statement=statement)
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 550, in onStatement
    new_statement, change_tags, change_desc = statement.computeStatement(self)
  File "/home/bksahu/Nuitka/nuitka/nodes/StatementNodes.py", line 229, in computeStatement
    trace_collection.onExpression(expression=self.getExpression())
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 531, in onExpression
    r = expression.computeExpressionRaw(trace_collection=self)
  File "/home/bksahu/Nuitka/nuitka/nodes/ExpressionBases.py", line 1038, in computeExpressionRaw
    expression = trace_collection.onExpression(sub_expression)
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 531, in onExpression
    r = expression.computeExpressionRaw(trace_collection=self)
  File "/home/bksahu/Nuitka/nuitka/nodes/ExpressionBases.py", line 1038, in computeExpressionRaw
    expression = trace_collection.onExpression(sub_expression)
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 531, in onExpression
    r = expression.computeExpressionRaw(trace_collection=self)
  File "/home/bksahu/Nuitka/nuitka/nodes/ExpressionBases.py", line 1057, in computeExpressionRaw
    return self.computeExpression(trace_collection=trace_collection)
  File "/home/bksahu/Nuitka/nuitka/nodes/BuiltinAnyNodes.py", line 46, in computeExpression
    any_node=self, trace_collection=trace_collection
  File "/home/bksahu/Nuitka/nuitka/nodes/ConstantRefNodes.py", line 383, in computeExpressionAny
    truth_value = value.getIterationValue(i).getTruthValue()
  File "/home/bksahu/Nuitka/nuitka/nodes/ConstantRefNodes.py", line 208, in getIterationValue
    constant=self.constant[count], source_ref=self.source_ref
TypeError: 'set' object does not support indexing

2. Script: dict.py

print(any({1:"One", 2:"Two"}))

Executed:
$ ~/Nuitka/bin/nuitka3-run --verbose dict.py

Output:

Nuitka:DEBUG:dict.py:1 : new_expression : Using original '__file__' value.
Nuitka:DEBUG:dict.py:1 : new_builtin_ref : Module variable 'print' found to be built-in reference.
Nuitka:DEBUG:dict.py:1 : new_builtin_ref : Module variable 'any' found to be built-in reference.
Nuitka:DEBUG:dict.py:1 : new_builtin : Replaced call to built-in 'any' with built-in call 'EXPRESSION_BUILTIN_ANY'.
Problem with statement at dict.py:1:
-> print(any({1:"One", 2:"Two"}))

Nuitka:INFO:Interrupted while working on '<Node 'PYTHON_MAIN_MODULE' with {'filename': 'dict.py', 'main_added': False, 'mode': 'compiled'}>'.
Traceback (most recent call last):
  File "/home/bksahu/Nuitka/nuitka/__main__.py", line 184, in <module>
    main()
  File "/home/bksahu/Nuitka/nuitka/__main__.py", line 177, in main
    MainControl.main()
  File "/home/bksahu/Nuitka/nuitka/MainControl.py", line 745, in main
    main_module = createNodeTree(filename=filename)
  File "/home/bksahu/Nuitka/nuitka/MainControl.py", line 139, in createNodeTree
    Optimization.optimize(main_module.getOutputFilename())
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 545, in optimize
    finished = makeOptimizationPass(initial_pass=False)
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 456, in makeOptimizationPass
    changed = optimizeModule(current_module)
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 167, in optimizeModule
    changed = optimizeCompiledPythonModule(module)
  File "/home/bksahu/Nuitka/nuitka/optimizations/Optimization.py", line 100, in optimizeCompiledPythonModule
    module.computeModule()
  File "/home/bksahu/Nuitka/nuitka/nodes/ModuleNodes.py", line 461, in computeModule
    trace_collection=self.trace_collection
  File "/home/bksahu/Nuitka/nuitka/nodes/StatementNodes.py", line 165, in computeStatementsSequence
    new_statement = statement.computeStatementsSequence(trace_collection)
  File "/home/bksahu/Nuitka/nuitka/nodes/FrameNodes.py", line 182, in computeStatementsSequence
    new_statement = trace_collection.onStatement(statement=statement)
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 550, in onStatement
    new_statement, change_tags, change_desc = statement.computeStatement(self)
  File "/home/bksahu/Nuitka/nuitka/nodes/StatementNodes.py", line 229, in computeStatement
    trace_collection.onExpression(expression=self.getExpression())
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 531, in onExpression
    r = expression.computeExpressionRaw(trace_collection=self)
  File "/home/bksahu/Nuitka/nuitka/nodes/ExpressionBases.py", line 1038, in computeExpressionRaw
    expression = trace_collection.onExpression(sub_expression)
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 531, in onExpression
    r = expression.computeExpressionRaw(trace_collection=self)
  File "/home/bksahu/Nuitka/nuitka/nodes/ExpressionBases.py", line 1038, in computeExpressionRaw
    expression = trace_collection.onExpression(sub_expression)
  File "/home/bksahu/Nuitka/nuitka/optimizations/TraceCollections.py", line 531, in onExpression
    r = expression.computeExpressionRaw(trace_collection=self)
  File "/home/bksahu/Nuitka/nuitka/nodes/ExpressionBases.py", line 1057, in computeExpressionRaw
    return self.computeExpression(trace_collection=trace_collection)
  File "/home/bksahu/Nuitka/nuitka/nodes/BuiltinAnyNodes.py", line 46, in computeExpression
    any_node=self, trace_collection=trace_collection
  File "/home/bksahu/Nuitka/nuitka/nodes/ConstantRefNodes.py", line 383, in computeExpressionAny
    truth_value = value.getIterationValue(i).getTruthValue()
  File "/home/bksahu/Nuitka/nuitka/nodes/ConstantRefNodes.py", line 208, in getIterationValue
    constant=self.constant[count], source_ref=self.source_ref
KeyError: 0

python -m nuitka --version
0.6.3rc3 (Branch: optimize_buildin_any #246)
Python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
Executable: ~/anaconda3/envs/nuitka/bin/python3
OS: Linux
Arch: x86_64

@kayhayen

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

Thanks for reporting this, great find.

A few things I want to highlight @bksahu about optimization in Nuitka. First of all, I have adopted a method, where I give each level of optimization its own spin, and your finding is evidence of it being useful. So, what I would do, is first:

a) Run all world test without the optimization (done already)
b) Run all world test with only the C code replaced, basically using your C code only, and see what this finds. This is the point in time, where this has the best exposure. The node would return self, None, None and be good, i.e. definitely not do anything wrong.
c) Run all world test with the generic implementation (the one you got right now in your PR), and see what it does (any it finds bugs).
d) Look at cases to specialize and do that, again run all world test with that.

On each stage, there is a tendency to find bugs. Right now, you can't really run the "all world test", but it has to be done on my infrastructure, as it needs really a lot of CPU and time. If you have a machine that might heat up and become not as responsive, you can try running the CPython test suite for one version at least. But I think I have to make an arrangement for you. It's a shame we don't have public CPU in large quantity yet. I will try and push for donated resources soon though. Nothing like Travis will allow us to use much CPU by default.

So, this is the general approach, and as you can see, had you headed my additional request to specialize any for constant nodes (computing any(self.constant) would be much faster than manual iteration, however at least for any not by much, due to most of the constants containing only stuff with positive truth value). That would would break going in baby steps. Ideally it's also why the built-in task is done in parallel, i.e. start other built-ins and e.g. bring the to the C only level, etc.

This testing approach deserves to be formalized with a table I guess. I am really happy about the opportunity to remember this, and why I have done it like above normally.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

Also, another experience with optimization is, that when we add new one, we increase the burden on it being correct. Nuitka has historically been incorrect on many things with these kind of things, and then when new optimization gets added, old errors get exposed, because now we depend on it being correct more deeply.

The compile whole world test is going to crash in places you don't expect, and this could well have happened to us, had you not sought good enough test coverage yourself. For example, it could have gone unnoticed, and a for loop over constant values in the CPython test suite, would not hit the code now. But the day we start to unroll such loops, then it would crash, unrelated to the optimization being done there.

So there is a tendency to uncover existing bugs that is far more widespread in Nuitka than you would normally encounter, because easily everything can affect everything. Every new optimization adds new requirements, or makes new test cases. For example, previously nothing was known about any expressions, but with your addition, more constants, more bool shapes enter the world, and that can affect code generation, and expose bugs there.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

As for your bug, I would suggest you would question this code in ConstantRefNodes.py

    def canPredictIterationValues(self):
        return self.isKnownToBeIterable(None)

Being iterable at all, doesn't mean you can predict the values, and definitely not this way:

def getIterationValue(self, count):
    assert count < len(self.constant)
        return makeConstantRefNode(
            constant=self.constant[count], source_ref=self.source_ref
        )

So, this could be done with next() calls on a iter(self.constant), but do we really want to go there? I doubt, we should do this. Maybe instead a node, needs to provide something that is a like a handle on iteration, so that we don't get called by the index, but rather with an expression that we want to go next. Can you propose an API from the perspective of your any node? One where you could ask to get a iteration handle, and then only use that, to predict the values. I know I am asking much here, but I think you ought to try thinking of that.

Then I think, random access, and sequential access, could be differentiated. Right now your any code is using an interface that is for random access in a sequential fashion. And even then, emulating the random interface on 3.6 dicts would be extremely costly.

If you search for .getIterationValue you barely find uses. So replacing it with your API will be relative easily possible.

@kayhayen kayhayen self-assigned this Apr 11, 2019

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2019

@kayhayen I have a doubt. While debugging I have noticed that for example print(any([0] * 20000)) the control flows through ExpressionBases.py (I mean it is computed there) whereas for print(any({1:"One", 2:"Two"})) the control flows through ConstantRefNodes.py. So could you please explain me this behaviour? When should I expect that control will flow though ConstantRefNodes.py and when from ExpressionBases.py ?

@kayhayen

This comment has been minimized.

Copy link
Member

commented Apr 13, 2019

@bksahu The first expression of yours is going to raise an exception, because any is not indexable. Is it a typo, or what you actually looked at?

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2019

Sorry, that was a typo. I am trying to figure out that in which cases the any expression will be computed in the ConstantRefNodes and ExpressionBases.

@kayhayen

This comment has been minimized.

Copy link
Member

commented Apr 26, 2019

Where do we stand with this one?

@bksahu

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2019

Sorry, I haven't been able to work on it due to my exams which will end in next two weeks. @kayhayen If it's urgent I can try to find some time and work on it.

@bksahu bksahu referenced this issue May 13, 2019
8 of 8 tasks complete
@bksahu

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

Maybe instead of a node needs to provide something that is a like a handle on iteration, so that we don't get called by the index, but rather with an expression that we want to go next. Can you propose an API from the perspective of your any node? One where you could ask to get an iteration handle, and then only use that, to predict the values.

I don't what exactly understand what do you mean by the handle and how it is going to return an expression instead of index. Could you give me some pointers?

Do you mean something like this:

truth_value = value.getIterationValueExpression(i).getTruthValue() 

def getIterationValueExpression(self, index):
    return makeConstantRefNode(
            constant=next(iter(self.constant)), source_ref=self.source_ref
     )
@kayhayen

This comment has been minimized.

Copy link
Member

commented May 13, 2019

@bksahu I was thinking that a node, like range, constant value, etc. should be able to return a new kind of object, one with methods, that will be used instead of itself. So instead of calling getIterationValue(3) as in the old API (spelling might be off), we first call a method that can be called getIterationHandle() which then has methods and internally obviously a reference to the node that produced it, but will then have an interface that is most suitable to us.

This would then offer ways do get the next value and/or its value shape, the length and so, and these would be used in generic algorithms like your any default implementation.

The problem of your method is that the iterator gets lost. That is something I would expect the handle to store, and re-use.

This is a rought sketch up:

class ConstantInterationHandle:
   def __init__(self, constant):
       self.constant = cons
       self.iter = iter
   def getNextValueExpression(self):
       return makeConstantRefNode(constant=next(self.iter), ...)

Obviously, this needs to have its way of saying StopIteration as well.

For some constants, or ranges, or list multiplications, we will have better knowledge, and random access will be possible.

There are remnants of very weak iterator tracing in Nuitka. These handles, I would assume, could become value traces of the iter() and later become mergeable between branches.

Basically you need to think of these objects as intermediates. They address the need of generic algorithms to have more or less random access, as well as the difficulty of the producing side, to have its own state. We cannot well track a self.iter inside the nodes, imagine the memory usage of every constant in the node tree had that. Plus a node shouldn't contain state related to the abstract execution.

From this, pretty immediately the need for an object between the node and the using algorithm comes into play, and that is pretty much immediately the only way to really do it.

Yours,
Kay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.