Skip to content
This repository has been archived by the owner on Aug 13, 2021. It is now read-only.

Introduce first dependencies? #96

Closed
aroberge opened this issue Aug 26, 2020 · 34 comments
Closed

Introduce first dependencies? #96

aroberge opened this issue Aug 26, 2020 · 34 comments
Labels
Discussion: future work Used to discuss items for future consideration.

Comments

@aroberge
Copy link
Owner

Look at using https://github.com/alexmojaki/executing and possibly https://github.com/alexmojaki/stack_data which uses executing.

Check to ensure that doing something like from Tkinter import * does not result in a huge amount of variable names.

@aroberge
Copy link
Owner Author

Possible simpler alternative: use Python's ast to figure out if the line pointed at is part of a multiline statement, and leave it at that.

@alexmojaki
Copy link

Hi! I'm currently working on https://github.com/alexmojaki/futurecoder to teach Python to beginners. I came here to look into using friendly-traceback to enhance the tracebacks shown to students. On a whim I opened the issues and then the first issue, and got a pleasant surprise! I'm glad you like my projects!

I would be happy to help you integrate in executing or stack_data to add any features you're interested in. Or maybe I could extract any specific source code you need without adding a dependency, like whatever it is you want to do with multiline statements.

Just playing around, here's a rough PoC of the kind of thing that could be done if you add the dependencies:

import ast
import sys

import executing
import pure_eval


def explain_index_error():
    ex = executing.Source.executing(sys.exc_info()[2])
    if not (ex.node and isinstance(ex.node, ast.Subscript)):
        return

    evaluator = pure_eval.Evaluator.from_frame(ex.frame)
    atok = ex.source.asttokens()

    seq = evaluator[ex.node.value]
    index = evaluator[ex.node.slice.value]
    seq_source = atok.get_text(ex.node.value)
    index_source = atok.get_text(ex.node.slice.value)

    print(f"IndexError in expression {ex.text()}:")
    print(f"{seq_source} has length {len(seq)} so the largest valid index is {len(seq) - 1}, "
          f"you tried to access index {index_source} = {index}.")


a = [[[]], 3]

try:
    print(a[a[1]] + 4)
except IndexError:
    explain_index_error()

Output:

IndexError in expression a[a[1]]:
a has length 2 so the largest valid index is 1, you tried to access index a[1] = 3.

On the flip side, would you be interested in contributing to futurecoder? This could be to integrate friendly-traceback, or anything else you can think of to ease learning for beginners. It looks like it aligns very well with your interests.

@aroberge
Copy link
Owner Author

@alexmojaki I might be interested in either integrating parts of your code, or contributing to it. Right now, I'm on a bit of a hiatus when it comes to coding as I work at a university and it is the beginning of an extremely busy term. I'm hoping that things will calm down in a few weeks and that I will be able to go back to programming.

I'm a tiny bit hesitant in having any formal dependencies. One of the reasons is that beginners using Mu, that wish to use friendly-tracebacks would be surprised to see other packages appearing "magically" in their environment. (See https://aroberge.github.io/friendly-traceback-docs/docs/html/mu.html for a screenshot). Deleting a package in MU is done by deleting an entry in a "text window" ... which may leave things in an inconsistent state.

So, I do like your idea of extracting some code from your project and incorporating it in Friendly-tracebacks.

Let's keep this discussion open. When I resume coding, I'll try to focus on this issue first.

@aroberge aroberge added the Discussion: future work Used to discuss items for future consideration. label Sep 24, 2020
@aroberge
Copy link
Owner Author

aroberge commented Oct 3, 2020

@alexmojaki I have looked at futurecoder and given some more thoughts about the ideas discussed in this issue/thread.

  • Friendly-traceback could definitely be improved by using some information that could be obtained from executing. I do have a preference for not having required dependencies so I would either adapt/modify some code from executing and integrate it in Friendly-traceback, or have executing as an optional dependency like I currently do with https://github.com/willmcgugan/rich. I am currently thinking that the "optional dependency" approach is the one that makes the most sense, at least initially to get things working while minimizing the amount of work that needs to be done.
  • I believe that Friendly-traceback could definitely be a nice addition to futurecoder. However, I think it should wait until any information form executing could be used by friendly-traceback, so that friendly-traceback could be used as a "drop-in" replacement for whatever error analysis you would have in place at the time.
  • Before I work on integrating ideas from executing, I prefer to first continue working towards being able to add code and ideas from https://github.com/SylvainDe/DidYouMean-Python. Looking at the code of that particular project, I do find it "cleverly condensed/efficient", but likely difficult to extend and incorporate additions to deal with particular cases like the example you wrote above using executing to add information about a potential IndexError. So, I plan to add ideas from "DidYouMean" while trying to ensure that I could easily extend individual cases by using information of the kind that could be obtained from executing.

If and when I use executing for the first time in friendly-traceback, I might reach out to you to get your thoughts. Please, feel free to ignore any such request and/or treat them as a very low priority.

@alexmojaki
Copy link

alexmojaki commented Oct 3, 2020

Cool! I've opened alexmojaki/futurecoder#75 to talk about integration into futurecoder.

However, I think it should wait until any information form executing could be used by friendly-traceback, so that friendly-traceback could be used as a "drop-in" replacement for whatever error analysis you would have in place at the time.

I've mentioned this at the end of the issue. I think there's no need for it to be drop-in, I'd rather have some crude working code sooner than perfect code later.

@aroberge
Copy link
Owner Author

aroberge commented Oct 3, 2020 via email

@alexmojaki
Copy link

That's fine, I understand that 😄

@aroberge
Copy link
Owner Author

I am going to add one of my own project (https://github.com/aroberge/token-utils) as a first dependency. My objection to not adding dependencies based on avoiding confusing beginners using the editor Mu is thus moot, and I will likely look more closely at what other dependencies from @alexmojaki projects (such as https://github.com/alexmojaki/pure_eval, https://github.com/alexmojaki/executing, https://github.com/alexmojaki/stack_data) to add.

@alexmojaki
Copy link

Awesome!

In the meantime, I have overhauled tracebacks in futurecoder and introduced friendly-traceback. Take a look at alexmojaki/futurecoder#77. You can try it out by clicking View Deployment and logging in with admin@example.com:admin.

@aroberge
Copy link
Owner Author

Very nice! The amount of work you have done on this project is really impressive.

@aroberge
Copy link
Owner Author

aroberge commented Dec 2, 2020

@alexmojaki Could executing and pure_eval be used to identify exactly which token(s) cause an error in something like

b = 2
c = "3"
d = 4
a = b + c + d

In this case, knowing that c is a string causing the problem, one could look at something like c.isdigit() and make a suggestion for fixing the problem in this specific case.

@alexmojaki
Copy link

If you run that exact script in the editor in futurecoder (you need to open a later chapter) you can see that executing highlights b + c:

Screenshot from 2020-12-02 17-14-44

futurecoder doesn't show a traceback in the shell because it's largely redundant, but this whole thing has made me realise that this decision means students can't see the highlighted executed node when they cause an exception in the shell. Thanks for bringing that to my attention.

Anyway yes, executing identifies the AST node b + c (an ast.BinOp I think) and from there it's easy to get the children and feed them to pure_eval to get the values. Of course in this case with just variables it's also easy to manage without pure_eval.

@aroberge
Copy link
Owner Author

aroberge commented Dec 2, 2020

Ok, while I had reverted back to having no dependencies, the ability to do this has convinced me that I really need to plan to use executing - and possibly asttokens. The only potential drawback is that I try to be compatible with the latest Python version (I currently test with 3.10a1) and asttokens only support up to 3.8.

However, I have other issues to take core of before looking at using this.

@aroberge
Copy link
Owner Author

aroberge commented Dec 7, 2020

@alexmojaki I've decided to take the plunge and try to use executing/stack_data/pure_eval/asttokens. When you have the time, I would appreciate if you could give me pointers.

For the example given above, namely:

b = 2
c = "3"
d = 4
a = b + c + d

how would I use these packages to obtain the two objects causing the problem as you identified them.

When I tried to obtain the node using executing, I found None ... which does not allow me to proceed further. Here's what I did:

records = inspect.getinnerframes(tb)

# Note, the above records have then been cleaned up to remove my own code which sandwiches the code from the user.

exception_frame, *rest = records[-1]

# etype, value, tb = sys.exc_info()

while True: 
    if tb.tb_frame == exception_frame:
        break
    tb = tb.tb_next

ex = executing.Source.executing(tb)

# To explore the available values, I did the following:

for key in dir(ex):
    if key.startswith("__"):
        continue
    obj = getattr(ex, key)
    if callable(obj):
        print(key, obj.__call__())
    else:
        print(key, obj)

# Among others, idenfied the node as "None".

Ideally, I'd like to extract all the relevant information about objects causing the exception (b and c for this example according to yours screenshot) in the exception frame and possibly in the calling frame in a single method and have them available when I do the specific analysis to determine the cause of the exception, instead of duplicating the code to extract this information in various places.

@alexmojaki
Copy link

I think it comes down to how you're getting the frame. Are you 100% sure it's the right frame?

I don't know what I can do to help without a complete reproducible script to debug. Here's an example of it working:

import executing
import sys

def foo():
    bar()

def bar():
    b = 2
    c = "3"
    d = 4
    return b + c + d

try:
    foo()
except:
    etype, value, tb = sys.exc_info()
    while tb:
        ex = executing.Source.executing(tb)
        print(ex.text(), ex.node)
        tb = tb.tb_next

In general I think it would be ideal if you dropped inspect.getinnerframes entirely in favour of stack_data. But that's a big leap and I don't know if that will solve your current problem.

@aroberge
Copy link
Owner Author

aroberge commented Dec 7, 2020 via email

@aroberge
Copy link
Owner Author

aroberge commented Dec 7, 2020

@alexmojaki I figured out how to make executing work with my console. I don't know if there is a better way, or if you'd want to implement a customizable option to do something like what I did.

As you know, executing uses Python's undocumented function linecache.getlines to retrieve source code. However, the code entered in a repl is not strictly speaking a file. I use my own caching method to store this, using filenames that begin with "<" and end with ">", similar to what Python does for fake files. Python's linecache never attempts to retrieve such code (https://github.com/python/cpython/blob/3.9/Lib/linecache.py#L88) as it is normally not accessible. However, I was able to monkeypatch your code as follows:

from executing import executing
old_getlines = executing.linecache.getlines

def getlines(*args):
    lines = old_getlines(*args)
    if lines:
        return lines
    else:
        return cache.getlines(*args)
executing.linecache.getlines = getlines

(I had also to define my own getlines method which reused an existing one.)

Now that I know I can use executing in all possible scenarios, I will definitely explore further and see how I can best make use of it.

@alexmojaki
Copy link

What exactly is stored in your cache? Is there metadata in addition to the lines?

I suggest that:

  1. You store the lines directly to linecache.cache. This is what IPython does. linecache is the de facto standard place to get this information. It's what the inspect and traceback modules use, and they don't always go through getlines. If anyone tries to use anything that depends on source code, this will increase the chances of that code working in the console. I've seen too many issues from people wondering why a certain tool doesn't work in the standard REPL. Even fairly standard things like logging.exception will be affected.
  2. You store any additional metadata per file in your own subclass of stack_data.Source and monkeypatch it back into stack_data. There's already a lot of cached metadata there that you will probably use, such as the AST and tokens.

Side notes:

  1. I really want to post something to python-ideas one day suggesting that exec and the standard python REPL store source code in linecache. But that's a long email for another day.
  2. You can monkeypatch the linecache module directly, no reason to go through executing. If executing contained from linecache import getlines it'd be a different matter.

@aroberge
Copy link
Owner Author

aroberge commented Dec 7, 2020

I only store the lines of code. I see that linecache store the module globals as well. I'll have a look at what IPython does, and try to do something similar to what it does.

@alexmojaki
Copy link

Sounds good. Here's the code:

https://github.com/ipython/ipython/blob/3587f5bb6c8570e7bbb06cf5f7e3bc9b9467355a/IPython/core/compilerop.py#L115-L136

linecache doesn't store the module globals, it just looks for __name__ and __loader__ and possible stores those.

@aroberge
Copy link
Owner Author

aroberge commented Dec 9, 2020

Thanks @alexmojaki , I am now using

  • linecache (with monkeypatching to work in the friendly-console)

I've also started to use

  • evaluating
  • pure_eval and ASTTokens

I have a question for you, but first some information about what's new for context.

I use evaluating, pure_eval and ASTTokens to highlight variables of interest and, when to flag parts of a line (i.e. a node) if it can be identified as causing an error. Prior to using these, I was simply tokenizing a line and trying to figure out what objects were present. This would miss dotted names (e.g. something like a.x = 1), and would not identifiy sequence elements.
As an example, here what I had before:

image

and here's the new version:

image

I was wondering if it was possible to use pure_eval to evaluate function calls. For example, suppose I have

a = "2"
b = int(a) + a

It would be really useful if I could extract int(a) = 2 and a="2", and use this to determine the cause of the error. I have no idea if it could be possible to do this with pure_eval. Other than a Fortran course taken some 40 years ago, I have never learned formally any CS and my understanding of working with AST is abysmal.

Perhaps this question could be redirected to the Discussion (https://github.com/aroberge/friendly-traceback/discussions) which I have just enabled for this project. If so, I would then close this issue as I have proceeded with using dependencies which was the whole point of creating this issue.

@alexmojaki
Copy link

This would be best handled as a new feature of pure_eval.

Basically every kind of expression needs to be specially handled in pure_eval. We'd need to add handling for function calls which would in turn need to handle each supported function specially. For int this would mainly mean checking that the argument is one of a few safe types like int, float, bool, or str, to not accidentally trigger a user defined __int__, then actually calling int.

@aroberge
Copy link
Owner Author

aroberge commented Dec 9, 2020

Ok, I will try to experiment with it and will likely have to fork it. For my purpose, I know (in theory) that the argument is safe. For example, suppose that I have something like:

def to_str(arg):
   return str(arg)

x = 42
y = x + to_str(x)

This will generate an exception, and I can get something like

Traceback ...
  File "<friendly-console:3>", line 1, in <module>
    y = x + to_str(x)
TypeError: unsupported operand type(s) for +: 'int' and 'str'
    -->1: y = x + to_str(x)
              ^^^^^^^^^^^^^
    x: 42
    to_str: <function to_str>

Because the exception was raised on that line, I know that the call to to_str() was "safe" since it did not generate an exception.
knowing that the expression to_str(x) = "42" would make it possible for me to provide a more precise explanation as to the exact cause of the error. For instance, in this case, I could analyze to_str(x) and confirm that it could be converted into an integer, and I could offer this as a suggestion. However, if the result would have been 42 + "a" then it would not make sense to mention "Did you forget to convert a string to an integer?" as a suggestion.

@alexmojaki
Copy link

The problem is not raising an exception, but causing side effects. You don't want generating a traceback to print things, write to files, or wait for user input.

If you want to bypass such concerns, you can just use regular eval.

@aroberge
Copy link
Owner Author

aroberge commented Dec 9, 2020

Good point ...

Before I dive in and try something: do you think it might be possible to have implement a custom object in a fork of pure_eval so that the previous example would yield something like:

x: 42
to_str: <function to_str>
to_str(x): <unevaluated function call>

@alexmojaki
Copy link

Well in the example x + to_str(x) you would try evaluating the two operands and for the RHS you'd get a CannotEval exception when trying, from there it would be easy to substitute your own object. I don't think a fork or API change would give much benefit. It's also easy to see that it's a function call with something like isinstance(node.right, ast.Call).

@aroberge
Copy link
Owner Author

aroberge commented Dec 9, 2020

I was essentially using the code from your readme:

from asttokens import ASTTokens
from pure_eval import Evaluator

source = """
x = 1
d = {x: 2}
y = d[x]
"""

names = {}
exec(source, names)
atok = ASTTokens(source, parse=True)
for nodes, value in Evaluator(names).interesting_expressions_grouped(atok.tree):
    print(atok.get_text(nodes[0]), "=", value)

I guess I'll need to essentially start from a modified version of find_expressions and go from there.

@alexmojaki
Copy link

Yes but the idea in those methods is to list a bunch of possibly unrelated expressions. Which is great for beefing up your current list of local variables, but you don't want a bunch of pointless noise like <unevaluated function call> in that list. I thought you only want that in a few targeted cases like "I know the problem is in x+y but I don't know what y is so I can't help".

@aroberge
Copy link
Owner Author

aroberge commented Dec 9, 2020

I agree that I would not want too many <unevaluated function call>. Using evaluating (which works really well, btw!) I now restrict looking for variables only for the node identified; this will likely cut down significantly on the number of objects looked at. In most cases, I do not thing that there would be that many <unevaluated function call>. These would be useful mostly for TypeError situations, and possibly some ValueError. If that is the case, I could always filter them out based on the actual exception class, or if the number of evaluated objects is greater than a given threshold.

@alexmojaki
Copy link

I think the number of such unevaluated nodes could very easily grow quite large, and even just one seems unhelpful to me. The only reason I can see is if you use it to say "if you extract this expression into a variable, then I'll know what it is which might let me give you more advice" or maybe "Can I eval this node to find out what it is? It might have side effects".

@alexmojaki
Copy link

You could also order expressions by their closeness to the executing node/statement, as in https://github.com/getsentry/sentry-python/blob/c277ed5d1170a7d58fe3482173d391ae799fdc0a/sentry_sdk/integrations/pure_eval.py#L84

Then the most relevant values, especially ones inside the failed expression, will be at the top.

@alexmojaki
Copy link

From #126 (comment) (seems a better fit here)

I could not get [stack_data] to work reliably both in the console and when running an external file

It should be fine as long as the lines are in linecache, each with a unique filename (e.g. <friendly-console:1>). If not I'd like to hear what the problem is, maybe I need to fix something.

it was also giving some problems when dealing with SyntaxErrors.

stack_data just doesn't handle those at all, they are a different breed.

One problem I have is that I remove from the stack frames that belong to friendly_traceback modules and some Python modules. I also do this now for embedding with IPython; other users (such as hackinscience) do that as well.

That's easy to deal with, just don't render those frames. What went wrong?

@aroberge
Copy link
Owner Author

aroberge commented Dec 9, 2020

Except for programs running in the console, where I execute code using exec (and compile first to capture syntax errors), I could get stackdata to most often point at the correct line if I just passed the traceback object, without trying to remove frames.

However, in other situations where the exceptions are caught by a normal excepthook, such as the one below, the frame where they are captured was incorrect, and I had to make sure that I went back in the stack to get the right frame.

I run my tests with pytest, capturing the result, like so:

def test_name_error():
    try:
        this = something
    except Exception as e:
        message = str(e)
        friendly_traceback.explain_traceback(redirect="capture")  # <--
    result = friendly_traceback.get_output()
    ...

When I used stack_data on these files and try to remove extra frame until I find the right one, the line that was flagged was the one indicated above with the comment, instead of the line

this = something

I imagine there must be a way to do it so that it works reliably, but I have not been able to get it working. This does not mean that I will not go back and try to do it again - I just moved on to something else to feel productive.

@alexmojaki
Copy link

alexmojaki commented Dec 9, 2020

Except for programs running in the console, where I execute code using exec (and compile first to capture syntax errors)

Make sure that the filename given to compile which becomes code.co_filename is the same as the filename key used for linecache. This is necessary for linecache to be useful beyond stack_data anyway.

When I used stack_data on these files and try to remove extra frame until I find the right one, the line that was flagged was the one indicated above with the comment, instead of the line

This suggests that you passed the frame object instead of the traceback object. Make sure you walk the traceback when choosing the correct frames rather than walking through the frames themselves. See https://docs.python.org/3/reference/datamodel.html?highlight=tb_lineno :

Special read-only attributes: tb_frame points to the execution frame of the current level; tb_lineno gives the line number where the exception occurred; tb_lasti indicates the precise instruction. The line number and last instruction in the traceback may differ from the line number of its frame object if the exception occurred in a try statement with no matching except clause or with a finally clause.

Demo script:

import sys

import stack_data

formatter = stack_data.Formatter()


def print_frame(f):
    print("".join(formatter.format_frame(f)))


def explain_traceback():
    tb = sys.exc_info()[2]
    print_frame(tb)  # correct
    print_frame(tb.tb_frame)  # wrong


def test_name_error():
    try:
        this = something
    except Exception:
        explain_traceback()


test_name_error()

Output:

 File "/home/alex/.config/JetBrains/PyCharm2020.3/scratches/scratch_1016.py", line 20, in test_name_error
      18 | def test_name_error():
      19 |     try:
-->   20 |         this = something
      21 |     except Exception:

 File "/home/alex/.config/JetBrains/PyCharm2020.3/scratches/scratch_1016.py", line 22, in test_name_error
      19 | try:
      20 |     this = something
      21 | except Exception:
-->   22 |     explain_traceback()
               ^^^^^^^^^^^^^^^^^^^

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Discussion: future work Used to discuss items for future consideration.
Projects
None yet
Development

No branches or pull requests

2 participants