Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FFI][Windows] Fix hasattr by extracting Python error type from Windows error message #4780

Merged
merged 1 commit into from Jan 27, 2020

Conversation

soiferj
Copy link
Contributor

@soiferj soiferj commented Jan 27, 2020

As discussed here, hasattr is currently broken for Windows. When a Expr does not have an attribute, a TVMError is thrown rather than an AttributeError. This is because of how we extract the error type from the logged message.

On Unix, DMLC will log the stack trace when LOG(FATAL) is called. However, DMLC will not log the stack trace on Windows. This causes the errors to look different.

Unix:

AttributeError: relay.Call object has no attributed name_hint
Stack trace:
  File "/home/jonso/dev/TVM/src/node/reflection.cc", line 109
  [bt] (0) /home/jonso/dev/TVM/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7f0970787a62]
  [bt] (1) /home/jonso/dev/TVM/build/libtvm.so(tvm::ReflectionVTable::GetAttr(tvm::runtime::Object*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x2c5) [0x7f0970af27f5]
  [bt] (2) /home/jonso/dev/TVM/build/libtvm.so(tvm::NodeGetAttr(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x15c) [0x7f0970af2c4c]
  [bt] (3) /home/jonso/dev/TVM/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), void (*)(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x14) [0x7f0970af6f14]
  [bt] (4) /home/jonso/dev/TVM/build/libtvm.so(TVMFuncCall+0x61) [0x7f0970fb68a1]

Windows:

[11:19:42] D:\_work\1\s\TVM\src\node\reflection.cc:109: AttributeError: relay.Call object has no attributed name_hint

This change updates the error string parsing logic in base.py for Windows. It will use the generic format of this message to try to find the error type.

@jmorrill @tqchen would you be able to take a look?

@soiferj soiferj changed the title [FFI][Windows] Extract Python error type from Windows error log [FFI][Windows] Fix hasattr by extracting Python error type from Windows error message Jan 27, 2020
@tqchen tqchen merged commit f71a10c into apache:master Jan 27, 2020
@tqchen
Copy link
Member

tqchen commented Jan 27, 2020

Thanks @jonso4 !

@soiferj soiferj deleted the windows_error branch January 27, 2020 22:59
@jmorrill
Copy link
Contributor

@soiferj sorry, I'm late to the party. Getting over a flu.

Anyways, I think there might be a problem.

On my test, the "line" variable is "AttributeError: relay.Op object has no attributed attrs".
On line 206, start_pos returns -1, which causes a return of "None";

My issue is fixed if the body of _find_error_type is replaced with:

        end_pos = line.find(":")
        if end_pos == -1:
            return None
        err_name = line[:end_pos]
        if _valid_error_name(err_name):
            return err_name
        return None

I'd attempt to fix, but I don't want to break the case you are dealing with.

@soiferj
Copy link
Contributor Author

soiferj commented Jan 29, 2020

@jmorrill, I haven't seen an error formatted like that. Can you print out the full error string in the function that calls into this one? In all of my tests, the message begins with timestamp, which has a few ":" characters.

If this does turn out to be an issue, I think we should handle both cases. If start_pos is -1, we should graph line[:end_pos]. Otherwise, grab line[start_pos + 1:end_pos]. What do you think?

@jmorrill
Copy link
Contributor

I added: print("line var is: ", line)
Output is: line var is: AttributeError: relay.Op object has no attributed attrs

This happens when I run autotvm, coming down the line from extract_from_program I believe.

Here is the stack

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\jeremiah.morrill\AppData\Local\Programs\Python\Python37\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "C:\Users\jeremiah.morrill\AppData\Local\Programs\Python\Python37\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\autotvm\task\relay_integration.py", line 54, in _lower
    compiler.lower(mod, target=target)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\backend\vm.py", line 455, in lower
    self._lower(mod, target, target_host)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\_ffi\_ctypes\function.py", line 207, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\_ffi\_ctypes\function.py", line 72, in cfun
    rv = local_pyfunc(*pyargs)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\transform.py", line 899, in _pass_func
    return inst.transform_function(func, mod, ctx)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\memory_alloc.py", line 284, in transform_function
    func = ea.visit(func)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\expr_functor.py", line 43, in visit
    res = self.visit_function(expr)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\expr_functor.py", line 199, in visit_function
    new_body = self.visit(fn.body)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\expr_functor.py", line 47, in visit
    res = self.visit_let(expr)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\memory_alloc.py", line 159, in visit_let
    new_val = self.visit(let.value)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\expr_functor.py", line 45, in visit
    res = self.visit_call(expr)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\memory_alloc.py", line 170, in visit_call
    if is_primitive(call):
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\relay\memory_alloc.py", line 31, in is_primitive
    return hasattr(call.op, 'attrs') and hasattr(call.op.attrs, 'Primitive') and \
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\_ffi\object.py", line 65, in __getattr__
    return _api_internal._NodeGetAttr(self, name)
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\.site-packages\python\tvm\_ffi\_ctypes\function.py", line 207, in __call__
    raise get_last_ffi_error()
  File "C:\src\mediatubes\ml\tensortubes\src\third_party\tvm\src\node\reflection.cc", line 113
AttributeError: relay.Op object has no attributed attrs

Based on your suggestion, I threw this together, which seems to work with my use.

def _find_error_type(line):
    """Find the error name given the first line of the error message.

    Parameters
    ----------
    line : str
        The first line of error message.

    Returns
    -------
    name : str The error name
    """
    if sys.platform == "win32":
        # Stack traces aren't logged on Windows due to a DMLC limitation,
        # so we should try to get the underlying error another way.
        # DMLC formats errors "[timestamp] file:line: ErrorMessage"
        # ErrorMessage is usually formatted "ErrorType: message"
        # We can try to extract the error type using the final ":"
        end_pos = line.rfind(":")
        if end_pos == -1:
            return None
        start_pos = line.rfind(":", 0, end_pos)
        if start_pos == -1:
            err_name = line[:end_pos]
            if _valid_error_name(err_name):
                return err_name
            else:
                return None
        err_name = line[start_pos + 1 : end_pos].strip()
        if _valid_error_name(err_name):
            return err_name
        return None
    else:
        end_pos = line.find(":")
        if end_pos == -1:
            return None
        err_name = line[:end_pos]
        if _valid_error_name(err_name):
            return err_name
        return None

@soiferj
Copy link
Contributor Author

soiferj commented Jan 29, 2020

Oh interesting - your branch is throwing, while mine is just calling LOG(FATAL) in reflection.cc. Do you know why yours is throwing?

Update: you're going through a different codepath (AutoTVM). Let's update the logic as below.

In any case, maybe we can update the logic. I think we can condense this:

if start_pos == -1:
            err_name = line[:end_pos]
            if _valid_error_name(err_name):
                return err_name
            else:
                return None
err_name = line[start_pos + 1 : end_pos].strip()
if _valid_error_name(err_name):
    return err_name
return None

into this:

if start_pos == -1:
            err_name = line[:end_pos].strip()
else:
           err_name = line[start_pos + 1 : end_pos].strip()
if _valid_error_name(err_name):
    return err_name
return None

What do you think? Would you be able to send a PR?

@jmorrill
Copy link
Contributor

Yes much cleaner, and works on my PC :)
Submitted PR
#4785

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020
Co-authored-by: Jon Soifer <jonso@microsoft.com>
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020
Co-authored-by: Jon Soifer <jonso@microsoft.com>
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020
Co-authored-by: Jon Soifer <jonso@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants