Skip to content

Conversation

@mchestnut91
Copy link
Contributor

@mchestnut91 mchestnut91 commented Feb 23, 2021

This PR is a follow up from this ticket and is meant to verify that btrdb-server error codes are properly sent through the python bindings and displayed to the user. In addition, this PR attempts to catch any grpc errors and present them in a way that more user friendly.

I found that the bindings correctly display the errors defined in btrdb-server here. Here are a few examples:

I tried to create a stream with a uuid that already exists:

btrdb.exceptions.BTrDBError: [417] a stream already exists with that uuid

I tried to update a stream’s metadata with an invalid tag:

btrdb.exceptions.BTrDBError: [408] (408: tag key "chickens" is invalid)

I tested a few more of these, in addition to the few that David tested, and they all seemed to work as expected, and IMO these errors are nice looking and easy to understand. All of these errors are generated by checking the result status after a grpc request is completed with a line that looks like this: BTrDBError.checkProtoStat(result.stat)

This all works as expected, but there are some cases where it encounters an error during the actual grpc request. When this happens, the program panics and returns an ugly grpc error like this:

    status = StatusCode.UNKNOWN
    details = "[404] stream does not exist"
    debug_error_string = "{"created":"@1612962972.818625965","description":"Error received from peer ipv4:10.30.10.12:4411","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"[404] stream does not exist","grpc_status":2}"

To fix this I put a try/except around each grpc call in endpoint.py that checks for any grpc.RpcError. I also wrote an error handling function that checks the details field for any known errors and returns a prettier error. Currently the two errors that I’ve identified are:

  1. the stream does not exist
  2. there is an issue with the user’s internet/VPN connection.

Here is what the previous error looks like with these changes:

btrdb.exceptions.StreamNotFoundError: Stream not found with provided uuid

I imagine that there are other things that would cause grpc errors, but I cannot think of them right now. I’ve enlisted the help of the DS team to create a ticket whenever they encounter a grpc error so I can go back and trap it and return something prettier.

@looselycoupled
Copy link
Member

Great work, but the overall goal is to make new errors like btrdb.exceptions.StreamNotFoundError so that the users can easily capture such errors. We do not want to just reraise the GRPC error - we want to hide the GRPCness of the bindings.

@mchestnut91
Copy link
Contributor Author

@looselycoupled maybe we can chat about what you have in mind, because that's what I thought I was doing in my handle_grpc_errors() function. It seems to me like we are properly handling errors as long as they are able to complete the grpc call. I have identified two cases where the code fails during the actual grpc request, and we can trap those errors and return either btrdb.exceptions.StreamNotFoundError or btrdb.exceptions.ConnectionError. In these cases the user would not see the grpc errors

@looselycoupled
Copy link
Member

@mchestnut91 cool - let's hop on a quick call Wednesday!

@mchestnut91
Copy link
Contributor Author

mchestnut91 commented Mar 4, 2021

Updates to this PR include:

  • Created custom BTrDB exceptions for the errors that I deemed most likely to occur.
  • Created a list of error codes that will raise a generic BTrDBServerError. Note that there are a few of the btrdb-server error codes that I did not include because the server did not seem to use them. I created a Notion page that lays out how I classified each error code. Definitely let me know if you think something is not categorized correctly
  • Created decorators for handling grpc errors for the functions in endpoint.py per @oridb's suggestion
  • Updated tests to work with new exceptions
  • Replaced instances of general errors in stream.py with their custom BTrDB errors. e.g. replaced TypeError with BTRDBTypeError

As a possible next step, I've made myself a ticket to dig into the btrdb-server code to see if we can figure out why it sends some errors with StatusCode.UNKNOWN. We are currently handling these errors by string matching the details field of the error, but this is obviously not ideal and would be better to fix on the server side

@looselycoupled
Copy link
Member

Just leaving a comment that I still need to review this.

btrdb/errors.py Outdated
# putting yield directly in this function turns it into a generator,
# so keeping it separate
consume_generator(fn, *args, **kwargs)
return fn(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused at the return here as well as where consume_generator yields to... Presumably consume_generator yields to wrap... but wrap isnt iterating over the consume_generator.

Can you help me understand what's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the super long comment and this will likely include a lot of stuff you already know, but I wanted to outline my thought process as I was trying to make the error handling decorator work with the generator functions in endpoint.py.

I originally started with Ori's suggestion, which was something like this:

def error_handler(fn):
    def wrap(*args, **kwargs):
        try:
            return fn(*args, **kwargs)
        except grpc.RpcError as e:
            handle_grpc_error(e)
    return wrap

This works well for non-generator functions, but when fn is a generator it gets returned
to the calling function, and when that function calls next and hits an error, there is nothing in place within the generator to catch the error, and the result is an unformatted grpc error. Not good.

So then I tried something like this:

def error_handler(fn):
    @wraps(fn)
    def wrap(*args, **kwargs):
        if inspect.isgeneratorfunction(fn):
            try:
                yield from fn(*args, **kwargs)
            except grpc.RpcError as e:
                handle_grpc_error(e)
        else:
            try:
                return fn(*args, **kwargs)
            except grpc.RpcError as e:
                handle_grpc_error(e)
    return wrap

This worked when fn is a generator, but even when it isn't the presence of yield automatically made this function a generator, so the calling function would receive a generator instead of the returned value from fn. Also not good.

To get around this I moved the yield from part into a separate function. How this works is
the generator gets returned to wrap and then gets returned to the calling function. Note that
I was originally missing a return statement, so thank you for pointing that out. Here is what it looks like now:

def consume_generator(fn, *args, **kwargs):
    # when a generator is passed back to the calling function, it may encounter an error
    # when trying to call next(), in that case we want to yield an Exception
    try:
        yield from fn(*args, **kwargs)
    except grpc.RpcError as e:
        handle_grpc_error(e)

def error_handler(fn):
    @wraps(fn)
    def wrap(*args, **kwargs):
        if inspect.isgeneratorfunction(fn):
            return consume_generator(fn, *args, **kwargs)
        try:
            return fn(*args, **kwargs)
        except grpc.RpcError as e:
            handle_grpc_error(e)
    return wrap

So now if I am doing an aligned_windows() query, the object that is returned in the call

windows = self._btrdb.ep.alignedWindows(self._uuid, start, end, pointwidth, version)

is <generator object consume_generator at 0x7fcb506e54d0>. The calling function is then yielding from consume_generator() which is yielding from fn, and whenever it encounters an error it is able to handle with with handle_grpc_error()

tl;dr I had a lot of trouble getting the error handling decorator to work with the generator functions. I've done some testing and this seems to work, but it is admittedly a bit complicated. If you feel like this is too confusing, we could always go back to putting try/excepts back in the endpoint.py generator functions like I had before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I'm tempted to leave as you have it. We would like the user called functions to get generators because we'd eventually like to have the bindings not fully materialize datasets before handing them on. So provide a generator now and user code will not be impacted in a future enhancement.

@looselycoupled
Copy link
Member

Gonna ask David and Ori to take another look but looking good.

Copy link
Collaborator

@davidkonigsberg davidkonigsberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors are much more clear. I think this will really help with customers' understanding.

@mchestnut91 mchestnut91 merged commit 7edee2b into master Mar 23, 2021
@mchestnut91 mchestnut91 deleted the error_codes branch March 23, 2021 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants