Skip to content
This repository has been archived by the owner on Apr 27, 2022. It is now read-only.

Improve JSON-less 404 error readability and prevention with client library #926

Merged
merged 21 commits into from
Feb 7, 2018

Conversation

ianballou
Copy link
Contributor

This code aims to stop the Unexpected error: No JSON object could be decoded error mentioned in #910:

  • The error is caused by 404 pages without JSON hitting json.loads().
  • If json.loads() fails, print out the response content rather than throwing the error above.
  • Since this isn't pretty, the following target 404's:
    • Characters that get passed to the API will be limited to numbers, letters, and $ - _ . + ! * ' ( ) ,
    • The client library checks object names per-call for characters that don't match this.
  • If the client library detects bad characters, it will print something like: Error: Projects may not contain: ['/', '&'] to alert the user of the illegal characters that they entered.

I anticipate that the tests will take a while to write since I am introducing changes to the majority of the client library calls. Thus, I'm introducing this code now so I can get feedback before writing the tests.

Note: some commands, such as switch_register, have not been implemented in the client library yet. These will get checks for reserved characters when they are implemented.

Copy link
Contributor

@zenhack zenhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

bad_chars = self.find_reserved(owner)
if bool(bad_chars):
raise BadArgumentError("Owner may not contain: %s"
% bad_chars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably do a better job of factoring this out. maybe:

check_reserved('Owner', owner)

with a suitable definition of check_reserved.

We could go a bit further and implement a decorator so we could do something like:

@check_reserved_chars('network', 'owner', 'access', slashes_ok=['net_id'])
def create(self, network, owner, access, net_id):
    # Implementation unchanged from master.

It might also make sense to survey the arguments and see what the proportion of arguments to be checked vs. not checked; depending, we may want to specify arguments not to check, rather than the other way around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the decorator solution, I'm going to work on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Izhmash let me know when you do the decorator solution, I'll review this after that.

Copy link
Contributor Author

@ianballou ianballou Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zenhack how would you relate the arguments in create() to the strings in the decorator? Since the arguments are passed as *args the only way I can see to relate them would be by their order which might not be robust enough. It would work out if the arguments were changed to be keyworded but that would mean editing the original function.

Edit: In case the decorator solution does start getting too complicated, I think it might be better to clean up the calls with a new definition of check_reserved() like you mentioned above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this is right but: what you need is a function that checking the error, for example

def decorator(argument):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            funny_stuff()
            something_with_argument(argument)
            function(*args, **kwargs)
            more_funny_stuff()
        return wrapper
    return real_decorator

https://stackoverflow.com/questions/5929107/decorators-with-parameters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And

@check_reserved_chars(arguments)
class Network(ClientBase):

should work so you don't have to put the decorator in every function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuhang57 thanks for the input. The code you presented above definitely fits what I had in mind. The issue lies with relating the non-keyworded arguments in the decorator to the arguments in the methods such as create() so we can choose what gets checked.

The only way that I could see to relate (for example) the string 'network' in the decorator to the network passed to the method would be by location since network names have no standard in HIL. That, then, becomes complicated when considering that some of the objects will allow slashes with the slashes_ok flag above.

In the meantime I'm going to work on factoring out my original code and present a solution without decorators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inspect module may be of interest:

https://docs.python.org/2/library/inspect.html

from there, you can do something like:

argspec = inspect.getargspec(f)
def wrapper(*args):
    for argname, argval in zip(argspec.args, args):
        if argname not in slashes_ok:
            check(argval)

Maybe adding some similar logic for kwargs.


def find_reserved(self, string):
"""Returns a list of illegal characters in a string"""
p = '[^A-Za-z0-9 \$\-\_\.\+\!\*\'\(\)\,]+'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think most of these need escaping when inside a character class. The dash can go at the end of the list to avoid it being interpreted as special. Escaping the single quote can be avoided by using double quotes to delmit the string as a whole.
  2. Generally speaking, use r"raw strings" for regexes, so you don't have to worry about what needs to be escaped in a python string as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll work on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip, c1fea67 addresses this.

@ianballou
Copy link
Contributor Author

Just wanted to note that I'm back working for the new semester so I'll have activity on here soon.

@ianballou
Copy link
Contributor Author

My latest commit cleans up the methods for checking for reserved characters.

return list(x for l in re.findall(p, string) for x in l)
else:
p = r"[^A-Za-z0-9 $_.+!*'(),-]+"
return list(x for l in re.findall(p, string) for x in l)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just do a single return after the if-else block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return list(x for l in re.findall(p, string) for x in l)

def check_reserved(self, obj_type, obj_string, slashes_ok=False):
"""Check for illegal characters and report of their existance"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ianballou ianballou force-pushed the fix-json-errors branch 2 times, most recently from 7bbadd9 to b1e7377 Compare January 25, 2018 19:44
@ianballou
Copy link
Contributor Author

The tests for this are in progress now.

@ianballou
Copy link
Contributor Author

I've setup decorators/wrappers to check the illegal characters now, so the client functions should remain otherwise unedited.

@ianballou ianballou force-pushed the fix-json-errors branch 2 times, most recently from 651bad6 to 55ed20a Compare February 1, 2018 18:19
@ianballou
Copy link
Contributor Author

I've addressed the requests and added a first pass at the tests. Did I address your requests correctly @zenhack and @naved001 ?

Copy link
Contributor

@xuhang57 xuhang57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zenhack zenhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the vast majority of our parameters need checking; maybe instead of listing the ones to check in each call to the decorator, we should list the ones not to check?

so:

@check_reserved_chars(dont_check=['foo'], slashes_ok=['bar'])
...

if 'slashes_ok' in outer_kwargs:
slashes_ok = outer_kwargs.get('slashes_ok')
else:
slashes_ok = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else can just be outer_kwargs.get('slashes_ok', [])

slashes_ok = outer_kwargs.get('slashes_ok')
else:
slashes_ok = []
for argname, argval in zip(outer_args, args[1:]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a second to work out why we were chopping off the first element; probably should add a comment about that.

if argname not in slashes_ok:
check_reserved(argname, argval)
else:
check_reserved(argname, argval, slashes_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could simplify this to:

check_reserved(argname, argval, slashes_ok=argname in slashes_ok)

@@ -14,6 +15,7 @@ def list(self, is_free):
url = self.object_url('nodes', is_free)
return self.check_response(self.httpClient.request('GET', url))

@check_reserved_chars('node')
def show(self, node_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got a mismatch between the name you're passing to the decorator and the actual name of the parameter. Similar thing in a bunch of places below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured there were some places where the variable names wouldn't want to get out. I put 'node' there because otherwise in the error reporting it would say 'node_name' to the user. Is that preferable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To settle this I'm just going to use argspec and match the variable names directly.

@naved001 naved001 removed the request for review from mosayyebzadeh February 5, 2018 20:53
@naved001
Copy link
Contributor

naved001 commented Feb 6, 2018

I don't have any other comments to make on this at this point, @zenhack has already made some good points.

p = r"[^A-Za-z0-9 /$_.+!*'(),-]+"
else:
p = r"[^A-Za-z0-9 $_.+!*'(),-]+"
return list(x for l in re.findall(p, string) for x in l)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, except for this line. There's a lot going on here, would be nice if we split it to make it more readable. But I won't block the PR on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but I think this is one of those cases where the list comprehension is actually less complex than if I didn't use a one-liner. I can add a comment for clarification if that helps.

@naved001
Copy link
Contributor

naved001 commented Feb 6, 2018 via email

@ianballou
Copy link
Contributor Author

@zenhack I believe I've addressed all of your requests in these two commits

Copy link
Contributor

@zenhack zenhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gripes about the one function, but this is almost ready I think.

@@ -79,7 +80,8 @@ def _find_reserved(string, slashes_ok=False):
p = r"[^A-Za-z0-9 /$_.+!*'(),-]+"
else:
p = r"[^A-Za-z0-9 $_.+!*'(),-]+"
return list(x for l in re.findall(p, string) for x in l)
# return all unique chars that exist neither in p nor string
return list(set(x for l in re.findall(p, string) for x in l))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me >1 min to figure out how this worked (I think partially just because there was a lot of seeking involved in parsing the comprehension). What about:

  1. Change the patterns to remove the +
  2. list(set(re.findall(p, string)))

Also, I don't like the fact that we have two nearly identical regexes. One alternative:

p = # regex without the /
result = # expression with re.findall
if not slashes_ok and '/' in string:
    result.append('/')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zenhack my code now should reflect your comments. My logic is slightly different from what you recommended, but works just as it did before.

p = r"[^A-Za-z0-9 $_.+!*'(),-]+"
# return all unique chars that exist neither in p nor string
return list(set(x for l in re.findall(p, string) for x in l))
p = r"[^A-Za-z0-9 /$_.+!*'(),-]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not include the slash I think? Otherwise the slashes will end up in the result regardless of the value of slashes_ok. Would you also add a test that catches this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that result shouldn't check slashes by default so I put one in my "allowed characters". Then, if slashes aren't allowed but there is one in the string, I need to add a slash to my list of unique illegal characters, result. My goal is for result to have all the characters that are in string but not in p.

My tests all failed when I did remove the slash from p, but when I put it back everything worked as it should. Tomorrow I'll double check that and write a test to check for a proper response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no you're totally right; I had things backwards in my head; forgot that the regex was being inverted, so would match allowed characters rather than banned characters.

@zenhack
Copy link
Contributor

zenhack commented Feb 7, 2018

LGTM

@zenhack zenhack merged commit 6f97fe4 into CCI-MOC:master Feb 7, 2018
@ianballou ianballou deleted the fix-json-errors branch February 13, 2018 18:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants