Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

synchronized_request._Request is_media sniff bug #187

Open
Druid-of-Luhn opened this issue Jan 31, 2020 · 1 comment
Open

synchronized_request._Request is_media sniff bug #187

Druid-of-Luhn opened this issue Jan 31, 2020 · 1 comment

Comments

@Druid-of-Luhn
Copy link

Druid-of-Luhn commented Jan 31, 2020

(see Conclusion for TL;DR)

Package version (pip): azure_cosmos-3.1.2-py3-none-any.whl

Observed error

When trying to upsert a document to a collection:

document = { 'id': str(datetime.datetime.utcnow()), 'key': 'example' }
collection_link = 'dbs/campaigns/colls/digitalmedia' # spoiler: note this collection link
options = { 'partitionKey': document['key'] }
client.UpsertItem(collection_link, document, options)

I get an error thrown from inside azure.cosmos.cosmos_client at line 2880:

File "...\env-3.5.4\lib\site-packages\azure\cosmos\cosmos_client.py", line 2880, in _AddPartitionKey
    partitionKeyDefinition = collection.get('partitionKey')
AttributeError: 'str' object has no attribute 'get'

This error had not been encountered in previous uses of the library.

Investigation

Upon printing the value of collection just above that line, I saw that it was a JSON string, instead of a dict, as the code must have expected.

So I followed the calls backwards: each time, the raw return value of the callee was returned:

At last, in the final synchronized_request._Request function, I see at the end:

if is_media:
    result = data
else:
    if len(data) > 0:
        try:
            result = json.loads(data)
        except:
            raise errors.JSONParseFailure(data)

return (result, headers)

So this tells me that if is_media is set, the response is not parsed from JSON, but instead returned as-is. Injecting a little print(is_media) in there told me that the flag was indeed set, so I went up to see how it was set:

def _Request(global_endpoint_manager, request, connection_policy, requests_session, path, request_options, request_body):
    """Makes one http request using the requests module.
    ...
    :return:
        tuple of (result, headers)
    :rtype:
        tuple of (dict, dict)

    """
    is_media = request_options['path'].find('media') > -1

(note that the return type is said to be (dict, dict), but we now know that to not always be the case)

So is_media is sniffed from request_options['path']. Intrigued, I print(request_options['path']) only to be greeted by a familiar sight:

/dbs/campaigns/colls/digitalmedia/

Conclusion

If the database id, collection id or document id (I assume) contains the substring media, then the result of the API GET request in synchronized_request._Request will not be parsed as JSON, and the azure-cosmos library will fail further down the line when expecting the returned value to be a dict rather than a str.

The obvious workaround at this point would be to just use a container with a different name. I can do this during the testing phase, but we already expect it to be called what it is, so just changing the name is not really viable. Is there a safer way of determining whether a request is_media?

The second workaround is for us to fork this library, remove the is_media check and use our fork instead (whether by putting it locally in the project or hosting it somewhere for pip to pick up). This is more effort than is desirable, but may be necessary to not be affected by this bug.

@Druid-of-Luhn
Copy link
Author

It has been almost 2 weeks since I raised this issue. A response, even if it is just an acknowledgement, would be appreciated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant