Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to pass return_raw=true to service endpoints #41

Open
iamed2 opened this issue Jul 12, 2018 · 4 comments
Open

Make it easier to pass return_raw=true to service endpoints #41

iamed2 opened this issue Jul 12, 2018 · 4 comments

Comments

@iamed2
Copy link
Member

iamed2 commented Jul 12, 2018

Currently to get raw output you need to call service_query/service_json/etc. directly and pass an additional keyword argument. It would be nice to have this available from the service endpoint, if possible.

I needed this to avoid using XMLDict, which returns inconsistently-structured results when a list of possibly-many items contains 1 item vs many items.

@samoconnor
Copy link
Contributor

Hi @iamed2,

As you've noticed, AWSCore uses XMLDict to parse XML API results by default.
e.g.

julia> aws = aws_config()
Dict{Symbol,Any} with 2 entries:
...
julia> AWSCore.Services.s3(aws, "GET", "/octech.com.au.ap-southeast-2.awslambda.jl.deploy")
XMLDict.XMLDictElement with 6 entries:
  "Name"        => "octech.com.au.ap-southeast-2.awslambda.jl.deploy"
  "Prefix"      => ""
  "Marker"      => ""
  "MaxKeys"     => "1000"
  "IsTruncated" => "false"
  "Contents"    => XMLDict.XMLDictElement[...

... and the AWSS3.jl package ueses a return_raw option to the low level API-call functions when it wants to disable this behaviour.

I agree that it would be good to make this option generally available through the high-level API functions.

In the meantime, one option is to set return_raw in the aws config dict.

julia> aws_raw_config = aws_config(return_raw=true)
Dict{Symbol,Any} with 3 entries:
  :creds      => (XXX, XXX...)
  :region     => "ap-southeast-2"
  :return_raw => true

julia> String(AWSCore.Services.s3(aws_raw_config, "GET", "/octech.com.au.ap-southeast-2.awslambda.jl.deploy"))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<ListBucketResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\">
...

This works because the low level API-call functions all merge the aws config Dict with the request Dict before calling the core do_request function. i.e. you can set default values for whatever do_request options you like in the aws config Dict.

@samoconnor
Copy link
Contributor

I needed this to avoid using XMLDict, which returns inconsistently-structured results when a list of possibly-many items contains 1 item vs many items

Yes, this is sometimes a pain.

In the absence of a reliable schema, there is no way to know if a tag in an XML document is supposed to be a singleton, or one item in a list. XMLDict implements an interface where single nodes are not wrapped in a vector; but peer nodes with the same name are wrapped in a vector.

julia> parse_xml("<A><B><C>foo</C></B></A>")["B"]["C"]
"foo"

julia> parse_xml("<A><B><C>foo</C><C>bar</C></B></A>")["B"]["C"]
2-element Array{String,1}:
 "foo"
 "bar"

It is designed this way to support terse access to simple API result structures like this:

url = xml["CreateQueueResult"]["QueueUrl"]

instead of having to write this:

url = xml["CreateQueueResult"][1]["QueueUrl"][1]

It would probably be better to use something like XPath for this sort of thing, but at the time of writing that wasn't readily available.

For now, if you are dealing with XML that sometimes lists one item and sometime list many, you can do something like this:

x = parse_xml("<A><B><C>foo</C></B></A>")
c = x["B"]["C"]
for i in (c isa Vector ? c : [c])
    println(i)
end

I agree, this is sometimes a pain

I would be open to changing XMLDict to present all nodes as being wrapped in a Vector (e.g. url = xml["CreateQueueResult"][1]["QueueUrl"][1]). This would break some existing code, but maybe it's best to change it and deal with the flow-on effects in the AWSCore.jl version that drops Julia 0.6 support.

(Aside: XMLDict is not intended to be a general-purpose XML interface. It isn't intended for SGML/HTML-ish mark-up style XML. It is only intended to be useful for simple XML documents that are more JSON-ish, like web services API results. More and more APIs are using JSON now anyway, so hopefully this issue will get less important over time).

@iamed2
Copy link
Member Author

iamed2 commented Jul 17, 2018

Boto3 actually has resource description files define what to expect and they parse everything basically by schema.

AWS also tends to put list elements in <member> tags. That could be a viable heuristic.

It could be possible to define getindex so that each successive index would index into each element of the vector, sort of like xpath. Then you would only need to get the first item once, with url = xml["CreateQueueResult"]["QueueUrl"][1]. Something like getindex(vec::XMLVec, key) = xml_vec(getindex.(vec::XMLVec, key)).

That config method is pretty handy, I think I'll use that!

@samoconnor
Copy link
Contributor

Boto3 actually has resource description files define what to expect and they parse everything basically by schema.

Yes, that's what I'm using to generated AWSSDK.jl: https://github.com/JuliaCloud/AWSCore.jl/blob/master/src/AWSMetadata.jl
Right now, AWSSDK.jl does nothing special with processing results. It just tries to document what the results will be. It would absolutely be possible to generate result processing code from the service description JSON. However, I've found that in practice, it's usually pretty low effort to just write code that deals with whatever XMLDict you end up getting back as the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants