Make it easier to pass return_raw=true to service endpoints #41

iamed2 · 2018-07-12T21:42:24Z

Currently to get raw output you need to call service_query/service_json/etc. directly and pass an additional keyword argument. It would be nice to have this available from the service endpoint, if possible.

I needed this to avoid using XMLDict, which returns inconsistently-structured results when a list of possibly-many items contains 1 item vs many items.

The text was updated successfully, but these errors were encountered:

samoconnor · 2018-07-17T01:57:41Z

Hi @iamed2,

As you've noticed, AWSCore uses XMLDict to parse XML API results by default.
e.g.

julia> aws = aws_config()
Dict{Symbol,Any} with 2 entries:
...
julia> AWSCore.Services.s3(aws, "GET", "/octech.com.au.ap-southeast-2.awslambda.jl.deploy")
XMLDict.XMLDictElement with 6 entries:
  "Name"        => "octech.com.au.ap-southeast-2.awslambda.jl.deploy"
  "Prefix"      => ""
  "Marker"      => ""
  "MaxKeys"     => "1000"
  "IsTruncated" => "false"
  "Contents"    => XMLDict.XMLDictElement[...

... and the AWSS3.jl package ueses a return_raw option to the low level API-call functions when it wants to disable this behaviour.

I agree that it would be good to make this option generally available through the high-level API functions.

In the meantime, one option is to set return_raw in the aws config dict.

julia> aws_raw_config = aws_config(return_raw=true)
Dict{Symbol,Any} with 3 entries:
  :creds      => (XXX, XXX...)…
  :region     => "ap-southeast-2"
  :return_raw => true

julia> String(AWSCore.Services.s3(aws_raw_config, "GET", "/octech.com.au.ap-southeast-2.awslambda.jl.deploy"))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<ListBucketResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\">
...

This works because the low level API-call functions all merge the aws config Dict with the request Dict before calling the core do_request function. i.e. you can set default values for whatever do_request options you like in the aws config Dict.

samoconnor · 2018-07-17T02:31:06Z

I needed this to avoid using XMLDict, which returns inconsistently-structured results when a list of possibly-many items contains 1 item vs many items

Yes, this is sometimes a pain.

In the absence of a reliable schema, there is no way to know if a tag in an XML document is supposed to be a singleton, or one item in a list. XMLDict implements an interface where single nodes are not wrapped in a vector; but peer nodes with the same name are wrapped in a vector.

julia> parse_xml("<A><B><C>foo</C></B></A>")["B"]["C"]
"foo"

julia> parse_xml("<A><B><C>foo</C><C>bar</C></B></A>")["B"]["C"]
2-element Array{String,1}:
 "foo"
 "bar"

It is designed this way to support terse access to simple API result structures like this:

url = xml["CreateQueueResult"]["QueueUrl"]

instead of having to write this:

url = xml["CreateQueueResult"][1]["QueueUrl"][1]

It would probably be better to use something like XPath for this sort of thing, but at the time of writing that wasn't readily available.

For now, if you are dealing with XML that sometimes lists one item and sometime list many, you can do something like this:

x = parse_xml("<A><B><C>foo</C></B></A>")
c = x["B"]["C"]
for i in (c isa Vector ? c : [c])
    println(i)
end

I agree, this is sometimes a pain

I would be open to changing XMLDict to present all nodes as being wrapped in a Vector (e.g. url = xml["CreateQueueResult"][1]["QueueUrl"][1]). This would break some existing code, but maybe it's best to change it and deal with the flow-on effects in the AWSCore.jl version that drops Julia 0.6 support.

(Aside: XMLDict is not intended to be a general-purpose XML interface. It isn't intended for SGML/HTML-ish mark-up style XML. It is only intended to be useful for simple XML documents that are more JSON-ish, like web services API results. More and more APIs are using JSON now anyway, so hopefully this issue will get less important over time).

iamed2 · 2018-07-17T04:08:57Z

Boto3 actually has resource description files define what to expect and they parse everything basically by schema.

AWS also tends to put list elements in <member> tags. That could be a viable heuristic.

It could be possible to define getindex so that each successive index would index into each element of the vector, sort of like xpath. Then you would only need to get the first item once, with url = xml["CreateQueueResult"]["QueueUrl"][1]. Something like getindex(vec::XMLVec, key) = xml_vec(getindex.(vec::XMLVec, key)).

That config method is pretty handy, I think I'll use that!

samoconnor · 2018-07-17T04:14:24Z

Boto3 actually has resource description files define what to expect and they parse everything basically by schema.

Yes, that's what I'm using to generated AWSSDK.jl: https://github.com/JuliaCloud/AWSCore.jl/blob/master/src/AWSMetadata.jl
Right now, AWSSDK.jl does nothing special with processing results. It just tries to document what the results will be. It would absolutely be possible to generate result processing code from the service description JSON. However, I've found that in practice, it's usually pretty low effort to just write code that deals with whatever XMLDict you end up getting back as the result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to pass return_raw=true to service endpoints #41

Make it easier to pass return_raw=true to service endpoints #41

iamed2 commented Jul 12, 2018

samoconnor commented Jul 17, 2018

samoconnor commented Jul 17, 2018

iamed2 commented Jul 17, 2018

samoconnor commented Jul 17, 2018

Make it easier to pass return_raw=true to service endpoints #41

Make it easier to pass return_raw=true to service endpoints #41

Comments

iamed2 commented Jul 12, 2018

samoconnor commented Jul 17, 2018

samoconnor commented Jul 17, 2018

iamed2 commented Jul 17, 2018

samoconnor commented Jul 17, 2018