Added proposed attachment feature DISCUSS BEFORE MERGE #57

bscSCORM · 2013-02-09T23:25:59Z

This is a proposal for how to handle binary data in statements.
As I wrote up the details, I change the approach from what we discussed in Alexandria, and in the email dicussion with Nik, for reasons I will describe below.

GitHub issues related to the PR:
#38
#23

Note that this proposal specifies the use of "multipart/mixed" as a transmission format to handle attachments, rather than using base64 within the json as we had previously discussed. This is a break from the model of "The transmission format of a statement matches the logical model of a statement". Breaking from the model used elsewhere in the spec will add some confusion, so we should not do it lightly. However, we do gain the following advantages:

Systems receiving statement streams with attachments will be able to inspect the statement(s) before receiving the attachment, and may therefore validate the rest of the statement(s) before storing the attachment(s).
Attachments may be sent without applying any encoding (such as base64)
Avoids having a sub-property of the statement JSON which is sometimes included, and sometimes not (the attachment binary is now always outside the JSON)
Maps very well to the mental model of "email attachments", which the binary feature was emulating in some regards anyway

Additionally, please note that I had discussed in an email with Nik pulling in some concepts from Activity Streams attachments. I wound up not including most of them for the following reasons:

md5: this is insecure and shouldn't be used. Still, it might have been worth including for compatibility with activity streams, if the SHA-2 wasn't required. However I wanted to require SHA-2 to ensure the integrity of statements where the headers may have been retrieved earlier and then the client goes back to retrieve the data. If one accepts that SHA-2 should be required, also allowing md5 to be included adds nothing.
compression: This seemed like a nice idea when I first saw it (see Nik's email), but most formats we'd want to include are compressed to some extent anyway, and HTTP itself already supports compression, and servers are free to compress the attachments once they receive them. Additionally, since I wanted the LRS to verify the SHA-2 to ensure statement integrity, what is more useful is the SHA-2 of the uncompressed data... this stopped sounding like a good idea.
data: since the actual attachment is now being transmitted outside the statement, and TCAPI only defines transmission, not storage, there is no use for this property.

Note, that even without the specified fields, it is possible to convert from an activity streams binary to a Tin Can binary (provided the actual data of the attachment is available). We will not be able to use this attachment feature to represent an activity streams binary object that points to a dead URL, but that could always be done in an extension. And a reference to a dead URL is of limited use.

bscSCORM · 2013-02-09T23:30:28Z

Also note: while I was writing up this feature, it struck me that particularly with the ability to POST a set of statements that all have large attachments, and the LRS potentially finding a hash mismatch on the last statement... we should provide a way for the LRS to accept some statements in a batch, and reject others, and inform the client.

bscSCORM · 2013-02-10T00:42:23Z

Not so related to this particular solution, but note that having an attachment mechanism, including the ability to retrieve attachment contents or not, will be of help for statement signing. A signature can be just another attachment "usageType".

garemoko · 2013-02-10T08:30:11Z

This looks good to me.

A few questions:

What's the purpose of the SHA field? Is this to protect against corrupted files?
If a file url is provided, does the attachment also need to be sent with the statement? Is the request still mixed/multipart in this case?
Am I right to assume that an example statement with attachment would be too complex? If not, can we add an example maybe in or as an appendix?

Andrew

bscSCORM notifications@github.com wrote:

This is a proposal for how to handle binary data in statements.
As I wrote up the details, I change the approach from what we discussed in Alexandria, and in the email dicussion with Nik, for reasons I will describe below.

GitHub issues related to the PR:
#38
#23

Note that this proposal specifies the use of "multipart/mixed" as a transmission format to handle attachments, rather than using base64 within the json as we had previously discussed. This is a break from the model of "The transmission format of a statement matches the logical model of a statement". Breaking from the model used elsewhere in the spec will add some confusion, so we should not do it lightly. However, we do gain the following advantages:

Systems receiving statement streams with attachments will be able to inspect the statement(s) before receiving the attachment, and may therefore validate the rest of the statement(s) before storing the attachment(s).Attachments may be sent without applying any encoding (such as base64)Avoids having a sub-property of the statement JSON which is sometimes included, and sometimes not (the attachment binary is now always outside the JSON)Maps very well to the mental model of "email attachments", which the binary feature was emulating in some regards anyway

Additionally, please note that I had discussed in an email with Nik pulling in some concepts from Activity Streams attachments. I wound up not including most of them for the following reasons:

md5: this is insecure and shouldn't be used. Still, it might have been worth including for compatibility with activity streams, if the SHA-2 wasn't required. However I wanted to require SHA-2 to ensure the integrity of statements where the headers may have been retrieved earlier and then the client goes back to retrieve the data. If one accepts that SHA-2 should be required, also allowing md5 to be included adds nothing.compression: This seemed like a nice idea when I first saw it (see Nik's email), but most formats we'd want to include are compressed to some extent anyway, and HTTP itself already supports compression, and servers are free to compress the attachments once they receive them. Additionally, since I wanted the LRS to verify the SHA-2 to ensure statement integrity, what is more useful is the SHA-2 of the uncompressed data... this stopped sounding like a good idea.data: since the actual attachment is now being transmitted outside the statement, and TCAPI only defines transmission, not storage, there is no use for this property.

Note, that even without the specified fields, it is possible to convert from an activity streams binary to a Tin Can binary (provided the actual data of the attachment is available). We will not be able to use this attachment feature to represent an activity streams binary object that points to a dead URL, but that could always be done in an extension. And a reference to a dead URL is of limited use.

You can merge this Pull Request by running

git pull https://github.com/bscSCORM/xAPI-Spec binary

Or view, comment on, or merge it at:

#57

Commit Summary

Added proposed attachment feature

File Changes

M xAPI.md (96)

Patch Links:

https://github.com/adlnet/xAPI-Spec/pull/57.patchhttps://github.com/adlnet/xAPI-Spec/pull/57.diff

bscSCORM · 2013-02-10T22:51:22Z

@garemoko,

The SHA-2 field is there because statements can be loaded without their attachments, and then attachments may be later fetched, so this ensures that the attachment didn't get switched in the meantime. It helps ensure the immutability of the attachments. This is particularly crucial if the attachment exists only at the URL. Additionally it serves as a handy natural ID, as opposed to specifying an ID field or leaving the attachment record without any thing that could be used as an ID.
No, the attachment data need not be sent if URLs are provided. The transmission of statements with attachments need only be done using multipart/mixed if attachment data is actually being sent. I'd say it would be a best practice in most circumstances to include the attachment data when sending the statement as opposed to relying on a URL.
I can add an example. It would have to be a text attachment to render properly in the specification document though, so we'll just have to note clearly with the example that these can be, and frequently will be binary attachments.

nhruska · 2013-02-12T13:32:39Z

Hi Ben:

Thanks for the lengthy explanation. I agree with the model and see the advantages. I have a question though about this statement:
"we should provide a way for the LRS to accept some statements in a batch, and reject others, and inform the client."

Haven't we decided that batch statements would be treated as a single transaction?

bscSCORM · 2013-02-12T15:57:16Z

@nhruska We have decided that, I should have been clearer I was suggesting re-opening that discussion.

I'm not entirely sure myself if that's the way we should go, but the attachment feature makes a very large stream of statement data more likely. The attachment feature, combined with the requirement for atomic statement batches pretty much forces LRSs to write statement data out to disk that they may eventually discard. This is probably a good idea anyway, but before the attachment requirement an LRS might have gotten away with holding statement batches in memory while validating them.

So, since that requirement is being made more onerous, I thought we should consider it again.

nhruska · 2013-02-12T17:42:18Z

@bscSCORM Indeed. I'm going to post a link to this PR in the group and try to solicit some feedback.

garemoko · 2013-02-12T18:37:01Z

Only solution I can think of for this is that the LRS returns an array if failed statement ids. I guess you could do it the other way round and return succeeded statement ids.

Are there any other options on the table?

Nikolaus Hruska notifications@github.com wrote:

@bscSCORM Indeed. I'm going to post a link to this PR in the group and try to solicit some feedback.

—
Reply to this email directly or view it on GitHub.

bscSCORM · 2013-02-12T19:34:21Z

I think we'd want an array of objects with:

statementId
status (HTTP status code, or rather the code that would have been used for this success/failure condition if this was the only statement)
reason (string, explain the problem if not obvious from code (and not successful))

Including both good & bad statements, in the order they were sent.

That way clients that don't set statement IDs can still figure out what the statement IDs for the good statements are (I guess the IDs would be blank for statements that don't get created).

Maybe the "post multiple statements" mode should require statement IDs? Though I'm afraid that would break some clients.

I think the HTTP status code for the overall request should be something non-200 if there were any failures, to make sure that clients which weren't checking for it considered the overall request a failure, seems better than the reverse.

garemoko · 2013-03-03T20:58:42Z

I think that makes sense. We could always recommend that statement IDs are included but not require it.

As for the HTTP status code for the overall request, I'd love to suggest 418 but 207 seems more appropriate. Interestingly this issue seems to have already been considered, see:

http://en.wikipedia.org/wiki/List_of_HTTP_status_codes

https://tools.ietf.org/html/rfc4918

I didn't yet have a chance to read the rfc.

Andrew

garemoko · 2013-03-03T21:08:27Z

Some links to relevant sections of the RFC:

https://tools.ietf.org/html/rfc4918#section-13

https://tools.ietf.org/html/rfc4918#section-9.6.2

Does this standard meet our needs?

bscSCORM · 2013-03-03T23:49:38Z

I've created this issue: #70 to continue the "atomicy" discussion, since it doesn't actually impact the proposed text in this PR, but is an additional proposed change to deal with some side effects of large statements of which attachments are one example.

I suggest we merge this PR, and continue the discussion on atomicity there.

I will also add an attachment example as a new PR after we merge, unless folks feel it is necessary for discussion -- if not I'd prefer to do that work against a feature we've agreed to include.

garemoko · 2013-03-04T06:07:35Z

Agreed. More frequent smaller merges FTW.

bscSCORM notifications@github.com wrote:

I've created this issue: #70 to continue the "atomicy" discussion, since it doesn't actually impact the proposed text in this PR, but is an additional proposed change to deal with some side effects of large statements of which attachments are one example.

I suggest we merge this PR, and continue the discussion on atomicity there.

I will also add an attachment example as a new PR after we merge, unless folks feel it is necessary for discussion -- if not I'd prefer to do that work against a feature we've agreed to include.

—
Reply to this email directly or view it on GitHub.

bscSCORM · 2013-03-04T15:08:36Z

@andyjohnson looks like we should be able to merge this now

Added proposed attachment feature DISCUSS BEFORE MERGE

Added proposed attachment feature

11a3a5d

bscSCORM mentioned this pull request Feb 15, 2013

Statement Signing #60

Closed

garemoko mentioned this pull request Mar 3, 2013

Binary (potentially large) data in statements #23

Closed

bscSCORM mentioned this pull request Mar 3, 2013

Should multi-statement posts be atomic? #70

Closed

andyjohnson added a commit that referenced this pull request Mar 4, 2013

Merge pull request #57 from bscSCORM/binary

cfec7c2

Added proposed attachment feature DISCUSS BEFORE MERGE

andyjohnson merged commit cfec7c2 into adlnet:master Mar 4, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added proposed attachment feature DISCUSS BEFORE MERGE #57

Added proposed attachment feature DISCUSS BEFORE MERGE #57

bscSCORM commented Feb 9, 2013

bscSCORM commented Feb 9, 2013

bscSCORM commented Feb 10, 2013

garemoko commented Feb 10, 2013

bscSCORM commented Feb 10, 2013

nhruska commented Feb 12, 2013

bscSCORM commented Feb 12, 2013

nhruska commented Feb 12, 2013

garemoko commented Feb 12, 2013

bscSCORM commented Feb 12, 2013

garemoko commented Mar 3, 2013

garemoko commented Mar 3, 2013

bscSCORM commented Mar 3, 2013

garemoko commented Mar 4, 2013

bscSCORM commented Mar 4, 2013

Added proposed attachment feature DISCUSS BEFORE MERGE #57

Added proposed attachment feature DISCUSS BEFORE MERGE #57

Conversation

bscSCORM commented Feb 9, 2013

bscSCORM commented Feb 9, 2013

bscSCORM commented Feb 10, 2013

garemoko commented Feb 10, 2013

bscSCORM commented Feb 10, 2013

nhruska commented Feb 12, 2013

bscSCORM commented Feb 12, 2013

nhruska commented Feb 12, 2013

garemoko commented Feb 12, 2013

bscSCORM commented Feb 12, 2013

garemoko commented Mar 3, 2013

garemoko commented Mar 3, 2013

bscSCORM commented Mar 3, 2013

garemoko commented Mar 4, 2013

bscSCORM commented Mar 4, 2013