Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added proposed attachment feature DISCUSS BEFORE MERGE #57

Merged
merged 1 commit into from
Mar 4, 2013

Conversation

bscSCORM
Copy link
Contributor

@bscSCORM bscSCORM commented Feb 9, 2013

This is a proposal for how to handle binary data in statements.
As I wrote up the details, I change the approach from what we discussed in Alexandria, and in the email dicussion with Nik, for reasons I will describe below.

GitHub issues related to the PR:
#38
#23

Note that this proposal specifies the use of "multipart/mixed" as a transmission format to handle attachments, rather than using base64 within the json as we had previously discussed. This is a break from the model of "The transmission format of a statement matches the logical model of a statement". Breaking from the model used elsewhere in the spec will add some confusion, so we should not do it lightly. However, we do gain the following advantages:

  • Systems receiving statement streams with attachments will be able to inspect the statement(s) before receiving the attachment, and may therefore validate the rest of the statement(s) before storing the attachment(s).
  • Attachments may be sent without applying any encoding (such as base64)
  • Avoids having a sub-property of the statement JSON which is sometimes included, and sometimes not (the attachment binary is now always outside the JSON)
  • Maps very well to the mental model of "email attachments", which the binary feature was emulating in some regards anyway

Additionally, please note that I had discussed in an email with Nik pulling in some concepts from Activity Streams attachments. I wound up not including most of them for the following reasons:

  • md5: this is insecure and shouldn't be used. Still, it might have been worth including for compatibility with activity streams, if the SHA-2 wasn't required. However I wanted to require SHA-2 to ensure the integrity of statements where the headers may have been retrieved earlier and then the client goes back to retrieve the data. If one accepts that SHA-2 should be required, also allowing md5 to be included adds nothing.
  • compression: This seemed like a nice idea when I first saw it (see Nik's email), but most formats we'd want to include are compressed to some extent anyway, and HTTP itself already supports compression, and servers are free to compress the attachments once they receive them. Additionally, since I wanted the LRS to verify the SHA-2 to ensure statement integrity, what is more useful is the SHA-2 of the uncompressed data... this stopped sounding like a good idea.
  • data: since the actual attachment is now being transmitted outside the statement, and TCAPI only defines transmission, not storage, there is no use for this property.

Note, that even without the specified fields, it is possible to convert from an activity streams binary to a Tin Can binary (provided the actual data of the attachment is available). We will not be able to use this attachment feature to represent an activity streams binary object that points to a dead URL, but that could always be done in an extension. And a reference to a dead URL is of limited use.

@bscSCORM
Copy link
Contributor Author

bscSCORM commented Feb 9, 2013

Also note: while I was writing up this feature, it struck me that particularly with the ability to POST a set of statements that all have large attachments, and the LRS potentially finding a hash mismatch on the last statement... we should provide a way for the LRS to accept some statements in a batch, and reject others, and inform the client.

@bscSCORM
Copy link
Contributor Author

Not so related to this particular solution, but note that having an attachment mechanism, including the ability to retrieve attachment contents or not, will be of help for statement signing. A signature can be just another attachment "usageType".

@garemoko
Copy link
Contributor

This looks good to me.

A few questions:

  1. What's the purpose of the SHA field? Is this to protect against corrupted files?
  2. If a file url is provided, does the attachment also need to be sent with the statement? Is the request still mixed/multipart in this case?
  3. Am I right to assume that an example statement with attachment would be too complex? If not, can we add an example maybe in or as an appendix?

Andrew

bscSCORM notifications@github.com wrote:

This is a proposal for how to handle binary data in statements.
As I wrote up the details, I change the approach from what we discussed in Alexandria, and in the email dicussion with Nik, for reasons I will describe below.

GitHub issues related to the PR:
#38
#23

Note that this proposal specifies the use of "multipart/mixed" as a transmission format to handle attachments, rather than using base64 within the json as we had previously discussed. This is a break from the model of "The transmission format of a statement matches the logical model of a statement". Breaking from the model used elsewhere in the spec will add some confusion, so we should not do it lightly. However, we do gain the following advantages:

Systems receiving statement streams with attachments will be able to inspect the statement(s) before receiving the attachment, and may therefore validate the rest of the statement(s) before storing the attachment(s).Attachments may be sent without applying any encoding (such as base64)Avoids having a sub-property of the statement JSON which is sometimes included, and sometimes not (the attachment binary is now always outside the JSON)Maps very well to the mental model of "email attachments", which the binary feature was emulating in some regards anyway

Additionally, please note that I had discussed in an email with Nik pulling in some concepts from Activity Streams attachments. I wound up not including most of them for the following reasons:

md5: this is insecure and shouldn't be used. Still, it might have been worth including for compatibility with activity streams, if the SHA-2 wasn't required. However I wanted to require SHA-2 to ensure the integrity of statements where the headers may have been retrieved earlier and then the client goes back to retrieve the data. If one accepts that SHA-2 should be required, also allowing md5 to be included adds nothing.compression: This seemed like a nice idea when I first saw it (see Nik's email), but most formats we'd want to include are compressed to some extent anyway, and HTTP itself already supports compression, and servers are free to compress the attachments once they receive them. Additionally, since I wanted the LRS to verify the SHA-2 to ensure statement integrity, what is more useful is the SHA-2 of the uncompressed data... this stopped sounding like a good idea.data: since the actual attachment is now being transmitted outside the statement, and TCAPI only defines transmission, not storage, there is no use for this property.

Note, that even without the specified fields, it is possible to convert from an activity streams binary to a Tin Can binary (provided the actual data of the attachment is available). We will not be able to use this attachment feature to represent an activity streams binary object that points to a dead URL, but that could always be done in an extension. And a reference to a dead URL is of limited use.

You can merge this Pull Request by running

git pull https://github.com/bscSCORM/xAPI-Spec binary

Or view, comment on, or merge it at:

  #57

Commit Summary

Added proposed attachment feature

File Changes

M xAPI.md (96)

Patch Links:

https://github.com/adlnet/xAPI-Spec/pull/57.patchhttps://github.com/adlnet/xAPI-Spec/pull/57.diff

@bscSCORM
Copy link
Contributor Author

@garemoko,

  1. The SHA-2 field is there because statements can be loaded without their attachments, and then attachments may be later fetched, so this ensures that the attachment didn't get switched in the meantime. It helps ensure the immutability of the attachments. This is particularly crucial if the attachment exists only at the URL. Additionally it serves as a handy natural ID, as opposed to specifying an ID field or leaving the attachment record without any thing that could be used as an ID.

  2. No, the attachment data need not be sent if URLs are provided. The transmission of statements with attachments need only be done using multipart/mixed if attachment data is actually being sent. I'd say it would be a best practice in most circumstances to include the attachment data when sending the statement as opposed to relying on a URL.

  3. I can add an example. It would have to be a text attachment to render properly in the specification document though, so we'll just have to note clearly with the example that these can be, and frequently will be binary attachments.

@nhruska
Copy link
Contributor

nhruska commented Feb 12, 2013

Hi Ben:

Thanks for the lengthy explanation. I agree with the model and see the advantages. I have a question though about this statement:
"we should provide a way for the LRS to accept some statements in a batch, and reject others, and inform the client."

Haven't we decided that batch statements would be treated as a single transaction?

@bscSCORM
Copy link
Contributor Author

@nhruska We have decided that, I should have been clearer I was suggesting re-opening that discussion.

I'm not entirely sure myself if that's the way we should go, but the attachment feature makes a very large stream of statement data more likely. The attachment feature, combined with the requirement for atomic statement batches pretty much forces LRSs to write statement data out to disk that they may eventually discard. This is probably a good idea anyway, but before the attachment requirement an LRS might have gotten away with holding statement batches in memory while validating them.

So, since that requirement is being made more onerous, I thought we should consider it again.

@nhruska
Copy link
Contributor

nhruska commented Feb 12, 2013

@bscSCORM Indeed. I'm going to post a link to this PR in the group and try to solicit some feedback.

@garemoko
Copy link
Contributor

Only solution I can think of for this is that the LRS returns an array if failed statement ids. I guess you could do it the other way round and return succeeded statement ids.

Are there any other options on the table?

Nikolaus Hruska notifications@github.com wrote:

@bscSCORM Indeed. I'm going to post a link to this PR in the group and try to solicit some feedback.


Reply to this email directly or view it on GitHub.

@bscSCORM
Copy link
Contributor Author

I think we'd want an array of objects with:

statementId
status (HTTP status code, or rather the code that would have been used for this success/failure condition if this was the only statement)
reason (string, explain the problem if not obvious from code (and not successful))

Including both good & bad statements, in the order they were sent.

That way clients that don't set statement IDs can still figure out what the statement IDs for the good statements are (I guess the IDs would be blank for statements that don't get created).

Maybe the "post multiple statements" mode should require statement IDs? Though I'm afraid that would break some clients.

I think the HTTP status code for the overall request should be something non-200 if there were any failures, to make sure that clients which weren't checking for it considered the overall request a failure, seems better than the reverse.

@garemoko
Copy link
Contributor

garemoko commented Mar 3, 2013

I think that makes sense. We could always recommend that statement IDs are included but not require it.

As for the HTTP status code for the overall request, I'd love to suggest 418 but 207 seems more appropriate. Interestingly this issue seems to have already been considered, see:

http://en.wikipedia.org/wiki/List_of_HTTP_status_codes

https://tools.ietf.org/html/rfc4918

I didn't yet have a chance to read the rfc.

Andrew

@garemoko
Copy link
Contributor

garemoko commented Mar 3, 2013

Some links to relevant sections of the RFC:

https://tools.ietf.org/html/rfc4918#section-13

https://tools.ietf.org/html/rfc4918#section-9.6.2

Does this standard meet our needs?

@bscSCORM
Copy link
Contributor Author

bscSCORM commented Mar 3, 2013

I've created this issue: #70 to continue the "atomicy" discussion, since it doesn't actually impact the proposed text in this PR, but is an additional proposed change to deal with some side effects of large statements of which attachments are one example.

I suggest we merge this PR, and continue the discussion on atomicity there.

I will also add an attachment example as a new PR after we merge, unless folks feel it is necessary for discussion -- if not I'd prefer to do that work against a feature we've agreed to include.

@garemoko
Copy link
Contributor

garemoko commented Mar 4, 2013

Agreed. More frequent smaller merges FTW.

bscSCORM notifications@github.com wrote:

I've created this issue: #70 to continue the "atomicy" discussion, since it doesn't actually impact the proposed text in this PR, but is an additional proposed change to deal with some side effects of large statements of which attachments are one example.

I suggest we merge this PR, and continue the discussion on atomicity there.

I will also add an attachment example as a new PR after we merge, unless folks feel it is necessary for discussion -- if not I'd prefer to do that work against a feature we've agreed to include.


Reply to this email directly or view it on GitHub.

@bscSCORM
Copy link
Contributor Author

bscSCORM commented Mar 4, 2013

@andyjohnson looks like we should be able to merge this now

andyjohnson added a commit that referenced this pull request Mar 4, 2013
Added proposed attachment feature DISCUSS BEFORE MERGE
@andyjohnson andyjohnson merged commit cfec7c2 into adlnet:master Mar 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants