-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implications of trimming whitespace from course structure #735
Comments
(for future readers, read the other comments as this one only addresses one narrow part of the spec and does not address other statements in the spec that modify final behavior) Your example is a bit misleading, as it conflates XML parsing with browser rendering. This demonstrates that the browser's Javascript-based XML parser preserves all whitespace when reading XML (see demo). The And indeed, in a vacuum, this is correct. The XML specification is very clear about default parsing behavior, which is to preserve whitespace unless you explicitly provide an But we're also defining the Course Structure with an XSD, specifically https://github.com/AICC/CMI-5_Spec_Current/blob/quartz/v1/CourseStructure.xsd . If we refer to the XML Schema W3C spec: https://www.w3.org/TR/xmlschema11-2/#rf-whiteSpace , you'll see the following:
The
If we follow xs:anyURI's definition: https://www.oreilly.com/library/view/xml-schema/0596002521/re56.html
You can see it has an xs:whiteSpace value of collapse, which is:
So for this one element, it's clear that it's okay to strip whitespace from the beginning and end of the element, and flatten any internal spacing. Your implementation, therefore, if it's using the XSD, should do this by default for this one element (but not necessarily all of them, you have to verify each one with the XSD, that's why it's there!) If you're not using the XSD, it's up to you to take the post-processed output of your XML parser and do what the XSD declares with each field. |
@gavbaa apologies, I have just substantially changed my comment without seeing your reply. |
This was very useful to me, thank you for taking the time to put that together. |
To analyze other elements in the XSD, we need to know the following (XML Schema 2.4.1):
anyURI, for example, is one of the atomic datatypes. For the others, an That leaves us with:
|
So would you take issue with the following in the spec:
|
I don't take issue with it, it's a transformation step that occurs after the XML document is parsed per the XSD rules. The XML Course Structure doesn't have to be canonically parsed only from the XML parser, it's reasonable to have additional transformations occur after that point. Everything I wrote above is strictly about the types chosen and the behavior of XML parsers. The specification goes beyond that, and while it gives a nod to Also, |
Ok that makes sense. I am concerned that by removing all leading/trailing whitespace on import that url, launchParameters, entitlementKey and Vendor Specific Metadata cannot begin with a space and therefore there is an implied requirement that isn't as clear as it could be. I am imagining someone generating an entitlementKey with leading whitespace or an AU authoring tool that created filenames with spaces. Do you have any thoughts on that? |
@thomasturrell, when you mention filenames with spaces I guess you mean that the filename would be used in the As I understand from comments by @gavbaa , the Maybe it is not specified in the cmi5 spec (sorry, I haven't checked deeply) but at least I think it is reasonable that the URL values used in the I found some description of the
And more definition of valid values here:
My two cents are that leading or ending spaces in the So if the spaces are intended then the values must be like this: Should the spec detail more about the values of the XML format, and especially about the expectations regarding encoding of the elements that have something do to with URLs and URL query string parameters? |
@geirfp, thank you for you reply, it's very useful to get a broad perspective.
Yes exactly that. When a course is packaged in ZIP Format the URL is meant to be relative (presumably relative to the cmi5.xml which is always in the root directory of the zip file). My understanding is that As well as spaces a file in a zip can contain # which would have special meaning in a URL. The following is a valid filename:
The resulting URL might be something similar to:
But launch.html wouldn't exist on the server because the filename would be
One problem with that approach is a file in a zip can be prefixed with %20 or a space. %20 does not mean space in a filename in a zip file. It's quite possibly an extreme edge case, and I am taking up too much time on this issue. However I usually find that if I can think of an edge case eventually someone will implement it. |
IMO, it's invalid to try to use rules around what's contained in a ZIP package to have any overlap with what's put in a URL/URI/IRI storage field. They are completely separate indications of storage and routable location, and any overlap they have is coincidental. Since we have the basis of "the attribute must be launchable in a web browser" (this isn't spec wording, I'm summarizing), it doesn't matter what's in the ZIP file. If someone names something weird in the ZIP, it's their responsibility to form a URI that is adequately capable of referencing that asset, just like they would have to to reference the asset as an import in any other HTML file that references that asset. Browser rules apply at that level, not ZIP rules. And since that's the case, I don't see any point in spelling out those rules, they're defined more than adequately by various RFCs. |
Per the July 16th Meeting, The group agreed that leading /trailing whitespace is addressed in section 13.1 of the specification and that XML processing rules can handle file naming/query string issues. |
The following xml is taken from the complex cmi5 course structure in the spec.
If I read this verbatim, it is not a valid url. In order to extract a valid URL the whitespace would need to be trimmed (which is required by the specification).
If I trim the whitespace from it will mean that relative URL'S cannot begin with a space.
So if the url was:
If the whitespace is removed the file would not be found in the course structure.
The following might be affected by whitespace trimming:
I understand that this is a corner case but I think it is note worthy.
The text was updated successfully, but these errors were encountered: