You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some bits of the data are so far going to be converted to attributes, meaning they'll be taken out of the text representation of the XML though the data is preserved. Can we decide on a principal to help guide when that occurs. To take the copies element as an example:
it could (1) just be text
<copies>2c 3May51</copies>
The current proposal (2) from DCL is to regularize the date (see #8)
<copies date="1951-05-03">2c</copies>
But we could go further (3) and just parse out the number of copies, too, so that it's an empty tag
I think either the first or the last (and really, I think the last is the best option). They both preserve the original information. The second (currently proposed) version does some of the processing up front and makes later processing easier but leaves out an important piece. The last option will be the easiest do deal with for both human and machine.
The text was updated successfully, but these errors were encountered:
After some offline discussion we're going to handle this according to a few of principles:
Try to capture everything: Don't assume any detail will be uninteresting
Don't add or remove any text: If you strip the XML tags, you should end up with the original text of the entry
Add data and interpretation as attributes: Following the previous principle, anything we add (for convenience, regularization, etc.) should be added as attributes.
For this particular issue then, we will go with the last option:
Some bits of the data are so far going to be converted to attributes, meaning they'll be taken out of the text representation of the XML though the data is preserved. Can we decide on a principal to help guide when that occurs. To take the
copies
element as an example:it could (1) just be text
The current proposal (2) from DCL is to regularize the date (see #8)
But we could go further (3) and just parse out the number of copies, too, so that it's an empty tag
Or combine the first and third (4)
I think either the first or the last (and really, I think the last is the best option). They both preserve the original information. The second (currently proposed) version does some of the processing up front and makes later processing easier but leaves out an important piece. The last option will be the easiest do deal with for both human and machine.
The text was updated successfully, but these errors were encountered: