Balance between text and data? #9

seanredmond · 2018-03-06T18:40:46Z

Some bits of the data are so far going to be converted to attributes, meaning they'll be taken out of the text representation of the XML though the data is preserved. Can we decide on a principal to help guide when that occurs. To take the copies element as an example:

it could (1) just be text

<copies>2c 3May51</copies>

The current proposal (2) from DCL is to regularize the date (see #8)

<copies date="1951-05-03">2c</copies>

But we could go further (3) and just parse out the number of copies, too, so that it's an empty tag

<copies date="1951-05-03" num="2"/>

Or combine the first and third (4)

<copies date="1951-05-03" num="2">2c 3May51</copies>

I think either the first or the last (and really, I think the last is the best option). They both preserve the original information. The second (currently proposed) version does some of the processing up front and makes later processing easier but leaves out an important piece. The last option will be the easiest do deal with for both human and machine.

The text was updated successfully, but these errors were encountered:

seanredmond · 2018-03-19T18:14:42Z

After some offline discussion we're going to handle this according to a few of principles:

Try to capture everything: Don't assume any detail will be uninteresting
Don't add or remove any text: If you strip the XML tags, you should end up with the original text of the entry
Add data and interpretation as attributes: Following the previous principle, anything we add (for convenience, regularization, etc.) should be added as attributes.

For this particular issue then, we will go with the last option:

<copies date="1951-05-03" num="2">2c 3May51</copies>

which both preserves the original text, but adds some derived attributes that will make the data easier to work with.

seanredmond closed this as completed Mar 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balance between text and data? #9

Balance between text and data? #9

seanredmond commented Mar 6, 2018

seanredmond commented Mar 19, 2018

Balance between text and data? #9

Balance between text and data? #9

Comments

seanredmond commented Mar 6, 2018

seanredmond commented Mar 19, 2018