Evaluating `<seg>` and/or `<w>`: Methodological Questions #50

ebeshero · 2016-01-14T23:29:10Z

@HelenaSabel @setriplette Stacey and I are meeting and look at the <seg> markup, and it has raised some very significant methodological questions for us. First, we see that the <seg> tags are not getting in the way of the "clause-chunk" markup and the @xml:ids that Helena has introduced are logical to indicate sub-segments of a clause-like unit. However, we're weighing the extra coding in the balance and wondering how significant the issue of word-count per clause really is. Stacey is expecting this <seg> markup to take up a lot of time. However, we also see that it's not actually interfering with other markup, so we could proceed with it if @HelenaSabel wants to experiment with it: What could we do with this <seg> markup that might be useful? Are the word count "rates" per clause in our tables actually going to be affected much?

Methodological Considerations:

We don't really need the alignment tables to give us a sequential "reading view" of the texts, since translation scholars can readily pick up the pieces that Southey is reformulating and see where they fit in Montalvo. (In some ways, the reading view is the least necessary part of this project since both texts are easily available to people.) What's useful to scholars is simply to be able quickly to see the major changes that Southey makes (which we're already highlighting in colors on the HTML tables.)

Word-count issue: Comparing the word-count of a Southey clause to that of a Montalvo clause is not an absolute measure of compression because the clauses are constructed based on different logics, which makes their start- and end-points fail to match. (Aside: we wonder if a Renaissance English translation like Anthony Munday's might look more like Montalvo's! Stacey is about to go read more of Munday soon-ish, er, one of these days!) More to the point, here's a very serious question we need to consider:

Are there other things besides word counts that we can compare between Southey's clauses vs. Montalvo's clause-like chunks?

We carved up Montalvo into those clause-like chunks because we thought we needed a fairly large portion of text to be able to locate matches between disparate languages. (At the level of the chapter, the alignments would be ridiculously simple, and at the level of the word, virtually impossible to get right. So we're hoping for something of a happy medium with the clause-like chunks.) Stacey is telling me that Southey's writing of sentences and his own clause units do not respect Montalvo's clauses--that Southey's sentence structure is not really dependent on Montalvo's. He's just not translating closely enough for there to be a strong correlation. The smallest unit that Southey's working with, says Stacey, is Montalvo's chapter! The word-for-word translations are of small pieces, indeed segments--just strings of words, but they don't necessarily follow the clause divisions. (I (Elisa) wonder if we'll start seeing segments that overlap the Montalvo clause hierarchies--segments that Southey has carved out from a couple of Montalvo clauses. Stacey says, she thinks we've already seen this.) We've seen this overlap for ourselves, but our current markup scheme doesn't let us render it. Hmmm.

So, that's why we're not sure if the <seg> markup is all that valuable! Eventually, if we keep on at this rate of reducing the units of text to which we apply @xml:ids, how soon will it be before we are putting an @xml:id on every word? Should we consider doing a word-by-word study? Stacey is saying, if we decide to try this, perhaps we'd best do it now while we only have a few chapters coded so we can experiment a bit with our methods before we go much further. (It's easy enough to create word-by-word markup, but we'll have a LOT of code to read and process!)

Advantages of word-by-word markup:

We'll definitely fine-tune our word counts!
If we can automate alignment, we don't have to do manual "stitchery"!
Word-by-word markup might cause us to question whether we need to be studying clauses and clause-like units at all, since Southey's sentences don't match Montalvo's clauses.
For example, the HyperMachiavel project (the Machiavelli translation-study project we heard about on our panel at the TEI conference) did this kind of word-by-word markup and automated its correlation. Here's their presentation: https://prezi.com/qrvc_uoil-ta/hypermachiavel-a-translation-comparison-tool/

Disadvantages of/Concerns about word-by-word markup:

If we only work with word-by-word markup, we don't know how we would go about aligning passages of the translation: That becomes a different kind of challenge. We could try it on one chapter to see what it's like! But we don't know until we try it out.
Automation might produce many, many errors: would it save us time? Perhaps if the alignment is consistent and reproducible in a way that human alignments simply aren't. (For example, we humans probably wouldn't be consistent with ourselves, if we went back and applied our stitchery to the same chapter. But a computer would behave predictably and we could program our alignments...)
If we could automate the markup of passages in Montalvo that are omitted in Southey through word-by-word comparisons, there might be benefits. BUT we would need to build our own dictionary of Montalvo's Spanish to Southey's English (or take an existing dictionary and add to it).
Such a Dictionary might be very, very useful and interesting to work on (and perhaps could be built up to work with other texts from that century, but it would be a huge project in and of itself.

SO: Do we want to try to create a dictionary for this project? Or is it better to proceed with the clause units and "synch-stitchery" as we've started?

The text was updated successfully, but these errors were encountered:

ebeshero · 2016-02-06T20:41:45Z

Discussed and decided in person in the week of 2016-02-02 during @HelenaSabel's visit to Greensburg. We continue with <seg> markup. @setriplette

ebeshero mentioned this issue Jan 15, 2016

GitHub Hygiene: Forking this Repository #47

Open

ebeshero added the question label Jan 23, 2016

ebeshero closed this as completed Feb 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating `<seg>` and/or `<w>`: Methodological Questions #50

Evaluating `<seg>` and/or `<w>`: Methodological Questions #50

ebeshero commented Jan 14, 2016

ebeshero commented Feb 6, 2016

Evaluating <seg> and/or <w>: Methodological Questions #50

Evaluating <seg> and/or <w>: Methodological Questions #50

Comments

ebeshero commented Jan 14, 2016

Methodological Considerations:

Advantages of word-by-word markup:

Disadvantages of/Concerns about word-by-word markup:

ebeshero commented Feb 6, 2016

Evaluating `<seg>` and/or `<w>`: Methodological Questions #50

Evaluating `<seg>` and/or `<w>`: Methodological Questions #50