You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is probably low priority but... currently in docxtotei we take comments and embed them as notes at the point where there is a w:commentReference. Basically, we treat them very similar to how we treat footnotes. However, this loses one important bit of information, the range of text to which the comment applies. Say we have a docx where I've made a comment and I've highlighted the bit of text which says "This is some test text." and I've made a comment that says "This is boring." on that bit of text. In the TEI we would get something like:
<p>Here is some earlier text. This is some test text.
<note place="comment" resp="James_Cummings">
<date when="2017-06-28T13:30:00:00Z"/>
This is boring.
</note>
</p>
(ignoring that the @resp should be a URI-fragment but isn't... What it doesn't tell me is the start and end points of the comment. This is stored in the word document as:
<w:commentRangeStart w:id="0"/>
<w:r>
<w:rPr>
<w:rFonts w:cs="Times New Roman" w:ascii="Times New Roman"
w:hAnsi="Times New Roman"/>
<w:sz w:val="24"/>
<w:szCs w:val="24"/>
</w:rPr>
<w:t>This is some test text.</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:cs="Times New Roman" w:ascii="Times New Roman"
w:hAnsi="Times New Roman"/>
<w:sz w:val="24"/>
<w:szCs w:val="24"/>
</w:rPr>
</w:r>
<w:commentRangeEnd w:id="0"/>
<w:r>
<w:commentReference w:id="0"/>
</w:r>
You'll notice the w:commentRangeStart and w:commentRangeEnd have the same @w:id as the w:commentReference that we currently use. The comment itself is stored in a separate file in the word docx zip comments.xml which would have an entry like:
One of the reasons this is hard to deal with is that the w:commentRangeStart and w:commentRangeEnd might be interrupting some other feature that we are creating. e.g. it might create overlapping hierarchies or similar as comments may stretch over paragraph boundaries, etc. Word's insistence on turning everything into a w:r run of text might actually be helpful here. I suggest that we turn w:commentReference into a note as we are doing but stick the w:id into the @n attribute. (While this should be unique amongst the document I don't trust word.) Similarly I suggest that we turn w:commentRefStart and w:commentRefEnd into milestones like <milestone type="commentRefStart" n="0"/>. This would leave that TEI text generated as something like:
<p>Here is some earlier text. <milestone type="commentRefStart" n="0"/>
This is some test text.<milestone type="commentRefEnd" n="0"/>
<note place="comment" resp="James_Cummings" n="0">
<date when="2017-06-28T13:30:00:00Z"/>
This is boring.
</note>
</p>
I believe the place this is currently handled is at:
This is probably low priority but... currently in docxtotei we take comments and embed them as notes at the point where there is a w:commentReference. Basically, we treat them very similar to how we treat footnotes. However, this loses one important bit of information, the range of text to which the comment applies. Say we have a docx where I've made a comment and I've highlighted the bit of text which says "This is some test text." and I've made a comment that says "This is boring." on that bit of text. In the TEI we would get something like:
(ignoring that the
@resp
should be a URI-fragment but isn't... What it doesn't tell me is the start and end points of the comment. This is stored in the word document as:You'll notice the w:commentRangeStart and w:commentRangeEnd have the same
@w:id
as the w:commentReference that we currently use. The comment itself is stored in a separate file in the word docx zip comments.xml which would have an entry like:One of the reasons this is hard to deal with is that the w:commentRangeStart and w:commentRangeEnd might be interrupting some other feature that we are creating. e.g. it might create overlapping hierarchies or similar as comments may stretch over paragraph boundaries, etc. Word's insistence on turning everything into a w:r run of text might actually be helpful here. I suggest that we turn w:commentReference into a note as we are doing but stick the w:id into the
@n
attribute. (While this should be unique amongst the document I don't trust word.) Similarly I suggest that we turn w:commentRefStart and w:commentRefEnd into milestones like<milestone type="commentRefStart" n="0"/>
. This would leave that TEI text generated as something like:I believe the place this is currently handled is at:
Stylesheets/docx/from/textruns.xsl
Lines 79 to 88 in 7a3677b
Since I don't think anyone will object I might have a go at doing it.
The text was updated successfully, but these errors were encountered: