Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of word comments in docxtotei #275

Closed
jamescummings opened this issue Jun 28, 2017 · 1 comment
Closed

Improve handling of word comments in docxtotei #275

jamescummings opened this issue Jun 28, 2017 · 1 comment
Assignees
Labels
status: inProgress Ticket has been assigned and someone is working on it.

Comments

@jamescummings
Copy link
Member

jamescummings commented Jun 28, 2017

This is probably low priority but... currently in docxtotei we take comments and embed them as notes at the point where there is a w:commentReference. Basically, we treat them very similar to how we treat footnotes. However, this loses one important bit of information, the range of text to which the comment applies. Say we have a docx where I've made a comment and I've highlighted the bit of text which says "This is some test text." and I've made a comment that says "This is boring." on that bit of text. In the TEI we would get something like:

<p>Here is some earlier text. This is some test text. 
<note place="comment" resp="James_Cummings">
<date when="2017-06-28T13:30:00:00Z"/>
This is boring.
</note>
</p>

(ignoring that the @resp should be a URI-fragment but isn't... What it doesn't tell me is the start and end points of the comment. This is stored in the word document as:

<w:commentRangeStart w:id="0"/>
      <w:r>
        <w:rPr>
          <w:rFonts w:cs="Times New Roman" w:ascii="Times New Roman" 
             w:hAnsi="Times New Roman"/>
          <w:sz w:val="24"/>
          <w:szCs w:val="24"/>
        </w:rPr>
        <w:t>This is some test text.</w:t>
      </w:r>
      <w:r>
        <w:rPr>
          <w:rFonts w:cs="Times New Roman" w:ascii="Times New Roman" 
             w:hAnsi="Times New Roman"/>
          <w:sz w:val="24"/>
          <w:szCs w:val="24"/>
        </w:rPr>
      </w:r>
      <w:commentRangeEnd w:id="0"/>
      <w:r>
        <w:commentReference w:id="0"/>
      </w:r> 

You'll notice the w:commentRangeStart and w:commentRangeEnd have the same @w:id as the w:commentReference that we currently use. The comment itself is stored in a separate file in the word docx zip comments.xml which would have an entry like:

<w:comment w:id="0" w:author="James_Cummings" w:date="2017-06-28T13:30:00Z" w:initials="JC">
    <w:p>
      <w:r>
        <w:rPr>
          <w:rFonts w:ascii="Liberation Serif" w:hAnsi="Liberation Serif" 
              w:eastAsia="DejaVu Sans" w:cs="DejaVu Sans"/>
          <w:sz w:val="24"/>
          <w:szCs w:val="24"/>
          <w:lang w:val="en-US" w:eastAsia="en-US" w:bidi="en-US"/>
        </w:rPr>
        <w:t>This is boring.</w:t>
      </w:r>
    </w:p>
  </w:comment> 

One of the reasons this is hard to deal with is that the w:commentRangeStart and w:commentRangeEnd might be interrupting some other feature that we are creating. e.g. it might create overlapping hierarchies or similar as comments may stretch over paragraph boundaries, etc. Word's insistence on turning everything into a w:r run of text might actually be helpful here. I suggest that we turn w:commentReference into a note as we are doing but stick the w:id into the @n attribute. (While this should be unique amongst the document I don't trust word.) Similarly I suggest that we turn w:commentRefStart and w:commentRefEnd into milestones like <milestone type="commentRefStart" n="0"/>. This would leave that TEI text generated as something like:

<p>Here is some earlier text. <milestone type="commentRefStart" n="0"/>
This is some test text.<milestone type="commentRefEnd" n="0"/>
<note place="comment" resp="James_Cummings" n="0">
<date when="2017-06-28T13:30:00:00Z"/>
This is boring.
</note>
</p>

I believe the place this is currently handled is at:

<xsl:template match="w:commentReference">
<xsl:variable name="commentN" select="@w:id"/>
<xsl:for-each
select="document(concat($wordDirectory,'/word/comments.xml'))/w:comments/w:comment[@w:id=$commentN]">
<note place="comment" resp="{translate(@w:author,' ','_')}">
<date when="{@w:date}"/>
<xsl:apply-templates/>
</note>
</xsl:for-each>
</xsl:template>
and similar templates for w:commentRefStart and w:commentRefEnd could be added adjacent to this one.

Since I don't think anyone will object I might have a go at doing it.

@jamescummings jamescummings self-assigned this Jun 28, 2017
jamescummings added a commit that referenced this issue Jun 28, 2017
@jamescummings
Copy link
Member Author

jamescummings commented Jun 28, 2017

Ah, I forgot that milestone has a required @unit changing this to <anchor> which probably makes more sense anyway. Change at e9b9238

@jamescummings jamescummings added the status: inProgress Ticket has been assigned and someone is working on it. label Jun 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: inProgress Ticket has been assigned and someone is working on it.
Projects
None yet
Development

No branches or pull requests

2 participants