Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

fix for information loss on footnotes/endnotes within XWPFRun.toString #3

Open
wants to merge 1 commit into from

2 participants

@akhikhl

Dear Apache POI Team,

Please consider a problem: whenever MS-Word document with footnotes/endnotes is being parsed with XWPFWordExtractor, information on the location of footnote/endnote references is lost. This information loss is clearly observed in, for example, Apache Tika output.

To reproduce a problem, please insert the following code to TestXWPFWordExtractor.testFootnotes:

    java.io.FileWriter w = new java.io.FileWriter(new java.io.File(System.getProperty("user.home"), "footnotes.output.txt"));
    try {
      w.write(extractor.getText());
    } finally {
      w.close();
    }

then run tests and inspect the content of "footnotes.output.txt" - it contains "Eto ochen prostoy text so snoskoy", where between "prostoy" and "text" there should be a footnote reference (and it is lost).

SOLUTION:
I suggest to introduce additional markup like [footnoteRef:num], [endnoteRef:num], which will allow applications to correctly render footnote references.

Please, see commit details.

@Gagravarr

Thanks, committed in r1492308. (That should mirror through to git shortly)

@Gagravarr Gagravarr referenced this pull request from a commit
@Gagravarr Gagravarr Patch from akhikhl from github pull #3 - Extract references from XWPF…
… footnotes

git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1492308 13f79535-47bb-0310-9956-ffa450edef68
8185178
@ischindl ischindl referenced this pull request from a commit in ischindl/poi
@Gagravarr Gagravarr Patch from akhikhl from github pull #3 - Extract references from XWPF…
… footnotes

git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1492308 13f79535-47bb-0310-9956-ffa450edef68
3c62ed6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
7 src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFRun.java
@@ -52,6 +52,7 @@ Licensed to the Apache Software Foundation (ASF) under one or more
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDrawing;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTEmpty;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTFonts;
+import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTFtnEdnRef;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHpsMeasure;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTOnOff;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPTab;
@@ -817,6 +818,12 @@ public String toString() {
text.append("\n");
}
}
+ if (o instanceof CTFtnEdnRef) {
+ CTFtnEdnRef ftn = (CTFtnEdnRef)o;
+ String footnoteRef = ftn.getDomNode().getLocalName().equals("footnoteReference") ?
+ "[footnoteRef:" + ftn.getId().intValue() + "]" : "[endnoteRef:" + ftn.getId().intValue() + "]";
+ text.append(footnoteRef);
+ }
}
c.dispose();
View
10 src/ooxml/testcases/org/apache/poi/xwpf/extractor/TestXWPFWordExtractor.java
@@ -166,8 +166,9 @@ public void testHeadersFooters() throws IOException {
public void testFootnotes() throws IOException {
XWPFDocument doc = XWPFTestDataSamples.openSampleDocument("footnotes.docx");
XWPFWordExtractor extractor = new XWPFWordExtractor(doc);
-
- assertTrue(extractor.getText().contains("snoska"));
+ String text = extractor.getText();
+ assertTrue(text.contains("snoska"));
+ assertTrue(text.contains("Eto ochen prostoy[footnoteRef:1] text so snoskoy"));
}
@@ -190,8 +191,9 @@ public void testFormFootnotes() throws IOException {
public void testEndnotes() throws IOException {
XWPFDocument doc = XWPFTestDataSamples.openSampleDocument("endnotes.docx");
XWPFWordExtractor extractor = new XWPFWordExtractor(doc);
-
- assertTrue(extractor.getText().contains("XXX"));
+ String text = extractor.getText();
+ assertTrue(text.contains("XXX"));
+ assertTrue(text.contains("tilaka [endnoteRef:2]or 'tika'"));
}
public void testInsertedDeletedText() throws IOException {
Something went wrong with that request. Please try again.