Support for parsing Comments #43

kinwahlai · 2014-01-15T07:38:30Z

Thanks for a great library!
Here is my best attempt at contributing support to parse comments.
I used an existing file already in the project for the regressions tests.
I realize I coded the changes directly in the built xlsx.
Let me know if you want me to change something.

Thanks!

Comments parts listed in the [Content Types] are parsed.
Sheets's relationships are parsed.
Comments parts are correlated to their corresponding sheets parts.
Comments's contents are added to the ref'ed cells.
Rich text styling properties are currently ignored.

For example:

{
  "!ref": "A1:B3",
  "A1": {
    "v": 1,
    "t": "n"
  },
  "B1": {
    "v": "one",
    "t": "s",
    "r": "one",
    "c": [
      { "a": "Yegor Kozlov",
       "t": [ "Yegor Kozlov:",
              "\r\nfirst cell" ]
      }
    ]
  }
}

SheetJSDev · 2014-01-15T14:02:42Z

The test failed because I moved the POI test cases to a separate file. I pushed changes to the test_files repo and this repo's tests.lst -- can you pull the changes, make init then make test?

Comments parts listed in the [Content Types] are parsed. Sheets's relationships are parsed. Comments parts are correlated to their corresponding sheets parts. Comments's contents are added to the ref'ed cells. Rich text styling properties are currently ignored. For example: { "!ref": "A1:B3", "A1": { "v": 1, "t": "n" }, "B1": { "v": "one", "t": "s", "r": "one", "c": [ { "a": "Yegor Kozlov", "t": [ "Yegor Kozlov:", "\r\nfirst cell" ] } ] } }

kinwahlai · 2014-01-16T02:35:32Z

OK thanks for your attention. We rebased our branch on the top of your latest master.
We are working on putting the changes into the bits files to support the tests as expected by Travis.

SheetJSDev · 2014-01-16T02:55:42Z

@kinwahlai I can reconcile that later. According to https://travis-ci.org/SheetJS/js-xlsx/jobs/17037467, Travis is complaining because the file SimpleWithComments.xlsx was renamed to apachepoi_SimpleWithComments.xlsx. The fix is in the test script:

var wb = XLSX.readFile('./test_files/apachepoi_SimpleWithComments.xlsx');

Update test xlsx file to apachepoi_SimpleWithComments.xlsx

SheetJSDev · 2014-01-16T15:20:30Z

Hmm @KinWah @kinwahlai I thought I made a comment yesterday but that seemed to have disappeared ...

Can you take a look at https://github.com/SheetJS/test_files/blob/master/comments_stress_test.xlsx ? In the first sheet, there is a rich-text comment and the raw text is broken up. It may make more sense to concatenate rather than push the raw comments.

Also, do you need the rich text as well? There is some basic HTML conversion (search for 18.4.7 in the code) which really should be moved out of the shared string table function

    <comment ref="C1" authorId="0">
      <text>
        <r>
          <rPr>
            <b/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
            <family val="2"/>
          </rPr>
          <t xml:space="preserve">I </t>
        </r>
        <r>
          <rPr>
            <b/>
            <i/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
          </rPr>
          <t>really</t>
        </r>
        <r>
          <rPr>
            <b/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
            <family val="2"/>
          </rPr>
          <t xml:space="preserve"> </t>
        </r>
        <r>
          <rPr>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
            <family val="2"/>
          </rPr>
          <t xml:space="preserve">hope that </t>
        </r>
        <r>
          <rPr>
            <i/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
          </rPr>
          <t>xlsx</t>
        </r>
        <r>
          <rPr>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
            <family val="2"/>
          </rPr>
          <t xml:space="preserve"> </t>
        </r>
        <r>
          <rPr>
            <u/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
          </rPr>
          <t>decides not</t>
        </r>
        <r>
          <rPr>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
            <family val="2"/>
          </rPr>
          <t xml:space="preserve"> </t>
        </r>
        <r>
          <rPr>
            <i/>
            <u/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
          </rPr>
          <t xml:space="preserve">to use </t>
        </r>
        <r>
          <rPr>
            <b/>
            <i/>
            <u/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
          </rPr>
          <t>magic</t>
        </r>
        <r>
          <rPr>
            <b/>
            <u/>
            <sz val="9"/>
            <color indexed="81"/>
            <rFont val="Calibri"/>
          </rPr>
          <t xml:space="preserve"> like rPr</t>
        </r>
      </text>
    </comment>

kinwahlai · 2014-01-17T02:27:13Z

Hi @SheetJSDev; well in our usecase we need to know the internal structure of the comments. And we are not interested by the styling as html.
For example: to find out that a google-spreadsheet's note matches a comment we make sure we see single rich text in the comment. We put some JSON in some of those notes and we dont want the html styling to appear in there.
When we see multiple rich texts we discard them as they are not notes.

In fact we have the same requirement with number formats: we need to know about the number-formats as defined in the spreadsheet and we are not interested in the formatting of the raw values for them.

For the sake of consistency with the rest of the code we can certainly help with doing to the comments the same type of things than what happens for the other rich texts.

What do you think?

SheetJSDev · 2014-01-17T03:20:03Z

@kinwahlai my point was according to the spec, the text tags don't have author information. It's at the outer comment tag level (see my annotations below):

<comment ref="C1" authorId="0">   <-- AUTHOR IS IDENTIFIED HERE
      <text>  <--- this element does not have the author id
        <r> <-- this element does not have the author id
          <rPr>
             ...
          </rPr>
          <t xml:space="preserve">I </t> <-- this element does not have the author id
        </r>

Hence I would think it makes sense to combine the t tags. I have not played with google docs much, so I don't know how they do it (and my comment may not make sense for gdocs). Can you share an example file with comments from multiple users?

Also, Excel generates an empty tag for comments with no text. See the comments_stress_test.xlsx file (https://github.com/SheetJS/test_files/blob/master/comments_stress_test.xlsx):

<comment ref="B2" authorId="0"><text/>

You need to guard for the possibility that x.match(/<text>([^\u2603]*)<\/text>/m) is null

hmalphettes · 2014-01-18T02:12:33Z

Hi @SheetJSDev, @kinwahlai and I are pairing on this code.
The example sheet we were looking at in the openxml spec had 2 rich texts; I mistook it for the author and body of the comment.
Thanks for clarifying this. In fact we are parsing the author as expected.

Thanks for noting the bug regarding the potentially null text element that would crash the parser.

OK to concatenate the comments as a first step and to pull out the code that applies the comments styles as html styles. It does not really match our usecase and I think it would make our life harder if we were trying to generate xlsx files from the generated model.
But we don't know better than that for now at the moment ;-)

We will add a few more commits to that branch for your kind review then.

SheetJSDev · 2014-01-18T03:48:41Z

@kinwahlai @hmalphettes Keep in mind that the goal here is to find a solution that makes "sense". There is no standard solution here, which is why discussion is good :)

If you use the Excel UI "Insert Comment" option, it inserts the author name followed by a colon (in bold). However, since that is actual content in the comment, you can change it or remove it (and that's how you generate the empty comment in the sample sheet).

I think it would make our life harder if we were trying to generate xlsx files from the generated model.

Google docs does some wild stuff with the comments. A friend and I tried to make individual comments as well as comment "conversation", and apparently gdocs is not writing author information using the author tag (authors are identified using a dash followed by the name):

    <comment ref="A1" authorId="0">
      <text>
        <t xml:space="preserve">Foo bar

baz qux
    -Sheet JS
This is a conversation
    -Dev SheetJS</t>
      </text>
    </comment>

complete comments.xml from exported XLSX:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<comments xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <authors>
    <author/>
  </authors>
  <commentList>
    <comment ref="A2" authorId="0">
      <text>
        <t xml:space="preserve">5 6 7 8
    -Sheet JS</t>
      </text>
    </comment>
    <comment ref="A2" authorId="0">
      <text>
        <t xml:space="preserve">1 2 3 4
    -Dev SheetJS</t>
      </text>
    </comment>
    <comment ref="A1" authorId="0">
      <text>
        <t xml:space="preserve">Foo bar

baz qux
    -Sheet JS
This is a conversation
    -Dev SheetJS</t>
      </text>
    </comment>
  </commentList>
</comments>

hmalphettes · 2014-01-18T08:02:34Z

@SheetJSDev thanks for trying all these examples.
And thanks for reminding us of the bigger cause.

In our narrow usecase what we wanted to access were google's 'notes'
Once saved as an xlsx, they show up like this:

<?xml version="1.0" encoding="UTF-8"?>
<comments xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <authors>
    <author />
  </authors>
  <commentList>
    <comment ref="A1" authorId="0">
      <text>
        <t xml:space="preserve">{stc_datasource:'c3b00170-642b-11e3-949a-0800200c9a66',stc_object:'stc_application'}</t>
      </text>
    </comment>
  </commentList>
</comments>

The simplest case of them all so far.

SheetJSDev · 2014-01-18T10:00:31Z

@hmalphettes @kinwahlai Playing with gdocs a bit more:

google docs comments and notes are plaintext. The UI is a text field and the generated xml only uses t tags
each comment has only one text tag. I can't seem to generate a case where gdocs generates a comment with multiple text tag children.
Google treats conversations and comments differently (notes and comments are separated with ---- when a note is present, but do not appear in a cell with only comments):

<!-- with note -->
<t xml:space="preserve">note with comment test

----
5 6 7 8
    -Sheet JS
----
1 2 3 4
    -Dev SheetJS</t>

<!-- without note -->
<t xml:space="preserve">Foo bar

baz qux
    -Sheet JS
This is a conversation
    -Dev SheetJS</t>

As a result, you probably will need to do post-processing in your app, and I think the concatenation comment still applies for the individual t tags within a given text tag.

…ents.

hmalphettes · 2014-01-18T13:49:48Z

OK @SheetJSDev thanks again for looking into more spreadsheets. I am convinced.

I have pulled up the code that parses the rich text for sst and reused it to parse the comments.
It does the concatenation and passes the raw values along.
That would work fine for our use case.

Here is what the comment of the test xlsx looks like now:

{ a: 'Yegor Kozlov',
  t: 'Yegor Kozlov:\r\nfirst cell',
  raw: '<r><rPr><b/><sz val="8"/><color indexed="81"/><rFont val="Tahoma"/></rPr><t>Yegor Kozlov:</t></r><r><rPr><sz val="8"/><color indexed="81"/><rFont val="Tahoma"/></rPr><t xml:space="preserve">\r\nfirst cell</t></r>',
  r: '<span style="font-weight: bold;">Yegor Kozlov:</span><span style=""><br/>first cell</span>' }

Let us know!

… its comments

Support for parsing Comments

SheetJSDev · 2014-01-22T04:09:08Z

@hmalphettes @kinwahlai Thanks for the patches! Now for the difficult part (figuring out how to do the same thing for https://github.com/SheetJS/js-xls)

hmalphettes · 2014-01-22T05:55:45Z

Oh boy xls! Sorry we are putting this new feature on your plate.
Cheers!
Hugues

Add parsing comment codes into bits/70_xlsx.js

80a843c

Update test xlsx file to apachepoi_SimpleWithComments.xlsx

Comments maybe contain an empty text tag

36f7080

Share the code for parsing the rich text and use it to parse the comm…

5d43dff

…ents.

Update the range of the sheet when creating a cell for the content of…

9efa4e2

… its comments

SheetJSDev added a commit that referenced this pull request Jan 22, 2014

Merge pull request #43 from nerds-odd-e/comment_support

e8d14dd

Support for parsing Comments

SheetJSDev merged commit e8d14dd into SheetJS:master Jan 22, 2014

SheetJS locked and limited conversation to collaborators Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for parsing Comments #43

Support for parsing Comments #43

kinwahlai commented Jan 15, 2014

SheetJSDev commented Jan 15, 2014

kinwahlai commented Jan 16, 2014

SheetJSDev commented Jan 16, 2014

SheetJSDev commented Jan 16, 2014

kinwahlai commented Jan 17, 2014

SheetJSDev commented Jan 17, 2014

hmalphettes commented Jan 18, 2014

SheetJSDev commented Jan 18, 2014

hmalphettes commented Jan 18, 2014

SheetJSDev commented Jan 18, 2014

hmalphettes commented Jan 18, 2014

SheetJSDev commented Jan 22, 2014

hmalphettes commented Jan 22, 2014

Support for parsing Comments #43

Support for parsing Comments #43

Conversation

kinwahlai commented Jan 15, 2014

SheetJSDev commented Jan 15, 2014

kinwahlai commented Jan 16, 2014

SheetJSDev commented Jan 16, 2014

SheetJSDev commented Jan 16, 2014

kinwahlai commented Jan 17, 2014

SheetJSDev commented Jan 17, 2014

hmalphettes commented Jan 18, 2014

SheetJSDev commented Jan 18, 2014

hmalphettes commented Jan 18, 2014

SheetJSDev commented Jan 18, 2014

hmalphettes commented Jan 18, 2014

SheetJSDev commented Jan 22, 2014

hmalphettes commented Jan 22, 2014