-
-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for parsing Comments #43
Conversation
The test failed because I moved the POI test cases to a separate file. I pushed changes to the test_files repo and this repo's tests.lst -- can you pull the changes, |
Comments parts listed in the [Content Types] are parsed. Sheets's relationships are parsed. Comments parts are correlated to their corresponding sheets parts. Comments's contents are added to the ref'ed cells. Rich text styling properties are currently ignored. For example: { "!ref": "A1:B3", "A1": { "v": 1, "t": "n" }, "B1": { "v": "one", "t": "s", "r": "one", "c": [ { "a": "Yegor Kozlov", "t": [ "Yegor Kozlov:", "\r\nfirst cell" ] } ] } }
OK thanks for your attention. We rebased our branch on the top of your latest master. |
@kinwahlai I can reconcile that later. According to https://travis-ci.org/SheetJS/js-xlsx/jobs/17037467, Travis is complaining because the file
|
Update test xlsx file to apachepoi_SimpleWithComments.xlsx
Hmm @KinWah @kinwahlai I thought I made a comment yesterday but that seemed to have disappeared ... Can you take a look at https://github.com/SheetJS/test_files/blob/master/comments_stress_test.xlsx ? In the first sheet, there is a rich-text comment and the raw text is broken up. It may make more sense to concatenate rather than push the raw comments. Also, do you need the rich text as well? There is some basic HTML conversion (search for 18.4.7 in the code) which really should be moved out of the shared string table function
|
Hi @SheetJSDev; well in our usecase we need to know the internal structure of the comments. And we are not interested by the styling as html. In fact we have the same requirement with number formats: we need to know about the number-formats as defined in the spreadsheet and we are not interested in the formatting of the raw values for them. For the sake of consistency with the rest of the code we can certainly help with doing to the comments the same type of things than what happens for the other rich texts. What do you think? |
@kinwahlai my point was according to the spec, the text tags don't have author information. It's at the outer comment tag level (see my annotations below):
Hence I would think it makes sense to combine the t tags. I have not played with google docs much, so I don't know how they do it (and my comment may not make sense for gdocs). Can you share an example file with comments from multiple users? Also, Excel generates an empty tag for comments with no text. See the comments_stress_test.xlsx file (https://github.com/SheetJS/test_files/blob/master/comments_stress_test.xlsx):
You need to guard for the possibility that |
Hi @SheetJSDev, @kinwahlai and I are pairing on this code. Thanks for noting the bug regarding the potentially null text element that would crash the parser. OK to concatenate the comments as a first step and to pull out the code that applies the comments styles as html styles. It does not really match our usecase and I think it would make our life harder if we were trying to generate xlsx files from the generated model. We will add a few more commits to that branch for your kind review then. |
@kinwahlai @hmalphettes Keep in mind that the goal here is to find a solution that makes "sense". There is no standard solution here, which is why discussion is good :) If you use the Excel UI "Insert Comment" option, it inserts the author name followed by a colon (in bold). However, since that is actual content in the comment, you can change it or remove it (and that's how you generate the empty comment in the sample sheet).
Google docs does some wild stuff with the comments. A friend and I tried to make individual comments as well as comment "conversation", and apparently gdocs is not writing author information using the author tag (authors are identified using a dash followed by the name):
complete comments.xml from exported XLSX:
|
@SheetJSDev thanks for trying all these examples. In our narrow usecase what we wanted to access were google's 'notes'
The simplest case of them all so far. |
@hmalphettes @kinwahlai Playing with gdocs a bit more:
|
OK @SheetJSDev thanks again for looking into more spreadsheets. I am convinced. I have pulled up the code that parses the rich text for sst and reused it to parse the comments. Here is what the comment of the test xlsx looks like now:
Let us know! |
Support for parsing Comments
@hmalphettes @kinwahlai Thanks for the patches! Now for the difficult part (figuring out how to do the same thing for https://github.com/SheetJS/js-xls) |
Oh boy xls! Sorry we are putting this new feature on your plate. |
Thanks for a great library!
Here is my best attempt at contributing support to parse comments.
I used an existing file already in the project for the regressions tests.
I realize I coded the changes directly in the built xlsx.
Let me know if you want me to change something.
Thanks!
Comments parts listed in the [Content Types] are parsed.
Sheets's relationships are parsed.
Comments parts are correlated to their corresponding sheets parts.
Comments's contents are added to the ref'ed cells.
Rich text styling properties are currently ignored.
For example: