-
-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing Style information into Excel #128
Comments
Huge thanks to @SheetJS for guidance on how to do all of this! Experimental, guarded by the cellStyles option (default: false)
@nukulb @elad @hubba After exploring this a bit more, I have a better sense for the roadmap. This is going to be a much larger discussion, but one that we've been putting off for far too long: Even though we all know this, it should be reiterated: our ultimate goal is to devise a universal representation (referred to as the "Common Spreadsheet Format" in the READMEs) that works for all spreadsheet formats, not just XLSX. If you limit yourself to XLSX support, it's really easy to just persist everything, but there are really 4 different systems you must consider:
Currently the code is spread across two libraries: js-xls and js-xlsx (which were developed separately due to licensing concerns that were eventually resolved) The last N times this conversation came up, the discussion fizzled because the style information is highly linked and requires careful manipulation to preserve integrity across formats. Pretty much every imaginable feature is persisted in at least 4 different ways (XLSX, XLSB, XLS, XLML). All formats use references to minimize data size. What is the best way to expose these features to the end users? APIsOriginally, SheetJS was developed to solve a simple problem: "Can we reliably extract the data from an Excel file?" It's a relatively straightforward problem, and none of the hidden style complexities were relevant. Since that point, the scope expanded quite a bit as it became apparent that we could do much more. The underlying approach allowed for all kinds of cool developments (since we had a unified format, we could build one xlsx writer that "just worked" with XLS and other inputs). But now the downsides are becoming clear. Should we have an API? If so, what does it look like? If we continue with the direct object manipulation approach, there are a few ways to go:
Which is preferable? @nathanathan @Amoki @mchapman @corsaronero @artemryzhov @nvcken @m1sta @KingJT @johnyesberg @ulknight @mgcrea @bolemeus @diginfo @sivyr @bmavity @djMax @christocracy @gcoonrod @clayzermk1 @jokerslab : Since you were involved in a discussion related to style, I'd like to hear your opinions on the matter (kinda sad to see so many names on this list). To focus the discussion: A) should we build an easy-to-manipulate structure or introduce a series of functions to manipulate the object? B) Should we be working at the attribute level (e.g. set_cell_bold or cell.bold) or use a bitfield (set_cell_style or cell.style)? |
has anyone who know Aspose.Cells? I think we can reference from it about api structure style |
In the long run I like the easy-to-manipulate structure approach. Create a very flexible abstract representation of a sheet and have readers/writers contain an approach which maps file formats to this abstract representation. When it comes to the abstract representation I definitley prefer cell.style. Seems more easily serializable. Everyone loves to be able to JSON.stringify. From a practical perspective I'd most like to be able to take a template based approach. Maintaining a separate template for xlsx, xls, xlsb, and csv isn't a huge issue from a dev perspective. Having a simple consistent api to read, clone, and subtly modify worksheet, range, row, column, and cell objects within a loaded template, regardless of format, is what I was really hoping to see. This means functions that in the short term I'd prefer to see effort put into modifying existing objects. |
I pretty much agree with @m1sta:
I also like the template approach, but I'm a little concerned about binary change when reading and writing the same file. Excel stores some style data as themes and references internal indices of colors and "taint," and I'm not entirely sure other (non-XLS) formats do the same. If we don't care about binary change - that is, we don't mind that the original file says "the color is aqua with a 40% taint" and the written file says "the color is some RGB value" - then I see no issue with going the template route. |
What if all of the template related data was placed in an 'extended' property on each key item?
... or something similar, such that the original "the color is aqua with a 40% taint" data is re-used if the extended data exists for the file format being written? |
Just to make sure I understand, do you mean keep the portable (for lack of better word) representation in |
Yep. |
Sounds good to me, especially if we add a toggle to disable it and rely strictly on the portable representation since I recall @SheetJSDev had concerns about introducing cell-level properties due to object size issues (but I might be remembering it wrong). |
A 'write priority' key might be handy so that you can dictate whether to use props or extended (if available), and some kind of 'extended data index' to minimise object verbosity might be good too. |
I agree. |
Another interesting question for me is whether to separate format information (and other properties) from value information. This might allow for a more efficient data structure. Store the formatting for the column once, indicate which cells it relates to, store the values as a dense array. Might need an additional, optional, 'compress' step? That'd mean something more akin to...
... in memory, with some smarter support functions. |
I think this is or similar to what @SheetJSDev referred to as the C-like "pointer" approach. I'm not yet sure I prefer it over data duplication, which is a lot more natural pick for me if we go the easy-to-manipulate structure route. It might be though that I don't understand what you mean by a "compress" step. If we go the separate value/style storage way, what would adding a cell with style look like? what would modifying a cell's style look like? |
Regarding the size issue: In the web browser, you can either do all of the heavy lifting in the main execution thread or with a Web Worker (http://dev.w3.org/html5/workers/). Objects cannot be shared between workers and the main thread, so the worker stringifies the intermediate object and the main thread parses it:
Reducing the stringified object size allows web workers to handle larger files. Regarding values/metadata: As discussed in #126, it's not always possible to map between Excel and JS data types easily. For rich text formatting see #74 -- we still need to find an acceptable form. @elad FYI The theme tint does not exist in the XLS format |
As I see it, API definitely would be easier to document than the JS object representation |
In #75 I proposed a style object:
And you (@SheetJSDev) pointed out to XLSB's "text run" concept. Thinking out loud... Let's also introduce a text run array:
Then Without taking optimization into account just yet, this so far seems reasonable to me. So, two questions:
|
Also agree with @m1sta regarding the API. For cell.style, instead of inventing a new object, we should probably directly go for the official CSS spec, with a jQuery-like API (obviously with a strict subset support). Something like:
For the future, I'd love to be able to easily export the worksheet structure to HTML/PDF along XLS*. I can imagine having some pre-defined (a bit like markdown styles) for our worksheets (GitHub spreadsheet, etc.). |
+1 for CSS as the basis for the abstract styling model |
Agreed. +1 for CSS here as well. If you're going to abstract away the On Mon, Oct 6, 2014 at 9:52 AM, m1sta notifications@github.com wrote:
|
I agree with using CSS as the model for the API as well. I'm not sure How should we handle text runs? Styling can be applied to parts of a cell's text. For the setter extra parameters might work ( |
Keep in mind Excel has support for double underline, and AFAICT there is no direct CSS equivalent. For reference, the rich text types are enumerated in the test file https://github.com/SheetJS/test_files/blob/master/rich_text_stress.xlsx |
I opened rich_text_stress.xlsx and noticed that it doesn't show any style for outline and shadow at least in Excel 2010. I also tried saving it as a web page and Excel notified me some features aren't compatible with the format, and sure enough there was no outline, shadow, double, or accounting underline styling. By the way, Preview on Mac OS X doesn't show subscript, superscript, outline, shadow, double, or accounting underline styling. Maybe we should enumerate all of the style properties a cell can have before deciding how to represent and access them. Here's a quick table mapping some Excel styles to CSS, I will update it as necessary.
|
I made that file in Excel 2011 and its possible that the windows versions do not support the missing forms. Oddly, iOS numbers actually renders them: http://i.imgur.com/YsCJNgb.jpg IIRC excel actually uses a CSS attribute like text-decoration for the features not supported in the browser (with nonstandard values). We could just replicate that As far as text runs are concerned, we could mirror the VBA interface as I noted in #75 (comment)
(Requesting a set of characters would return an object whose getters and setters work with the original text) |
For the few styles that would not directly match the css spec we could use custom vendor prefixes:
Like what we have to use today for flexbox:
|
I really don't like the idea of using custom vendor prefixes for internal style representation. :/ The goal as I understand it is to create a common format, that is, an internal format that can be externalized to XLSX, HTML, PDF, etc. If CSS isn't a (near-)perfect answer to the question of internal representation, then we're abusing it. By introducing vendor-specific prefixes into the internal representation I would argue we go against the "common format" concept. Consider the object pollution you might get. How many vendors are there? Do we want five or six lines just to be able to add accounting underline? What if there are no equivalents for one or more vendors? It also doesn't solve the problem of an actual API. Instead of
So I'm again not entirely convinced CSS is what we want here for either API or internal representation. |
@elad, vendor prefixes would only be used for xls(*) styles not easily matched by pure css props. I don't see where we would end up with multiple lines, there would only be one specific custom vendor Anyway, that was just a quick idea, so it might not be the best thing to do. |
I see what you mean about multiple lines - if the internal representation agrees on a single vendor prefix then sure, we use that. A vendor-specific style ("accounting underline") is mapped to a vendor-specific CSS property ("-ms-xlsx-accounting-underline") invented to support it. Still, I don't think using a CSS vendor prefix for internal common representation is a good design choice. I think the VBA API mentioned by @SheetJSDev is much cleaner and easier to use. Doesn't it also make more sense to mimic an API that already exists to work with Excel style properties rather than jQuery? |
@elad being written in JavaScript, I'd say that this project's main audience is clearly web developers, that for the most part have some jQuery-like experience. VBA-fluent developers are getting very rare (at least in my Web/NodeJS area). |
That's a fair point, but I don't think jQuery's popularity should influence all APIs designed for the web and/or node.js. jQuery is a moving target and there's a constant back-and-forth of ideas and concepts, for example Ajax promises. In any case, considering CSS doesn't offer a direct mapping to cell/text styles, requires vendor-specific properties and values, and in my opinion looks a lot clunkier than an existing API that does the same, I'm not convinced it's a good choice here for either internal representation or function interface. Now letting others chime in and @SheetJSDev can decide. :) |
@mgcrea There are strategic reasons unrelated to the audience. Writing code to solve individual features is relatively straightforward. The hard part is finding real-world files and strange corner cases. The neat thing about a JavaScript and HTML5 solution is that pretty much anyone can try it on their files (there are no security issues because the files and data are never sent to a server, and there are no installation issues since no external plugins are required). If we started this in C or python or Java, people would either have to install something or send files to a remote server (and we would have far fewer testers). And of course, thanks to node, we can also write server processes and neat tools like the command-line "j".
There is no real culture of JS in areas like scientific computing or data analysis, so we are starting from a blank slate. Since future projects may turn to our example, it's better to discuss now. |
@SheetJSDev +1 |
+1 |
2 similar comments
+1 |
+1 |
No news and no feedback, so, here, we are moins to xls-populate, an incredible Project WITH styles support :) |
@lizjulien, with |
+1 for merging. |
i will try |
What do you think about exceljs? |
Where does this stand today as far as merging xlsx-style into this main project? |
@thearabbit interesting |
Hello, @pietersv the idea is to use "https://github.com/protobi/js-xlsx" now to use the styles? |
https://github.com/protobi/js-xlsx this one is updated 3 years ago .. ANy update on style in js-xlsx ? |
+1 |
1 similar comment
+1 |
+1. The support for style setting is really important |
This is really a deal breaker for this amazing library.. 😭 |
+1 |
Is it possible to provide this for xlsx at least? A partial solution is better than nothing. @SheetJSDev |
+1 |
2 similar comments
+1 |
+1 |
I do not think that they include this feature any soon, because styles are part of pro edition. |
+1 |
We offer this in the Pro compendium. Since companies have paid for the features already, it would be unfair to them if we turned around and made it available as open source. We have a longer comment in a gist. |
It doesn't support LibreOffice/LibreCalc ODS files, or older Excel Neither does xlsx-populate. |
@SheetJSDev , I dont follow this statement, " Since companies have paid for the features already, it would be unfair to them if we turned around and made it available as open source." How come? |
@rsgilbert Because they paid money for those features, but now they can get them for free. Should they get a refund? The argument is a bit similar to this one: "In the past, immigrants into the country had to go through this complicated process and pay this high fee. Now if we make immigration easier and everyone could just come in, it would be unfair to those early immigrants." One response is that the companies that paid money, got early access to those features, while the rest of us had to wait 4+ years until they got open sourced. Time is money, and the companies did get an advantage by paying for the features. This argument is similar to how early adopters always pay a much higher price (e.g. $10,000 for a cell phone the size of a brick in the early '80s) than laggards. |
I am trying to preserve the style information while I read and then write an excel File.
Seems like the style information is read when I read the file, but unfortunately when I write the file its not preserved.
Thoughts on how I can read and write the style information?
@SheetJS After a conversation with @elad (hubba@5e9bca7#commitcomment-8039333) I was informed that the styling information is only read, and not written back to excel.
The text was updated successfully, but these errors were encountered: