Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing Style information into Excel #128

Closed
nukulb opened this issue Oct 5, 2014 · 169 comments
Closed

Writing Style information into Excel #128

nukulb opened this issue Oct 5, 2014 · 169 comments
Labels

Comments

@nukulb
Copy link

nukulb commented Oct 5, 2014

I am trying to preserve the style information while I read and then write an excel File.
Seems like the style information is read when I read the file, but unfortunately when I write the file its not preserved.

Thoughts on how I can read and write the style information?

            excelFile = XLSX.readFile(file.path, {
                cellStyles: true
            });

            // some processing here!

            XLSX.writeFile(excelFile, fileName, {
                cellStyles: true
            });

@SheetJS After a conversation with @elad (hubba@5e9bca7#commitcomment-8039333) I was informed that the styling information is only read, and not written back to excel.

nukulb referenced this issue in hubba/js-xlsx Oct 5, 2014
Huge thanks to @SheetJS for guidance on how to do all of this!

Experimental, guarded by the cellStyles option (default: false)
@SheetJSDev
Copy link
Contributor

@nukulb @elad @hubba After exploring this a bit more, I have a better sense for the roadmap. This is going to be a much larger discussion, but one that we've been putting off for far too long:

Even though we all know this, it should be reiterated: our ultimate goal is to devise a universal representation (referred to as the "Common Spreadsheet Format" in the READMEs) that works for all spreadsheet formats, not just XLSX. If you limit yourself to XLSX support, it's really easy to just persist everything, but there are really 4 different systems you must consider:

  • XLSX / XLSM (XML-based style representation): ECMA-376 Part 1 Section 18.8
  • XLSB (binary representation): [MS-XLSB] 2.1.7.50 Styles
  • XLS (binary): Follow the yellow brick road, starting from [MS-XLS] 2.4.353 XF
  • SpreadsheetML 2003 (XML): The style metadata are stored in the sub-document from the Styles elements. Since there is no OSP-blessed spec, files from the test suite should be used to understand the scope

Currently the code is spread across two libraries: js-xls and js-xlsx (which were developed separately due to licensing concerns that were eventually resolved)

The last N times this conversation came up, the discussion fizzled because the style information is highly linked and requires careful manipulation to preserve integrity across formats.

Pretty much every imaginable feature is persisted in at least 4 different ways (XLSX, XLSB, XLS, XLML). All formats use references to minimize data size. What is the best way to expose these features to the end users?

APIs

Originally, SheetJS was developed to solve a simple problem: "Can we reliably extract the data from an Excel file?" It's a relatively straightforward problem, and none of the hidden style complexities were relevant.

Since that point, the scope expanded quite a bit as it became apparent that we could do much more. The underlying approach allowed for all kinds of cool developments (since we had a unified format, we could build one xlsx writer that "just worked" with XLS and other inputs). But now the downsides are becoming clear.

Should we have an API? If so, what does it look like?

If we continue with the direct object manipulation approach, there are a few ways to go:

  1. C-like "pointer" manipulation: storing numbers to be used as references in other structures.

  2. Data duplication: store styles in many places, push the complexity to the writing functions

Which is preferable?

@nathanathan @Amoki @mchapman @corsaronero @artemryzhov @nvcken @m1sta @KingJT @johnyesberg @ulknight @mgcrea @bolemeus @diginfo @sivyr @bmavity @djMax @christocracy @gcoonrod @clayzermk1 @jokerslab : Since you were involved in a discussion related to style, I'd like to hear your opinions on the matter (kinda sad to see so many names on this list). To focus the discussion:

A) should we build an easy-to-manipulate structure or introduce a series of functions to manipulate the object?

B) Should we be working at the attribute level (e.g. set_cell_bold or cell.bold) or use a bitfield (set_cell_style or cell.style)?

@nvcken
Copy link

nvcken commented Oct 6, 2014

has anyone who know Aspose.Cells? I think we can reference from it about api structure style
Just a idea

@m1sta
Copy link

m1sta commented Oct 6, 2014

In the long run I like the easy-to-manipulate structure approach. Create a very flexible abstract representation of a sheet and have readers/writers contain an approach which maps file formats to this abstract representation. When it comes to the abstract representation I definitley prefer cell.style. Seems more easily serializable. Everyone loves to be able to JSON.stringify.

From a practical perspective I'd most like to be able to take a template based approach. Maintaining a separate template for xlsx, xls, xlsb, and csv isn't a huge issue from a dev perspective. Having a simple consistent api to read, clone, and subtly modify worksheet, range, row, column, and cell objects within a loaded template, regardless of format, is what I was really hoping to see. This means functions that in the short term I'd prefer to see effort put into modifying existing objects.

@elad
Copy link

elad commented Oct 6, 2014

I pretty much agree with @m1sta:

  • I prefer an easy-to-manipulate structure because I don't see any advantages in the opacity offered by functions; we still have to figure out a representation
  • I prefer cell.style because it isolates style-related properties making it easy to attach them to things (like cells or character ranges)

I also like the template approach, but I'm a little concerned about binary change when reading and writing the same file. Excel stores some style data as themes and references internal indices of colors and "taint," and I'm not entirely sure other (non-XLS) formats do the same. If we don't care about binary change - that is, we don't mind that the original file says "the color is aqua with a 40% taint" and the written file says "the color is some RGB value" - then I see no issue with going the template route.

@m1sta
Copy link

m1sta commented Oct 6, 2014

What if all of the template related data was placed in an 'extended' property on each key item?

workbook: {props: {}, references: {}, extended: {}, 
   sheets: [0: {props: {}, ranges: {}, extended: {}, 
      rows: [0: {props: {}, extended: {}, 
         cells: [0: {value: 123, formula: null, 
            props: {style: {foreground: [{color: "#0055dd"}]}}, 
            extended: {xlsx: "the color is aqua with a 40% taint", xlsb: {}}

... or something similar, such that the original "the color is aqua with a 40% taint" data is re-used if the extended data exists for the file format being written?

@elad
Copy link

elad commented Oct 6, 2014

Just to make sure I understand, do you mean keep the portable (for lack of better word) representation in props and retain original format-specific representation in extended so that when writing (back?) to that format, the original values can be used?

@m1sta
Copy link

m1sta commented Oct 6, 2014

Yep.

@elad
Copy link

elad commented Oct 6, 2014

Sounds good to me, especially if we add a toggle to disable it and rely strictly on the portable representation since I recall @SheetJSDev had concerns about introducing cell-level properties due to object size issues (but I might be remembering it wrong).

@m1sta
Copy link

m1sta commented Oct 6, 2014

A 'write priority' key might be handy so that you can dictate whether to use props or extended (if available), and some kind of 'extended data index' to minimise object verbosity might be good too.

@elad
Copy link

elad commented Oct 6, 2014

I agree.

@m1sta
Copy link

m1sta commented Oct 6, 2014

Another interesting question for me is whether to separate format information (and other properties) from value information. This might allow for a more efficient data structure.

Store the formatting for the column once, indicate which cells it relates to, store the values as a dense array. Might need an additional, optional, 'compress' step?

That'd mean something more akin to...

values: [["First header", "Second header"], [123, 456]]
meta: {generic:{}, xlsx:{}, xlsb:{}}

... in memory, with some smarter support functions.

@elad
Copy link

elad commented Oct 6, 2014

I think this is or similar to what @SheetJSDev referred to as the C-like "pointer" approach.

I'm not yet sure I prefer it over data duplication, which is a lot more natural pick for me if we go the easy-to-manipulate structure route. It might be though that I don't understand what you mean by a "compress" step. If we go the separate value/style storage way, what would adding a cell with style look like? what would modifying a cell's style look like?

@SheetJSDev
Copy link
Contributor

Regarding the size issue: In the web browser, you can either do all of the heavy lifting in the main execution thread or with a Web Worker (http://dev.w3.org/html5/workers/). Objects cannot be shared between workers and the main thread, so the worker stringifies the intermediate object and the main thread parses it:

Reducing the stringified object size allows web workers to handle larger files.

Regarding values/metadata: As discussed in #126, it's not always possible to map between Excel and JS data types easily. For rich text formatting see #74 -- we still need to find an acceptable form.

@elad FYI The theme tint does not exist in the XLS format

@KingJT
Copy link

KingJT commented Oct 6, 2014

As I see it, API definitely would be easier to document than the JS object representation

@elad
Copy link

elad commented Oct 6, 2014

In #75 I proposed a style object:

{
    bold: true,
    font: 'Arial',
    size: 16,
    fg_color: '#000000',
    bg_color: '#ffffff',
    ...
}

And you (@SheetJSDev) pointed out to XLSB's "text run" concept.

Thinking out loud...

Let's also introduce a text run array:

[
    { f: from_offset, t: to_offset, s: style_object },
    ...
]

Then cell.s stays the style object and cell.x will be the text run array.

Without taking optimization into account just yet, this so far seems reasonable to me.

So, two questions:

  • Do you find any issues with the above?
  • Where does HTML enter the picture? (you mentioned it'll be the hard part)

@mgcrea
Copy link

mgcrea commented Oct 6, 2014

Also agree with @m1sta regarding the API.

For cell.style, instead of inventing a new object, we should probably directly go for the official CSS spec, with a jQuery-like API (obviously with a strict subset support).

Something like:

myCell.css('background-color', 'red').css('color', 'black').css('font-size', '12px');
myCellB.style = myCell.style;
myCellC.css('color', myCell.css('color'));

For the future, I'd love to be able to easily export the worksheet structure to HTML/PDF along XLS*. I can imagine having some pre-defined (a bit like markdown styles) for our worksheets (GitHub spreadsheet, etc.).

@m1sta
Copy link

m1sta commented Oct 6, 2014

+1 for CSS as the basis for the abstract styling model

@DigitalMachinist
Copy link

Agreed. +1 for CSS here as well. If you're going to abstract away the
details of particular file formats and go with something uniform, CSS
really should be your model for this. There's hardly anything else that
universal to draw from.

On Mon, Oct 6, 2014 at 9:52 AM, m1sta notifications@github.com wrote:

+1 for CSS as the basis for the abstract styling model


Reply to this email directly or view it on GitHub
#128 (comment).

@elad
Copy link

elad commented Oct 6, 2014

I agree with using CSS as the model for the API as well. I'm not sure .css is a good name for the function though (is it misleading?), maybe .style is a better name if we store the actual object in .s. No strong preference either way.

How should we handle text runs? Styling can be applied to parts of a cell's text. For the setter extra parameters might work (cell.css(attr, value, [start_offset], [end_offset])), but the getter expects only one parameter. Should there instead be a .css_run function? Any other ideas?

@SheetJSDev
Copy link
Contributor

Keep in mind Excel has support for double underline, and AFAICT there is no direct CSS equivalent.

For reference, the rich text types are enumerated in the test file https://github.com/SheetJS/test_files/blob/master/rich_text_stress.xlsx

@elad
Copy link

elad commented Oct 6, 2014

I opened rich_text_stress.xlsx and noticed that it doesn't show any style for outline and shadow at least in Excel 2010. I also tried saving it as a web page and Excel notified me some features aren't compatible with the format, and sure enough there was no outline, shadow, double, or accounting underline styling. By the way, Preview on Mac OS X doesn't show subscript, superscript, outline, shadow, double, or accounting underline styling.

Maybe we should enumerate all of the style properties a cell can have before deciding how to represent and access them. Here's a quick table mapping some Excel styles to CSS, I will update it as necessary.

Style CSS
Normal - / font-weight: normal;
Bold font-weight: bold;
Italic font-style: italic;
Underline text-decoration: underline; text-underline-style: single; (?)
Size font-size: <size>;
Strike text-decoration: line-through;
Subscript vertical-align: sub; font-size: smaller;
Superscript vertical-align: super; font-size: smaller;
Outline text-effect: outline; (?)
Shadow text-shadow: auto; (?)
Double underling text-decoration: underline; text-underline-style: double; (?)
Accounting underline text-decoration: underline; text-underline-style: single-accounting; (?)
Doubt accounting underline text-decoration: underline; text-underline-style: double-accounting; (?)
Background color background-color: <color>;
Foreground color color: <color>;
Pattern ?

@SheetJSDev
Copy link
Contributor

I made that file in Excel 2011 and its possible that the windows versions do not support the missing forms.

Oddly, iOS numbers actually renders them: http://i.imgur.com/YsCJNgb.jpg

IIRC excel actually uses a CSS attribute like text-decoration for the features not supported in the browser (with nonstandard values). We could just replicate that

As far as text runs are concerned, we could mirror the VBA interface as I noted in #75 (comment)

Sheets("Sheet1").Range("B3").Characters(16,9).Font.Italic = True

(Requesting a set of characters would return an object whose getters and setters work with the original text)

@mgcrea
Copy link

mgcrea commented Oct 6, 2014

For the few styles that would not directly match the css spec we could use custom vendor prefixes:

text-decoration: -ms-xlsx-accounting-underline;

Like what we have to use today for flexbox:

display: -webkit-flex;

@elad
Copy link

elad commented Oct 6, 2014

I really don't like the idea of using custom vendor prefixes for internal style representation. :/

The goal as I understand it is to create a common format, that is, an internal format that can be externalized to XLSX, HTML, PDF, etc. If CSS isn't a (near-)perfect answer to the question of internal representation, then we're abusing it. By introducing vendor-specific prefixes into the internal representation I would argue we go against the "common format" concept.

Consider the object pollution you might get. How many vendors are there? Do we want five or six lines just to be able to add accounting underline? What if there are no equivalents for one or more vendors? It also doesn't solve the problem of an actual API. Instead of cell.s.underline = true; or cell.style('underline', true); you get:

cell.css({
    'text-decoration': '-ms-xlsx-accounting-underline'
    ...
});

So I'm again not entirely convinced CSS is what we want here for either API or internal representation.

@mgcrea
Copy link

mgcrea commented Oct 6, 2014

@elad, vendor prefixes would only be used for xls(*) styles not easily matched by pure css props. I don't see where we would end up with multiple lines, there would only be one specific custom vendor -ms-xls. For the HTML export we could automatically convert theses "missing" CSS prefixed values to classes, that could be handled on the client/theme side (with CSS).

Anyway, that was just a quick idea, so it might not be the best thing to do.

@elad
Copy link

elad commented Oct 6, 2014

I see what you mean about multiple lines - if the internal representation agrees on a single vendor prefix then sure, we use that. A vendor-specific style ("accounting underline") is mapped to a vendor-specific CSS property ("-ms-xlsx-accounting-underline") invented to support it.

Still, I don't think using a CSS vendor prefix for internal common representation is a good design choice. I think the VBA API mentioned by @SheetJSDev is much cleaner and easier to use. Doesn't it also make more sense to mimic an API that already exists to work with Excel style properties rather than jQuery?

@mgcrea
Copy link

mgcrea commented Oct 6, 2014

@elad being written in JavaScript, I'd say that this project's main audience is clearly web developers, that for the most part have some jQuery-like experience. VBA-fluent developers are getting very rare (at least in my Web/NodeJS area).

@elad
Copy link

elad commented Oct 6, 2014

That's a fair point, but I don't think jQuery's popularity should influence all APIs designed for the web and/or node.js. jQuery is a moving target and there's a constant back-and-forth of ideas and concepts, for example Ajax promises.

In any case, considering CSS doesn't offer a direct mapping to cell/text styles, requires vendor-specific properties and values, and in my opinion looks a lot clunkier than an existing API that does the same, I'm not convinced it's a good choice here for either internal representation or function interface.

Now letting others chime in and @SheetJSDev can decide. :)

@SheetJSDev
Copy link
Contributor

@elad being written in JavaScript, I'd say that this project's main audience is clearly web developers

@mgcrea There are strategic reasons unrelated to the audience. Writing code to solve individual features is relatively straightforward. The hard part is finding real-world files and strange corner cases. The neat thing about a JavaScript and HTML5 solution is that pretty much anyone can try it on their files (there are no security issues because the files and data are never sent to a server, and there are no installation issues since no external plugins are required). If we started this in C or python or Java, people would either have to install something or send files to a remote server (and we would have far fewer testers). And of course, thanks to node, we can also write server processes and neat tools like the command-line "j".

I don't think jQuery's popularity should influence all APIs designed for the web and/or node.js

There is no real culture of JS in areas like scientific computing or data analysis, so we are starting from a blank slate. Since future projects may turn to our example, it's better to discuss now.

@HannaBabrouskaya
Copy link

@SheetJSDev +1

@targonsr
Copy link

targonsr commented Apr 5, 2017

+1

2 similar comments
@calbertts
Copy link

+1

@tanxiaoning007
Copy link

+1

@materazu
Copy link

materazu commented May 4, 2017

No news and no feedback, so, here, we are moins to xls-populate, an incredible Project WITH styles support :)

@thearabbit
Copy link

@lizjulien, with xls-populate we can generate excel file from existing template + style???

@chuaweijie
Copy link

+1 for merging.
Kneeling for mercy

@LennyYi
Copy link

LennyYi commented Jun 9, 2017

i will try

@thearabbit
Copy link

What do you think about exceljs?

@samuelneff
Copy link

Where does this stand today as far as merging xlsx-style into this main project?

@ingcrengifo
Copy link

@thearabbit interesting

@ingcrengifo
Copy link

ingcrengifo commented Dec 14, 2017

Hello, @pietersv the idea is to use "https://github.com/protobi/js-xlsx" now to use the styles?

@chinthu
Copy link

chinthu commented Dec 19, 2017

https://github.com/protobi/js-xlsx this one is updated 3 years ago .. ANy update on style in js-xlsx ?

@louiepiol
Copy link

+1

1 similar comment
@125037225
Copy link

+1

@Matthew1994
Copy link

+1. The support for style setting is really important

@cemremengu
Copy link

This is really a deal breaker for this amazing library.. 😭

@Paul-Vandell
Copy link

+1

@cemremengu
Copy link

cemremengu commented Mar 14, 2018

Is it possible to provide this for xlsx at least? A partial solution is better than nothing. @SheetJSDev

@rahulraghavankklm
Copy link

+1

2 similar comments
@hegoku
Copy link

hegoku commented Apr 3, 2018

+1

@danielkerwin
Copy link

+1

@imdoroshenko
Copy link

I do not think that they include this feature any soon, because styles are part of pro edition.
https://sheetjs.com/pro

@rsalunga29
Copy link

+1

@SheetJSDev
Copy link
Contributor

We offer this in the Pro compendium. Since companies have paid for the features already, it would be unfair to them if we turned around and made it available as open source. We have a longer comment in a gist.

@dandv
Copy link
Contributor

dandv commented Oct 29, 2018

What do you think about exceljs?

It doesn't support LibreOffice/LibreCalc ODS files, or older Excel .xls files.

Neither does xlsx-populate.

@rsgilbert
Copy link

@SheetJSDev , I dont follow this statement, " Since companies have paid for the features already, it would be unfair to them if we turned around and made it available as open source." How come?

@dandv
Copy link
Contributor

dandv commented Feb 15, 2023

@rsgilbert Because they paid money for those features, but now they can get them for free. Should they get a refund?

The argument is a bit similar to this one: "In the past, immigrants into the country had to go through this complicated process and pay this high fee. Now if we make immigration easier and everyone could just come in, it would be unfair to those early immigrants."

One response is that the companies that paid money, got early access to those features, while the rest of us had to wait 4+ years until they got open sourced. Time is money, and the companies did get an advantage by paying for the features. This argument is similar to how early adopters always pay a much higher price (e.g. $10,000 for a cell phone the size of a brick in the early '80s) than laggards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests