New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Template Filling, Styles & Page Setups #75
Comments
js-xlsx currently rebuilds styles and other metadata (unfortunately, XLSB and XLS and XML2003 have different ways of storing the style information, so the only uniform way to handle this is to parse and write out the data). The style metadata is woefully incomplete (contributed by @elad, who specifically needed certain cell background information) and should improve over time. The biggest challenge is finding a way that works across the various formats. If you know the input templates will be XLSX, it's pretty simple to hack the parser and writer to use the original stylesheet:
(after making the changes, run The reason this hack and the bold workaround aren't in the main code is because I don't think it's the right approach going forward. Try this and let me know how it goes |
Just to simplify a bit, because I had to dive into this code without knowing it myself :) What js-xlsx does is transform a file to an internal representation, and all operations are done on that internal representation. When you want to save the changes, it needs to export the data back. This means that js-xlsx doesn't "go to cell A1 and change value and saves," but rather parses the entire file, modifies the data, and rewrites the entire file. Files have several "subfiles" that store the various sheets, styles, and so on. Some of those aren't parsed, and others are parsed only partially. When I added minimal style parsing, for example, the styles weren't being parsed at all. Because I only needed a specific subset, styles are partially parsed just to give the background color information. What's becoming obvious though is that there has to be some format to hold this data that properly reflects what Excel is capable of. Otherwise we'll just pollute the data structures and they'll become a maintenance headache - if I understand what @SheetJSDev is getting at. (This is partly my fault, because I lack the time to give this proper thought, but I have a feeling that what we'll end up doing is populate the |
Thanks for helping, I modified the 3 files and recompiled, this resulted in a partial success. Cells with data in them are formatted, but formatting (such as background shading) is removed from cells without data (text) in them. Merged cells were not merged any more. The print margins etc did not get copied across. Maybe I am expecting too much, but this would be a really fabulous solution if I was able to do all of my formatting in the original file and then just update the cells with data. I just tried the python xlrd/xlwt library and that did not copy any styles either. |
@elad I didn't mean to suggest you were sloppy or cut corners! I was contextualizing where things stand and giving credit where it is due. You had a need for extracting the cell background and scratched the itch :) @diginfo to be clear, though, you can literally modify the parsed cell structure by changing the cell's
Besides the maintenance headache, a uniform representation lets you convert between formats seamlessly. There are some cool tests that read XLS and write XLSX ...
Set
They are being parsed but that data is not being written. That's a pretty straightforward fix in the writer. The
These are worksheet-level properties, which are not parsed at the moment.
You aren't expecting too much :) The challenge is finding the best approach that will extend to the other formats, and it's better to spend a few minutes or hours or days thinking before adding something that we'll regret later. As the old adage goes:
|
Thanks for the explanation, I understand the need to come up with a solution that can handle all scenarios & stylesheets. Personally, my only requirement is for a excel-based reporting tool and probably the only thing I will ever do is to create a template and style & format it so that data can be later added to the relevant cells. This will allow me to easily create pretty reports that can be either downloaded or rendered to pdf. I understand your best approach, and that this is not something that may happen imminently, but Is this something I can further hack in the next few days to achieve what I need before the official method and enhancement is released ? PS: I have discovered that the 'formatting_info=True' option in xlrd for python does not appear to work for xlsx files, so it looks like that is no longer an option. |
Fortunately, this shouldn't take more than an hour :) The missing features that you identified are worksheet-level properties, which should be found in the individual sheetN.xml files within the xlsx. If you have access to vim, you can open the file using vim and it will show you a zip.vim view. Alternatively, rename the xlsx file to .zip and unzip. Let's take https://github.com/SheetJS/test_files/blob/master/merge_cells.xlsx?raw=true as an example.
The page margins is in the worksheet xml, so let's take a peek there: It's easier to understand if we clean it up a bit. In vim, the mantra is The relevant block is at the bottom:
We want to take this raw text and add it to the output. To do that, the steps are straightforward:
For example:
Hopefully this is helpful and hopefully you finish within an hour :) I'd like to think the code organization is sufficiently straightforward that hacking should not require too much trickery. |
Thanks, but where do I add this:
|
@diginfo near the end of |
OK, I think I have figured it out:
|
Unfortunately, this killed the output file which is corrupted and cannot be read:
|
This is somewhat informative (certainly better than the cannot parse errors in XLSB with no indication). It's possible that pageSetup or pageMargins has children. Can you put the xml somewhere (e.g. http://hastebin.com/) so I can take a look? If it is confidential, send me an email |
OK, but which XML ?- do you mean the output xlsx file or the sheet1.xml file that was from the source ? |
XLSX is fine, but the error is reported to be in /xl/worksheets/sheet1.xml (so that should be sufficient) |
This is the file extracted from the source xlsx :http://hastebin.com/fexawijizu.xml |
I wanted to see the file being generated by js-xlsx -- the load error |
OK, this is the output: http://hastebin.com/umuwadukeb.xml |
This is the output without the additional pageMargins & pageSetup which loads successfully: http://hastebin.com/riqaxuwaqe.xml |
Hi; Thought I would send you the source file. Kind Regards PeterC On 28/6/14 8:46 am, SheetJSDev wrote:
|
Would it not be possible to clone the original file, then change the worksheet values and save the file ? I dont need to change or edit anything else |
It's not that simple because the you have to rebuild the shared string table. I'm surprised that js-xlsx generated that xml because it always generates with Can you share the broken xlsx that was generated? |
Attached are the source and output files, not sure if you can receive PeterC On 28/6/14 10:34 am, SheetJSDev wrote:
|
Github doesn't support attachments. Send to sheetjs@gmail.com Sent from my iPad
|
Just to plug this issue, I have need of this very same feature in the near future and it would be nice to know when this is implemented and/or if there is now an established workaround for this issue as a result of @diginfo's correspondence with you. In my case, having this feature is non-essential, but I'm sure it would please my clients to be getting their Excel reports output from the server with the same appearance that they're used to in the past. Thanks for investing the time in helping diginfo through his hack. I may be in a similar position in the future. |
@sivyr This is still in the pipeline. No one has forgotten :) The current hangup actually has nothing to do with XLSX. There is a parallel project for XLS and other pre-2007 formats: https://github.com/SheetJS/js-xls . Right now, you can do some really cool things like read XLS files and generate XLSX files. In order to keep the representations aligned, we have to make sure that a unified style representation can be generated from the different file types. I haven't settled on that format yet. I think @elad said it best:
|
I was talking to friend today and they brought up the idea of looking at how Excel macros work to access this type of data. For example, take a look at http://msdn.microsoft.com/en-us/library/cc296089(v=office.12).aspx for colors. Maybe we should use the same variables and structure exposed for macro development? I was also thinking we could simplify by restricting styling to an entire cell. (I'm not sure but Excel macros might be doing that already.) |
VBA macros actually lets you set styling on a character-by-character basis. For example:
If we wanted to make a function library, this may look like:
Any thoughts for what this function interface should look like? |
You're right, per-character styling is possible. The only thing holding me back is real world use cases. If I had to guess, I would bet entire-cell styling is much more common than per-character styling. You'd see bold headlines, colored rows, and so on. Unlikely that styling on a per-character basis would have any meaning other than simple styling. Considering Excel is used to carry data, not text, the meaning is much more important to me than style, at least in the sense of how easy it is to extract programmatically. That said, you might want to have SheetJS support everything Excel can - this would make it a much stronger tool. Focusing on cell contents (that is, ignoring GetSheet and GetRange), I think something that maps style to character range is sensible. So a style definition would look something like:
The way this would be mapped to character ranges by way of beginning index. There can be no overlaps I imagine, so there would be an offset-to-style map. If the entire cell has a single style, there would be only one mapping, from 0. I don't know if I'd break the style to multiple functions... I'd probably treat it the same way as jQuery treats CSS - simple get/set. Makes sense...? |
If I understood correctly, what you are describing is similar to text run concept that XLSB uses: http://msdn.microsoft.com/en-us/library/dd952238.aspx. Each "run" is defined implicitly by specifying the start index and the style info. Fortunately, translating from the XLSX rich text format to XLSB style is easy. The hard part, of course, is going to/from HTML (in particular, converting potentially messy HTML back to the rich text)
That is the goal :) It's clear that completely different groups of people designed the various formats (as an example, some formats use "Creator" to refer to the author while others use "Author"), so part of the challenge is trying to make sense of inconsistent standards and approaches. |
Have the line numbers changed for the above workaround? It may be the coffee not kicking in yet, but I am having trouble finding a function that has both a p variable and cell variable in the 67_wsxml.js file. I fully understand the desire to avoid hacks and have a common format, but maybe you could have a temporary hack branch with all these one liners in place that people could clone and use while you work it out. I can send a pull request if you walk me through the setup today. |
@bmavity the line numbers and variable names changed. The new location was https://github.com/SheetJS/js-xlsx/blob/master/bits/67_wsxml.js#L231-L232 and the new name for the |
Great, thanks. I'll see what I can do. :) |
Is there any new development for keeping raw template styles once file is written? |
We offer this in the Pro compendium. Since companies have paid for the features already, it would be unfair to them if we turned around and made it available as open source. We have a longer comment in a gist. |
I am trying to replace an implementation of PHPExcel with js-xlsx under node js as I am no longer using php server-side.
I have a series of excel templates that are pre-formatted with colours and styling, and also page size, margins, print areas etc etc, and then I simply add text to the relevant cells, under PHPexcel it works very well.
I discovered js-xlsx a few hours ago and thought I had found the solution, but it appears that I am unable to duplicate the original file, only with the cell values changed, whilst maintaining everything else in the original file.
I have used the simple code above to read in a formatted file and then save it again, but all of the styling is lost.
Is it possible to achieve this kind of functionality with js-xlsx and if so what am I doing wrong ?
The text was updated successfully, but these errors were encountered: