Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get actual cell values? #21

Closed
maxvol opened this issue Nov 29, 2018 · 18 comments
Closed

how to get actual cell values? #21

maxvol opened this issue Nov 29, 2018 · 18 comments
Assignees

Comments

@maxvol
Copy link

maxvol commented Nov 29, 2018

i see that cells contain some kind of indices instead of actual values (i.e. some kind of integer number instead of string)
how can i get actual value of a cell? probably there is some kind of reference table for it?

@FleetPhil
Copy link

The integer is an index into the list of strings in the ‘shared strings’ file.

@Cassers
Copy link

Cassers commented Nov 29, 2018

Where do I find the file ‘shared strings’? I could not see the real value of the cell and I do not see examples anywhere D:

Can you create an example of how it would be used?

@FleetPhil
Copy link

This document gives an overview of the XLSX file content: http://download.microsoft.com/download/3/E/3/3E3435BD-AA68-4B32-B84D-B633F0D0F90D/SpreadsheetMLBasics.ppt

@Cassers
Copy link

Cassers commented Nov 29, 2018

Thanks for your help, but the link you passed does not show me anything D:

captura de pantalla 2018-11-29 a la s 5 42 53 p m

@Cassers
Copy link

Cassers commented Nov 29, 2018

I found the document on the microsoft page, and I already understand why the values are in that document, how do I access that file?

In the parseWorksheetPaths property, only worksheets are listed.

@MaxDesiatov
Copy link
Collaborator

Hi all, as was noted here, if you're looking for string values stored in cells, those are quite frequently (but apparently not always) are stored in sharedStrings.xml. There's work in progress in PR #8 (shared-strings branch) that exposes a public type SharedStrings with parseSharedStrings() function on CoreXLSXFile. I'm not sure yet if that's the best possible API though, because it requires library users to manually match cell values to items stored in SharedStrings. On the other hand we could store a cached SharedStrings instance on CoreXLSX as soon as at least one worksheet is parsed and match those for you automatically and store results on Worksheet before you get it. I look forward to hearing your feedback on this.

@FleetPhil
Copy link

Hi Max - I think there's a wider decision for you to make: do you want this library to simply extract the data in XLSX files keeping the structure unchanged and leave it to users to do the matching and cross-referencing? Shared strings is not the only place where this is required - linking worksheets using rId is another, and formats is another. Or do you want to do these joins in the library and cache 'higher level' objects as you are suggesting above.
In my opinion you need to go one way or the other - if you keep it to the XLSX structure it's simple for you, conceptually clean and will give users flexibility on how they use the data, however they will need to understand the underlying structure and do whatever work is necessary (like shared strings) for their application. If you go the other way and start creating higher level objects you need to think how far you want to go - a clean logical object structure for Excel would have objects like Workbook, Worksheet, Column, Row, Cell, Format, Style etc that contain logical level attributes that don't all map directly to the underlying XLSX structure. If you go this way the library will be MUCH bigger and need a lot of thought on the design at the outset - as well as more maintenance. Or it could end up as a bit of a half way house with some cached items and some not, which seems a bit messy to me...

It's up to you - if it was me I would keep this library low level, clean, and aligned to the XLSX structure - you or someone else can always build a 'logical level' library on top of it in future!

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Nov 30, 2018

Hey @FleetPhil, thanks for the suggestion. I agree, I'd prefer to maintain a library with narrow focus, so probably will leave the API as it is in PR #8 right now: SharedStrings reflect the XML directly and it's up to a user to reconcile those values with what's available in Cell model. I will do a few more checks on PR #8, update README.md clarifying the intended use and merge it after that.

I would still appreciate more feedback from everyone else reading this. Thanks for your input!

@FleetPhil
Copy link

Still needs a Relationships object as well....! :)

@Cassers
Copy link

Cassers commented Nov 30, 2018

I would believe that the important thing is to define what kind of situations the API will be used for.

Sincerely there is a very important lack when working with XLSX files in iOS, in my case for example I'm just looking to consult a file and convert it into arrays to send them to a server.

There are several old libraries in Objective C, but their use is quite confusing and they do not even have support for CocoaPods.

I would recommend initially giving support for a basic handling of these files, such as consulting and modifying cells in a basic way.

MaxDesiatov added a commit that referenced this issue Dec 3, 2018
As reported in #21, there's no obvious way to get values for some values. That's caused by the fact that some of the values are stored in a separate file, which is usually located at the path `sharedStrings.xml`. This can be parsed to a corresponding model type and then processed by users of `CoreXLSX` to get the string values.

* Add SharedStrings model, general cleanup
* Add link to Open XML SDK docs for SharedStrings
* Fix rebase conflicts
* Add public `parseSharedStrings` API with tests
@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Dec 3, 2018

#8 and #22 PRs has been merged today into master, hope this resolves the reported issues for you. Beware that the API in master might still change in a backwards-incompatible way, but overall I'm trying to stabilise what we have now in preparation for an imminent 0.4 release. I'm only waiting for CoreOffice/XMLCoder#13 to be implemented, CoreOffice/XMLCoder#11 to be merged, audit existing types to match Microsoft's spec, fix/merge any other issues/PRs that appear along the way and hope to have 0.4 ready at that point. Looking forward to your feedback, all! Thank you for your help.

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Feb 7, 2019

0.4.0 was released today, hope you can try it and I look forward to your feedback. I'm leaving this issue open until we have some more detailed documentation and guides describing what's the best way to get cell values in all cases.

@MaxDesiatov MaxDesiatov self-assigned this Feb 7, 2019
@MaxDesiatov
Copy link
Collaborator

Closing this as parseSharedStrings is documented in README now. As usual, please feel free to reopen or create a new one if you find any issues or need help.

@duodo2412
Copy link

duodo2412 commented Apr 10, 2020

I have a problem, I hope you can help me
My file 'xlsx' is
Screen Shot 2020-04-10 at 20 36 50

  • When i write
let sharedStrings = try file.parseSharedStrings()
let columnCStringsA = worksheet.cells(atColumns: [ColumnReference("A")!])
  .compactMap { $0.stringValue(sharedStrings) }
let columnCStringB = worksheet.cells(atColumns: [ColumnReference("B")!])
  .compactMap { $0.stringValue(sharedStrings) }

and print it with value
columnCStringsA = ["ID", "1", "2", "3", "4"]
columnCStringsB = ["VALUE", "VALUE1", "VALUE2", "VALUE"]

  • And i dont know value of B14 is "", i wanna to result
    columnCStringsA = ["ID", "1", "2", "3", "4"]
    columnCStringsB = ["VALUE", "VALUE1", "VALUE2", "", "VALUE"]
    Can you help, how can i get value all cell
    Tks

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Apr 10, 2020

Hi @duodo2412, the value of the cell B4 in this case is not "", it's essentially nil, because the cell itself is absent in the .xlsx file in this case. If you'd like to verify that, you can specify the exact cell:

let cell = worksheet.cells(atColumns: [ColumnReference("B")!], rows: [4])

My undertstanding is that this will return an empty array [] in your case. Does that resolve your issue?

@duodo2412
Copy link

Hi @duodo2412, the value of the cell B14 in this case is not "", it's essentially nil, because the cell itself is absent in the .xlsx file in this case. If you'd like to verify that, you can specify the exact cell:

let cell = worksheet.cells(atColumns: [ColumnReference("B")!], rows: [4])

My undertstanding is that this will return an empty array [] in your case. Does that resolve your issue?

Tks for the your answer .
Can i ask you another question.
Screen Shot 2020-04-11 at 14 32 09
I have a lot of sheets in xlsx file.
My code:
let sheets: [String] = try file.parseWorksheetPaths() and print result

  • After that i change the order of sheet in file and execute my code again. Result is the same before i change the order of sheet .
    How can i know exactly the sheet i want to get ( may be i can query with name of sheet ) . Can you help me?
    Tks

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Apr 11, 2020

Hey @duodo2412, you should check this function on the XLSXFile type:

  /// Parse and return an array of workbooks in this file.
  /// Worksheet names can be read as properties on the `Workbook` model type.
  public func parseWorkbooks() throws -> [Workbook]

This returns an array of workbooks (usually there's just one), each of which contains an array of sheets with their ids and names. Please check the Workbook model type for more details, there's also some example code in WorkbookTests.

You could then use your own function instead of parseWorksheetPaths, that would give you worksheet names:

extension XLSXFile {
  func parseWorksheetPathsAndNames(workbook: Workbook) throws -> [(name: String?, path: String)] {
    return try parseDocumentPaths().map {
      try parseDocumentRelationships(path: $0)
    }.flatMap { (path, relationships) -> [(name: String?, path: String)] in
      let worksheets = relationships.items.filter { $0.type == .worksheet }

      guard !path.isRoot else { return worksheets.map { (name: nil, path: $0.target) } }

      // .rels file has paths relative to its directory,
      // storing that path in `pathPrefix`
      let pathPrefix = path.components.dropLast().joined(separator: "/")

      let sheetIDs = Dictionary(uniqueKeysWithValues: workbook.sheets.items.compactMap { sheet in
        sheet.name.flatMap { (sheet.relationship, $0) }
      })

      return worksheets.map { (name: sheetIDs[$0.id], path: "\(pathPrefix)/\($0.target)") }
    }
  }
}

I know this is a bit convoluted, but the CoreXLSX API represents the low-level internals of the XLSX format directly. I have plans to add more high-level functions to the library in the future, but I hope that parseWorksheetPathsAndNames would work for you in the meantime.

@duodo2412
Copy link

Hey @duodo2412, you should check this function on the XLSXFile type:

  /// Parse and return an array of workbooks in this file.
  /// Worksheet names can be read as properties on the `Workbook` model type.
  public func parseWorkbooks() throws -> [Workbook]

This returns an array of workbooks (usually there's just one), each of which contains an array of sheets with their ids and names. Please check the Workbook model type for more details, there's also some example code in WorkbookTests.

You could then use your own function instead of parseWorksheetPaths, that would give you worksheet names:

extension XLSXFile {
  func parseWorksheetPathsAndNames(workbook: Workbook) throws -> [(name: String?, path: String)] {
    return try parseDocumentPaths().map {
      try parseDocumentRelationships(path: $0)
    }.flatMap { (path, relationships) -> [(name: String?, path: String)] in
      let worksheets = relationships.items.filter { $0.type == .worksheet }

      guard !path.isRoot else { return worksheets.map { (name: nil, path: $0.target) } }

      // .rels file has paths relative to its directory,
      // storing that path in `pathPrefix`
      let pathPrefix = path.components.dropLast().joined(separator: "/")

      let sheetIDs = Dictionary(uniqueKeysWithValues: workbook.sheets.items.compactMap { sheet in
        sheet.name.flatMap { (sheet.relationship, $0) }
      })

      return worksheets.map { (name: sheetIDs[$0.id], path: "\(pathPrefix)/\($0.target)") }
    }
  }
}

I know this is a bit convoluted, but the CoreXLSX API represents the low-level internals of the XLSX format directly. I have plans to add more high-level functions to the library in the future, but I hope that parseWorksheetPathsAndNames would work for you in the meantime.

Tks for your support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants