Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidated and refactored pivot sources #1238

Merged
merged 2 commits into from Jun 15, 2023

Conversation

igitur
Copy link
Member

@igitur igitur commented Jun 26, 2019

Depends on #1237
Depends on #1233
Fixes #1039

WIP

@igitur igitur added this to the v0.96 milestone Jun 26, 2019
@igitur igitur added the enhancement Feature already exists, but should be enahanced. label Jun 26, 2019
@igitur igitur changed the title Pivot sources Consolidated and refactored pivot sources Jun 26, 2019
@igitur igitur modified the milestones: v0.96, v0.96.1 Jun 29, 2022
@jahav jahav modified the milestones: v0.97, v0.97.1 Oct 21, 2022
@jahav jahav modified the milestones: v0.100, v0.101 Dec 22, 2022
@jahav jahav modified the milestones: v0.101, v0.102 Apr 1, 2023
@jahav
Copy link
Member

jahav commented Apr 10, 2023

I rebased the pivot source to the current develop branch. Because the branch is 4 years old, rebase was complicated and pretty much every pivot test fails (some on exceptions), but that is fine.

The premise of PR is sound, source should be separate from the table, so we can have multiple tables from same source. I want to do this, before I separate out the pivot writer class (saving pivot is 1000+LOC and should be more structured).

A long way for merge:

  • Several rounds of fixing, refactoring, so test pass and I have a better understanding of changes.
  • Review API so it is good and we won't have to break it due to other changes in near future (e.g. don't expose internal stuff public modification IDictionary<String, IList<XLCellValue>>IXLPivotSource.CachedFields)
  • Add documentation, both articles and xml doc.
  • Add tests for the feature.

@jahav
Copy link
Member

jahav commented Apr 12, 2023

Done:

  • Streamlined setting of sharedItems element attributes. Specification is pretty vague in this point and MS-OI29500 has a massive errata and is still missing some info (e.g. containsDate="1" containsNumber="1" => excel tries to repair).
  • Separated the saving of a pivot cache into the 2-phase save (first phase ensures parts are in sync, second phase fills parts)
  • Fixed tests (they all load and look same in Excel). In one case, the blank value position has been moved to a different in Excel, but that is "more correct" behavior.

Explicitly scoping out of this PR:

  • Loading and saving the pivot cache records. AS-IS, pivotCacheRecords.xml is not really saved. That alone will be 1000+LOC PR.
  • detached sources (e.g. external sources or sources from deleted workbooks). Current behavior is to ignore them and that is good enough for now.

@jahav jahav changed the title Consolidated and refactored pivot sources WIP: Consolidated and refactored pivot sources Apr 12, 2023
@jahav
Copy link
Member

jahav commented Apr 14, 2023

I have made a review of public API and slimmed it down. It boils down to "does it have a use case" and "is it API I am confident can grow"

  • IXLPivotSourceReference public interface was removed along with a property on IXLPivotSource. There are several currently unsupported options for data (e.g. external workbook or OLAP cube) and it's too premature to do API.
  • Removed IXLPivotSource.SourceRangeFields - too much information (i.e. it reads from actual source range, not from cache), semantic unclear for external pivot tables (=would likely be a breaking change). At this time, I want IXLPivotSource to mostly hide the actual source of data. User should just call Refresh() if necessary.
  • Added IXLPivotSource.FieldNames - Names of the fields of the source. I am not using Cached prefix, pretty much everything in IXLPivotSource should be taken from cached data. It is inherent to the type. Use case: Field names are used to build pivot tables.
  • Removed IXLPivotSource.CacheFields - too much information, plus unclear one. At this time, it basically only contained shared items, reading of cache records is not done... at this stage, it is too premature to expose anything about field data.

I will also merge IXLTable methods at IXLPivotSources to the IXLRange. It's not too future proof - when we support external workbooks, should a new set of methods (Add/Contains/TryGet) be added? Adding support for OLAP cube/different source would introducec similar dilemma.

@jahav
Copy link
Member

jahav commented Apr 16, 2023

I renamed XLPivotSource to XLPivotCache. I think it is more fitting name that describes the purpose of the interface/class better (VBA also names it that way).

I kept only one method at this time: XLPivotCache.Add that adds a new pivot cache (always). It was originally a dictionary of a range - cache, but OOXML allows for multiple caches from same range (that was the way before this PR). The TryGet is not sensible after removal of a single range = single cache constraint.

The methods for adding a new pivot tables, where a range is passed as an argument (IXLPivotTables.Add), always check for existing caches as not to duplicate the caches (the key of this PR).

I need to do several rounds of reviews+cleanup+fixes (I haven't proofread anything yet), but API is now fine, there is a documentation and tests. Tests are limited to the current API scope (will be improved in subsequent PRs).

@GitAleB
Copy link

GitAleB commented Apr 26, 2023

Hi @jahav
Since this is ongoing work, i do not want to create an Issues now.
However, I would like to provide you with two excel files that demonstrate two problems (and might help):

  • Pivot Filter with custom name results in: Error 'The column 'Chose an Owner' does not appear in the source range. Arg_ParamName_Name' : Excel: SampleCustomNameOfFilter.xlsx
    "Chose an Owner" is the name of the custom filter in the attached excel. If the actual sourceName is used, it works, if the custom name is used, the exception above is thrown.

  • Calculated Column with CustomName: Actually same error here. Custom field is looked up in XLPivotValues.Add(...) _pivotTable.PivotCache.SourceRangeFields but can not ne found. SampleCustomNameOfCalculatedField.xlsx

Both errors seem to be caused by the same basic problem. Custom fields are not handled correctly (in my opinion).
The exceptions are thrown on XLWorkbook book = new XLWorkbook(stream).

Both Excel documents were correctly processed with Version 0.76.

@igitur
Copy link
Member Author

igitur commented Apr 26, 2023

Just to clarify, in v0.76 pivot tables were not supported. And ClosedXML read and wrote the files totally ignoring them. Whatever pivot tables were in the file were left untouched. Then we implemented support for adding/modifying pivot tables and obviously couldn't support every single feature from the beginning. This is the reason for the apparent regressions.

@GitAleB
Copy link

GitAleB commented Apr 26, 2023

Just to clarify, in v0.76 pivot tables were not supported. And ClosedXML read and wrote the files totally ignoring them. Whatever pivot tables were in the file were left untouched. Then we implemented support for adding/modifying pivot tables and obviously couldn't support every single feature from the beginning. This is the reason for the apparent regressions.

Ok, got it. With 0.76 we were using Excel "Templates" which had Pivot Tables already defined. The only thing we were using ClosedXML for was, to fill a separate "data" Sheet that was then used as Source for the already existing Pivot Table.
With the new Design (i.e. Version >0.76), ClosedXML now analyzes (an modifies) Pivot Tables as well - so, this causes the Problems for us. I think we can achieve the same thing with a pure OpenXml Solution (even though, this is more complicated for us). Any way - thank you for the support!

@jahav jahav changed the title WIP: Consolidated and refactored pivot sources Consolidated and refactored pivot sources Jun 4, 2023
@jahav jahav marked this pull request as ready for review June 4, 2023 23:09
@jahav
Copy link
Member

jahav commented Jun 4, 2023

It's not perfect, but it's an improvement and good enough. A representation of a pivot cache is a required first step toward making pivot tables actually work. I also need to this to be merged, so I can separate out individual saving writers.

@igitur Could you take a quick look? Saving part would be enough. I plan RC on Friday or Saturday. I know it's on a short notice, sorry about that. Work on phase 1 (long overdue architecture changes + dependencies changes) took precedence over phase 2 (deal with PR).

@jahav jahav merged commit f95d4bf into ClosedXML:develop Jun 15, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature already exists, but should be enahanced.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidated pivot table sources
3 participants