Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Location of pivot cache parts inside xlsx file #612

Open
igitur opened this issue Jul 8, 2019 · 5 comments

Comments

Projects
None yet
3 participants
@igitur
Copy link

commented Jul 8, 2019

Description

When creating a new .xlsx file with a pivot table in Excel and saving it, the pivot cache parts (specificially the PivotCacheDefinitionPart and PivotCacheRecords parts) are saved to the ./xl/pivotCache/ location inside package.

When one generates pivot tables using the OpenXML SDK, these new parts are added to the ./pivotCache/ location. Notice that it does not fall under ./xl/.

While the generated file isn't corrupt and works with all spreadsheet clients, the discrepancy about the location of the parts does feel strange. It's especially confusing when one opens an Excel-generated file using the OpenXML SDK where the file already contains pivot table parts and one adds new pivot table parts. The resulting file will have the original pivot table cache parts in the ./xl/pivotCache location, but the newly added parts will be in the ./pivotCache/ location. Additionally, this is the only spreadsheet-related part (that I've noticed) that does not save to some location under ./xl/.

Information

  • .NET Target: .NET 4.7.2
  • DocumentFormat.OpenXml Version: Commit 9d8629a

Repro

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;

namespace ConsoleApp1
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            using (var package = SpreadsheetDocument.Create("outputfile.xlsx", SpreadsheetDocumentType.Workbook))
            {
                var workbookPart = package.AddWorkbookPart();

                workbookPart.AddNewPart<PivotTableCacheDefinitionPart>("rId2");
            }
        }
    }
}

Observed

Notice location of result pivot cache parts:
pivotcache2

Expected

When one generates a file in Excel, pivot cache parts are in this location:
pivotcache

@igitur

This comment has been minimized.

Copy link
Author

commented Jul 8, 2019

I notice that in https://github.com/OfficeDev/Open-XML-SDK/blob/master/src/DocumentFormat.OpenXml/GeneratedCode/Packaging/PivotTableCacheDefinitionPart.g.cs#L85 the relative target path is defined as ../pivotCache. But this is relative to the workbook part.

I suspect someone treated this target path as if it's relative one of the pivot tables that uses the pivot table cache definition.

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2019

@tomjebo Looks like the schema files has the path defined as ../pivotCache. Which behavior do we want in the SDK?

@tomjebo

This comment has been minimized.

Copy link
Collaborator

commented Jul 14, 2019

@igitur you are correct. The explicit relationship from the workbook part is supposed to be just: Target="pivotCache/pivotCacheDefinition1.xml"/>
Where adding a relationship (if needed; the standard says it's "implicit") from the pivotTable part is supposed to be:
Target="../pivotCache/pivotCacheDefinition1.xml"/>

This is per the standard ISO 29500-1:2016 12.3.12 "Pivot Table Cache Definition Part". What I think is happening is that the PivotTableCacheDefinition class was intended to be added to (or by) the PivotTableDefinition (root of the PivotTablePart class). When added in that context the "../pivotCache" location makes sense.
So I think what should happen is you should add the PivotTablePart part to the Workbook (actually Worksheet) part and it will create (it has a constraint rule in the generated code) the PivotCacheDefinition part relationship.

Having said all this, I see after searching for examples of this, it's not that straight forward. One way to see the correct steps would be to use the Productivity Tool to reflect the OpenXML SDK code based on an existing file with pivot table inserted.
...(some time later)...
Ok, I actually did the above. Used the productivity tool to add reflect the code on a very simple workbook with a pivot table. I ran the reflected code and sure enough it created target paths just like Excel.

Then I simplified this to use the ordering that the reflected code uses. Here's the final product like yours:

            using (var package = SpreadsheetDocument.Create("outputfile.xlsx", SpreadsheetDocumentType.Workbook))
            {
                var workbookPart = package.AddWorkbookPart();

                var cachedefinitionpart = workbookPart.AddNewPart<PivotTableCacheDefinitionPart>("rId3");
                var cacherecordspart =  cachedefinitionpart.AddNewPart<PivotTableCacheRecordsPart>("rId1");
                var worksheet = workbookPart.AddNewPart<WorksheetPart>();

                var pivotTablePart = worksheet.AddNewPart<PivotTablePart>("rId2");
                pivotTablePart.AddPart(cachedefinitionpart);

            }

Running this simpler example I see that it does create the correct relationships like Excel. I think it's because you didn't create the sheet and then add the PivotTablePart to it. When that happens, the constraint (I think) creates the correct relationship.
@igitur Can you also check to see that this works for you?

Tom

@igitur

This comment has been minimized.

Copy link
Author

commented Jul 15, 2019

Thanks Tom.

I had a quick look at the 12.3.12 section.

A bit of a side-note, but I notice this (my emphasis):

A package shall contain exactly one Pivot Table Cache Definition part per pivot table, and each such part shall be the target of an implicit relationship from a Pivot Table (§12.3.11) part as well as an explicit relationship from a Workbook (§12.3.23) part.

Surely this cannot be true. I see many examples of multiple pivot tables sharing a single pivot table cache definition.

For now, I'm going to assume that multiple pivot tables can share a single pivot table cache definition. So, for me, it makes more sense to enumerate and add all my cache definitions first (adding them to the workbook), and only after that add the pivot tables and add their appropriate cache definition parts.

I was interested in how the reflected code would look for 2 pivot tables referencing the same underlying range. Here it is:

// Adds child parts and generates content of the specified part.
private void CreateParts(SpreadsheetDocument document)
{
    ExtendedFilePropertiesPart extendedFilePropertiesPart1 = document.AddNewPart<ExtendedFilePropertiesPart>("rId3");
    GenerateExtendedFilePropertiesPart1Content(extendedFilePropertiesPart1);

    WorkbookPart workbookPart1 = document.AddWorkbookPart();
    GenerateWorkbookPart1Content(workbookPart1);

    WorksheetPart worksheetPart1 = workbookPart1.AddNewPart<WorksheetPart>("rId3");
    GenerateWorksheetPart1Content(worksheetPart1);

    SharedStringTablePart sharedStringTablePart1 = workbookPart1.AddNewPart<SharedStringTablePart>("rId7");
    GenerateSharedStringTablePart1Content(sharedStringTablePart1);

    WorksheetPart worksheetPart2 = workbookPart1.AddNewPart<WorksheetPart>("rId2");
    GenerateWorksheetPart2Content(worksheetPart2);

    PivotTablePart pivotTablePart1 = worksheetPart2.AddNewPart<PivotTablePart>("rId1");
    GeneratePivotTablePart1Content(pivotTablePart1);

    PivotTableCacheDefinitionPart pivotTableCacheDefinitionPart1 = pivotTablePart1.AddNewPart<PivotTableCacheDefinitionPart>("rId1");
    GeneratePivotTableCacheDefinitionPart1Content(pivotTableCacheDefinitionPart1);

    PivotTableCacheRecordsPart pivotTableCacheRecordsPart1 = pivotTableCacheDefinitionPart1.AddNewPart<PivotTableCacheRecordsPart>("rId1");
    GeneratePivotTableCacheRecordsPart1Content(pivotTableCacheRecordsPart1);

    WorksheetPart worksheetPart3 = workbookPart1.AddNewPart<WorksheetPart>("rId1");
    GenerateWorksheetPart3Content(worksheetPart3);

    PivotTablePart pivotTablePart2 = worksheetPart3.AddNewPart<PivotTablePart>("rId1");
    GeneratePivotTablePart2Content(pivotTablePart2);

    pivotTablePart2.AddPart(pivotTableCacheDefinitionPart1, "rId1");

    WorkbookStylesPart workbookStylesPart1 = workbookPart1.AddNewPart<WorkbookStylesPart>("rId6");
    GenerateWorkbookStylesPart1Content(workbookStylesPart1);

    ThemePart themePart1 = workbookPart1.AddNewPart<ThemePart>("rId5");
    GenerateThemePart1Content(themePart1);

    workbookPart1.AddPart(pivotTableCacheDefinitionPart1, "rId4");

    CustomFilePropertiesPart customFilePropertiesPart1 = document.AddNewPart<CustomFilePropertiesPart>("rId4");
    GenerateCustomFilePropertiesPart1Content(customFilePropertiesPart1);

    SetPackageProperties(document);
}

Sure enough, that will work, but it does feel like a clumsy code. Note how pivotTableCacheDefinitionPart1 is created and added to pivotTablePart1 and later also added to pivotTablePart2. It's not really the kind of approach that lends itself towards enumeration over pivot tables and their cache definitions.

You say:

What I think is happening is that the PivotTableCacheDefinition class was intended to be added to (or by) the PivotTableDefinition (root of the PivotTablePart class).

Sorry for asking, but how sure are you that this was the intended approach?

@tomjebo

This comment has been minimized.

Copy link
Collaborator

commented Jul 15, 2019

@igitur I had said that before doing the remaining investigation and now don't think that. More likely the relationship is getting overwritten after adding in the pivot table and table definition parts.

After investigation, my point was more that just adding the pivot cache definition to the workbook relationships does not represent a completed package. Table and sheet are needed. After adding those, the relationships seem to be correct. Do you see that as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.