Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workbook with custom properties becomes corrupt #506

Closed
igitur opened this issue Jun 7, 2018 · 18 comments
Closed

Workbook with custom properties becomes corrupt #506

igitur opened this issue Jun 7, 2018 · 18 comments

Comments

@igitur
Copy link

@igitur igitur commented Jun 7, 2018

Description

Using OpenXML to purely open and save a file (without any manipulation) causes a corrupt file, i.e. Excel 2013 64b complains:
image

Note that the file contains custom property parts, some of which are of 0 lengths. I don't have control of those parts.

This problem occurs only when using .NET Core, not with .NET Framework.

Information

  • .NET Target: .NET Core 2.0
  • DocumentFormat.OpenXml Version: 2.8.1
  • System.IO.Packaging Version: v.4.5.0 (explicitly added)

Repro

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Validation;
using System;
using System.IO;

namespace OpenXMLCustomPropertyPackagingProblemCore
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            File.Copy("test.xlsx", "file_to_open.xlsx", true);
            LoadAndValidate("file_to_open.xlsx");

            Console.WriteLine("Done.");
            Console.ReadKey(false);
        }

        public static void LoadAndValidate(string sourcePath)
        {
            using (SpreadsheetDocument package = SpreadsheetDocument.Open(sourcePath, true))
            {
                var validator = new OpenXmlValidator();
                var errors = validator.Validate(package);

                foreach (var e in errors)
                {
                    Console.WriteLine($"{e.Part} {e.Path.PartUri}");
                    Console.WriteLine($"\t{e.Description}");
                }
            }
        }
    }
}

Input file:

MD5 hashes:

815363ba1700d2f2aae99e677475f564 *file_to_open.xlsx
ac39d841fc15384c252c4870063f9b54 *test.xlsx

Observed

file_to_open.xlsx becomes corrupt.

Expected

file_to_open.xlsx remains readable by Excel.

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 7, 2018

I strongly suspect that this is a bug in System.IO.Packaging, but let's wait for you guys to confirm.

@tomjebo

This comment has been minimized.

Copy link
Collaborator

@tomjebo tomjebo commented Jun 8, 2018

@igitur Thanks for the report. It doesn't seem to bother 2016 so I'll have to test with 2013 when I get a chance. I notice that there are some validation errors but not sure if those would affect.

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Jun 8, 2018

Do you see the same repro with System.IO.Packaging 4.4.0? Also, do you see it with any previous version of the SDK? We're getting a machine with 2013 to test this, but your help to narrow this down will be appreciated.

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Jun 8, 2018

Also, can you give it a try with 2.9.0? It's available in the CI feed and I've fixed a number of weird design issues within the validator in 2.9.0, so the architecture of the validator is substantially different now.

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Jun 8, 2018

Not sure if this is just a repro and you have a larger scenario, but you should be able to open it in read only (ie pass false instead of true to .Open(...) and have it remain as-is.

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 10, 2018

Not sure if this is just a repro and you have a larger scenario, but you should be able to open it in read only (ie pass false instead of true to .Open(...) and have it remain as-is.

Thanks, but unfortunately it's not an option. He code sample is just a reduced test case.

I'll try the other suggestions soon. Will also try to remove the validation errors. In the original, non-minimal use case there are no validation errors. Something must have happened when I trimmed down the Excel file.

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 11, 2018

I updated the test case. The validation errors are removed (they were related to comments). I have added both the input file and the file produced (i.e. after modification), and their hashes. This is still with System.IO.Packaging v4.5.0.

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 11, 2018

Did a 7-Zip CRC check on the file:
image

Running the code under net461 produces a file without any CRC errors.

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 11, 2018

Downgrading System.IO.Packaging to 4.4.0 does not solve the problem.
Upgrading DocumentFormat.OpenXml to 2.9.0-office2016-0107 does not solve the problem.

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 11, 2018

Did a 7-Zip CRC check on the file:

Hunting down the cause of the CRC error is probably easier than trying to set up a machine with Excel 2013 ;-)

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Jun 11, 2018

I found the file is changed even without any validation checks. Can you check if it occurs if you just open and close a package with System.IO.Packaging.Package.Open

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 12, 2018

Yes, the error occurs with this code too:

using (var package = System.IO.Packaging.Package.Open(sourcePath, FileMode.Open))
{
    foreach (var p in package.GetParts())
        Console.WriteLine(p.Uri);
}

Should I report this issue at https://github.com/dotnet/core/issues ? Having the OpenXml team's weight behind it will surely help :-)

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Jun 12, 2018

This should probably be reported to https://github.com/dotnet/corefx/issues. Does it repro without doing the writeline?

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Jun 12, 2018

The zip implementation is completely different between .NET Framework and .NET Core, so seems like there may be an issue in that implementation.

@igitur

This comment has been minimized.

Copy link
Author

@igitur igitur commented Jun 12, 2018

Yes, it repros without the writelines too. Ok, I'll report it. Thanks.

@orobert91

This comment has been minimized.

Copy link

@orobert91 orobert91 commented Dec 25, 2018

Any progress on this? 6 months and the dotnet core team hasn't even started looking at it. This should be top priority. It is a blocking issue with no workaround and prevents modifying Excel files in .NET Core.

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Jan 3, 2019

@orobert91 The issue is not in this SDK but in the .NET Core implementation. We're blocked here until there is progress there.

@twsouthwick

This comment has been minimized.

Copy link
Collaborator

@twsouthwick twsouthwick commented Mar 24, 2020

Looks like this has been fixed in System.IO.Packaging (see dotnet/corefx#37079). We've updated to the latest version, so should be fixed. I'll close the issue here, but please reopen if issue persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.