Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed mailto Hyperlink causes Exception on .NET 4.5+ #38

Closed
koyote opened this issue May 1, 2015 · 24 comments
Closed

Malformed mailto Hyperlink causes Exception on .NET 4.5+ #38

koyote opened this issue May 1, 2015 · 24 comments
Assignees

Comments

@koyote
Copy link

koyote commented May 1, 2015

Hi,

This issue is similar to #7 but for recent versions of the .NET framework.

If you create a blank document and set some text to the following field code (note the extra space):
{ HYPERLINK "mailto:email@address%20.com" }

Running the following on .NET 4.0 will work fine:
var wpdoc = WordprocessingDocument.Open(@"TestDoc.docx", false)
Running the same line on .NET 4.5+ will throw the following exception:

DocumentFormat.OpenXml.Packaging.OpenXmlPackageException: Invalid Hyperlink: Malformed URI is embedded as a hyperlink in the document.
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.Load() in OpenXmlPackage.cs: line 490
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(String path, Boolean readWriteMode) in OpenXmlPackage.cs: line 402
   at DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open(String path, Boolean isEditable, OpenSettings openSettings) in PackageDocument.cs: line 297
   at  DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open(String path, Boolean isEditable) in PackageDocument.cs: line 256

The inner-exception being this:

   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
   at System.Uri..ctor(String uriString, UriKind uriKind)
   at MS.Internal.IO.Packaging.InternalRelationshipCollection.ProcessRelationshipAttributes(XmlCompatibilityReader reader)
   at MS.Internal.IO.Packaging.InternalRelationshipCollection.ParseRelationshipPart(PackagePart part)
   at MS.Internal.IO.Packaging.InternalRelationshipCollection..ctor(Package package, PackagePart part)
   at System.IO.Packaging.PackagePart.EnsureRelationships()
   at System.IO.Packaging.PackagePart.GetRelationshipsHelper(String filterString)
   at System.IO.Packaging.PackagePart.GetRelationships()
   at DocumentFormat.OpenXml.Packaging.PackagePartRelationshipPropertyCollection..ctor(PackagePart packagePart)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.Load(OpenXmlPackage openXmlPackage, OpenXmlPart parent, Uri uriTarget, String id, Dictionary`2 loadedParts)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.LoadReferencedPartsAndRelationships(OpenXmlPackage openXmlPackage, OpenXmlPart sourcePart, RelationshipCollection relationshipCollection, Dictionary`2 loadedParts)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.Load()

I assume this is due to the following change introduced in .Net 4.5 (https://msdn.microsoft.com/en-us/library/hh367887%28v=vs.110%29.aspx):
An invalid mailto: URL throws an exception in the Uri class constructor.

Word seems to be perfectly happy with the malformed mailto hyperlink so I assume this would count as a valid document.
I am therefore not sure how this could be fixed elegantly.

Thanks,

@EricWhiteDev
Copy link
Contributor

Hi,

As you determined, this is an issue in the underlying System.IO.Packaging.

Currently, we are assessing whether we can patch System.IO.Packaging to fix this. One issue is that the System.IO.Packaging returns a Uri object when you request a list of external relationships, and if the Uri is invalid in the document, then we can't return one.

Perhaps the good semantics would be:

  • Don't throw an exception upon opening.
  • If the program using the SDK (and therefore System.IO.Packaging) requests an invalid Uri, then throw an exception at that point
  • If the program using the SDK never requests the invalid link, then upon serializing, the Open XML SDK and System.IO.Packaging would serialize the bad Uri as is.

Currently, the recommended workaround is:

http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2014/08/19/handling-invalid-hyperlinks-openxmlpackageexception-in-the-open-xml-sdk.aspx

@EricWhiteDev
Copy link
Contributor

I'm assigning this to myself, and to version 4.0. This is an issue intrinsic to the design of System.IO.Packaging, and we need to be careful about how we fix this. In the meantime, the best approach is the workaround in the above comment.

@EricWhiteDev EricWhiteDev added this to the V4.0 milestone Jul 24, 2015
@EricWhiteDev EricWhiteDev self-assigned this Jul 24, 2015
@EricWhiteDev
Copy link
Contributor

Closing this suggestion. This is most likely not going to be implemented, so should close this issue to reflect that.

@houssemzaier
Copy link

Hello
I have the sale problem when I want to read an Excel file containing a non valid URI I don't have any solution to this problem yet... any help please ?

@davideloper
Copy link

I guess I have the same problem with xlsx: "PackageRelationship target must be relative URI if TargetMode is Internal"
This error is only presented on mono, on windows everything looks fine.
I am using OpenXmlSdk 2.6 and System.IO.Packaging from .NETStandard.
The error fires on SpreadsheetDocument.Open(filePath ...).
Any possible solutions?

@igitur
Copy link
Contributor

igitur commented Mar 30, 2017

For those still struggling with this issue, please see @EricWhiteDev 's article on how to handle this: http://ericwhite.com/blog/handling-invalid-hyperlinks-openxmlpackageexception-in-the-open-xml-sdk/

The link Eric posted above seems obsolete.

@EricWhiteDev: Is there absolutely no way that this workaround will be implemented going forward?

@silviomarquesferreira
Copy link

Hi Guys, I'm facing the same problem with a docx document.
Is there some new implementation inside system.io.packaging?

@AmeetShinde
Copy link

Hello guys, we are facing similar issue and the workaround is not acceptable in this case (no write access to document and making in-memory copy of the document is too costly). Can we get this fixed in System.IO.Packaging library itself?

@robertmuehsig
Copy link
Contributor

He @EricWhiteDev - is this problem solved when using a newer release of the OpenXML SDK or is this "by design" broken? @twsouthwick

@twsouthwick
Copy link
Member

This is a "by design" issue by the fact that System.IO.Packaging.Package throws the exception and there's no way to catch it. That said, I have a design for a fix I'm going to propose on CoreFx that would allow to hook into the package reading to handle it manually. It hasn't been high on my priority list, but if there's interest I'll get that proposal submitted so there'll be a viable workaround.

@robertmuehsig
Copy link
Contributor

Uh - this is bad, because we process Office documents and our main message is "use whatever feature you like in Word and we can handle it" (at least we shouldn't fail on it) and we have a customer issue because she or he inserted a link with a custom URL schema, like "myapp:foobar&param=123".

Word itself is doing fine, but our application needs to insert some CustomXML into the document, but because of this issue it will fail.

Now our application looks quite bad, because "Word can handle it" and we fail pretty early (and the workaround seems quite perf heavy :-/)

@igitur
Copy link
Contributor

igitur commented May 4, 2018

So change your main message. Clearly it's not true.

@robertmuehsig
Copy link
Contributor

Just to be clear: The main message thing was just a overstatement. I know that there are a lot of functions inside Word that are not easily resolvable via pure Open XML (e.g. everything that renders the actual content), but that a typical user can create a somewhat invalid OpenXML file (at least from the view of the Open XML SDK) via a pretty basic function is quite bad.

@twsouthwick
Copy link
Member

I've opened a PR to get a potential way of handling this into the Package object: https://github.com/dotnet/corefx/issues/29531

@robertmuehsig
Copy link
Contributor

@twsouthwick Thanks! Would the suggested PR (if accepted) only be available on .NET Core or would we able to use this in the full framework as well?

@twsouthwick
Copy link
Member

That would be up to CoreFx, but it would first go into .NET Core, and then we can push to port it to framework.

@ThomasBarnekow
Copy link
Collaborator

@twsouthwick, I am very frequently running in this issue and have implemented a workaround based on Eric White's article. However, rather than having everybody implement that workaround for himself or herself, wouldn't it be better to offer that as part of the Open XML SDK? For example, we could either provide a separate utility class or add a static utility method to OpenXmlPackage.

@vogla
Copy link

vogla commented Sep 30, 2020

I just ran into this issue with an application that processes excel documents. How can such a fundamental and important issue be ignored for years? Invalid hyperlinks occur in excel documents all the time and everywhere. The argument that this is not a concern for an OpenXML document parsing library is ridiculous.

@igitur
Copy link
Contributor

igitur commented Sep 30, 2020

@vogla This was fixed in #793 . Even if it weren't, the cool thing with open source software is that you can fix it yourself. And if you're really nice, you can contribute it back to the original project.

@abelykh0
Copy link

@igitur Thank you. But sorry, I do not think it is a good idea. On the one hand, this code adds a traversal, which will reduce performance. On the other hand, this code seems to modify the original document.

@vogla
Copy link

vogla commented Sep 30, 2020

@vogla This was fixed in #793 . Even if it weren't, the cool thing with open source software is that you can fix it yourself. And if you're really nice, you can contribute it back to the original project.

Thank you @igitur for pointing this out and thank you @twsouthwick for providing this wonderful improvement. Fix #793 seems to adress this problem indeed. If I understand correctly the fix was merged to master quite recently, but not yet released. So hopefully we'll get a new release soon that will then include this fix.

@Nils-Berghs
Copy link

Fix #793 is a workaround, not a true fix. It trips over readonly documents.

@twsouthwick
Copy link
Member

@Nils-Berghs you're correct. I've got a better workaround that does a shadow copy if needed. Still not great, but give v3.0 a try (there's a beta on NuGet that has this). A true fix would be at lower levels, but we're doing what we can to support it better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests