Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Load Facts for Danish Taxonomy #3

Closed
JeffFerguson opened this issue Sep 20, 2017 · 27 comments

Comments

Projects
None yet
5 participants
@JeffFerguson
Copy link
Owner

commented Sep 20, 2017

From a user:

I am working with a large customer trying to parse the danish taxonomy using the Gepsio framework for c#.

However I am experiencing difficulties to parse xbrl documents, more specific getting facts out of them.
I found a post on stackoverflow where someone experienced the same problems parsing the Dutch taxonomy. I tried to follow his input on how to solve this without any luck.

I am trying to parse this document:
https://datacvr.virk.dk/data/offentliggorelse?dl_ref=ZG9rdW1lbnRsYWdlcjovLzAzL2U1LzI0L2U3LzYwLzZmN2YtNGVkOC05MmMwLWU4YjIxOTkxYTFjNQ

And the danish taxonomy can be downloaded here:
https://erhvervsstyrelsen.dk/sites/default/files/media/xbrl20161001_20161123_1408.zip

@JeffFerguson JeffFerguson self-assigned this Sep 20, 2017

@JeffFerguson JeffFerguson added the new label Sep 20, 2017

@JeffFerguson JeffFerguson added investigating and removed new labels Oct 5, 2017

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented Oct 5, 2017

The XBRL instance referenced above contains a schema reference:

<lnk:schemaRef xlink:href="http://archprod.service.eogs.dk/taxonomy/20161001/entryDanishGAAPBalanceSheetAccountFormIncomeStatementByNatureIncludingManagementsReviewStatisticsAndTax20161001.xsd" xlink:type="simple"/>

However, there is no schema at http://archprod.service.eogs.dk/taxonomy/20161001/entryDanishGAAPBalanceSheetAccountFormIncomeStatementByNatureIncludingManagementsReviewStatisticsAndTax20161001.xsd. An HTTP request for this address returns a 404.

Since the schema cannot be found, Gepsio fails schema validation and reports a validation error. This is by design, as far as Gepsio is concerned. The question becomes: where is the taxonomy if the taxonomy is not at the address specified in the schema reference?

@ValeriyTitov

This comment has been minimized.

Copy link

commented Dec 4, 2017

I have same problem.
Taxonomy xlink:href is just fake link to http site.
Real taxonomy located on drive c:\Taxonomy...\File.xsd

But if i change xlink:href path in xml-document, another error thrown - XbrlSchema.GetFullSchemaPath() code path not implemented.

@ValeriyTitov

This comment has been minimized.

Copy link

commented Dec 4, 2017

Just created local http server with HFS.exe and copied taxonomy files in it(keeping directory structure, about ~9000 files Generated by Fujitsu Interstage XWand).
After about 30 seconds of loading im getting another error in xml.Load - Object reference not set to an instance of an object

@gustavkn

This comment has been minimized.

Copy link

commented Apr 30, 2018

Any updates on this? - I am still trying to get this working :/

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 11, 2018

I am able to replicate this issue. After copying taxonomy files locally, I can load an XBRL instance that uses a Dutch taxonomy. Gepsio reports that the document is valid, that there is one fragment, and the fragment contains one unit, 13 contexts – and no facts.

Now that I have a working sample, let me see if I can figure out what is going on. More soon!

@JeffFerguson JeffFerguson added bug and removed investigating labels May 11, 2018

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 11, 2018

I have more information on this issue, and I am not sure that it is a bug. I will illustrate the issue using the files in this attached sample XBRL source. Using the sample instance I have attached to this comment, I can write code to ask Gepsio to load the XBRL instance document offentliggorelse.xml. The first node in the XBRL instance that Gepsio attempts to load as a fact has a local name of InformationOnTypeOfSubmittedReport. However, this -- and all of the other nodes -- fail to load as facts because Gepsio cannot find the element definition in the loaded schema that contains the definition of that node. Because it cannot find a schema definition for the node, Gepsio does not recognize the node as a fact.

The element InformationOnTypeOfSubmittedReport is defined in the schema gsd.xsd. However, that schema is not referenced in the schema definitions for the document, so Gepsio does not load the schema. Here is the full instance and schema loading process, according to Gepsio:

  • Load XBRL instance offentliggorelse.xml.
  • Find schema reference entryDanishGAAPBalanceSheetAccountByCurrentAndLongTermFormIncomeStatementByNatureIncludingManagementsReviewStatisticsAndTax20171001.xsd in the loaded XBRL instance offentliggorelse.xml.
  • Load schema entryDanishGAAPBalanceSheetAccountByCurrentAndLongTermFormIncomeStatementByNatureIncludingManagementsReviewStatisticsAndTax20171001.xsd.
  • Examine loaded schema and load all of the linkbase references. The schema that contains the fact definitions -- gsd.xsd, in the first case, is not refrerenced anywhere in the loaded XBRL source, so it is not loaded.

Issue3XbrlInstance.zip

@JeffFerguson JeffFerguson added investigating and removed bug labels May 11, 2018

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 11, 2018

I'll move this issue back to "investigating". The moral of the story is that the schema that contains the defintions of facts such as InformationOnTypeOfSubmittedReport -- which, in that case, is gsd.xsd -- need to be somewhere in the XBRL instance document's schema reference chain. Since it is not, then no matching facts can be found.

@ostat

This comment has been minimized.

Copy link

commented May 15, 2018

Having a similar issue. We are attempting to load an xbrl document that contains a schemaRef where the URL is not a valid link, return 404. I had planned to store the xsd locally and modify the xlink:href to a local path. Ideally I would have liked to provide Gepsio will a list of schema path replacements, I.E. for http:\example.com\something replace with **c:\localcache**.

However I was not able to get the document to load the facts.
If I load the document as is I get an error indicating the schema can't be accessed.
If I remove the schema link the document loads but there are no facts.
If I try to change the schema path to c:.....xsd I get a NotImplemented exception 'XbrlSchema.GetFullSchemaPath() code path not implemented.'.

This is the link to the taxonomy pack http://www.sbr.gov.au/__data/assets/file/0008/46907/ATO-SPRMBR-2018-Package-v0.9.zip,
Which can be found on this public page, http://www.sbr.gov.au/software-developers/developer-tools/ato/ato-superannuation-data-and-payment-standards/ato-superannuation-spr#sprmbr

I can prob provide a sample document, however that in not publicly accessible so would rather confirm and then email it.

Am i doing something wrong here? Is it possible to have Gepsio load these documents?

@ValeriyTitov

This comment has been minimized.

Copy link

commented May 15, 2018

Ostat,

To solve this, i rewrote code in 3 places, to make sure http links are replaced with local links.
I think best practice for future use is to rewrite code to use one single XmlUrlResolver class, what resolves all possible URIs. And use it as custom property for each instance of gepsio...

xbrlschema.cs:

private string GetFullSchemaPath(string SchemaFilename, string BaseDirectory)
string RealFileName;
RealFileName = System.IO.Path.GetFileName(SchemaFilename);
if  (RealFileName.Equals("ep_nso_aif_uk_m_q_20d.xsd",StringComparison.OrdinalIgnoreCase))
{  
   FullPath = @"c:\Tax\www.cbr.ru\xbrl\nso\uk\rep\2018-01-31\ep\ep_nso_aif_uk_m_q_20d.xsd";
   return FullPath;
}

XbrlSchemaCollection.cs //well-known files ?

private void BuildStandardNamespaceSchemaLocationDictionary()
{
 StandardNamespaceSchemaLocationDictionary = new Dictionary<string, string>
  {
    {
       "http://www.cbr.ru/xbrl/udr/dim/dim-int", 
       @"c:\Tax\www.cbr.ru\xbrl\udr\dim\dim-int.xsd"
     }, //and so on
   };
}

SchemaSet.cs / /write custom class to resolve URIs for XmlSchemaSet

public class MyXmlResolver : XmlUrlResolver
    {
        public override Uri ResolveUri(Uri baseUri, string relativeUri)
        {
            string RealPath;
            if (relativeUri.StartsWith("http://"))
                RealPath = @"C:/tax/" + relativeUri.Substring(7);
            else
                RealPath = relativeUri;
            //Console.WriteLine(RealPath);
            return base.ResolveUri(baseUri, RealPath);
        }
    }
...
public SchemaSet()
        {
            MyXmlResolver resolver = new MyXmlResolver();
            thisSchemaSet.XmlResolver = resolver;
        }

@ValeriyTitov

This comment has been minimized.

Copy link

commented May 15, 2018

However, after document loads, i couldnt find a way to access fact dimensions...

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 15, 2018

I like the idea @ostat has about schema path replacements. I will definitely look into that. It still seems odd to me that these URLs aren't actually valid, but I will work with what we have. Thanks for the suggestion!

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 15, 2018

The first question becomes "How will callers inform Gepsio about schema path replacements"? My initial thought is to create some sort of "runtime configuration" class, add the schema path replacement information into that configuration class, and then pass that configuration instance in with the call to Load(). Thoughts? Something like this … ?

var config = new RuntimeConfiguration();
config.AddSchemaReplacement(oldPath, newPath);
var doc = new XbrlDocument();
doc.Load("XBRL.xml", config);

The advantage with something like this is that, if more configuration items come up in the future, I can stick them in the config class without changing the signature to Load() every time. If no config is passed in, then defaults will be used.

@ValeriyTitov

This comment has been minimized.

Copy link

commented May 15, 2018

hm... I think this one is better.
Use same resolver in those 3 places I specified earlier

public class CustomXmlResolver : XmlUrlResolver //class Derived from built-in XmlUrlResolver
    {
        public override Uri ResolveUri(Uri baseUri, string relativeUri)
        {
            //User defined rules for finding
        }
    }
...
var resolver = new CustomXmlResolver();
var doc = new XbrlDocument();
doc.CustomResolver = resolver;
doc.Load("XBRL.xml");
@ValeriyTitov

This comment has been minimized.

Copy link

commented May 15, 2018

private string GetFullSchemaPath(string SchemaFilename, string BaseDirectory)
{
if (CustomResolver!=null)
     return CustomResolver.ResolveUri(BaseDirectory, SchemaFilename)
      else
     return OldLogic()
}
@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 15, 2018

Thanks for the suggestion. It exposes some XML implementation details that Gepsio tries to hide from the user. I think the other nice thing about the RuntimeConfiguration class is that I can add to it in the future if more configuration items are necessary.

Internally, of course, it will be implemented using some sort of XmlUrlResolver, as you suggest. I'm just leery of exposing that level of XML implementation detail to the callers, since Gepsio tries to encapsulate as much implementation detail as possible.

I suppose I could simplify it one step further, and just support some sort of Boolean called something like ForceLocalSchemas that, when set to true, would look for the schemas with the XBRL instance, regardless of the URL in schemaRef. In that case, you wouldn't have to specify any paths at all. Simple, from the caller's perspective!

@ostat

This comment has been minimized.

Copy link

commented May 15, 2018

Thanks for the input @JeffFerguson and @ValeriyTitov. @ValeriyTitov, i will give you code a try see if i can get it to work. However you said you were still not able to load the facts?

I prefer the idea of passing in a list of schema replacements and having the CustomXmlResolver implemented internally, if desired this could be overridden if more flexibility was needed. Being able to provide the local paths would allow for the schems to be in any location, where a ForceLocalSchemas would not be as flexible in this regard.
Aside supporting local copies of the schemas might also be a performance improvement. Some of the taxonomies we deal with are quite large, and pulling them over the internet would not be idea.

Replacing the schema path is a good improvement, however correct me if I am mistaken (perhaps this is a separate issue), this would be the same as manually editing the schemaRef to be a local path before loading the xbrl. I mention this because when I tried that I received a NotImplemented exception 'XbrlSchema.GetFullSchemaPath() code path not implemented.'.

Thanks for the fast feedback :)

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 16, 2018

I'll meet you halfway. :)

I agree that the ForceLocalSchemas idea is a bit too restrictive, so let's drop that.

What about my other idea regarding a RuntimeConfiguration class and methods like some sort of methd like AddSchemaReplacementPath(), or something similarly named? I like that a bit better than a custom XmlUrlResolver, since it encapsulates, rather than exposes, XML details, while still being more flexible than a ForceLocalSchema Boolean.

Thoughts?

@ValeriyTitov

This comment has been minimized.

Copy link

commented May 16, 2018

Lists is not good for security reasons.
You never know SchemaRef's are linked to. They can link to malware site or huge xml document.
SchemaRef's what is not in your lists can lead Gepsio to load xml from Mars :)

@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented May 26, 2018

I just pushed a fix for this to the develop branch. HTTP-based schema references that fail to load due to an HTTP 404 response are searched locally.

@Wimsteppe

This comment has been minimized.

Copy link

commented Nov 2, 2018

@JeffFerguson Hi, we've used gepsio successfully to load the Belgian Corporate Tax taxonomies. They are similar to the Danish and Dutch taxonomies, in that the schema names are formed like an HTTP address, but are actually just a denominator. Your fix works, but it only searches in the base directory, requiring us to copy paste the XBRL into the Taxonomy directory.

We currently solved this issue by providing an optional schemaLocation parameter to the XbrlDocument class. Is this something you would consider adding to the repository? Basically only the GetSchemaReferencePath method would have to be adjusted.

A runtimeconfig class or similar would also be a solution.

@ValeriyTitov

This comment has been minimized.

Copy link

commented Nov 2, 2018

Basically only the GetSchemaReferencePath method would have to be adjusted.

Not realy.
xbrlschema.cs: GetFullSchemaPath(string SchemaFilename, string BaseDirectory)

XbrlSchemaCollection.cs: add reference to dictionary in BuildStandardNamespaceSchemaLocationDictionary() or override GetSchemaFromTargetNamespace

SchemaSet.cs : Build any custom URI Resolver class derived from standart XmlUrlResolver, and add it after

runtimeconfig class should be able to resolve conflicts in this 3 entries.

@Wimsteppe

This comment has been minimized.

Copy link

commented Nov 5, 2018

Basically only the GetSchemaReferencePath method would have to be adjusted.

Not realy.
xbrlschema.cs: GetFullSchemaPath(string SchemaFilename, string BaseDirectory)

XbrlSchemaCollection.cs: add reference to dictionary in BuildStandardNamespaceSchemaLocationDictionary() or override GetSchemaFromTargetNamespace

SchemaSet.cs : Build any custom URI Resolver class derived from standart XmlUrlResolver, and add it after

runtimeconfig class should be able to resolve conflicts in this 3 entries.

Just to be sure I understand this correctly, are you saying that this is already possible? Or is this an additional (perhaps similar to the original) argument to implement a custom runtime config class? If so, maybe I'll open a new issue / feature request. I feel that a good temporary solution has been offered by Jeff, but the structural solution seems to be missing at the moment.

@ValeriyTitov

This comment has been minimized.

Copy link

commented Nov 6, 2018

This additional argument to implement a custom runtime config class.
For now, Gepsio waiting for 404 http error before load files from local path.

I was telling you just to check those moments i mentioned above.
For me, Gepsio fail to load XBRL instance until i fix path in all 3 methods.
If you fixed only one method, probably your XBRL instance not fully loaded/working.

@ValeriyTitov

This comment has been minimized.

Copy link

commented Nov 6, 2018

You can add folowing code in each method just to check which files being loaded and do they realy exsits ?

Console.WriteLine(Path)
Console.ReadKey()
...
public class MyXmlUrlResolver : XmlUrlResolver
 {
   public override Uri ResolveUri(Uri baseUri, string relativeUri)
     {
       Console.WriteLine(baseUri +" "+relativeUri);
       Console.ReadKey();
       return base.ResolveUri(baseUri, relativeUri);
     }
  }
...
thisSchemaSet.XmlResolver = new MyXmlUrlResolver();
@JeffFerguson

This comment has been minimized.

Copy link
Owner Author

commented Nov 6, 2018

We currently solved this issue by providing an optional schemaLocation parameter to the XbrlDocument class. Is this something you would consider adding to the repository?

Yes, I believe that that is something that I should be able to do pretty easily. I might open up a separate issue for it, but, in either case, I should have some time to look into this over the weekend.

Thanks for the suggestion!

@Wimsteppe

This comment has been minimized.

Copy link

commented Nov 16, 2018

We currently solved this issue by providing an optional schemaLocation parameter to the XbrlDocument class. Is this something you would consider adding to the repository?

Yes, I believe that that is something that I should be able to do pretty easily. I might open up a separate issue for it, but, in either case, I should have some time to look into this over the weekend.

Thanks for the suggestion!

That's great, looking forward to see the implementation! :)

@ValeriyTitov Yeah, the solution "kinda" works currently using our local code, but it would still be awesome (and way more cleaner) to have the functionality available in the online repository, so we can use the NuGet straight from the store and not be excluded from future functionalities / bugfixes.

@Wimsteppe

This comment has been minimized.

Copy link

commented Dec 3, 2018

@JeffFerguson Hi Jeff, any luck with the suggestion? Need any help with the implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.