Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata-free conformance and additional columns #50

Closed
4 tasks
Anaphory opened this issue Nov 13, 2017 · 2 comments
Closed
4 tasks

Metadata-free conformance and additional columns #50

Anaphory opened this issue Nov 13, 2017 · 2 comments

Comments

@Anaphory
Copy link

Anaphory commented Nov 13, 2017

The standard says

A dataset can be CLDF conformant without providing a separate metadata description file. To do so, the dataset must follow the default specification for the appropriate module regarding

I had assumed I could add additional columns which just would not have well-defined semantics and only string as possible datatype. But when I tried to load

ID,Language_ID,Parameter_ID,Form,Segments,Comment,Source,Cognate_Set
0,abai1240,feature1,form,,,,0
1,afad1236,feature1,form,,,,1
2,ambu1247,feature1,form,,,,0

I can get the Cognate_Set column by using iterdict, but I cannot ask the Dataset object whether that column exists.

  • Clarify in the CLDF specs whether additional columns are permitted in mdf conformance (I'll raise a separate issue there)
  • If they are permitted, sniff the table header to add them to the table spec
  • In any case, unify column existence between tableSpec and iterdicts
  • Fix cldf validate to enforce the specs

(Or convince me why the current state is as it should be – as has happened often enough – and a bit of documentation about it somewhere.)

@xrotwang
Copy link
Contributor

The scenario you describe isn't a problem of "metadata-free conformance". Even with a description file, iterdicts may return dictionaries with more keys than listed in the tableSchema. So I guess there are two levels of support for additional columns:

  • Implicit: You'd have to inspect the first dictionary returned by iterdicts
  • Explicit: Whatever is listed as non-virtual column in the description

I wouldn't want to force-add all columns to the description, because this would require reading (parts of) the data file right away and because this would interfere with the expliciteness of the description. So, in terms of the ZEN of CLDF I'd say:

  • Whatever isn't listed in the description shouldn't be accessed by CLDF-aware software.
  • If the default descriptions for metadata-free conformance don't list what you want to access, create an explicit more inclusive description first, which can subsequently also serve as documentation for your code.

@Anaphory
Copy link
Author

Anaphory commented Nov 13, 2017

In that case, I think it's good enough to document this behaviour in the iterdicts docstring (which is completely missing at the moment: clld/clldutils#60) and in the CLDF specs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants