URI vs Slug vs String #18

Robsteranium · 2018-02-28T08:01:51Z

The new loading architecture suggests to me a possible solution to the stringy-identifiers problem:

if the column configuration identifies the datatype as string:
- if the column configuration includes a value-template, then cell value should be treated as ready for that template (i.e. already slugged or a code without spaces)
- else if the column configuration points at a component property that specifies a qb:codeList then use the value in that cell to lookup a code by it's label
- else treat the value as a string literal (e.g. observation label)
else if the datatype is number:
- parse the string as a number (as per csvw)
else if the datatype is xsd:anyURI
- parse the string as a URI (as per csvw), I think this would also allow curies if the csv2rdf process already recognises the prefix

Given that multiple column names could theoretically be used to populate the same component, this gives us quite a bit of flexibility. For example, for specifying a reference period you might provide three columns all having sdmx-dim:refPeriod in their property_template configuration, but each with a different value_template:

"Year": "http://reference.data.gov.uk/id/year/{year}" accepting values like "2018"
"Government Year":"http://reference.data.gov.uk/id/government-year/{government-year}" accepting values like "2017-2018"
"Reference Period": "http://reference.data.gov.uk/id/{reference_period}" (a generic fallback) accepting values like "government-quarter/2009-2010/Q1"

Thus the uploader would indicate which kind of date they'd provided by using the appropriate column heading. They could then provide more than one kind of interval by either splitting the upload or by providing multiple (non-overlapping) columns in the table (i.e. some rows having Years, others Government Years but none both).

One implication of this is that we wouldn't be slugging anything in the observations input csv! We could still need to do so as part of the code-list pipeline (i.e. to create an skos:notation where none was provided).

The text was updated successfully, but these errors were encountered:

Robsteranium · 2018-02-28T08:35:46Z

One possible problem with (blindly) passing values into templates is that we don't have context-sensitive validation of the inputs - i.e. it would be possible to provide invalid values like "20180" to the "Year" column (technically http://reference.data.gov.uk/id/year/20180 is valid and does resolve, but we know from the context that it's likely a mistake) or "2015-2018" to "Government Year".

We may be able to solve this using datatypes. Although, if we need to coin our own (xsd:gYear wouldn't catch either example), this may make the serialisation less portable.

Alternatively we could ignore the problem at this level and instead pick it up with a later validation. We will in any case validate that dimension-values are recognised (i.e. ASK that ?obs ?dim/?p ?o) which would highlight these cases. We haven't actually designed a table2qb version of the intervals pipeline (it wouldn't fit the codelists pipeline as we want to build start/end instants etc) - perhaps this will just need to be bespoke and include it's own validation.

Note that cases with qb:codeLists could raise an validation error if the code couldn't be found (bearing a skos:prefLabel or rdfs:label with) the string provided. We wouldn't need to wait until a later (rdf-based) validation and could reject the uploaded csv immediately.

Adds HMRC Overseas Trade example and starts removing hard-coded config specific to the regional-trade example as per #22. The `is-dimension?` and `is-attribute?` sets are now drawn from the conventions - i.e. those columns with the respective component attachment property. The `values` seq now uses an equivalent `is-value?` set which includes those columns for which the conventions don't specify a component attachment (i.e. if the column is not a dimension, measure, or attribute then it must be a value). `standardise-measure` is now just `title->name` `slugize-columns` is replaced by `transform-columns` which is configured by a new convention: `value_transformation` which also allows us to specify e.g. a `replace-symbols` transform (#18 may later allow us to derive these instead of setting them explicitly).

Robsteranium mentioned this issue Feb 28, 2018

Remove example-specific incidental config (derive from columns.csv) #22

Closed

Robsteranium mentioned this issue Mar 26, 2018

Wiring-up table2qb, csv2rdf.clj and grafter #20

Open

Robsteranium mentioned this issue Apr 3, 2018

Resolve approach to URI slugging #9

Closed

BillSwirrl closed this as completed Jul 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URI vs Slug vs String #18

URI vs Slug vs String #18

Robsteranium commented Feb 28, 2018 •

edited

Loading

Robsteranium commented Feb 28, 2018

URI vs Slug vs String #18

URI vs Slug vs String #18

Comments

Robsteranium commented Feb 28, 2018 • edited Loading

Robsteranium commented Feb 28, 2018

Robsteranium commented Feb 28, 2018 •

edited

Loading