Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urnresolver: Uniform Resource Names - URN Resolver #13

Open
fititnt opened this issue Mar 5, 2021 · 9 comments
Open

urnresolver: Uniform Resource Names - URN Resolver #13

fititnt opened this issue Mar 5, 2021 · 9 comments
Labels
epic proof-of-concept-already-exist Do exist proof of concept (or better) for this issue

Comments

@fititnt
Copy link
Member

fititnt commented Mar 5, 2021

Quick links


Captura de tela de 2021-03-05 09-36-02


"A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable.[1] URNs cannot be used to directly locate an item and need not be resolvable, as they are simply templates that another parser may use to find an item." -- Wikipedia

As part of reference the datasets (temporary internal name: hdataset) from different groups (temporary internal name: hsilo) makes sense to have some way to padronize naming. And URNs, even if is complicated to implement in practice, at least could serve as hint for humans simply avoid using whatever is their creative idea at the moment. (This actually is more important if we're implementing localized translations as part of the [meta issue] hxlm #11 with equal equivalent between translations).

fititnt added a commit that referenced this issue Mar 5, 2021
…tead of script (more strict control, force to be python script, not generic system scrypt)
fititnt added a commit that referenced this issue Mar 5, 2021
fititnt added a commit that referenced this issue Mar 5, 2021
@fititnt
Copy link
Member Author

fititnt commented Mar 5, 2021

Because of this topic, we will need to create some sort of local vault for permanent storage.

Captura de tela de 2021-03-05 15-58-25

One idea about the namespace urn:data: is, while some more complex namespaces may actually do whatever they want (including using full unicode), we could have some base functionality to for an query like urnresolver urn:data:un:locode if already have files on local computer, return the exact URI "$HOME/.config/hxlm/urn/data/un/locode/locode.csv" instead of return error and suggest the documentation https://unece.org/trade/uncefact/unlocode.

I think that in fact, instead of "return error" if the user does not force return error, but allow the urnresolver return ANOTHER urn (like urn:data-i:un:locode), and that urn would return the information like https://unece.org/trade/uncefact/unlocode, this could help with direct usage via command line.

Note: I know that urn:data:un:locode could "ideally" be something like urn:data:un:unece:locode, but the "UN/LOCODE" is so famous, that could worth the idea of make some types of aliases.

fititnt added a commit that referenced this issue Mar 5, 2021
…te a formal ABNF (like the ISO URN, RFC5141); but ANTLR seems more friendly
fititnt added a commit that referenced this issue Mar 6, 2021
… buld full parser to avoid regex hell; I think maybe the early versions could be a bunch of effective if-elses
fititnt added a commit that referenced this issue Mar 6, 2021
…r before need to resort to full grammar checking; this at least could help locallized namespaces get what matter for then, while keeping the start of the 'urn:data' predictable
fititnt added a commit that referenced this issue Mar 6, 2021
…domain names as quick namespace; full unicode support need more testing (fallbacking to GenericUrnHtype instead of DataUrnHtype)
@fititnt
Copy link
Member Author

fititnt commented Mar 6, 2021

This is the current result. As for baseline URN processing strategy (likely to be the "organization" inside an already namespaced country/territory) could be both an single identifier or (since I'm not sure if most people in the middle of urgency would agree with something) then use an domain name itself.

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ ./tests/test_core_urn.py
DataUrnHtype(value='urn:data--i:un:locode') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'un', 'bpln': 'locode', 'nss': 'un:locode'}
DataUrnHtype(value='URN:DATA--I:UN:LOCODE') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'URN', 'bpln': 'DATA--I', 'nss': 'URN:DATA--I:UN:LOCODE'}
DataUrnHtype(value='urn:data:un:locode') {'nid': 'data', 'nid_attr': 'd', 'bpgp': 'un', 'bpln': 'locode', 'nss': 'un:locode'}
DataUrnHtype(value='urn:data--i:xz:hxlcplp:fod:bool') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'xz', 'bpln': 'hxlcplp', 'nss': 'xz:hxlcplp:fod:bool'}
DataUrnHtype(value='urn:data:br:__saude.gov.br__:covid-19-vacinacao') {'nid': 'data', 'nid_attr': 'd', 'bpgp': 'br', 'bpln': 'saude.gov.br', 'bpln_isdn': True, 'nss': 'br:__saude.gov.br__:covid-19-vacinacao'}
DataUrnHtype(value='urn:data--i:cn:__中国.icom.museum__:test') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'cn', 'bpln': '中国.icom.museum', 'bpln_isdn': True, 'nss': 'cn:__中国.icom.museum__:test'}
DataUrnHtype(value='urn:data--i:ru:__россия.иком.museum__:test') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'ru', 'bpln': 'россия.иком.museum', 'bpln_isdn': True, 'nss': 'ru:__россия.иком.museum__:test'}
DataUrnHtype(value='urn:data--i:eg:__مصر.icom.museum__:test') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'eg', 'bpln': 'مصر.icom.museum', 'bpln_isdn': True, 'nss': 'eg:__مصر.icom.museum__:test'}

The idea is the urnresolver be able to (if do exist one already prepared dataset on a path available on local filesystem) based on most comon URNs even if implementers do not create something very specific for the country or the organuzation, at least the default strategy would allow people working with datasets some place to put the files.

If the default is good enough, while documentations could always require the humans manually translate, at least the default resolver could make documentations directly usable!

fititnt added a commit that referenced this issue Mar 6, 2021
fititnt added a commit that referenced this issue Mar 7, 2021
…aming files to be just urn.csv,urn.json,urn.yml (this is more an suffix, since users could in fact search for entire paths)
fititnt added a commit that referenced this issue Mar 7, 2021
…raft of get_urn_resolver_remote() & get_urn_resolver_remote_authenticated()
fititnt added a commit that referenced this issue Mar 7, 2021
fititnt added a commit that referenced this issue Mar 7, 2021
…isCI, removed hardcoded path; renamed urn:data:xz:hxl:std:core:hashtag (std inspired on ISO RFC) to urn:data:xz:hxl:standard:core:hashtag (maybe core is not need?)
fititnt added a commit that referenced this issue Mar 7, 2021
… like when have several sources or URNs, allow urnresolver filter sources (at first just use file names)
fititnt added a commit that referenced this issue Mar 7, 2021
fititnt added a commit that referenced this issue Mar 8, 2021
… did not customized yet), file urnresolver-default.urn.yml
@fititnt
Copy link
Member Author

fititnt commented Mar 8, 2021

The current version of HXL-Data-Science-file-formats is v0.7.3.

Captura de tela de 2021-03-08 10-12-13

I think that the tools for URN resolving worth an different group from HXL2 topic. In fact, the URN resolving often would be applied to content that still not HXLated yet or deal with issues also related to HXL, like very sensitive content (like how to name URLs that may be protected just by randomness, not by access control?)

fititnt added a commit that referenced this issue Mar 8, 2021
fititnt added a commit that referenced this issue Mar 8, 2021
fititnt added a commit that referenced this issue Apr 25, 2021
fititnt added a commit that referenced this issue Apr 26, 2021
fititnt added a commit that referenced this issue Apr 28, 2021
…cem:sexum:binarium & urn:data:xz:eticaai:ontologia:codicem:sexum:non-binarium
fititnt added a commit that referenced this issue Apr 28, 2021
fititnt added a commit that referenced this issue Apr 28, 2021
…:xz:hxl:standard:core:attribute, urn:data:xz:hxl:standard:master-vocabulary, urn:data:xz:hxl:cplp:hxl2tab
@fititnt
Copy link
Member Author

fititnt commented Apr 28, 2021

I believe we should add a few more parameters on the urn.yml files. All other options are new ones (most of then already are not implemented parameters on the current urnresolver cli toon to expose these features, but this could be done soon.

One change is the source now is fontem.

The urn.yml format

Old format

Example 1

# Trivia:
#  - "fontem"
#    - https://en.wiktionary.org/wiki/fons#Latin
#  - "auxilium"
#    - https://en.wiktionary.org/wiki/auxilium#Latin
#  - "dēscrīptiōnem"
#    - https://en.wiktionary.org/wiki/descriptio#Latin
#  - "explānandum"
#    - https://en.wiktionary.org/wiki/explano#Latin



- urn: "urn:data:xz:hxl:standard:core:hashtag"
  descriptionem:
    eng-Latn: HXL/CSV version of the HXL Standard core hashtags.
  auxilium:
    - https://data.humdata.org/dataset/hxl-core-schemas
  fontem:
    - ontologia/codicem/hxl/standard/core/hashtag.hxl.csv
    - https://proxy.hxlstandard.org/data.csv?dest=data_edit&strip-headers=on&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI%2Fpub%3Fgid%3D319251406%26single%3Dtrue%26output%3Dcsv
    - https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/pub?gid=319251406&single=true&output=csv

Example 2

- urn: "urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica"
  descriptionem:
    eng-Latn: >
      Table with code references for body parts, in special
      Terminologia Anatomica (TA). Can be used with other ontologies and
      to transform for a few natural languages descriptions.
  explanandum:
    # Good references:
    - +v_fipat_ta2
    - +v_fipat_ta98_id
    - +v_fipat_ta98_latin
    # Generic references:
    - +v_wikidata
    - +v_fi_yso
    - +v_fr_universalis
    - +v_it_bncf
    - +v_jp_ndl
    - +v_uberon
    - +v_uk_britannica
    - +v_us_jstor
    - +v_us_mag
    - +v_us_mesh
    - +v_us_umls_cui
  auxilium:
    - https://github.com/HXL-CPLP/forum/issues/44
    - https://www4.unifr.ch/ifaa/Public/EntryPage/TA98%20Tree/HelpPage/TA98%20Latin%20Page%20Help.pdf
  exemplum:
    # Since terminologia-anatomica.hxl.csv 1,8mb, we only deploy a sample
    - ontologia/codicem/anatomiam/terminologia-anatomica-EXEMPLUM.hxl.csv
  fontem:
    # run ontologia/codicem/anatomiam/make.sh to get terminologia-anatomica.hxl.csv
    # or let the urnresolver download from live URNs
    - ontologia/codicem/anatomiam/terminologia-anatomica.hxl.csv
    - https://proxy.hxlstandard.org/data/b02a5f/download/HXL_CPLP-FOD_medicinae-legalis_humana-corpus.csv
    - https://docs.google.com/spreadsheets/d/10axnLpDNtAc8Bh921dz5XPXCwo0FUXRcKS6-ermiu5w/edit#gid=1622293684

Old format

# URNResolver v1.2.1
# hdp-toolchain v0.8.7.2

# @see https://data.humdata.org/dataset/hxl-core-schemas
- urn: "urn:data:xz:hxl:standard:core:hashtag"
  source:
    - ontologia/codicem/hxl/standard/core/hashtag.hxl.csv
    - https://proxy.hxlstandard.org/data.csv?dest=data_edit&strip-headers=on&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI%2Fpub%3Fgid%3D319251406%26single%3Dtrue%26output%3Dcsv
    - https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/pub?gid=319251406&single=true&output=csv

fititnt added a commit that referenced this issue Apr 28, 2021
…ions descriptionem, auxilium, explanandum, exemplum
@fititnt
Copy link
Member Author

fititnt commented Apr 28, 2021

Added reverse search, like urnresolver -?? +v_iso15924 or urnresolver -?? country+code+v_iso2

I believe we will need to build some table that could give a hint that some codes, like country+code+v_iso2, could also mean other variants of ISO 3166-1. This could help more automated search of what something means.

# fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver -?? +v_iso15924
urn:data:xz:eticaai:ontologia:codicem:linguam
# fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver -?? country+code+v_iso2
urn:data:xz:eticaai:ontologia:codicem:locum
# fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver --urn-explanandum-list
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_fipat_ta2
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_fipat_ta98_id
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_fipat_ta98_latin
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_wikidata
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_fi_yso
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_fr_universalis
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_it_bncf
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_jp_ndl
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_uberon
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_uk_britannica
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_us_jstor
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_us_mag
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_us_mesh
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica	+v_us_umls_cui
urn:data:xz:eticaai:ontologia:codicem:sexum:binarium	+v_iso5218
urn:data:xz:eticaai:ontologia:codicem:sexum:binarium	+v_iso5218_extended
urn:data:xz:eticaai:ontologia:codicem:sexum:binarium	+v_fipat_ta98_latin
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7	+v_iso5218
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7	+v_iso5218_extended
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7	+v_us_cdc_sex
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7	+v_un_icao_sex
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7	+v_us_NAACCR
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7	+v_us_census_sex
urn:data:xz:eticaai:ontologia:codicem:sexum:non-binarium	+lat_codices_anonyma
urn:data:xz:eticaai:ontologia:codicem:sexum:non-binarium	+v_iso5218_extended
urn:data:xz:eticaai:ontologia:codicem:linguam	+v_iso15924
urn:data:xz:eticaai:ontologia:codicem:locum	country+code+v_iso2
urn:data:xz:eticaai:ontologia:codicem:locum	country+code+v_iso3
urn:data:xz:eticaai:ontologia:codicem:locum	+v_hrinfo_country
urn:data:xz:eticaai:ontologia:codicem:locum	+v_reliefweb
urn:data:xz:eticaai:ontologia:codicem:locum	country+code+v_reliefweb
# fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver -? urn:data:xz:hxl:standard:core:attribute
[
    {
        "urn": "urn:data:xz:hxl:standard:core:attribute",
        "descriptionem": {
            "eng-Latn": "HXL/CSV version of the HXL Standard core attributes."
        },
        "auxilium": [
            "https://data.humdata.org/dataset/hxl-core-schemas"
        ],
        "fontem": [
            "ontologia/codicem/hxl/standard/core/hashtag.hxl.csv",
            "https://proxy.hxlstandard.org/data.csv?dest=data_view&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI%2Fpub%3Fgid%3D1810309357%26single%3Dtrue%26output%3Dcsv&strip-headers=on",
            "https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/pub?gid=1810309357&single=true&output=csv"
        ],
        "urnref": "urnresolver-default.urn.yml"
    }
]

@sabas
Copy link

sabas commented Apr 29, 2021

👋 Thank you!

fititnt added a commit that referenced this issue Apr 30, 2021
fititnt added a commit that referenced this issue Apr 30, 2021
fititnt added a commit that referenced this issue Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic proof-of-concept-already-exist Do exist proof of concept (or better) for this issue
Projects
None yet
Development

No branches or pull requests

2 participants