Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest payload naming standard #39

Open
Sveino opened this issue May 12, 2023 · 3 comments
Open

Manifest payload naming standard #39

Sveino opened this issue May 12, 2023 · 3 comments

Comments

@Sveino
Copy link
Collaborator

Sveino commented May 12, 2023

CIM in general shall not include any naming standard. The main idea to include manifest and DCAT is to avoid implementation to rely on naming standard. However, as part of testing and where user interaction on the technical level is needed there is a need for a naming recommendation.
The shall follow the same logic that cim:IdentifiedObject.mRID and cim:IdentifiedObject.name where mRID is the machine interpreted identification and name is the user "identification".
A naming standard can also be useful for simple file based archiving tools that is based on the file name.

The updated naming standard need to cover the needs from CGMES profiles in CGM and TYNDP process in addition to the CSA, CCC, OPC and STA processes.

The current CGM names standard is document in: Quality of CGMES datasets and calculations v3.3
3.2 FILE NAME AND FILE HEADER
The following mask is to be used to have a valid file name:
(snip)
image

Example from QoCDC:

  • 20180118T0930Z_1D_APG_SSH_001.xml
  • 20180117T2230Z_1D_APG_EQ_001.xml
  • 20180117T2230Z__APG_EQ_001.xml
  • 20180118T1130Z_1D_TSCNET-EU_SV_001.xml
  • 20180118T1130Z_1D_TSCNET-EU-APG_SSH_001.xml

The item in the naming standard need to be found in the header so that tools can generate it based on an information model and that is consistent with the content of the payload.

<dcat:startTime>_<dcterms:publisher>_<prov:wasGeneratedBy>_[dcat:version]

dcat:startTime:
Taken from the dcat:Dataset - if there are multiple dataset with different startTime the prov:generateedAtTime for the manifest (collection) is used.

dcterms:publisher:
Taken from the dcat:Dataset - if there are multiple dataset with different publisher the publisher of the manifest (collection) are used.

prov:wasGeneratedBy
Taken from the dcat:Dataset - if there are multiple dataset with different wasGeneratedBy the wasGeneratedBy of the manifest (collection) are used.
prov:wasGeneratedBy is an association to the abstract prov:Activity that produced the prov:Entity.
The name include:

  • Process Type: CGM, TYNDP etc
  • Time Horizon: Year-ahead, Month-ahead etc
  • Run
  • Iteration
  • Profile
    E.g. for the following instance file the relevant activity are relevant:
    EQ/RA -> CGM, CGM1Y, TYNDP
    SSH/TP/SV -> IN, TYNDP, 1Y, 1M, 1W, 6...1D, ID
    RAS -> IN, TYNDP, 1Y, 1M, 1W, 6...1D, ID

_[dcat:version]:
This is referring to the dcat:Dataset where a new dcat:Dataset is replacing, make the previous version not valid any longer, by a new version that has the same validity period. The naming should follow semantic versioning, e.g. https://semver.org/ where _[1.0.0] is the default and is optional to use. Other version than the default must be included in the name.
E.g. The same EQ is exchange for the TYNDP:

  • 20230101_APG_TYNDP-EQ.xml
  • 20230101_APG_TYNDP-EQ_[1.0.0].xml

Example for CGM:

  • 20180118T0930Z_1D_APG_SSH_001.xml -> 20180118T0930Z_APG_CGM-1D-SSH.xml
  • 0180117T2230Z_1D_APG_EQ_001.xml -> 0180117T2230Z_APG_CGM-1D-EQ.xml
  • 20180117T2230Z__APG_EQ_001.xml -> 20180117T2230Z_APG_CGM-EQ.xml
  • 20180118T1130Z_1D_TSCNET-EU_SV_001.xml -> 20180118T1130Z_TSCNET-EU_CGM-1D-SV.xml
  • 20180118T1130Z_1D_TSCNET-EU-APG_SSH_001.xml -> 20180118T1130Z_TSCNET-EU-APG_CGM-1D-SSH.xml

Example for TYNDP:

  • 20230101_APG_TYNDP-EQ.xml

Example for CSA:

  • 20230512T2230Z_APG_CGM-RA.xml
  • 20230512T2230Z_APG_CGM-1D-r1-RAS.xml
@Haigutus
Copy link
Owner

I would propose a rule, that filename can contain only data that can be extracted from file header.

Reasoning:

  1. Currently some metadata is added to filename, that is not present inside the file and then the filename parsing becomes mandatory process. To avoid this in future we should force the rule and if additional metadata is needed, then first file header/manifest needs to be extended

  2. Filename can be automatically created at the moment of storage by extracting relevant metadata from the file header

@Sveino
Copy link
Collaborator Author

Sveino commented May 12, 2023

@Haigutus Yes, definitely - I was hoping this would come clear out of the text above. In the discussion with CSA it is clear that we need to have a name - may above proposal is based on this. Making sure that we can cover the current requirement. The next step would be to come up with a proposal that used our current header data.

@Sveino
Copy link
Collaborator Author

Sveino commented May 19, 2023

Updated above that the _[dcat:version] is referring to dcat:Dataset and not dcat:Distribution. It now refers to when a dataset is replaced by a new version with the same metadata, e.g. start and end validitiy period. dcat:version will follow semantic versioning, e.g. https://semver.org/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants