Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC 2.5 and 3.2 profiles: access CV information #104

Closed
cessda-bitbucket-importer opened this issue Dec 16, 2021 · 35 comments · Fixed by #151
Closed

CDC 2.5 and 3.2 profiles: access CV information #104

cessda-bitbucket-importer opened this issue Dec 16, 2021 · 35 comments · Fixed by #151
Assignees

Comments

@cessda-bitbucket-importer

Original report on BitBucket by Taina Jääskeläinen.


CESSDA will have an access vocabulary to which SPs will map their access categories. This will allow CDC users to narrow down their searches to open access data.

So far neither DDI-C nor DDI-L have an element for this but Darren had an idea of how this information could be incorporated into the profiles. Important to have.

  • Having truly open data (just download) double data downloads at FSD, so users really are looking for open access data.

CV Element promised for DDI 2.6.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


Could dc: Access rights or dc: Rights elements be used?

@cessda-bitbucket-importer
Copy link
Author

Original comment by Darren Bell (GitHub: darrenbell2).


From: Wendy Thomas [[mailto:email address removed](mailto:email address removed)]
Sent: 27 January 2022 17:52
To: Bell, Darren S <[email address removed](mailto:email address removed)>
Subject: Re: Use of DC in Codebook

Ok first there is a place for data access and metadata access in 2.6

codebook/stdyDscr/dataAcces/typeOfAccess

codebook/stdyDscr/metadataAcces/typeOfAccess

 

as for Dublin Core it is the last contained element in "citation"

 

On Wed, Jan 26, 2022 at 4:38 AM Bell, Darren S <[email address removed](mailto:email address removed)> wrote:

Hi Wendy – sorry if this is a dumb question but Taina is talking about using dc:rights in a codebook 2.5 document.  I can see that codebook.xsd imports the dcterms schema:

<xs:import namespace="http://purl.org/dc/terms/" schemaLocation="dcterms.xsd"/>

which in turn imports the dc schema:

<xs:import namespace="http://purl.org/dc/elements/1.1/" schemaLocation="dc.xsd"/>

 

But how (and where) can I use “dc:rights” in a codebook xml document so that can be validated properly?  Should I do a namespace and schema declaration for dc.xsd in the root element of the codebook xml document?

 

Thanks, Darren

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


Wendy said:

Lifecycle:

TypeOfAccess is an element of AccessType and is expressed in Access, DefaultAccess, and SampleFrameAccess and Access is available in Item. DDILIFE-3699 addresses this. (Taina: So DDI 3.3. has an element TypeOfAccess but DDI 3.2 does not seem to have it)

Codebook:

The ability to describe a typeOfAccess for data and metadata has been added to Version 2.6

dcterms:AccessRights is available and I have talked with Darren about how this could be used and how to associate a controlled vocabulary with this item

__________________

Another suggestion from a CESSDA deliverable D3 Specification for interoperable access conditions in CDC:

The technical capacities of OAI-PMH endpoints must also be examined, configuring OAI-PMH headers12 to set up a True or False filter seems promising.

http://www.openarchives.org/OAI/openarchivesprotocol.html#Record

@cessda-bitbucket-importer
Copy link
Author

Original comment by Former user.


I am not convinced that OAI setspec is a good approach to solve this, it shifts metadata information over to logical organisation of records and is not interoperable beyond CESSDA

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


CDC Upgrade project thought that while waiting for DDI 2.6 which solves this, DDI 2.5 could use

  • codeBook/stdyDscr/dataAccs/useStmt/conditions
  • codeBook/stdyDscr/dataAccs/useStmt/conditions/@‌ID

I don’t know if archives already have some other content in the element.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


Any ideas about this one?

@cessda-bitbucket-importer
Copy link
Author

Original comment by Morten.Jakobsen (GitHub: MortenSikt).


We have had some talks on this, and the proposed position in codebook seems like an acceptable solution and is agreed.

But there are still some difficulties in order to achieve the desired result in CDC.

Not sure they are relevant to this Bitbucket issue itself, but I’ll note them down for anyone interested.

  1. As pointed out by Katja, @‌ID attributes needs to be unique within the metadata record. This will cause an issue for multilingual records. How the @‌ID attributes should defined depend on two questions

    1. Is it only duo-lingual with SP-language and English language being transferred?

      1. If yes, we can do CESSDA-en and CESSDA-sp as the @‌ID
    2. Is it more than 2 languages?

      1. If yes, we need to do language specific IDs for all languages being sent.
  2. Risk that the element is used by SPs for different information. The element itself can be repeated, so there is no risk that we force SPs to delete/move metadata away from the element. As a solution they can introduce a repetion of the element with the desired information for CDC. But the question is how can we sort this with CMV validation, fixed value node config is currently being examined to validate based on information in the @‌ID attribute, but the unanswered question is:

    1. For repeatable elements, will the test succeed if only one of the elements adhere to the required structure in the DDI profile?

      1. If yes, there is no need for a change in what being sent from SP apart from the addition of the metadata element.
      2. If not, we have to ask SPs who use this element for other information to exclude it some way before sending their records to CESSDA.
  3. For 1 a & b, the question remains if it is possible in the Fixed Value Node config to support matches based on a list, instead of one exact term.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


It would be also possible to do like this

Avoin

Öppen

Open

OR

use only a not-language-specific code

O

@cessda-bitbucket-importer
Copy link
Author

Original comment by Morten.Jakobsen (GitHub: MortenSikt).


Oh, then I have misunderstood it. I thought that each instance of the language would require its own @‌ID. This makes it easier, thank you for this.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


If you have same ID for two or more elements the XML document is not valid.

This is not valid:

….

Avoin

Open

Öppen

….

Having language code attached to string CESSDA_Data_Access (if and only if there is one term from CESSDA Data Access CV used for one study) make IDs valid.

This is valid:

….

Avoin

Open

Öppen

….

@cessda-bitbucket-importer
Copy link
Author

Original comment by Morten.Jakobsen (GitHub: MortenSikt).


My concern with the languages was based on this beeing a freetext field, so I assume they have to be specified no matter what, but I might be wrong. My codebook knowledge is limited.

But in trying to imitate a CV here it would make the most sense that the content of the conditions element is a Code, not a descriptive term. Then it would be easier for CDC to handle it, since its one value instead of many different they need to resolve?

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


Yes, I agree, it might be better to use the code value only. I’m attaching the CESSDA Access Interoperability Vocabulary to show the code values. It is not published yet, first a review round is to be made.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


  <div class="preview-container wiki-content"><!-- loaded via ajax --></div>
  <div class="mask"></div>
</div>

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


In DDI-C, it is not mandatory to use language tags for freetext fields (if I checked the DDI-L specification correctly, it is not mandatory in it either). If the DDI-C file is monolingual, the language is usually specified in codeBook root element and all elements/attributes inherit the language of the root element. In multilingual files elements have their own language tags and actually in most cases all contents have language either through their own language tag or inherited. So Morten you are correct, that language is defined somewhere. Kuha2 seems to add language tag for all elements so having one element without language might cause a problem.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


Hmmm… that complicates things. Would you say Katja then that it would be best to use the descriptive term with language tags or the code value with language tags?

@cessda-bitbucket-importer
Copy link
Author

Original comment by Morten.Jakobsen (GitHub: MortenSikt).


Can’t we just exclude language tags from the DDI Profile then for this element? Then CMV/CDC just ignores the information in the cases where it is present, or am I misunderstanding this?

edit: I mean that it would ignore the language tag, not the conditions element or @‌ID attribute

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


I asked this from Toni who has built the CESSDA metadata aggregator and Kuha2. The information, if the data is openly available or not is useful for other catalogues as well, not only for CDC. E.g. BY-COVID portal (https://www.covid19dataportal.org/) will in future include metadata of the COVID-19 related datasets appearing in CDC. Toni said that using ID in this way does not work in CESSDA metadata aggregator (or for Kuha2) because there one OAI-PMH-XML document has many entries and ID must be unique inside one XML document so misusing ID for separating CESSDA field from other does not work.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


How about misusing //conditions/@‌elementVersion ? It seems to be xs:string so it could always get value “CESSDA” and it does not need to be unique.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Morten.Jakobsen (GitHub: MortenSikt).


Well that’s not good news. I don’t really mind what we use as long as it is something that can work at this point. But is there not a higher risk that this attribute is already being used for versioning?

I was looking at //conditions/@‌ddiCodebookUrn but I don’t know if this includes information from the @‌ID attribute the same way LifecycleUrns do?

But there are not many options left here, so @‌elementVersion works for me..

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


I think that codeBook users do not largely use @‌elementVersion for versioning (it has been available only since DDI2.5). @‌ddiCodebookUrn contents must be of type xs:anyURI. In any case, some other attribute than ID needs to be misused.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Former user.


Sorry to add to the complications, but OpenAIRE requires ‘Rights’, see https://guidelines.openaire.eu/en/latest/data/field_rights.html#rightsuri-ma

This has the following allowed values:

  • info:eu-repo/semantics/closedAccess
  • info:eu-repo/semantics/embargoedAccess
  • info:eu-repo/semantics/restrictedAccess
  • info:eu-repo/semantics/openAccess

According to MDO’s current schema mapping (https://doi.org/10.5281/zenodo.5614658), the information should come from /codeBook/stdyDscr/dataAccs/useStmt/restrctn

So the solution of this issue should include the specification for that mapping, i.e. how the OpenAIRE CV is to be used in the transformation.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


I think that current MDO - OpenAIRE mapping does not include OpenAIRE 16.1. RightsURI at all. I suppose that OpenAIRE 16. Rights is mapped with CDC codeBook/stdyDscr/dataAccs/useStmt/restrctn because both of these are freetext fields.

In OpenAIRE 16. Rights and 16.1 RightsURI are not mandatory/required; occurences of the 16. Rights is 0-n and 16.1 RightsURI 0-1. (OpenAIRE 2. Creator, which is mandatory/required may have 1-n occurences).

Do we need CESSDA’s own data access CV at all, if there is a need to use OpenAIRE CV?

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


@sharonbolton @MortenSikt What do you think about replacing the CESSDA CV with OpenAire vocabulary, would this work? It would of course mean asking SPs to do the exercise of mapping their access categories to it.

It solves the issue of embargoed access but the definition given for restricted access is rather problematic for CESSDA since it simplifies all types of restrictions to getting an email address in the last sentence. The definitions seem to be focused on articles and other publications.

Is someone from MDO in contact with OpenAire? Would be good to discuss a possible amendments of term definitions?

Or adapt the CESSDA CV to be more compatible with OpenAire by using closedAccess, embargoedAccess, restrictedAccess and openAccess as code values but adapt the definitions to fit the CESSDA case better?

OpenAire Access Rights vocabulary

SAME AS the e-prints accessRights property in the Scholarly Works Application Profile

see: http://purl.org/eprint/accessRights/

@cessda-bitbucket-importer
Copy link
Author

Original comment by Morten.Jakobsen (GitHub: MortenSikt).


If replacing the CESSDA CV with the OpenAire vocabulary solves other issues it is probably better. To me they look as general as the ones we intended, and there should not be a problem for SPs mapping to these.

On another note, we will misuse this as katja proposed: //conditions/@‌elementVersion for specifying that the content of the element is a code for the CV for 2.5.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


Especially if the decision will be to use two classes

  1. open (openAccess)
  2. restricted (restrictedAccess)

it does not really matter which vocabulary is used.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


Taken from Morten’s email:

As for the vagueness of the OpenAire CV definition. The report will include guidelines for SPs on mapping to the CV and this will use the definitions stated in the CESSDA CV for CESSDA Data Access Interoperability Vocabulary. In short, all data with access that fall within “Free download. May require registration to the system and/or accepting the terms of use online to gain access to the data. No restriction on the type of use.” Will be mapped to openAccess. Any other restrictions that are not within that scope, the SPs should map to restrictedAccess.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Taina Jääskeläinen.


@MortenSikt OK, with those mapping instructions I guess it would work and CESSDA could then use the OpenAire vocabulary.

The misuse of the DDI 2.5 element would be only temporary. The draft DDI 2.6 specification will be under review from next week. As Darren said, it contains ‘TypeOfDataAcdess’. In the most optimal scenario, the end-of-the-year DDI2 profiles could already contain that element. However, this depends on the review cycle.

But is there a suitable element in DDI 3.2.? There is in DDI 3.3.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Morten.Jakobsen (GitHub: MortenSikt).


Should not be to hard to add it in for 2.6 when that arrives.

There is not a suitable element under Archive/Access for DDI 3.2, it is possible to specify the CV using UserAttributePair(https://ddialliance.org/Specification/DDI-Lifecycle/3.2/XMLSchema/FieldLevelDocumentation/schemas/reusable_xsd/elements/UserAttributePair.html), this can basically be used to specify whatever you want.

But since a proper element exists in DDI 3.3 I am writing the specification for that standard.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


I and Esra checked DDI3.2. We suggest to use ddi:DDIInstance/s:StudyUnit/a:Archive/a:ArchiveSpecific/a:Item/a:Access/a:AccessTypeName/r:String in DDI3.2 for saying if access is “openAccess” or “restrictedAccess” and attribute //a:AccessTypeName/@‌context to be used for vocabulary name. We know that it is not a perfect solution, but we did not find better one.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Katja Moilanen.


Conclusion:

CDC will use the same Access Rights vocabulary as OpenAIRE. The name of the vocabulary is “info:eu-repo-Access-Terms vocabulary”. Only the codes “restrictedAccess” and “openAccess” will be used.

DDI2.5:
codeBook/stdyDscr/dataAccs/useStmt/conditions for the code and codeBook/stdyDscr/dataAccs/useStmt/conditions/@‌elementVersion for vocabulary name
e.g.
<conditions elementVersion=”info:eu-repo-Access-Terms vocabulary”>openAccess

DDI2.6:
codeBook/stdyDscr/dataAccs/typeOfAccess for the code and codeBook/stdyDscr/dataAccs/typeOfAccess/@‌vocabURI for vocabulary name
e.g.
<typeOfAccess vocabURI=”info:eu-repo-Access-Terms vocabulary”>openAccess

DDI3.2:
ddi:DDIInstance/s:StudyUnit/a:Archive/a:ArchiveSpecific/a:Item/a:Access/a:AccessTypeName/r:String for the code and ddi:DDIInstance/s:StudyUnit/a:Archive/a:ArchiveSpecific/a:Item/a:Access/a:AccessTypeName/@‌context for vocabulary name
e.g.
<a:AccessTypeName context=”info:eu-repo-Access-Terms vocabulary”>
<r:String>openAccess</r:String>
</a:AccessTypeName>

DDI3.3:
ddi:DDIInstance/s:StudyUnit/a:Archive/a:ArchiveSpecific/a:Item/a:Access/a:TypeOfAccess for the code and ddi:DDIInstance/s:StudyUnit/a:Archive/a:ArchiveSpecific/a:Item/a:Access/a:TypeOfAccess/@controlledVocabularyName for vocabulary name
e.g.
<a:TypeOfAccess controlledVocabularyName=”info:eu-repo-Access-Terms vocabulary”>openAccess</a:TypeOfAccess>

Guidance for SPs how to use info:eu-repo-Access-Terms vocabulary vocabulary:
All data with access that fall within “Free download. May require registration to the system and/or accepting the terms of use online to gain access to the data. No restriction on the type of use.” will be mapped to openAccess. Any other restrictions that are not within that scope, the SPs should map to restrictedAccess.

@cessda-bitbucket-importer
Copy link
Author

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


@kpapag

@darrenbell2
Copy link
Contributor

Implemented following in 2.5, 2.5_mono, 1.2.2 and 1.2.2_mono profiles:



<pr:Used xpath="/codeBook/stdyDscr/dataAccs/useStmt/conditions" isRequired="true">
<r:Description>
<r:Content>Required: Mandatory</r:Content>
<r:Content>ElementType: Content element</r:Content>
<r:Content>ElementRepeatable: Yes</r:Content>
<r:Content>Usage: Controlled description of the data access (open access vs. restricted access).
CDC will use the same Access Rights vocabulary as OpenAIRE.
The name of the vocabulary is “info:eu-repo-Access-Terms vocabulary” - see
http://purl.org/eu-repo/semantics/#info-eu-repo-AccessRights
Only the codes “restrictedAccess” and “openAccess” will be used.</r:Content>
<r:Content>CDC_UI_Label: Open/Closed</r:Content>
<r:Content>CMM_Mapping: 1.4.1</r:Content>
</r:Description>
</pr:Used>
<pr:Used xpath="/codeBook/stdyDscr/dataAccs/useStmt/conditions/@elementVersion"
defaultValue="info:eu-repo-Access-Terms vocabulary" fixedValue="true" isRequired="false">
<r:Description>
<r:Content>Required: Mandatory if 'conditions' element is present</r:Content>
<r:Content>ElementType: Attribute</r:Content>
<r:Content>Usage: Always 'info:eu-repo-Access-Terms vocabulary'.</r:Content>
<r:Content>CMM_Mapping: 1.4.1</r:Content>
</r:Description>
pr:Instructions
<r:Content>
]]>
</r:Content>
</pr:Instructions>
</pr:Used>

@darrenbell2
Copy link
Contributor

Implemented following in CDC 3.2 profile:



<pr:Used xpath="/ddi:DDIInstance/s:StudyUnit/a:Archive/a:ArchiveSpecific/a:Item/a:Access/a:AccessTypeName/r:String" isRequired="true">
<r:Description>
<r:Content>Required: Mandatory</r:Content>
<r:Content>ElementType: Content element</r:Content>
<r:Content>ElementRepeatable: Yes</r:Content>
<r:Content>Usage: Controlled description of the data access (open access vs. restricted access).
CDC will use the same Access Rights vocabulary as OpenAIRE.
The name of the vocabulary is “info:eu-repo-Access-Terms vocabulary” - see
http://purl.org/eu-repo/semantics/#info-eu-repo-AccessRights
Only the codes “restrictedAccess” and “openAccess” will be used.</r:Content>
<r:Content>CDC_UI_Label: Open/Closed</r:Content>
<r:Content>CMM_Mapping: 1.4.1</r:Content>
</r:Description>
</pr:Used>
<pr:Used xpath="ddi:DDIInstance/s:StudyUnit/a:Archive/a:ArchiveSpecific/a:Item/a:Access/a:AccessTypeName/@‌context"
defaultValue="info:eu-repo-Access-Terms vocabulary" fixedValue="true" isRequired="false">
<r:Description>
<r:Content>Required: Mandatory if '../a:AccessTypeName' element is present</r:Content>
<r:Content>ElementType: Attribute</r:Content>
<r:Content>Usage: Always 'info:eu-repo-Access-Terms vocabulary'.</r:Content>
<r:Content>CMM_Mapping: 1.4.1</r:Content>
</r:Description>
pr:Instructions
<r:Content>
]]>
</r:Content>
</pr:Instructions>
</pr:Used>

@darrenbell2
Copy link
Contributor

Will close as new 2.6 and 3.3. profiles will be based in 2.5/3.2 and hence will incorporate this information.

@matthew-morris-cessda
Copy link
Contributor

Reopening, as this hasn't been merged into main (or mdo-d6)

darrenbell2 added a commit that referenced this issue Sep 4, 2023
… Issue #64, Issue #65, Issue #66, Issue #67, Issue #68, Issue #69, Issue #70, Issue #71, Issue #103, Issue #104, Issue #110,

Issue #149. Added CMM Mapping annotation to all elements in all profiles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants