Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of DOIs #16

Closed
stansmith907 opened this issue Aug 13, 2014 · 16 comments
Closed

Handling of DOIs #16

stansmith907 opened this issue Aug 13, 2014 · 16 comments

Comments

@stansmith907
Copy link
Contributor

Places where doi occurs:

  • onlineResource{}
  • additionalIdentifier{}
  • and may be placed in "identifier" of resourceIdentifier{}
onlineResource{} occurs in
  • conatact[]
  • resourceInfo>citation (but not displayed in 19115-2)
  • resourceInfo>resourceIdentifier (MD_Identifier under citation in ISO)
  • resourceInfo>extent>geographicElement>properties>assignedID[]
  • distribution>online
  • associatedResource>resourceCitation
  • associatedResource>resourceIdentifier (MD_Identifier under citation in ISO)
  • associatedResource>metadataCitation
  • additionalDocumentation
additionalIdentifier{} occurs in
  • resourceInfo>citation
  • resourceInfo>keywords (as citation)
  • resourceInfo>taxonomy (as citation)
  • resourceInfo>dataQualityInfo>lineage>source>citation
  • associatedResource>resourceCitation
  • associatedResource>metadataCitation
  • additionalDocumentation (as citation)
identifier of MD_Identifier occurs in
  • resourceInfo>resourceIdentifier
  • resourceInfo>extent>geographicElement>properties>assignedId[]
  • associatedResource>resourceIdentifier

Rules:

  • onlineResource in 19115-1 and -2 is a URL
  • onlineResource describes 'other' resources
  • MD_Identifier specifies identifiers by which the resource is known
  • additionalIdentifiers are URNs (e,g, urn:isbn:123654 or urn:doi:10/1256.321475266)

Recommendations:

for onlineResource ...

[ ] change uri to url to match ISO requirement
[ ] drop doi - people can make the url a resolvable doi if that is the preferred method of accessing the resource

for additionalIdentifier ...

[ ] no change - specify doi as URN; ISO writer will publish doi as MD_Identifier

for resourceIdentifier ...

[ ] no change

@jlblcc
Copy link
Member

jlblcc commented Aug 13, 2014

So we're only allowing DOI, ISBN, and ISSN identifiers in citation? This means that "custom" identifiers are only supported where identifier of MD_Identifier occurs - see above. Is everyone OK with that restriction?

FYI, from doi.org:

DOI is not registered as a URN namespace, despite fulfilling all the functional requirements, since URN registration appears to offer no advantage to the DOI System.

@stansmith907
Copy link
Contributor Author

True, "custom" identifier related to the resource would only go in MD_Identifier (resourceIdentifier in adiwgJSON). The additionalIdentifier{} block just has identifiers than have common, well recognized names. We could add other identifiers later if we like. That saves space in the JSON and if they are common we can always create the MD_Identifier citation for them to meet ISO standards. But I don't think we are missing any opportunity to specify all the identifiers we need.

@dwalt
Copy link
Collaborator

dwalt commented Aug 14, 2014

To note a conversation Stan and I had, I had expressed concern about
additional identifiers needing to resolve to a URL. As an example, if we
have an agency id for a feature and want to add other agency's identifiers
aliased to that same feature, would someone want a link and other citation
info for the aliased ids, or is providing the id good enough? I think
Stan's idea is that it could be handled in that case by creating an
additional resource for the aliased id. That would work fine for me, what
do other people think?

On Wed, Aug 13, 2014 at 4:11 PM, stansmith907 notifications@github.com
wrote:

True, "custom" identifier related to the resource would only go in
MD_Identifier (resourceIdentifier in adiwgJSON). The additionalIdentifier{}
block just has identifiers than have common, well recognized names. We
could add other identifiers later if we like. That saves space in the JSON
and if they are common we can always create the MD_Identifier citation for
them to meet ISO standards. But I don't think we are missing any
opportunity to specify all the identifiers we need.


Reply to this email directly or view it on GitHub
#16 (comment)
.

@jlblcc
Copy link
Member

jlblcc commented Aug 14, 2014

I think space saved in the JSON is negligible in the larger scheme of things. For me, the issue is whether we want to maintain a "codelist" of identifier types, encode the types directly in the JSON (done currently via additionalIdentifier), or allow any identifier via an identifier[ ] array(like ISO). Personally, I find the "codelist" more appealing than using the identifier type as a property name. It's much easier to extend since we don't have to change the json to account for a new type of identifier.

In fact, this seems more appealing than the way we're currently handling resourceIdentifier, since, like MD_Identifier, there's no direct way to determine the type of identifier (I supposed you could try to infer from the authority).

{
  "additionalIdentifier": [
    {
      "code": "10.1000/182", <= any string, required
      "type": "doi", <= from codelist, recommended but not required, if present could auto-generate authority in translator
      "authority": { <= could be citation block or just a contactId, not required
        "contactId": "1"
      }
    } 
  ]
}

@dwalt
Copy link
Collaborator

dwalt commented Aug 14, 2014

I kind of like the codelist idea. This strikes me as similar to a contact's
role.

On Thu, Aug 14, 2014 at 9:57 AM, Josh Bradley notifications@github.com
wrote:

I think space saved in the JSON is negligible in the larger scheme of
things. For me, the issue is whether we want to maintain a "codelist" of
identifier types, encode the types directly in the JSON (done currently via
additionalIdentifier), or allow any identifier via an identifier[ ]
array(like ISO). Personally, I find the "codelist" more appealing than
using the identifier type as a property name. It's much easier to extend
since we don't have to change the json to account for a new type of
identifier.

In fact, this seems more appealing than the way we're currently handling
resourceIdentifier, since, like MD_Identifier, there's no direct way to
determine the type of identifier (I supposed you could try to infer
from the authority).

{
"additionalIdentifier": [
{
"code": "10.1000/182", <= any string, required
"type": "doi", <= from codelist, recommended but not required, if present could auto-generate authority in translator
"authority": { <= could be citation block or just a contactId, not required
"contactId": "1"
}
}
]}


Reply to this email directly or view it on GitHub
#16 (comment)
.

@stansmith907
Copy link
Contributor Author

Josh -
changing additionalIdentifier to be an array of object could work. However authority would need to be a citation rather than contact to fit into ISO MD_Identifier. We would could strip out ISBN and ISSN entries to their proper homes; construct a known citation for DOI; ask for minimal citation information in authority {name, date (optional), contact (optional)}; and then not create a MD_Identifier ISO record if authority were missing. I would change "code": to "identifier": to match additionalIdentifier.

If authority were a citation, think we could drop the resourceIdentifier section? "type": would need to be required but NOT restricted. The advantages are that it all fits in one place by combining additionalIdentifier and resourceIdentifier; and it becomes more clear that these identifiers all describe the main resource.

{
  "additionalIdentifier": [
    {
      "identifier": "10.1000/182", <= any string, required
      "type": "doi", <= required, not restricted
      "authority": { <= not required
        "name": "Digital Object Identifier", <= required
        "date": "2014-08-14", <= not required
        "contact": "1" <= not required
      }
    } 
  ]
}

Note: additionalResources describe OTHER resources.

@jlblcc
Copy link
Member

jlblcc commented Aug 14, 2014

If you want to use a citation, it should follow the base citation schema. I would not make title required in the JSON for identifiers, it generally contains redundant data in the context of an identifier - just auto-fill with the type or just "assigned identifier" to satisfy ISO if it's not present in the JSON. I also wouldn't require type, making it required doesn't seem to serve a purpose if it's not restricted. You either provide a value in the codelist and get the benefit(e.g. doi, isbn), or not, and get a generic identifier.

Why would you not create a MD_Identifier ISO record if authority were missing? The only required element for MD_Identifier is the code.

We could drop resourceIdentifier.

{
  "identifier": [
    {
      "identifier": "10.1000/182", <= any string, required
      "type": "doi", <= not required, not restricted
      "authority": { <= not required
        "title": "Digital Object Identifier", <= not required
        "date": [ <= not required
          {
            "date": "2013-03-13",
            "dateType": "creation"
          }
        ],
        "responsibleParty": [ <= not required
          {
              "contactId": "1",
              "role": "originator"
          }
        ]
      }
    } 
  ]
}

@stansmith907
Copy link
Contributor Author

I definitely agree that authority should follow the standard citation format, I was just posting too fast before my 11am conference call. The point I was trying to make was to include the ISO required fields plus contact but keep citation fields to a minimum. This field set makes it compatible with what we have for resourceIdentifier.

I don't think we need to auto-fill citation:title with additionalIdentifier:type. If we don't want to say who the authority is, why have an authority at all? It's not required. Just don't include it for that identifier. Of course title was only required if we do specify an authority.

Type not required, that's ok with me, it fits with the MD_Identifier standards. Type is really only used to extract the ISSN and ISBN for ISO, and otherwise enhance readability in adiwgJSON record. It does not transfer to ISO. But I think a pick list is too restrictive.

If we go this direction is everyone comfortable with dropping the resourceIdentifier section?

@jlblcc
Copy link
Member

jlblcc commented Aug 14, 2014

I think having the codelist identifies those types that have special significance and may receive special treatment by the translator. Otherwise, I'm good with dropping resourceIdentifier.

@stansmith907
Copy link
Contributor Author

Okay. Looking back at your example. I also like changing additionalIdentifier{} to identifier{}, since there is really no primary identifier anyway. The original meaning sort of got lost once the block expanded to handle all MD_Identifiers and we eliminated resourceIdentifier{}.

This means we should also drop resourceIdentifier{} from associatedResource[]. And change out the old additionalIdentifier{} block to identifier{} is all its 7 locations.

I'm still not clear on what you are suggesting we do with type:? On one hand it would be nice to control it with a codelist for known types (isbn, issn, doi). On the other it would be necessary to have it open for organization specific types (ascProject, nsfAward, nccwscProject, lccProject, "gntpId", etc.). And I don't think we can anticipate all these types.

Another possible way to approach it would be set "type": to "string" or "other" for these types and add details to the authority. This would be less useful in searching adiwgJSON metadata. But unrestricted types make searching problematic as well.

I think I lean toward just having type NOT restricted for now.

@stansmith907
Copy link
Contributor Author

If we want to codelist the type; maybe broad categories would work like [ projectId, awardNum, grant, ...]

@jlblcc
Copy link
Member

jlblcc commented Aug 14, 2014

I'm not saying that we restrict entries to the codelist using an enum constraint. Like our other codelist supported properties, it's more of a recommendation, e.g. you should use the code doi, isbn, etc. and the translator or other ADIwg entities recognize them. Users would still be able to enter custom types for their own purposes.

@stansmith907
Copy link
Contributor Author

That sounds okay to me. But I was kind of warming to the idea of a broad category code list ...
[isbn, issn, doi, projectId, awardNum, grantNum, etc.]. That might be flexible enough when the authority fills in the details of "who" and "when". That way searching the JSON would still be viable.

I'm open to either a controlled category list for type or a non-restricted type.

Everything else good to go? Anything else before we change the schema and code? From anyone?

@jlblcc
Copy link
Member

jlblcc commented Aug 14, 2014

The broad category code list sounds good, for our use. I just don't think we should enforce it in the schema.
We could potentially add a "codelist" or "cvl" JSON-schema keyword that's a URL pointing to the ADIwg code list definitions.

@stansmith907
Copy link
Contributor Author

Agreed. Don't enforce it in the schema, but we could prompt it in the on-line editor. The writer would just look for items of interest specifically to it.

Are we ready to code?

@stansmith907
Copy link
Contributor Author

After starting through implementation, one more thing:

I think we need to include an onlineResource block for identifier>authority. If we do, we will able to provide a link to an authority page for lesser know authorities such as GTN-P http://www.gtnp.org/index_e.html; and have an identify block that will also function for our "assignedId" block in extent>geographicElement>properties>assignedId.

    "identifier": [
        {
            "identifier": "",
            "type": "",
            "authority": {
                "title": "",
                "date": [
                    {
                        "date": "0000-00-00",
                        "dateType": ""
                    }
                ],
                "responsibleParty": [
                    {
                        "contactId": "",
                        "role": ""
                    }
                ],
                "onlineResource": [
                    {
                        "url": "http://thisisanexample.com",
                        "protocol": "",
                        "name": "",
                        "description": "",
                        "function": ""
                    }
                ]
            }
        }
    ],

We suggest use this same block in both citation and geographicElement>properties.

@jlblcc jlblcc changed the title Handling of doi s Handling of DOIs Aug 19, 2014
jlblcc added a commit that referenced this issue Aug 26, 2014
onlineResource{} => removed doi
additionalIdentifiers{} => changed name to identifier[]:
 -objects have identifier, type, authority{};
 -authority has title, date[], responsibleParty[], and onlineResource[]
resourceIdentifier[] => removed sections
assignedId[] => replaced with identifier[]
@jlblcc jlblcc closed this as completed Aug 29, 2014
@jlblcc jlblcc modified the milestone: pre-2.1.0 May 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants