Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ADR to propose new field for host institution #133

Merged
merged 1 commit into from
Feb 26, 2024

Conversation

jonavellecuerdo
Copy link
Contributor

Purpose and background context

This PR introduces a new Architecture Decision Record (ADR) that proposes to add a new institution field to the TIMDEX data model.

This discussion was kicked off when seeking to understand what TIMDEX fields to use for displaying the host institution on GeoData's "result" and "full record" pages for non-MIT GIS resources.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

YES

What are the relevant tickets?

@jonavellecuerdo jonavellecuerdo self-assigned this Feb 23, 2024
@jonavellecuerdo jonavellecuerdo changed the title Create ADR to propose new field for host instution Create ADR to propose new field for host institution Feb 23, 2024
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these flurry of ADRs and data model discussions, I found @JPrevost's comments helpful in thinking about data model solutions and change over time. Specifically, how we might have an institution field at the GraphQL layer, and while it may point to TIMDEX.institution for now, if we choose the move where that data is stored that's just a mapping change at the GraphQL layer. It gets more complicated when you introduce objects and filtering by sub-fields.

For this reason, I approve this ADR. Creating a top level field called institution is simple and meets the current need. Thanks for the write-up @jonavellecuerdo!

Copy link

@jazairi jazairi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the more straightforward solution and the easiest to work with on the UI side. Thanks for the write-up, Jonavelle!

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Graham and Adam, good writeup!

docs/adrs/0004-field-for-institution-information.md Outdated Show resolved Hide resolved

For GIS records, the UI can directly reference the values from `TimdexRecord.institution` to display the host institution for a resource on [Geodata](https://geodata.libraries.mit.edu/).

For other TIMDEX sources, given the decision outlined above, there should be little to no effect. It could be beneficial to revisit these sources and see if there is a field we can map to that was previously ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly nervous that we don't know yet of other sources will use this field. I'm okay if this is only a useful field for GIS at this time, but the nervousness comes from not knowing if we want to move other things here that indeed make the label of it less good (i.e. Institution is perfect for GIS, but metadata_provider would work for both GIS and additional sources if we considered mappings more broadly before adding this field. Or not!).

I'd really prefer if we could do the investigation to be able to take a stronger stance here. Either "we don't think this field will be used by any of our current sources so there is no risk" or "we think we'd map fields X from MARC and Y from DSpace and they work perfectly with this plan".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for articulating this @JPrevost, I agree.

Reading this ADR, it felt like a top level, single string field was flexibile enough that we could both modify how it's stored in the data model, and how it's retrieved in the API layer, as needed. But if it's immediate goal is to support the GIS data, as a thought experiment, why don't we call it gis_data_resource_provider? Obviously, that's too specific; so what then do we mean by institution? are we expecting other sources to potentially use this?

Though I already approved, I think the ADR would benefit from discussion about what institution means, and how other sources may, or may not, use it. Going to leave some inline comments, now that I reread it through this lens.


## Decision

**A new field called "institution" is added to the TIMDEX data model** (i.e., `TimdexRecord.institution`). This field will denote--as its name implies--the institution or organization that provides access to the resource described in the TIMDEX record. This field exists at the top level of the TIMDEX data model, making it easily accessible for referencing in the UI or querying. This solution also avoids using existing fields (discussed below) in ways that obscure the TIMDEX data model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a closer reread, I do think this does a good job at proposing what this field "means":

"This field will denote--as its name implies--the institution or organization that provides access to the resource described in the TIMDEX record"

But, if we are thinking of the reusability or naming of this new field, perhaps that "access" word is important. Is it possible this is really about access primarily? If so, what if the field were something along the lines of access_provider? Or even just provider?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think provider is an interesting concept to consider here. For GIS it seems to map clearly because the provider of the metadata and provider of the resource are generally the same from the TIMDEX perspective.

It seems to get messier for subscription resources that might appear in Alma (which might either help us refine the intent of this field... even if our finding is that this field is not to be used for vended resources :) ). Would the provider be the vendor that supplies the content or the subscriber to the content on behalf of the user. i.e. MIT subscribes to an Ebsco database so our users can access the content. Is MIT the provider or is ebsco the provider. As a consumer service, TIMDEX API should likely take the user perspective?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a consumer service, and a discovery one at that, I suppose I'd expect provider to indicate what instituion/organization/company is providing physical or digital access to the resource.

...i.e. MIT subscribes to an Ebsco database so our users can access the content. Is MIT the provider or is ebsco the provider.

I like this example because I think a) provider="Ebsco" would be ideal, but b) I think that level of provider granularity may vary considerably from source-to-source. Examples:

  • Libguides / Research Database
    • I've confirmed that OAI-PMH records offer no indication of the "leaf" node of access
    • Research Databases: might consider a default of provider="MIT"
    • LibGuides: we could say provider="Springshare"
  • DSpace
    • feels like we could default to provider="MIT" for all records
  • Alma
    • probably the most complex, where records may have varying degrees of granularity here
  • ArchivesSpace
    • perhaps another good example where provider="MIT" default makes sense

Don't want to dig too far into the logic, as provider is just a consideration at the moment. But unless we explicitly know -- from a consumer / discovery POV -- maybe a default of provider="MIT" wouldn't be inaccurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may lean more towards provider=unknown for things like Alma/LibGuides that we aren't sure on... and then take that unknown data to the metadata team and ask if they could make that clearer in the records or how we could better detect it/supplement records from non-alma sources, etc. The reason I'd suggest not defaulting to "MIT" is that would make it very difficult to understand which are actually MIT and which are 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, makes sense. And maybe to get a bit technical, we could do provider=NULL? Understanding that the absence of a field value means we don't know for certain?

And clarification: when you say "LibGuides" above, do you mean "ResearchDatabases" (AZ list)? As for "LibGuides" (they are distinct sources in TIMDEX), it feels like we do know that the provider is always Springshare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I read misread and thus misrepresented. LibGuides is probably either MITL or Springshare. The content is curated/created by MITL but using the Springshare platform.

Research Databases is what I meant to be replying to with the "unknown" or your follow on idea of just not setting it (i.e. null).

**Transformer:** [Ead](https://www.loc.gov/ead/tglib/appendix_d.html)
**Source(s):** MIT ArchivesSpace (aspace)
* [`<publicationstmt><publisher>`|`<bibliography><bibref><imprint><publisher>`](https://www.loc.gov/ead/tglib/elements/publisher.html): When used in the publication statement, the name of the party responsible for issuing or distributing the encoded finding aid. Often this party is the same corporate body identified in the `<repository>` element in the finding aid. When used in a Bibliographic Reference <bibref>, the name of the party issuing a monograph or other bibliographic work cited in the finding aid.
* [`repository`](https://www.loc.gov/ead/tglib/elements/repository.html): The institution or agency responsible for providing intellectual access to the materials being described. Although the repository providing intellectual access usually also has physical custody over the materials, this is not always the case. When it is clear that the physical custodian does not provide intellectual access, use <physloc> to identify the custodian and <repository> to designate the intellectual caretaker. When a distinction cannot be made, assume that the custodian of the physical objects also provides intellectual access to them and should be recognized as the <repository>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if mapping EAD repository to this new institution/provider field might make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think EAD's definition for repository lines up with what we've been defining as a provider, so I'm okay with this as well!

@jonavellecuerdo
Copy link
Contributor Author

Hi folks! The last three commits are in response @JPrevost 's comments above. In short, updates are made to the following section:

  1. Decision: Revise the name of the proposed field to provider instead, where provider refers to "the institution or organization that provides access to the resource described in the TIMDEX record".
  2. Context: Exploration of existing fields in non-GIS TIMDEX sources that convey "host institution" in response to @JPrevost 's comment. Main takeaway is that majority of these sources view the "publisher" as an entity that can be the provider of the data as well.
  3. Considerations: Suggested some additional mappings for non-GIS TIMDEX sources.

Let me know what you think!

Copy link

@jazairi jazairi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about what I think is an outdated instance of TimdexRecord.institution. That small change aside, I'm still 👍 on this. Thanks for adding this context about non-geo sources!

docs/adrs/0004-field-for-institution-information.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of "hopefully this is a widely applicable and longterm solution, but acknowledging future information may require reworking it", I approve of this approach to add a new top level provider field.

I think that it will be important to think deeply in Transmogrifier source transformations about the difference between a resource's "publisher" and this proposed "provider", but they do strike me as meaningfully distinct. And as touched on in comments, err on the side of setting this field only when confidently known or explicitly stated in the record.

In support of this decision, I would also note the field name parity with the Aardvark field schema_provider_s, which is defined as:

"To clarify which organization holds the resource or acts as the custodian for the metadata record and to help users understand which resources they can access."

They acknowledge this subtle but important dual purpose of the field where it may denote the institution that holds the actual resource, or just the custodian of the metadata record. Projecting this "provider" mentality to TIMDEX, I would imagine we utilize either where appropriate for this field, but could lean into the "holds the resource" side of things. Also noting that their definition of a "provider" very clearly does not imply who created the resource; it's very much access/discovery oriented.

docs/adrs/0004-field-for-institution-information.md Outdated Show resolved Hide resolved
@jonavellecuerdo jonavellecuerdo force-pushed the GDT-205-adr-institution-information branch from 8e8b497 to 669e7b4 Compare February 26, 2024 21:59
@jonavellecuerdo jonavellecuerdo force-pushed the GDT-205-adr-institution-information branch from 669e7b4 to 6d31ee0 Compare February 26, 2024 22:01
@jonavellecuerdo jonavellecuerdo merged commit 285a02d into main Feb 26, 2024
5 checks passed
@jonavellecuerdo jonavellecuerdo deleted the GDT-205-adr-institution-information branch February 26, 2024 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants