Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does normalized_date exist? #1336

Open
jcoyne opened this issue Nov 18, 2022 · 6 comments
Open

Why does normalized_date exist? #1336

jcoyne opened this issue Nov 18, 2022 · 6 comments

Comments

@jcoyne
Copy link
Member

jcoyne commented Nov 18, 2022

I see we copy normalized_date_ssm to the date_sort field, but aside from that it doesn't appear to be used. Do we need it?

@jcoyne jcoyne changed the title Why does normalied_date exist? Why does normalized_date exist? Nov 18, 2022
@marlo-longley
Copy link
Contributor

marlo-longley commented Dec 1, 2022

@jcoyne Recently we've seen that normalized_date_ssm is appended to the normalized_title and guessed that this was desired by archivists/people working with historical content. We allow it to be customized now via #1344. Not sure if that answers!

@jcoyne
Copy link
Member Author

jcoyne commented Dec 2, 2022

I see that, but does it need to be stored in its own solr field?

@seanaery
Copy link
Contributor

seanaery commented Dec 2, 2022

I think the normalized date is useful to store in its own Solr field for downstream applications to use if desired even if it is smushed into the normalized title by default in core. A couple examples I know of...

UAlbany has a nice collection list on a "repository" show page that splits the date out from the title: e.g., https://archives.albany.edu/description/repositories/apap and https://github.com/UAlbanyArchives/arclight-UAlbany/blob/main/app/views/arclight/repositories/show.html.erb#L45

Duke has a CSV exporter from bookmarks that we use to make digitization work orders and isolating the date helps us map date metadata from archival components to digitized object metadata for our digital repository.

@jcoyne
Copy link
Member Author

jcoyne commented Dec 2, 2022

@seanaery I think it would be better if we could either put those full features in Arclight or completely remove them from Arclight. Currently these features are sort of hanging between two worlds.

@seanaery
Copy link
Contributor

seanaery commented Dec 2, 2022

@jcoyne That's a fair assessment and I appreciate that you're thinking about only storing in Solr the data that the core application expects to use. We are flexible at Duke and would just extend the traject rules locally to capture the normalized date atomically/additionally in its own Solr field if that disappears from core.

Caveat: I'm not an archivist. But the way I understand it is, this was developed as it is because proper titles of archival components (collections, too, but especially components) are often generic and repeated within a collection ("Letters," "Correspondence," "Newspaper Clippings"). Appending the dates in the places where the title typically appears gives the described entity some valuable distinction and context, e.g., in the html page <title>, in a list of sibling components, etc.

@gwiedeman
Copy link
Contributor

There is duplication here and I think normalized_title_ssm is actually the field that is unnecessary. Yeah, they'll be lots of components like "Minutes, 1990" and "Minutes, 1991" that are distinguished by date and Arclight currently uses normalized_title_ssm to display this, but I feel like title in date could just be appended in the template, no? That way, both the title and date are still stored in a structured way in Solr. This would more easily facilitate #292, which is the dream.

I believe the reason for normalized_title_ssm if you're searching "minutes 1991" in this case, but I'm not sure if including that in the index as a distinct field actually aids relevancy. If it does, then I probably shouldn't have to be stored.

The downside to this is that more logic would have to be in the template to handle date types like inclusive and bulk dates. But to me it makes sense to have well-structured data in Solr and have that logic be in the template rather than the data harvesting pipeline.

I'm also not sure that "normalized" is the best descriptor here for what Arclight is doing. Typically archivists use two date fields, a required well-structured date and what ASpace calls a date "expression" that is optional. That way you can have a publication that has a displayed date expression of "Fall 2002" that also has a date like "2002-09" for sorting. Prior to Arclight 1.0 at lease, Arclight used unitdate_ssm as a list of well-structured dates and normalized_date_ssm for the date expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants