Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding aids with nested collections index 'successfully', but then cause page crashes #1533

Open
archivalGrysbok opened this issue May 8, 2024 · 2 comments

Comments

@archivalGrysbok
Copy link

It is possible to index an ead that has nested collections. This causes page crashes.
See collectionInCollection.txt example file.

Expected behavior

I expect either the page to load or the file to not index.

Actual behavior

The indexer will index this file without error. If you navigate to the finding aid or it shows up in search results, the page will not load. The log file says:

I, [2024-05-08T09:51:40.077867 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b] Started GET "/caoSearch/catalog?f%5Blevel%5D%5B%5D=Collection&f%5Brepository%5D%5B%5D=Unicorn+Test+Repository%3A+where+weird+data+comes+to+life%21" for 10.135.171.48 at 2024-05-08 09:51:40 -0400
I, [2024-05-08T09:51:40.078836 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b] Processing by CatalogController#index as HTML
I, [2024-05-08T09:51:40.078904 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   Parameters: {"f"=>{"level"=>["Collection"], "repository"=>["Unicorn Test Repository: where weird data comes to life!"]}}
I, [2024-05-08T09:51:40.209573 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   Rendered vendor/bundle/ruby/3.1.0/bundler/gems/arclight-8852569afb4c/app/views/catalog/index.html.erb within layouts/blacklight (Duration: 64.5ms | Allocations: 48783)
I, [2024-05-08T09:51:40.209702 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   Rendered layout vendor/bundle/ruby/3.1.0/gems/blacklight-8.1.0/app/views/layouts/blacklight.html.erb (Duration: 64.7ms | Allocations: 48812)
I, [2024-05-08T09:51:40.209902 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b] Completed 500 Internal Server Error in 131ms (ActiveRecord: 1.5ms | Allocations: 90449)
F, [2024-05-08T09:51:40.212721 #1584162] FATAL -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] ActionView::Template::Error (id must be present for all documents and components):
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     4: <% view_config = local_assigns[:view_config] || blacklight_config&.view_config(document_index_view_type) %>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     5: <div <%= 'id="documents"'.html_safe unless grouped? %> class="al-document-listings documents-<%= view_config&.key || document_index_view_type %>">
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     6:   <% document_presenters = documents.map { |doc| document_presenter(doc) } -%>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     7:   <%= render view_config.document_component.with_collection(document_presenters, partials: view_config.partials, counter_offset: @response&.start || 0) %>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     8: </div>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) lib/arclight/normalized_id.rb:23:in `normalize'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) lib/arclight/normalized_id.rb:15:in `to_s'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) app/models/arclight/parents.rb:19:in `eadid'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) app/models/arclight/parents.rb:25:in `block in as_parents'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) app/models/arclight/parents.rb:25:in `map'

Steps to reproduce

  1. Index attached file.
  2. Navigate to the repository page for that finding aid, or search for it.
@archivalGrysbok archivalGrysbok changed the title Collections within a collection index, but then cause page crashes Finding aids with nested collections index 'successfully', but then cause page crashes May 8, 2024
@marlo-longley
Copy link
Contributor

Hi @archivalGrysbok -- we encountered this at Stanford too. You can see our workaround here: https://github.com/sul-dlss/stanford-arclight/blob/7245b905c50f5165cd46a4686bce9e24a3830493/app/models/solr_document.rb#L22 - we secretly interpret any lower-level "collections" as "series".

Is it invalid EAD to have two level="collection" components in the same file? Is this a common thing in real archival data?

It is possible we could apply our Stanford solution to Core. Or, in my opinion having the indexer fail would be preferable to "successful" indexing with broken pages. I think this needs more discussion!

@archivalGrysbok
Copy link
Author

I don't think it's invalid, just not a best practice. ArchivesSpace allows collections within a collection.

I've only seen two repositories in the CAO do it and both were small. We updated their finding aids so they'd index and let the repositories know about the problem, so they could fix it on their end going forward. I think they trying to describe all their collections in a single finding aid.

My concern with secretly calling lower-level collections "series" is that then they wouldn't be findable as collections. Then again, I don't know that anyone is trying to do that. To me, having something findable (even if not entirely as described in the .xml) is preferable to dropping it on the floor.

I'll look into applying the Stanford fix to my test server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants