Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JSON string array of ids instead of ordered_aggregation linked list #5778

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

cjcolvar
Copy link
Member

@cjcolvar cjcolvar commented Apr 12, 2024

Create a new property on media objects which store a JSON serialized array of ids: section_list. This property and related methods, sections and section_ids, replace nearly all calls to master_files, ordered_master_files, and indexed_master_files methods along with their *_ids counterparts. The only calls left to these methods are in the migration to the new property and one instance in MediaObjectsController that I wasn't able to figure out how to remove.

In order to migrate a media object it needs to be saved thus a full migration of all media objects might take too long to be done during a maintenance window. For this reason migrating is done in-place and lazily. When a media object is loaded from fedora it is migrated in-memory and made final on save. Subsequent loads from fedora use the already migrated section_list and should not attempt to read from list_source. After some time a portion of the repository should have been migrated naturally and a follow-up background job can be setup to force migration of the remainder of media objects. A future version of Avalon that requires the migration is complete might remove the ordered_aggregation and master_files association entirely.

While implementing this change I wanted to ensure that edits to section_list/sections/section_ids are in-memory only until saving. Previously changes to master_files and ordered_master_files were persisted immediately. In order to avoid that I put the synchronization of section_list and master_files into a before_save callback. This did cause some complications because that synchronization can lead to saves when objects were in odd states in the test suite. This was a reason for doing the switch of all references to master_files to sections throughout the code base. The only place I wasn't able to resolve this was in MediaObjectsController#update_media_object. One thing to be aware of is that because of this synchronization you may need to reload the media object after saving in order to get the updated master_files loaded. (See MasterFile#media_object= for an example of this.)

Review notes:

  • The main changes occur in app/models/media_object.rb and app/models/master_file.rb
  • I tried to leave in-line comments in important places to explain why certain things were necessary. Please add questions or suggestions if anything is unclear or could be improved.

It probably makes sense to squash this PR when merging.

Resolves #5749

Future work:

  • Test performance on avalon-dev on loading/editing many-sectioned items
  • Test migration on existing data

@cjcolvar cjcolvar force-pushed the ordered_aggregation_fire branch 2 times, most recently from f71b199 to 5828937 Compare May 22, 2024 19:33
…ids and rename variables where appropriate as well
@cjcolvar cjcolvar marked this pull request as ready for review May 23, 2024 13:02
@cjcolvar cjcolvar changed the title [Proof of concept] Use JSON string array of ids instead of ordered_aggregation linked list Use JSON string array of ids instead of ordered_aggregation linked list May 23, 2024
Copy link
Contributor

@masaball masaball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this looks good. I do have a couple clarifying questions and some small suggestions.

@@ -49,6 +49,7 @@
rights_statement { ['http://rightsstatements.org/vocab/InC-EDU/1.0/'] }
terms_of_use { [ 'Terms of Use: Be kind. Rewind.' ] }
series { [Faker::Lorem.word] }
master_files { [] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be sections { [] }?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

I don't think so? I think it is needed to initialize that association for tests that were depending on it. But I could try switching it and see if the tests still pass.

app/models/master_file.rb Show resolved Hide resolved
Comment on lines 29 to +30
@attrs[:section_id] = solr_document["section_id_ssim"]
@attrs[:section_ids] = solr_document["section_id_ssim"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be separate fields? Seems weird to have a field with a singular name that contains the same data as a field with the pluralized name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they do because adding @attrs[:section_ids] allows this presenter to proxy the new MediaObject#section_ids method. Alternatively this presenter could delegate section_ids to section_id or implement a new method which returns section_id. Do you think either of those would make more sense? I didn't want to remove section_id from the attributes because we had already added it so I know it is used elsewhere although I can't remember exactly where.

spec/models/media_object_spec.rb Show resolved Hide resolved
Comment on lines 1264 to 1272
it 'migrates to section_list without interaction' do
expect(media_object.section_list).to eq nil
mo = MediaObject.find(media_object.id)
mo.save
mo = MediaObject.find(media_object.id)
expect(mo.section_list).not_to eq nil
expect(mo.sections).to eq [section, section2]
expect(mo.section_ids).to eq [section.id, section2.id]
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what exactly this test case is for? Is it just that section_list fields are persisted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that this test only shows that section_list persists. I added a test to show that when section_list is set a MediaObject won't use ordered_master_files anymore. This new test plus the one above which shows that MediaObject will read from ordered_master_files to populate section_list when it isn't set should probably cover all cases so I removed this test.

Comment on lines 1258 to 1261
expect(mo.ordered_master_files.to_a).to eq [section, section2]
expect(mo.ordered_master_file_ids).to eq [section.id, section2.id]
expect(mo.sections).to eq [section, section2]
expect(mo.section_ids).to eq [section.id, section2.id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of explicitly testing for all those specific arrays, would it make sense to condense this to equality checks?

Suggested change
expect(mo.ordered_master_files.to_a).to eq [section, section2]
expect(mo.ordered_master_file_ids).to eq [section.id, section2.id]
expect(mo.sections).to eq [section, section2]
expect(mo.section_ids).to eq [section.id, section2.id]
expect(mo.sections).to eq mo.ordered_master_files.to_a
expect(mo.section_ids).to eq mo.ordered_master_file_ids

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, but I think I'd also like to keep one of the expectations to ensure that mo.ordered_master_files is actually populated the way I'm expecting.

app/models/media_object.rb Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants