Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging two documents with digital content footnotes for the same source results in unique constraint violation #1394

Closed
6 tasks done
blms opened this issue Jun 15, 2023 · 4 comments
Assignees
Labels
🐛 bug Something isn't working performant Tasks for or taken on by Performant

Comments

@blms
Copy link
Contributor

blms commented Jun 15, 2023

testing notes (QA)

On the QA site admin:

  • Check PGPID 29344 and PGPID 4474. Note their digital edition footnotes' differences ("location" and "notes" fields) and review their transcriptions.
    • After reviewing their transcriptions, try merging PGPID 29344 into PGPID 4474. There should not be an error.
    • The two transcriptions should now be combined on the merged document's Khan transcription, with separate annotations (blocks of transcription) for each.
    • In the Footnotes section, the "location" and "notes" fields on the single merged digital edition footnote should capture the data from both of the footnotes pre-merge.
  • If possible, try to merge two other documents that both have transcription content on the same source—but this time, with no data in the "location" or "notes" fields in the digital edition footnotes on either document. You may need to create such transcriptions if you can't find them. (This is the situation with Ksenia's merge on PGPID 5200, but that's already been completed.)
    • All transcription content should be present, none deleted or missing.

dev notes

There is an interesting edge case where one document, PGPID 4474, has a transcription from Geoffrey Khan. Another PGPID, 29344, was created and refers to the same fragment/images. This second PGPID also has the transcription sourced from Khan, but with emendations by Yusuf.

An attempt to merge these two documents into 4474 results in a unique constraint violation:

psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "one_digital_edition_per_document_and_source"
DETAIL:  Key (source_id, object_id, content_type_id)=(135, 4474, 8) already exists.

The solution here is either to discard Khan's original transcription and replace it with the emended one, or to take Yusuf's emendations and consider them a separate source, reassign the source, then perform the merge.

But it is possible this error could happen again, so it would be helpful to show a more explanatory error message and give the user those two options to consider.


  • also check that "digital content" footnotes don't get merged with non-digital ones during document merge. always create a new footnote!
@blms blms added the 🐛 bug Something isn't working label Jun 15, 2023
@kseniaryzhova
Copy link

@blms I merged PGPIDs 5200 and 5201 - they were duplicates, with the same transcription (from the same source) split verso and recto (verso on one PGPID, recto on the other). So the text was different on each one, but the source was the exact same. I was able to merge but one of the transcriptions disappeared.

@blms
Copy link
Contributor Author

blms commented Jun 23, 2023

Confirmed locally that the footnote and annotation from the merged document are both deleted, so it would seem in some cases the footnote is deleted instead of an attempted reassignment. Thanks Ksenia for documenting! Working on a fix now.

@blms blms self-assigned this Jun 23, 2023
@blms
Copy link
Contributor Author

blms commented Jun 23, 2023

Looks like both bugs are a result of this logic:

geniza/geniza/corpus/models.py

Lines 1362 to 1368 in 65660de

def _merge_footnotes(self, doc):
# combine footnotes; footnote logic for merge_with
for footnote in doc.footnotes.all():
# check for match; add new footnote if not a match
equiv_fn = self.footnotes.includes_footnote(footnote)
if not equiv_fn:
self.footnotes.add(footnote)

def includes_footnote(self, other):
"""Check if the current queryset includes a match for the
specified footnotes. Matches are made by comparing content source,
location, document relation type, and notes.
Returns the matching object if there was one, or False if not."""
compare_fields = ["source", "location", "notes"]
for fn in self.all():
if (
all(getattr(fn, val) == getattr(other, val) for val in compare_fields)
# NOTE: fn.doc_relation seems to be different on queryset footnote
# than newly created object; check by display value to avoid problems
and fn.get_doc_relation_display() == other.get_doc_relation_display()
):
return fn
return False

It looks like Marina's failed because the notes field didn't match. When adding notes to one of those two footnotes for 5200 and 5201, I see that my merge also fails locally with the same error. Whereas when the notes are identical, they are treated as "the same footnote" and one is discarded.

@blms blms added the performant Tasks for or taken on by Performant label Jul 6, 2023
blms added a commit that referenced this issue Jul 20, 2023
Also copy any missing attributes (notes, location, url)
ref #1394
blms added a commit that referenced this issue Jul 20, 2023
blms added a commit that referenced this issue Jul 21, 2023
@blms blms added the 🗜️ awaiting testing Implemented and ready to be tested label Aug 9, 2023
@kseniaryzhova
Copy link

@blms the doubled transcription content (with the slight differences) is confusing, but I'm not sure there's a way around it (perhaps creating a different entry for the source and thus a different transcription dropdown entirely)? Closing, as this works, just need to bring it up at the next PM meeting.

@blms blms removed the 🗜️ awaiting testing Implemented and ready to be tested label Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working performant Tasks for or taken on by Performant
Projects
None yet
Development

No branches or pull requests

2 participants