Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicates when mixing explicit and autogenerated ids #70

Merged
merged 1 commit into from
Apr 17, 2019

Conversation

cole
Copy link
Contributor

@cole cole commented Apr 15, 2019

When some metadata section ids, but not all, are autogenerated, duplicates can easily occur. This PR changes metsrw to track ids used, and if a conflicting id is generated, retry.

I'm not totally happy with this implementation, I think if we had a more consistent metadata class structure, we could easily do this in a base class, but it's an improvement on what was there.

I've also made id_string a property (vs a method) in
bdf84b6, which is a small API change that will require updates to a few lines in Archivematica (all in archivematicaCreateMETSReingest.py). That change can be dropped if it is problematic, it just seemed like an easy improvement.

Connects to archivematica/Issues#442.

@cole cole requested a review from sevein April 15, 2019 21:55
When some ids, but not all, are autogenerated, duplicates can easily
occur. Changes metsrw to take in to account when manual ids of the same
format are specified.

Also changes id_string into a regular attribute and increments version.
@cole
Copy link
Contributor Author

cole commented Apr 16, 2019

Works in my testing when combined with the changes in artefactual/archivematica#1407

Copy link
Member

@sevein sevein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this and it works. Thanks @cole!

Just for the record, it's still true (as it was in v0.3.3 when we removed random IDs) that the ID generators are going to be shared across multiple documents. E.g. the following test passes

def test_idgenerator_instance_attribute(self):
    doc_foo = metsrw.METSDocument()
    fsentry = metsrw.FSEntry("foo.txt", file_uuid=str(uuid.uuid4()))
    fsentry.add_premis_object("<premis>object uno</premis>")
    fsentry.add_premis_event("<premis>event uno</premis>")
    fsentry.add_premis_event("<premis>event dos</premis>")
    doc_foo.append_file(fsentry)

    doc_foo = doc_foo.serialize()
    assert doc_foo.find(".//mets:amdSec", namespaces=metsrw.utils.NAMESPACES).get("ID") == "amdSec_1"
    assert doc_foo.find(".//mets:techMD", namespaces=metsrw.utils.NAMESPACES).get("ID") == "techMD_1"
    assert doc_foo.findall(".//mets:digiprovMD", namespaces=metsrw.utils.NAMESPACES)[0].get("ID") == "digiprovMD_1"
    assert doc_foo.findall(".//mets:digiprovMD", namespaces=metsrw.utils.NAMESPACES)[1].get("ID") == "digiprovMD_2"

    doc_bar = metsrw.METSDocument()
    fsentry = metsrw.FSEntry("bar.txt", file_uuid=str(uuid.uuid4()))
    fsentry.add_premis_object("<premis>object uno</premis>")
    fsentry.add_premis_event("<premis>event uno</premis>")
    fsentry.add_premis_event("<premis>event dos</premis>")
    doc_bar.append_file(fsentry)
    print(doc_bar.tostring(encoding="unicode"))

    doc_bar = doc_bar.serialize()
    assert doc_bar.find(".//mets:amdSec", namespaces=metsrw.utils.NAMESPACES).get("ID") == "amdSec_2"
    assert doc_bar.find(".//mets:techMD", namespaces=metsrw.utils.NAMESPACES).get("ID") == "techMD_2"
    assert doc_bar.findall(".//mets:digiprovMD", namespaces=metsrw.utils.NAMESPACES)[0].get("ID") == "digiprovMD_3"
    assert doc_bar.findall(".//mets:digiprovMD", namespaces=metsrw.utils.NAMESPACES)[1].get("ID") == "digiprovMD_4"

That's not an impediment for Archivematica in the way that it's currently used though. Also users hitting this limitation can clear the generators like @cole is doing in test_metadata.py.

@sevein sevein merged commit d95939c into master Apr 17, 2019
@sevein sevein deleted the dev/id-generation-fixes branch April 17, 2019 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants