Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to do merge importing... #10

Closed
mikesname opened this issue Nov 28, 2013 · 7 comments
Closed

Figure out how to do merge importing... #10

mikesname opened this issue Nov 28, 2013 · 7 comments
Assignees

Comments

@mikesname
Copy link
Contributor

At present, Description nodes have a dependent relationship the their parent item. This means that if you createOrUpdate an item bundle (as we do when importing from EAD) all descriptions that are not present in the input bundle will be deleted. This is an unavoidable consequence of having to update a full subtree (or at least I can't figure out how to do it otherwise.) The corollary is that descriptions are deleted when an item is deleted, like an SQL CASCADE.

This will obviously cause problems in the case where additional descriptions are created manually via the web interface, on items that were harvested. Those manually created descriptions will be zapped if/when we re-harvest, because they don't belong in the source data.

The way I think we should handle this (without putting egregious hacks in the core persistence code) is as follows:

  • agree on a property convention to distinguish manually-created descriptions from automatically harvested ones, i.e. "manual=true"
  • check if an item already exists on import
  • if it is already there, serialize the existing node as a bundle and, using the pre-agreed discriminator for manually-created descriptions, copy them to the new import bundle
  • update the bundle

Because the existing descriptions won't have changed they won't actually be "updated", since the persistence code checks if any changes have actually occurred. However they also won't be deleted.

What do you think @bencomp and @lindareijnhoudt ?

@bencomp
Copy link
Contributor

bencomp commented Jan 30, 2014

I don't know of a better way to do this. We have discussed discriminating by language tags in the description identifier (i.e. #desc-[unitid]-[language tag]), which could be extended to include -manual-[ordinal number], but in the end encoding meanings explicitly using properties works better in the long run.

Checking for the 'manual designator' on replace would take hacking the persistence code, wouldn't it?

@mikesname
Copy link
Contributor Author

I would rather put this create/merge/replace logic in the importers. I still can't think of a general way to merge the trees without appropriate knowledge of what it is we're merging. Can you write an XSLT to merge the Cegesoma multilingual EADs prior to import?

I'm going to do some experimenting to try and figure this out.

@bencomp
Copy link
Contributor

bencomp commented Jan 30, 2014

I've been looking around the handlers to understand how multilingual stuff is imported currently and to me it seems that one Map per language is created although how the language of the description is determined is unclear. Merging is one option, 'caching' the language of description is what I was thinking of. I just started 'caching' the <eadid> to help create unique IDs per file - uh oh, that should be the <archdesc>'s unitid.

@mikesname
Copy link
Contributor Author

I don't understand it either - out of interest I tried putting langcode attributes in those Cegesoma files (based on F or N) and they still all came out as nld.

I'm working on something where two bundles (such as the existing one and the one we're importing) can be merged via a strategy that can be provided (via a function object) to determine whether nodes at the same level of the tree should be preserved or replaced. Then we should be able to do something like:

  • node is a description and has the same language and is not manually created = replace

Then we could do:

build import bundle from XML
if item exists:
    get existing item bundle
    merge via above strategy
    update
else:
    create

@mikesname
Copy link
Contributor Author

Now added a creationProcess field on Description class. Can have values IMPORT or MANUAL at present.

@bencomp
Copy link
Contributor

bencomp commented Jan 27, 2015

We've made progress on this issue. Instead of (?) the creationProcess, the importer looks at the sourceFileId to discover whether a description has been imported. This includes the language.

Will look again soon.

@mikesname
Copy link
Contributor Author

This is in theory working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants