-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(id2iri): add flag to remove created resources from XML (DEV-2571) #491
Changes from 5 commits
6891d6f
16a1c0e
916b724
b71a862
cdb8e0c
f4c955a
aca7a4a
b710b8f
78b966a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,7 +75,8 @@ def _replace_resptrs( | |
Returns: | ||
a tuple of the modified XML tree and the set of the IDs that have been replaced | ||
""" | ||
resptr_elems = tree.xpath("/knora/resource/resptr-prop/resptr") | ||
resptr_xpath = "|".join([f"/knora/{x}/resptr-prop/resptr" for x in ["resource", "annotation", "link", "region"]]) | ||
resptr_elems = tree.xpath(resptr_xpath) | ||
resptr_elems_replaced = 0 | ||
for resptr_elem in resptr_elems: | ||
value_before = resptr_elem.text | ||
|
@@ -106,9 +107,8 @@ def _replace_salsah_links( | |
Returns: | ||
a tuple of the modified XML tree and the set of the IDs that have been replaced | ||
""" | ||
salsah_links = [ | ||
x for x in tree.xpath("/knora/resource/text-prop/text//a") if x.attrib.get("class") == "salsah-link" | ||
] | ||
salsah_xpath = "|".join([f"/knora/{x}/text-prop/text//a" for x in ["resource", "annotation", "link", "region"]]) | ||
salsah_links = [x for x in tree.xpath(salsah_xpath) if x.attrib.get("class") == "salsah-link"] | ||
salsah_links_replaced = 0 | ||
for salsah_link in salsah_links: | ||
value_before = regex.sub("IRI:|:IRI", "", salsah_link.attrib.get("href", "")) | ||
|
@@ -137,7 +137,7 @@ def _replace_ids_by_iris( | |
mapping: mapping of internal IDs to IRIs | ||
|
||
Returns: | ||
modified XML tree | ||
a tuple of the modified XML tree and the success status | ||
""" | ||
success = True | ||
used_mapping_entries: set[str] = set() | ||
|
@@ -160,6 +160,36 @@ def _replace_ids_by_iris( | |
return tree, success | ||
|
||
|
||
def _remove_resources_if_id_in_mapping( | ||
tree: etree._Element, | ||
mapping: dict[str, str], | ||
) -> tuple[etree._Element, bool]: | ||
""" | ||
Remove all resources from the XML file if their ID is in the mapping. | ||
|
||
Args: | ||
tree: parsed XML file | ||
mapping: mapping of internal IDs to IRIs | ||
|
||
Returns: | ||
a tuple of the modified XML tree and the success status | ||
""" | ||
success = True | ||
jnussbaum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
resources = tree.xpath("|".join([f"/knora/{x}" for x in ["resource", "annotation", "link", "region"]])) | ||
resources_to_remove = [x for x in resources if x.attrib.get("id") in mapping] | ||
for resource in resources_to_remove: | ||
jnussbaum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
resource.getparent().remove(resource) | ||
|
||
msg = ( | ||
f"Removed {len(resources_to_remove)}/{len(resources)} resources from the XML file, " | ||
"because their ID was in the mapping" | ||
) | ||
logger.info(msg) | ||
print(msg) | ||
|
||
return tree, success | ||
|
||
|
||
def _write_output_file( | ||
orig_xml_file: Path, | ||
tree: etree._Element, | ||
|
@@ -182,6 +212,7 @@ def _write_output_file( | |
def id_to_iri( | ||
xml_file: str, | ||
json_file: str, | ||
remove_resource_if_id_in_mapping: bool = False, | ||
) -> bool: | ||
""" | ||
Replace internal IDs of an XML file | ||
|
@@ -193,6 +224,7 @@ def id_to_iri( | |
Args: | ||
xml_file: the XML file with the data to be replaced | ||
json_file: the JSON file with the mapping (dict) of internal IDs to IRIs | ||
remove_resource_if_id_in_mapping: if True, remove all resources from the XML file if their ID is in the mapping | ||
|
||
Raises: | ||
BaseError: if one of the two input files is not a valid file | ||
|
@@ -207,5 +239,10 @@ def id_to_iri( | |
tree=tree, | ||
mapping=mapping, | ||
) | ||
if remove_resource_if_id_in_mapping: | ||
tree, success = _remove_resources_if_id_in_mapping( | ||
tree=tree, | ||
mapping=mapping, | ||
) | ||
_write_output_file(orig_xml_file=xml_file_as_path, tree=tree) | ||
return success | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
description of methode 'id_to_iri' is misleading in my opinion. There is no possibility that success returns 'False' if I see correctly 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. This is part of a overarching design pattern that I often use in DSP-TOOLS: every top-level-function (like xmlupload, create, id2iri, ...) always returns True if successful, and False if not. The file Now the question is: Does it make sense to define Pros and cons of the current code:
@BalduinLandolt Have you got an opinion on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me too it is unclear. If I have a variable with a bool which you return then I expect there to be a possibility for it to change. In this case I would just return True. If we need to implement a change, creating a new variable is not too difficult. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an issue with exceptions in general: They behave like an additional, implicit and untyped return channel - something we despise in the world of functional programming 🙂 The clean way to solve this is to either return something useful (some aggregation of error messages?) that is consumed by In any case, this is probably out of scope for this PR and should be addressed separately There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe to avoid a possible bug in the future: just remove success from the methods that get called from within id_to_iri? since the latter could - if changes were made later in the first method that gets called - overwrite the success message of the first one
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To clarify:
At least for the This means for id2iri: Currently, it has to return a success status to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And thank you @gNahcab, you noticed that in my previous code, the 2nd function overwrote the success status from the 1st function, which was a bug, of course. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add here that there can't be further statements with the resource in subject position. Because to me that would not be obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we speak of 2 different things: You talk about triples (subject - predicate - object) in the database, and I talk about resources that can be uploaded (=
<resource>
tags in the XML file --> a resource like "Iliad Prooem" that can be linked at with a link like https://ark.dasch.swiss/ark:/72163/1/082E/40kW9f9=SzOnQiyvhBNSqw=.20220414T072754555597Z).In the context of an xmlupload, we don't care about what kind of triples can be in the database. We only care about
<resource>
s that are uploaded. And if "Iliad Prooem" has been uploaded already, I don't want to upload it a second time.