Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace resource instance factory (DEV-2876) #594

Merged

Conversation

BalduinLandolt
Copy link
Collaborator

No description provided.

@BalduinLandolt BalduinLandolt self-assigned this Oct 25, 2023
@linear
Copy link

linear bot commented Oct 25, 2023

@BalduinLandolt
Copy link
Collaborator Author

This PR is by no means perfect; especially one should add more tests, which now should already be considerably easier. But I would like to keep the PR size manageable, so I would prefer not tho have to do this in this one.

@BalduinLandolt BalduinLandolt marked this pull request as ready for review November 2, 2023 10:53
Copy link
Collaborator

@jnussbaum jnussbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a very cool PR, congrats, and thanks for tidying up this mess!
There are a lot of minor comments.
The only major concern are the errors raised in resource_create_client.py. As far as I see, they would be triggered during the upload. But in xmlupload.py, line 354, only BaseErrors are catched. So during a very long upload, it could happen that the program crashes because of bad input data. This could easily be prevented if we make sure that the only error that is raised in resource_create_client.py is BaseError. (Or, if xmlupload.py catches Exception.)

src/dsp_tools/connection/connection_live.py Outdated Show resolved Hide resolved
src/dsp_tools/connection/connection_live.py Show resolved Hide resolved
src/dsp_tools/models/value.py Show resolved Hide resolved
src/dsp_tools/utils/xmlupload/list_client.py Show resolved Hide resolved
src/dsp_tools/utils/xmlupload/list_client.py Outdated Show resolved Hide resolved
src/dsp_tools/utils/xmlupload/xmlupload.py Outdated Show resolved Hide resolved
src/dsp_tools/utils/xmlupload/xmlupload.py Show resolved Hide resolved
src/dsp_tools/utils/xmlupload/xmlupload.py Show resolved Hide resolved
src/dsp_tools/utils/xmlupload/xmlupload.py Outdated Show resolved Hide resolved
src/dsp_tools/utils/xmlupload/xmlupload.py Outdated Show resolved Hide resolved
@jnussbaum
Copy link
Collaborator

Another observation: There seem to be several id2iri_mappings that aren't synchronized. E.g. in xmlupload.py, line 338, you update one of them, but the one inside resource_create_client is still outdated. This will probably cause a problem in resource_create_client.py, line 315 (see my comment here: #594 (comment))

Yet another id2iri_mapping is created on line 156 of xmlupload.py (but that doesn't seem to be problematic)

@BalduinLandolt
Copy link
Collaborator Author

Another observation: There seem to be several id2iri_mappings that aren't synchronized. E.g. in xmlupload.py, line 338, you update one of them, but the one inside resource_create_client is still outdated. This will probably cause a problem in resource_create_client.py, line 315 (see my comment here: #594 (comment))

Yet another id2iri_mapping is created on line 156 of xmlupload.py (but that doesn't seem to be problematic)

I can't see any code path where this would be an issue - from what I can tell, there is really only one instance around (apart from the one on line 156, which as you mention, isn't relevant, because it is just there to ensure initialization of the variable in an error case.
But it's very well possible that I'm missing something. (And I strongly dislike this mutable state that's potentially modified all over the place. So if you can think of a nicer way, I'm happy... writing this, I think we could turn this IRI-registry into a little service so that it's at least compartmentalised properly, what do you think?)

@jnussbaum
Copy link
Collaborator

Another observation: There seem to be several id2iri_mappings that aren't synchronized. E.g. in xmlupload.py, line 338, you update one of them, but the one inside resource_create_client is still outdated. This will probably cause a problem in resource_create_client.py, line 315 (see my comment here: #594 (comment))
Yet another id2iri_mapping is created on line 156 of xmlupload.py (but that doesn't seem to be problematic)

I can't see any code path where this would be an issue - from what I can tell, there is really only one instance around (apart from the one on line 156, which as you mention, isn't relevant, because it is just there to ensure initialization of the variable in an error case. But it's very well possible that I'm missing something. (And I strongly dislike this mutable state that's potentially modified all over the place. So if you can think of a nicer way, I'm happy... writing this, I think we could turn this IRI-registry into a little service so that it's at least compartmentalised properly, what do you think?)

  • I don't understand how xmlupload.py, line 338 manages to update the mapping inside the resource_create_client. As far as I can see, resource_create_client has an empty mapping, which is never updated?
  • A service sounds exciting. Do you mean a listener, i.e. a class instance that holds the IRI-registry as a field, and that has an update() method that is called from the various places?

@BalduinLandolt
Copy link
Collaborator Author

BalduinLandolt commented Nov 3, 2023

Another observation: There seem to be several id2iri_mappings that aren't synchronized. E.g. in xmlupload.py, line 338, you update one of them, but the one inside resource_create_client is still outdated. This will probably cause a problem in resource_create_client.py, line 315 (see my comment here: #594 (comment))
Yet another id2iri_mapping is created on line 156 of xmlupload.py (but that doesn't seem to be problematic)

I can't see any code path where this would be an issue - from what I can tell, there is really only one instance around (apart from the one on line 156, which as you mention, isn't relevant, because it is just there to ensure initialization of the variable in an error case. But it's very well possible that I'm missing something. (And I strongly dislike this mutable state that's potentially modified all over the place. So if you can think of a nicer way, I'm happy... writing this, I think we could turn this IRI-registry into a little service so that it's at least compartmentalised properly, what do you think?)

  • I don't understand how xmlupload.py, line 338 manages to update the mapping inside the resource_create_client. As far as I can see, resource_create_client has an empty mapping, which is never updated?
  • A service sounds exciting. Do you mean a listener, i.e. a class instance that holds the IRI-registry as a field, and that has an update() method that is called from the various places?

If you have a look at this bit of code:

def _upload_resources(
resources: list[XMLResource],
imgdir: str,
sipi_server: Sipi,
permissions_lookup: dict[str, Permissions],
con: Connection,
config: UploadConfig,
project_client: ProjectClient,
list_client: ListClient,
) -> tuple[dict[str, str], list[str]]:
"""
Iterates through all resources and tries to upload them to DSP.
If a temporary exception occurs, the action is repeated until success,
and if a permanent exception occurs, the resource is skipped.
Args:
resources: list of XMLResources to upload to DSP
imgdir: folder containing the multimedia files
sipi_server: Sipi instance
permissions_lookup: maps permission strings to Permission objects
con: connection to DSP
config: the upload configuration
project_client: a client for HTTP communication with the DSP-API
list_client: a client for HTTP communication with the DSP-API
Returns:
id2iri_mapping, failed_uploads
"""
id2iri_mapping: dict[str, str] = {}
failed_uploads: list[str] = []
project_iri = project_client.get_project_iri()
json_ld_context = get_json_ld_context_for_project(project_client.get_ontology_name_dict())
listnode_lookup = list_client.get_list_node_id_to_iri_lookup()
resource_create_client = ResourceCreateClient(
con=con,
project_iri=project_iri,
json_ld_context=json_ld_context,
id2iri_mapping=id2iri_mapping,
permissions_lookup=permissions_lookup,
listnode_lookup=listnode_lookup,
)
for i, resource in enumerate(resources):
bitstream_information = None
if bitstream := resource.bitstream:
bitstream_information = handle_bitstream(
resource=resource,
bitstream=bitstream,
preprocessing_done=config.preprocessing_done,
permissions_lookup=permissions_lookup,
sipi_server=sipi_server,
imgdir=imgdir,
)
if not bitstream_information:
failed_uploads.append(resource.id)
continue
res = _create_resource(resource, bitstream_information, resource_create_client)
if not res:
failed_uploads.append(resource.id)
continue
iri, label = res
id2iri_mapping[resource.id] = iri
resource_designation = f"'{label}' (ID: '{resource.id}', IRI: '{iri}')"
print(f"Created resource {i+1}/{len(resources)}: {resource_designation}")
logger.info(f"Created resource {i+1}/{len(resources)}: {resource_designation}")
return id2iri_mapping, failed_uploads

  • L 307 creates an empty dict.
  • L 318 passes the dict to the ResourceCreateClient -> the client holds on to a reference of that dict
  • L 343 updates the dict repeatedly -> inside the client, the dict will update

Does this make sense?

But I will try to implement an IRI resolver to hide this ugly behaviour behind a nice facade :)

@BalduinLandolt BalduinLandolt merged commit 906a211 into main Nov 7, 2023
8 checks passed
@BalduinLandolt BalduinLandolt deleted the feature/dev-2876-remove-use-of-resource-instance-factory branch November 7, 2023 11:16
@daschbot daschbot mentioned this pull request Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants