New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: xmlupload crashes without writing id2iri mapping (DEV-813) #194
fix: xmlupload crashes without writing id2iri mapping (DEV-813) #194
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…-when-xmlupload-crashes' into wip/dev-813-write-id2iri-mapping-when-xmlupload-crashes
…this resource can be skipped instead of making the xmlupload crash
…ipped instead of making the xmlupload crash
reduce number of retries to 5
rearrange code lines improve comments add docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loos like a big improvement on the stability of dsp-tools! I added some comments, most of them are minor. I would appreciate if you could check if the one case where you call a method as a return value could be improved.
Apart from that some general remarks/possible improvements:
- please don't use "knora" anywhere in new code. I know that we have it a lot everywhere. But we want to get rid of this name and use DSP instead. "knora" is deprecated.
- the whole file is pretty long now. I would consider splitting it up into smaller pieces if that makes sense.
- I haven't checked properly but it seems that there are still some methods that aren't tested. Also, I would recommend to test negative cases where you actually have an exception or an error. IMO, also the log files that are created should be checked (the content not just their existence).
delta = datetime.datetime.now() - datetime.datetime.fromtimestamp(mapping.stat().st_mtime_ns / 1000000000) | ||
if delta.seconds < 15: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this delta needed?
delta = datetime.datetime.now() - datetime.datetime.fromtimestamp(mapping.stat().st_mtime_ns / 1000000000) | ||
if delta.seconds < 15: | ||
mapping_file = mapping.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would you have a file that starts with "id2iri_test-data_mapping_" which is older than 15 seconds?
|
||
id2iri_replaced_xml_filename = 'testdata/tmp/_test-id2iri-replaced.xml' | ||
id_to_iri(xml_file='testdata/test-id2iri-data.xml', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this test to a separate test? Ideally, they shouldn't depend on each other (unless you want to test this dependency).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not happy with the error handling part:
except BaseException
(or as I prefer except Exception
) should ever only be used when you want to do something before you kill the program. And then you should normally raise the same error again, so that the stack trace doesn't get lost.
You should only handle exceptions when you know how to handle them.
Also, remember that it shoud be possible at any time, to do e.g. a keyboard interrupt.
Additionally, I think you could make more liberal use of the finally
block: It wouldn't hurt to just write the IRI mapping in a finally at the end of every attempt to upload something.
Some more thoughts, more in the direction of refactoring the code to make it more maintainable:
- I agree that the file should be split up in multiple files (ideally I'd put it in a submodule
xmlupload
so that everything is together that belongs together). - I think you could go even further in splitting those long methods up into shorter methods (also Sonar gives you codesmells where some of those functions have a complexity score of 55 whereas no more than 15 is recommended; and some linting tools don't allow functions that have more than 15 lines - I wouldn't be so strict, but I still think that methods should be kept as simple as possible).
- relating to that: in python code, by convention, protected methods have a name starting with an underscore and private methods start with double-underscore. It's generally good practice to do that for every function that is specific for the logic of the present module, so that you don't expect it to be called from a different file than the one you're in. So ideally
xml_upload()
which is the public method that you expose, should be short and just call bunch of protected/private methods that again delegate the work, until you get a nice granularity.
rename answer to response improve comments/docstrings
…g was successful - refactor e2e test accordingly
improve reconnecting-print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Args: | ||
terminal_output_on_failure: message to be printed if action cannot be executed | ||
method: either a callable to be called on its own, or a method name (as string) to be called on object | ||
object: if provided, it must be a python variable/object, accompanied by a method name (as string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by accompanied? Provided with the method
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall a great improvement. Also good job on the strategy pattern for the retry-upload method!
There is still a lot of room for refactoring/codestyle. But this PR has been going on for long enough, so feel free to leave it as it is for now.
Regarding the strategy pattern, I have added a bunch of comments which would probably simplify it a bit.
If it's not possible to pass an object instance method as a function, you might use a lambda function or a functools.partial
, so there would be ways for sure, to streamline it.
I still think it will be good to split up things to make them simpler, but again: this can/should be done in a separate PR.
And, honestly, I don't really understand the stashing and hashing that is done in some places. My gut tells me that there should be a simpler way for some things - but I don't know, so I decided to not comment on it.
img = sipi_server.upload_bitstream(os.path.join(imgdir, resource.bitstream.value)) | ||
img: dict[Any, Any] = try_network_action( | ||
object=sipi_server, | ||
method='upload_bitstream', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you pass the function rather than a string here, you can call it directly, instead of having to find out, what function the string points to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, why you used the string. Doesn't it work, to just pass sipi_server.upload_bitstream
as a callable?
Kudos, SonarCloud Quality Gate passed!
|
resolves DEV-813