-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 462 #463
Conversation
Main reason to have a single method to do both is against race condition: we are relying on postgres for the lock, ensuring that we cannot update twice this value to true.
- not_started when we haven't started uploading the clks - in_progress when we start uploading them (lock the resource) - done when completed - error when an error occured.
One checking that re-using a token fails. One checking that uploading failing data does not block the project (can re-upload).
…t them or lock the resource. If an error occurs, free back the resource for later upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like that we have UPLOADSTATE and UPLOADEDSTATE in the db now.
It is really hard to understand what they are for without spending quite some time reading through code. I made a suggestion how we can resolve that in the comments.
|
||
def get_and_set_dataprovider_upload_state_in_progress(db, dp_id): | ||
""" | ||
This method returns true if it was able to update the uploaded status of this dataprovider from false to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the status of the dataproviders is not true/false any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well seen. I started by using the boolean value, but added more states to have an in_progress
one. But I forgot the comments...
This method returns true if it was able to update the uploaded status of this dataprovider from false to true. | ||
It return false otherwise (i.e. the state was already set to true). | ||
""" | ||
logger.debug("Setting dataprovider {} upload state to True".format(dp_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-phrased the whole docstring.
@@ -121,57 +121,69 @@ def project_clks_post(project_id): | |||
with DBConn() as conn: | |||
dp_id = db.get_dataprovider_id(conn, token) | |||
project_encoding_size = db.get_project_schema_encoding_size(conn, project_id) | |||
upload_state_updated = db.get_and_set_dataprovider_upload_state_in_progress(conn, dp_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit cryptic. The function get_and_set_dataprovider_upload_state_in_progress
mixes two tasks.
How about if we split this to make it clearer:
if not db.dataprovider_allowed_to_upload(conn, project_id, dp_id, ...):
return safe_fail_request(403, "This token has already been used to upload clks.")
db.set_dp_upload_state_in_progress(...)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main point is to do both actions at once not to have race conditions. Otherwise we would first assess it, and then setting it which should return an error if already set. I thus rename the method is_dataprovider_allowed_to_upload_and_lock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see...
@@ -85,6 +85,12 @@ RETURNS bool AS $$ | |||
SELECT $1.state = 'completed' | |||
$$ STABLE LANGUAGE SQL; | |||
|
|||
CREATE TYPE UPLOADEDSTATE AS ENUM ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, now we have UPLOADEDSTATE
and UPLOADSTATE
. This is very confusing.
After looking at those for a bit, I think we should rename UPLOADSTATE
to PROCESSEDSTATE
.
It would also be nice to add a bit of an explanation what all those states are for.
In my view,
UPLOADSTATE describes the state of the processing of the uploaded clks. This is only set after CLKs have been uploaded. 'pending' means we are waiting for some processing to be done, 'ready' means done and 'error' if something fails.
UPLOADEDSTATE describes the state of the CLK upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was indeed a pickle. I will add comments on the UPLOADEDSTATE
.
I didn't want to touch the UPLOADSTATE
as I wasn't sure what it was made for (as you mentioned, this is not well documented). But I'll also add some documentation around it and rename it. In which case I will also rename UPLOADEDSTATE
to UPLOADSTATE
which sounds better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll finally keep UPLOADEDSTATE
instead of renaming it because the value in the table is uploaded
. But renamed the UPLOADSTATE
to PROCESSEDSTATE
, adding some documentation.
@@ -235,3 +235,62 @@ def test_project_single_party_empty_data_upload( | |||
) | |||
assert r.status_code == 400 | |||
|
|||
|
|||
def test_project_upload_wrong_authentication(requests, valid_project_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can be more specific here. You are testing that you cannot upload twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
small_file_path, new_project_data['project_id'], token_to_reuse, 1000, expected_status_code=403) | ||
|
||
|
||
def test_project_upload_fail_then_works(requests, valid_project_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a great name for this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was inspired :)
I was not happy with the previous name...
def is_dataprovider_allowed_to_upload_and_lock(db, dp_id): | ||
""" | ||
This method returns true if the dataprovider is allowed to upload her clks. | ||
A data provider is allowed to upload clks if they have not yet been uploaded or if the upload is already in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean something different with the second if.
A DP is not allowed to upload if data has been uploaded, or if an upload is currently underway.
if length < 1: | ||
return False | ||
elif length > 1: | ||
raise ValueError("Houston, we have a problem!!! This dataprovider can upload multiple times her clks.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to write a message to the logger at high severity level as well here.
'in_progress', -- the upload is in progress, so no-one else should be able to upload at the same time | ||
'done', -- the clks have been uploaded, it should stay this way | ||
'error' -- the dataprovider has tried to upload the clks but an error occurred, having this state allows a dataprovider to try again. | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -121,57 +121,69 @@ def project_clks_post(project_id): | |||
with DBConn() as conn: | |||
dp_id = db.get_dataprovider_id(conn, token) | |||
project_encoding_size = db.get_project_schema_encoding_size(conn, project_id) | |||
upload_state_updated = db.get_and_set_dataprovider_upload_state_in_progress(conn, dp_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see...
Closes #462 which describes the fact that a dataprovider was able to upload clks multiple time to the same project using the same token.
In this PR, we solve this issue by modifying the
uploaded
value in the tabledataproviders
in the database from a boolean to an enum which can have the following values:When posting clks to a project, we check that the token is not already in use (i.e.
uploaded
set toin_progress
ordone
), and post if possible. In an error case,uploaded
will be set toerror
, which will enable the owner of the token to try uploading their data again, not to block a project if an upload failed.