Skip to content
This repository has been archived by the owner on Dec 16, 2019. It is now read-only.

Stable IDs #341

Merged
merged 11 commits into from
Sep 11, 2018
Merged

Stable IDs #341

merged 11 commits into from
Sep 11, 2018

Conversation

silverdaz
Copy link
Contributor

According to Oscar, the stable IDs generation for CentralEGA is handled by a process at EBI and it depends on the content of the original file.

We use the generated stable ID after a successful ingestion, which allows us:

  • not to waste any useful stable ID in case of errors (and no need to recycle or clean them up)
  • mark the ingested file as completed, but not yet ready for data-out. Once the stable ID comes in, we flag the file as ready for download.

@silverdaz silverdaz added this to the Sprint 35 milestone Sep 3, 2018
@silverdaz silverdaz self-assigned this Sep 3, 2018
@silverdaz
Copy link
Contributor Author

@blankdots : When you are back from holidays and read that, could you please give me a hand for the tests? (My local test worked fine).

The tests fail on the eureka, but I have neither touched the eureka code, nor its test.
So either it failed before, or there is some magic happening in the mock things.

@dtitov
Copy link

dtitov commented Sep 4, 2018

@silverdaz Could you, please, elaborate why not request the stable ID in the verify.py, right after successful checksum validation? What is the necessity of introducing new microservice?

In my mind, the logic of verify.py could be:

  1. Validate checksum:
    a. If validation failed - do the same what we do in such case currently;
    b. If validation succeeded - do the same what we do in such case currently plus query CentralEGA for stable ID and update it in our database.

Will it work like that or am I missing something?

@silverdaz
Copy link
Contributor Author

Ah, yes, sure, I had not thought about putting that code inside verify.py.

But this is asynchronous: We do not "query" CentralEGA and wait for the response synchronously.
We instead send them a message and they respond when they find the time.
That's why there is another service that listens to their answer, and that verify does not handle the stable ID.

@dtitov
Copy link

dtitov commented Sep 4, 2018

Are we making our system more complicated just to make CEGA's life easier? :)

To be honest, I would even include stable ID querying to the ingest.py, because I don't care much about wasting of some IDs: if we take a look at what UUID is, we can see that it's 128-bit long identifier, so you can imagine what a nearly endless space of IDs does it give to us.

And also I don't think that generation of ID is so much heavy and CPU consuming operation or that it's hard to implement: it's basically a call to UUID.randomUUID().

So, concluding, I would definitely not overload LocalEGA's setup with yet another microservice just because CentralEGA cannot or doesn't want to generate UUIDs for us upon request: neither them, nor we should be concerned about running out of ID's or about the heaviness of a synchronous approach.

Can you discuss this with Oscar and Sabela?

@jrambla
Copy link

jrambla commented Sep 5, 2018

@dtitov we are talking about Stable IDs, not UUIDs. StableID have a different life cycle that include tracking them for creation, updates (with versions) and eventual deprecation or obsolescence. Thus, we prefer to do not "waste" them, if possible.

@sdelatorrep
Copy link

sdelatorrep commented Sep 5, 2018

I'm not sure about what we are discussing here: async vs sync communication and CEGA IDs vs LEGA IDs (=UUIDs)?
In my opinion, I don't see any benefit of using synchronous communication here. If CEGA is down what would you do? Retry until it is up again? What if there is a communication error? Retry again? How many times? And what happens if CEGA never answers? Retry? Or the other way around, what if LEGA requests an ID but something happens when answering back, what should CEGA do? Retry? We already have RabbitMQ in place, so let's use it :) It will guarantee messages are not lost and neither CEGA nor LEGA need this information immediately.
About using UUIDs instead of CEGA IDs, I guess it's another solution but with probably lots of consequences which must be considered.

@dtitov
Copy link

dtitov commented Sep 5, 2018

@jrambla Thank you for the clarification, I was not aware of StableIDs lifecycle - that, of course, makes things a bit more difficult.

At the moment, after your comment, I see two possible ways:

  1. CentralEGA creates and provides us with the "StableIDs API" (probably the REST one) so that we can perform all kinds of the operations that you've specified: issue new IDs, version them, deprecate them and so on.
  2. Or LocalEGA creates yet another microservice to handle that (basically, this PR).

In other words, one of the sides should become more complicated. And, obviously, as a representative of LocalEGA team, I would prefer not adding any extra complexity to LocalEGA's architecture if it's possible to avoid that.

@sdelatorrep I see your point and it makes perfect sense to me, but any communication channel has its own benefits and downsides. What if we send you a message, you receive it and after that, during ID generation, CentralEGA goes down for some reason (e.g. power outage)? Then we will never receive the message back because, after the restart, your side will simply "forget" about the fact that it actually needs to respond (no messages are queued - no state is preserved). In this rare, but still possible case, the archived file will never get its ID. And with HTTP request we at least can be aware of the fact that ID retrieval failed so that we can fall back to some backup mechanism.

Look, I propose another variant not only because of my personal preferences but more because of practical reasons that relate to our local infrastructural limitations. But, I guess, we should better discuss it tomorrow in a meeting. Anyway, thanks for the explanations.

@silverdaz
Copy link
Contributor Author

What if we send you a message, you receive it and after that, during ID generation, CentralEGA goes down for some reason (e.g. power outage)? Then we will never receive the message back because, after the restart, your side will simply "forget" about the fact that it actually needs to respond (no messages are queued - no state is preserved).

This is what RabbitMQ handles. The message is not forgotten, and it will be (re)processed.

This PR is a solution to issue #260 .
There is always the possibility to make another PR/issue if you, @dtitov, feel that this solution is not suitable.
In the Agile spirit.

@dtitov
Copy link

dtitov commented Sep 5, 2018

@silverdaz I have read more about acknowledgments in RabbitMQ and yes, the case above can be handled.

container_name: id-mapper
volumes:
- ./lega/conf.ini:/etc/ega/conf.ini:ro
- ~/_ega/lega:/home/lega/.local/lib/python3.6/site-packages/lega

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be for local development (even more so that this is specific to @silverdaz env :) ), not particularly fond of having this sort of paths in the public repo (the same for other services)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah... oups...I thought I had removed all of them...

@@ -25,12 +25,11 @@


def retry_loop(func):
"""Retry connection for ``try`` times every ``try_interval`` seconds."""
"""Decorator retry something ``try`` times every ``try_interval`` seconds."""
Copy link

@blankdots blankdots Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using https://www.python.org/dev/peps/pep-0257/ - this line should be imperative (the same goes for line 32 below).
see D401 http://www.pydocstyle.org/en/2.1.1/error_codes.html

I know we are not following pep8 and pep257, and it will take quite the effort to go over the previous code to format it following the rules, however I would like for the "newer" (a bit relative here what "new" means) to follow these guides.

@blankdots
Copy link

blankdots commented Sep 6, 2018

@silverdaz regarding this #341 (comment), you kinda touched eureka in the config (try and try_interval)
you can set them as env varisables in test using https://docs.python.org/3.6/library/test.html#test.support.EnvironmentVarGuard

setup.py Outdated
@@ -3,7 +3,7 @@

setup(name='lega',
version=__version__,
url='http://lega.nbis.se',
url='http://LocalEGA.nbis.se',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url is not valid, why not have http://localega.readthedocs.io/ there or the github repo url ?

lega/verify.py Outdated
LOG.info('Verification completed. Updating database.')
db.set_status(file_id, db.Status.Completed)
# Updating the database
db.mark_completed(file_id)

# Send to QC
data.pop('status', None)
data['key_id'] = key_id
LOG.debug(f'Sending message to QC: {data}')
publish(data, channel, 'lega', 'qc') # We keep the org msg in there

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't qc queue deleted from defs.json ? Maybe a miss, or does it serve some purpose ?

Copy link
Member

@viklund viklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the logic of the python script and I think it is ok.

I do think that the name mapper is a bit confusing. Maybe just, stable-id-reciever?

@blankdots blankdots removed this from the Sprint 35 milestone Sep 7, 2018
@blankdots blankdots added this to the Sprint 36 milestone Sep 7, 2018
README.md Show resolved Hide resolved
@silverdaz silverdaz removed the request for review from dtitov September 10, 2018 13:34
Copy link

@blankdots blankdots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID Mapper is indeed not the best name.

Code has been reviewed. Changes to the code base imply changes to the deployment specified in these issues:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants