Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get_unit_status method #11

Merged
merged 18 commits into from Jan 7, 2020
Merged

Conversation

helrond
Copy link
Contributor

@helrond helrond commented Nov 8, 2019

This hasn't been tested and is probably full of bugs but I wanted to see if you think I'm on the right track here. If so, my next moves will be to add logging, tests, and work on exception handling.

Connects to archivematica/Issues#972.

@sevein
Copy link
Member

sevein commented Nov 10, 2019

This is looking good, @helrond. I have a few suggestions for consideration.

With a few tweaks, I think that _call_url_json could become a thinner function that wraps _call_url. That'd save us from some code duplication.

Regarding the error classes. It'd be interesting to adopt a proper/idiomatic class hierarchy of exceptions, e.g. AMClientException > TransportError etc, but I'd definitely leave that for v2. For v1, since we've already reached v1 release candidate, I'm wondering if we should actually make an effort to hide these classes from developers, e.g. amclient.utils._AMClientConnectionError as opposed to amclient.utils.AMClientConnectionError to hint them that they're for internal use only.

Regarding the helper itself, should we try to leave AMClient as the low-level client interface focused on API communication? A helper like yours, that encapsulates some logic beyond simple API calls, could be part of the amclient.utils module.

An additional idea which may not fit at all with your use case so please let me know, would provide a better experience for the user to focus on transfers only and hide the knowledge of how transfers transition into SIPs? The workflow could look as follows:

  1. User submits a transfer with amclient.AMClient.create_package which returns transfer_id. Or the old two-step way, start_transfer + approve_transfer.
  2. User calls amclient.utils.get_transfer_status(transfer_id) repeatedly to look up status and wait for completion.

get_transfer_status would always return the underlying payload generated by api/transfer/status or api/ingest/status since they share the same properties:

{
    "uuid": "...",
    "status": "...",
    "name": "...",
    "sip_uuid": "...",
    "microservice": "...",
    "directory": "...",
    "path": "...",
    "message": "...",
    "type": "..."
}

The helper knows how the transition from transfer to ingest happens, so it would hit api/ingest/status only when api/transfer/status returns:

  • status_code == 200, and
  • status == "COMPLETE", and
  • sip_uuid and sip_uuid != "BACKLOG"

Calling the helper when the package made it into Ingest would result into two API calls, but I think that's acceptable. Also in AM 1.11 the cost associated to the database queries is lower because we added some indexes.

Ultimately, the helper could take a wait_for_completion=True keyword to expose a blocking form of the whole process that knows how to retry until completion (AIP stored). That could make things even simpler for users.

@helrond
Copy link
Contributor Author

helrond commented Nov 17, 2019

Thanks @sevein - I've just pushed two commits which address your first two points (using _call_url_json as a wrapper and hiding the internal exceptions).

I'm wondering if you have a reference you can point me to for the hierarchy of exceptions - I'm not totally following you there, but that's likely my lack of knowledge!

I'm fine with moving the get_status_helper function to utils, but I haven't done that yet because there are (at least to my eye) several other functions in amclient which also encapsulate some logic beyond a simple API call. Does it make sense to move those as well?

Regarding your last point: what you describe is basically how I had envisioned the function working, however my approach was to try the ingests endpoint and then handle the exception from there. I think it basically amounts to the same thing in the end. I was going for the EAFP approach but I'd defer to your preference!

@sevein
Copy link
Member

sevein commented Nov 18, 2019

Thanks @sevein - I've just pushed two commits which address your first two points (using _call_url_json as a wrapper and hiding the internal exceptions).

Looking good!

I'm wondering if you have a reference you can point me to for the hierarchy of exceptions - I'm not totally following you there, but that's likely my lack of knowledge!

elasticsearch-py could be a good example because it's also based on urllib3/requests. Here's two examples outside the HTTP context: aiosmtplib.errors, celery.exceptions.

I'm fine with moving the get_status_helper function to utils, but I haven't done that yet because there are (at least to my eye) several other functions in amclient which also encapsulate some logic beyond a simple API call. Does it make sense to move those as well?

You're right. Let's leave it as part of AMClient, it would be an inconsistency otherwise.

Regarding your last point: what you describe is basically how I had envisioned the function working, however my approach was to try the ingests endpoint and then handle the exception from there. I think it basically amounts to the same thing in the end. I was going for the EAFP approach but I'd defer to your preference!

I realize that you're perfectly solving the problem that you described. It seems useful if you don't know the type of your objects, but how often does that happen?

I find myself frequently just waiting until a transfer becomes an AIP and reaches my AIP storage location. In that case, it's nice that something like the following:

  1. API user submits a transfer, Archivematica returns transfer_id.
  2. API user checks status with helper(transfer_id) until status=COMPLETE. sip_id is retrieved.
  3. API user checks status with helper(sip_id) until status=COMPLETE. AIP stored!

Becomes:

  1. API user submits a transfer, Archivematica returns transfer_id.
  2. API user checks status: helper(transfer_id) until status=COMPLETE. AIP stored!

@helrond
Copy link
Contributor Author

helrond commented Nov 19, 2019

OK, I've added some Exception which have a bit of a hierarchy. Definitely a rough draft, so open to suggestions:

_AMClientError
    _TransportError
        _ConnectionError
        _Request Error
            _Decode Error

Now that I've done this, I'm idly wondering, if we're going to the trouble of creating all these exceptions, why wouldn't we want to expose those to developers?

I'm also still a little lost with your last suggestion. If I'm understanding correctly, your proposal would change get_unit_status() to something like:

def get_unit_status(self, uuid):
        self.transfer_id = uuid
        transfer = self.get_transfer_status()
        if (
            transfer.get('status') == 'COMPLETE' and 
            transfer.get('sip_uuid') and 
            transfer.get('sip_uuid') != 'BACKLOG'
        ):
            self.sip_uuid = uuid
            return self.get_ingest_status()

Is that what you're saying, or am I missing something here? If we're going this route then I think I can (and probably should) dispense with the exceptions and _call_url because this doesn't make use of them at all.

Last, I was hoping to write some tests for this. Can you give me any pointer for building the vcr.py cassettes? I see there are some credentials hardcoded into test/test_amclient.py which may point to a local AM instance?

@sevein
Copy link
Member

sevein commented Nov 19, 2019

Now that I've done this, I'm idly wondering, if we're going to the trouble of creating all these exceptions, why wouldn't we want to expose those to developers?

I was suggesting the hierarchy only for v2. It's a lot of work. I don't think we have to do that now. If we end up using _call_url in the new method that you're suggesting, I think it'd be just fine to leak python-request exceptions.

Is that what you're saying, or am I missing something here? If we're going this route then I think I can (and probably should) dispense with the exceptions and _call_url because this doesn't make use of them at all.

The code looks good. What's missing in _call_url_json is the ability to communicate errors precisely. What you're saying could be a good compromise though. But if I was using that helper I'd very much prefer to have access to the actual exceptions because they'll let me look up the response statuses. I've found that to be important, having implemented retries recently. E.g. I don't want my app to spend 10 minutes retrying on a 401 Unauthorized, but a 500 Internal Server Error is definitely worth retrying. I'd take that over consistency with other methods of AMClient.

Last, I was hoping to write some tests for this. Can you give me any pointer for building the vcr.py cassettes? I see there are some credentials hardcoded into test/test_amclient.py which may point to a local AM instance?

Exactly. But you could just use mock, it'd make things simpler. You only have to patch a couple of methods after all.

@helrond
Copy link
Contributor Author

helrond commented Nov 22, 2019

OK, I think we're getting closer. I've tweaked _call_url to raise_for_status() and passed that exception along to _call_url_json. Since we don't need them right now, I've also removed the custom exceptions.

Last I've added some tests. I fully realize these tests are dumb, but I've never worked with mock before for this kind of thing so I'm still trying to figure out what I should be testing and how I should be testing it.

Feedback/next steps welcome!

Copy link
Member

@sevein sevein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Thanks.

amclient/amclient.py Show resolved Hide resolved
amclient/amclient.py Outdated Show resolved Hide resolved
amclient/amclient.py Outdated Show resolved Hide resolved
tests/test_amclient.py Show resolved Hide resolved
tests/test_amclient.py Show resolved Hide resolved
amclient/utils.py Outdated Show resolved Hide resolved
@sevein sevein changed the title WIP get_status helper Add get_unit_status method Nov 26, 2019
@sevein
Copy link
Member

sevein commented Dec 5, 2019

@helrond, I had a chance today to give this a try and it wasn't working for a couple of reasons. We forgot to pass the auth header and the base URL. Also when transitioning from transfer to ingest, we were not pulling the identifier of the SIP. I've pushed 389074e to your branch with some suggestions. Let me know what you think!

@helrond
Copy link
Contributor Author

helrond commented Dec 9, 2019

Hi @sevein these changes look great, thank you! I also appreciate the improvements to the tests. I had not had a chance to actually test; was planning on doing that this week, so you beat me to it.

@sevein
Copy link
Member

sevein commented Dec 10, 2019

Thanks @helrond. No rush, I'd still like to hear if this solution is working for you, since it's slightly different form what you were originally suggesting.

@helrond
Copy link
Contributor Author

helrond commented Jan 2, 2020

Hi @sevein - I have been working on incorporating amclient into one of our existing microservice apps, and early indications are that this is working really well.

I still have some work to do, but as the diff shows, I was able to get rid of a pretty big chunk of code by using the get_unit_status method along with some of the other helpers that amclient already offers, which feels pretty good!

@sevein
Copy link
Member

sevein commented Jan 7, 2020

That's awesome, thanks!

@sevein sevein merged commit 8dd3d9f into artefactual-labs:master Jan 7, 2020
@helrond helrond deleted the status branch January 7, 2020 22:43
@helrond
Copy link
Contributor Author

helrond commented Jan 7, 2020

@sevein at the risk of wearing out my welcome, any chance of getting this pushed up to PyPI?

@sevein
Copy link
Member

sevein commented Jan 7, 2020

Totally! Ross plans to release the final v1 release soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants