Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections API: Update Celery to be Synchronous #2167

Merged
merged 6 commits into from
Aug 23, 2017

Conversation

rajadain
Copy link
Member

@rajadain rajadain commented Aug 18, 2017

Overview

Update Celery to use the new synchronous geoprocessing service, instead of the older Spark JobServer setup. See original issue and commit messages for details.

Connects #2102
Connects #2104

Testing Instructions

  • Ensure your Worker VM has been provisioned at least once on this feature/collections-api branch
  • Get the latest version of mmw-geoprocessing and run cibuild to generate api-assembly-3.0.0-alpha.jar
  • Check out this branch
  • Copy that new JAR into /opt/geoprocessing/mmw-geoprocessing-3.0.0-alpha.jar in the Worker VM, and then reload the Worker VM
  • Provision the Worker
  • Open the app :8000/
  • Draw any shape, and proceed to Analyze. Ensure that all analyses complete successfully.
  • Proceed to TR-55. Draw modifications, move the precipitation slider. Ensure that all model runs complete successfully.

Since the new geoprocessing service is run as the `mmw` user in
the Worker VM, that user must have access to AWS credentials.
Instead of mounting the developer's credentials into `/aws`, they
are now mounted into the `mmw` user's home folder.

Both `~/.aws/credentials` and `~/.aws/config` must be 644.
@kellyi
Copy link
Contributor

kellyi commented Aug 21, 2017

Just made a new mmw-geoprocessing release -- https://github.com/WikiWatershed/mmw-geoprocessing/releases/tag/3.0.0-alpha-2 -- so we can add a commit here to pull the latest geoprocessing service when provisioning the worker on this branch.

combines it with input data, and submits it to Spark JobServer.

This task must always be succeeded by `finish` below.
combines it with input data, and submits it to Geoprocessing Service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: to Geoprocessing Service -> to the Geoprocessing Service.

"""
Start a geoproessing operation.
Run a geoproessing operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: geoproessing -> geoprocessing

@kellyi
Copy link
Contributor

kellyi commented Aug 21, 2017

Not sure why but I keep getting this error from the geoprocessing logs when trying to analyze:

Error during processing of request: 'Unable to load AWS credentials from any provider in the chain'. Completing
 with 500 Internal Server Error response. To change default exception handling behavior, provide a custom Exception
Handler.

@rajadain rajadain force-pushed the tt/collections-api-sync-celery branch from e3a94fa to edb8d8b Compare August 21, 2017 20:18
@@ -1,4 +1,8 @@
---
- name: Update CA Certificates
shell: "update-ca-certificates -f"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to do this for AWS credentials to work with OpenJDK. Otherwise we get this error:

Unable to execute HTTP request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty

Does this make sense, or is it something completely out of left field? @hectcastro @tnation14

I got this from https://stackoverflow.com/a/29313285/6995854

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is unfamiliar to me; we use OpenJDK with the AWS SDK on a few other projects and I don't think I've seen it before. Before I give the OK on this solution I'd like to take a deeper look at the error locally. I think I remember how to reproduce this from our conversation on Friday.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let me know if you'd like to pair on this, I'm available for that this week. Alternatively, just check out this branch, remove this commit, and follow the testing instructions to see the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that this is an instance of ca-certificates-java bug 1396760. This is actually an issue that should be fixed in azavea.java; I opened azavea/ansible-java#26 to track the issue, but I'm going to fix it right now.

@rajadain
Copy link
Member Author

keep getting this error from the geoprocessing logs

paired with @kellyi yesterday, and got that error to abate with edb8d8b. Still waiting on confirmation from ops folks if that is the right thing to do.

@tnation14
Copy link
Contributor

tnation14 commented Aug 22, 2017

Once azavea/ansible-java#27 is closed, this PR should use azavea.java version 6.0.1 0.6.1

@rajadain
Copy link
Member Author

@tnation14 We're currently at 0.5.0. Would we now go to 6.0.1?

@tnation14
Copy link
Contributor

I meant 0.6.1, sorry.

We add a task `run` and a helper method `geoprocess`. The `run`
task converts the input into the desired format, and `geoprocess`
communicates with the geoprocessing service and returns results.

`run` is a combination of `start` and `finish`: it checks whether
a result is cacheable and cached or not, and if so returns that.
Otherwise it runs `geoprocess`.

`geoprocess` is similar to `sjs_submit` in the sense that it is
POSTing to an endpoint. Unlike `sjs_submit`, which gets back a job
id, `geoprocess` receives the actual results and returns them.

`run` is designed to replace `start` and `finish` tasks in Celery
chains. So if a previous celery chain was:

    chain(geoprocessing.start.s(data),
          geoprocessing.finish.s(),
          mytasks.process_results.s())

It will now be:

    chain(geoprocessing.run.s(data),
          mytasks.process_results.s())
We likely do not need to use `choose_worker` anymore, since each
request is independent and can be run on any worker (in the right
colored stack). However, this probably needs some more thought,
and thus will be addressed in the separate issue #2117.
These old async operations are no longer used.
This version includes RasterGroupedCount and RasterGroupedAverage
operations, which make it sufficient for Analyze and TR-55 tasks.
We need azavea/ansible-java#27 to solve
AWS access issues with OpenJDK.
@rajadain rajadain force-pushed the tt/collections-api-sync-celery branch from edb8d8b to 4e77070 Compare August 22, 2017 22:23
@rajadain
Copy link
Member Author

Alright, this should be ready for another look.

finish.s().set(exchange=exchange,
routing_key=worker) |
run.s(opname, data, wkaoi).set(exchange=exchange,
routing_key=worker) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need any of the exchange and routing_key stuff now that SJS is out of the picture. The main reason for it was to pin tasks to the worker where things started so that they'd finish there too. It may be another task, but probably worth doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hectcastro We do have #2117 to address that, but it'd be helpful if you could comment on the statement I made in there about dark stack routing - that was my recollection, but I wasn't confident it was accurate.

@kellyi
Copy link
Contributor

kellyi commented Aug 23, 2017

I destroyed then rebuilt my worker VM on this branch and everything seems to be working well!

It feels pretty quick for small AOIs; going to try out a few larger AOIs and some simultaneously requests just to see how that works.

Copy link
Contributor

@kellyi kellyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is working well! I destroyed then rebuilt my worker vm on this branch and everything worked correctly.

The responses feel pretty quick now, and we'll be tweaking the geoprocessing API performance in future work, too.

@kellyi kellyi assigned rajadain and unassigned kellyi Aug 23, 2017
@rajadain
Copy link
Member Author

Thanks for the reviews, everyone!

@rajadain rajadain merged commit b06a9e9 into feature/collections-api Aug 23, 2017
@rajadain rajadain deleted the tt/collections-api-sync-celery branch August 23, 2017 19:32
@rajadain rajadain mentioned this pull request Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants