-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections API: Update Celery to be Synchronous #2167
Collections API: Update Celery to be Synchronous #2167
Conversation
Since the new geoprocessing service is run as the `mmw` user in the Worker VM, that user must have access to AWS credentials. Instead of mounting the developer's credentials into `/aws`, they are now mounted into the `mmw` user's home folder. Both `~/.aws/credentials` and `~/.aws/config` must be 644.
Just made a new mmw-geoprocessing release -- https://github.com/WikiWatershed/mmw-geoprocessing/releases/tag/3.0.0-alpha-2 -- so we can add a commit here to pull the latest geoprocessing service when provisioning the worker on this branch. |
combines it with input data, and submits it to Spark JobServer. | ||
|
||
This task must always be succeeded by `finish` below. | ||
combines it with input data, and submits it to Geoprocessing Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: to Geoprocessing Service
-> to the Geoprocessing Service
.
""" | ||
Start a geoproessing operation. | ||
Run a geoproessing operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: geoproessing
-> geoprocessing
Not sure why but I keep getting this error from the geoprocessing logs when trying to analyze: Error during processing of request: 'Unable to load AWS credentials from any provider in the chain'. Completing
with 500 Internal Server Error response. To change default exception handling behavior, provide a custom Exception
Handler. |
e3a94fa
to
edb8d8b
Compare
@@ -1,4 +1,8 @@ | |||
--- | |||
- name: Update CA Certificates | |||
shell: "update-ca-certificates -f" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to do this for AWS credentials to work with OpenJDK. Otherwise we get this error:
Unable to execute HTTP request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
Does this make sense, or is it something completely out of left field? @hectcastro @tnation14
I got this from https://stackoverflow.com/a/29313285/6995854
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is unfamiliar to me; we use OpenJDK with the AWS SDK on a few other projects and I don't think I've seen it before. Before I give the OK on this solution I'd like to take a deeper look at the error locally. I think I remember how to reproduce this from our conversation on Friday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Let me know if you'd like to pair on this, I'm available for that this week. Alternatively, just check out this branch, remove this commit, and follow the testing instructions to see the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that this is an instance of ca-certificates-java
bug 1396760. This is actually an issue that should be fixed in azavea.java
; I opened azavea/ansible-java#26 to track the issue, but I'm going to fix it right now.
Once azavea/ansible-java#27 is closed, this PR should use |
@tnation14 We're currently at |
I meant |
We add a task `run` and a helper method `geoprocess`. The `run` task converts the input into the desired format, and `geoprocess` communicates with the geoprocessing service and returns results. `run` is a combination of `start` and `finish`: it checks whether a result is cacheable and cached or not, and if so returns that. Otherwise it runs `geoprocess`. `geoprocess` is similar to `sjs_submit` in the sense that it is POSTing to an endpoint. Unlike `sjs_submit`, which gets back a job id, `geoprocess` receives the actual results and returns them. `run` is designed to replace `start` and `finish` tasks in Celery chains. So if a previous celery chain was: chain(geoprocessing.start.s(data), geoprocessing.finish.s(), mytasks.process_results.s()) It will now be: chain(geoprocessing.run.s(data), mytasks.process_results.s())
We likely do not need to use `choose_worker` anymore, since each request is independent and can be run on any worker (in the right colored stack). However, this probably needs some more thought, and thus will be addressed in the separate issue #2117.
These old async operations are no longer used.
This version includes RasterGroupedCount and RasterGroupedAverage operations, which make it sufficient for Analyze and TR-55 tasks.
We need azavea/ansible-java#27 to solve AWS access issues with OpenJDK.
edb8d8b
to
4e77070
Compare
Alright, this should be ready for another look. |
finish.s().set(exchange=exchange, | ||
routing_key=worker) | | ||
run.s(opname, data, wkaoi).set(exchange=exchange, | ||
routing_key=worker) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we need any of the exchange
and routing_key
stuff now that SJS is out of the picture. The main reason for it was to pin tasks to the worker where things started so that they'd finish there too. It may be another task, but probably worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hectcastro We do have #2117 to address that, but it'd be helpful if you could comment on the statement I made in there about dark stack routing - that was my recollection, but I wasn't confident it was accurate.
I destroyed then rebuilt my worker VM on this branch and everything seems to be working well! It feels pretty quick for small AOIs; going to try out a few larger AOIs and some simultaneously requests just to see how that works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. This is working well! I destroyed then rebuilt my worker vm on this branch and everything worked correctly.
The responses feel pretty quick now, and we'll be tweaking the geoprocessing API performance in future work, too.
Thanks for the reviews, everyone! |
Overview
Update Celery to use the new synchronous geoprocessing service, instead of the older Spark JobServer setup. See original issue and commit messages for details.
Connects #2102
Connects #2104
Testing Instructions
feature/collections-api
branchGet the latest version of mmw-geoprocessing and runcibuild
to generateapi-assembly-3.0.0-alpha.jar
Copy that new JAR into/opt/geoprocessing/mmw-geoprocessing-3.0.0-alpha.jar
in the Worker VM, and then reload the Worker VM