Collections API #2212

rajadain · 2017-08-30T19:29:52Z

Overview

Formally includes the Collections API work by merging the feature branch. Includes #2100, #2101, #2102, #2103, #2104, WikiWatershed/mmw-geoprocessing#48, WikiWatershed/mmw-geoprocessing#49, WikiWatershed/mmw-geoprocessing#50, WikiWatershed/mmw-geoprocessing#51, WikiWatershed/mmw-geoprocessing#52, WikiWatershed/mmw-geoprocessing#53.

All the code in this PR has already been reviewed. This only requires a quick run through, before merging.

Connects #2105

Notes

Some areas of interest will not work for MapShed, because of an alignment mismatch issue described in #2153. To test the values here outside of that, try and choose a horizontal area of interest, and not a diagonal one. These HUC-12's should work:

Testing Instructions

Check out this branch. Destroy your worker, and reprovision it.
Run through the app. Ensure you can still analyze and model.
Ensure TR-55 works. Compare results against staging and ensure they are identical.
Ensure MapShed works. Compare results against staging and ensure they are identical.

This is required to run Akka HTTP services natively. Previously we were running Spark JobServer within a Docker container, so did not need to install this. Now, for performance reasons, we will run the service natively, thus necessitating this install.

…l-dependencies Upgrade to Java 8 Connects #2100

- adjust geoprocessing jar version and name - remove Spark Job Server from Ansible config - rename SJS port & host -> geop_port & geop_host - configure geoprocessing role - add upstart geoprocessing job - declare an explicit dependency on the model-my-watershed.base role in the geoprocessing role to ensure mmw user's created before the service starts

Replace SJS with Akka HTTP server Connects #2101 Connects #2103

Since the new geoprocessing service is run as the `mmw` user in the Worker VM, that user must have access to AWS credentials. Instead of mounting the developer's credentials into `/aws`, they are now mounted into the `mmw` user's home folder. Both `~/.aws/credentials` and `~/.aws/config` must be 644.

We add a task `run` and a helper method `geoprocess`. The `run` task converts the input into the desired format, and `geoprocess` communicates with the geoprocessing service and returns results. `run` is a combination of `start` and `finish`: it checks whether a result is cacheable and cached or not, and if so returns that. Otherwise it runs `geoprocess`. `geoprocess` is similar to `sjs_submit` in the sense that it is POSTing to an endpoint. Unlike `sjs_submit`, which gets back a job id, `geoprocess` receives the actual results and returns them. `run` is designed to replace `start` and `finish` tasks in Celery chains. So if a previous celery chain was: chain(geoprocessing.start.s(data), geoprocessing.finish.s(), mytasks.process_results.s()) It will now be: chain(geoprocessing.run.s(data), mytasks.process_results.s())

We likely do not need to use `choose_worker` anymore, since each request is independent and can be run on any worker (in the right colored stack). However, this probably needs some more thought, and thus will be addressed in the separate issue #2117.

These old async operations are no longer used.

This version includes RasterGroupedCount and RasterGroupedAverage operations, which make it sufficient for Analyze and TR-55 tasks.

We need azavea/ansible-java#27 to solve AWS access issues with OpenJDK.

…elery Collections API: Update Celery to be Synchronous Connects #2102

Due to the implementation of async SJS requests, we established a mechninism for directly establishing routes that would stay on a single worker. This led to complex code and also high latency when we pinged each worker to determine if they were available. With recent changes removing SJS in favor of a synchronous geoprocessing call, we can now cull these custom routes and exchanges, both simplifying the code and increasing the speed of submitting a new job.

Remove custom celery routing and exchange

Upgrade to 3.0.0-beta-1 which suports RasterLinesJoin, thus unlocking MapShed tasks, and becoming feature complete.

…feature/collections-api Connects #2015 Connects WikiWatershed/mmw-geoprocessing#53

mmcfarland

Ran through everything and compared to production, identical results. The celery improvements make a big impact, but at larger watersheds it's performing at slightly more polls, which is probably to be expected due to network latency with s3. Looking forward to getting this up on staging and having a better apples to apples comparison.

rajadain and others added 15 commits August 2, 2017 11:42

Upgrade to Java 8

a1b9ded

This is required to run Akka HTTP services natively. Previously we were running Spark JobServer within a Docker container, so did not need to install this. Now, for performance reasons, we will run the service natively, thus necessitating this install.

Merge pull request #2107 from WikiWatershed/tt/collections-api-instal…

d92a909

…l-dependencies Upgrade to Java 8 Connects #2100

Merge pull request #2118 from WikiWatershed/ki/use-akka-http-service

f292f5b

Replace SJS with Akka HTTP server Connects #2101 Connects #2103

Replace start and finish with run

30327aa

We likely do not need to use `choose_worker` anymore, since each request is independent and can be run on any worker (in the right colored stack). However, this probably needs some more thought, and thus will be addressed in the separate issue #2117.

Remove unused tasks and methods

88cf528

These old async operations are no longer used.

Use latest alpha of geoprocessing service

df38e84

This version includes RasterGroupedCount and RasterGroupedAverage operations, which make it sufficient for Analyze and TR-55 tasks.

Update Ansible Java to latest with bugfix

4e77070

We need azavea/ansible-java#27 to solve AWS access issues with OpenJDK.

Merge pull request #2167 from WikiWatershed/tt/collections-api-sync-c…

b06a9e9

…elery Collections API: Update Celery to be Synchronous Connects #2102

Merge pull request #2193 from WikiWatershed/mjm/celery-crunch

8577f11

Remove custom celery routing and exchange

Upgrade Geoprocessing Service

4a73218

Upgrade to 3.0.0-beta-1 which suports RasterLinesJoin, thus unlocking MapShed tasks, and becoming feature complete.

Merge branch 'tt/collections-api-upgrade-geoprocessing-service' into …

557d0fb

…feature/collections-api Connects #2015 Connects WikiWatershed/mmw-geoprocessing#53

rajadain assigned mmcfarland Aug 30, 2017

rajadain requested a review from mmcfarland August 30, 2017 19:29

mmcfarland approved these changes Aug 31, 2017

View reviewed changes

mmcfarland assigned rajadain and unassigned mmcfarland Aug 31, 2017

rajadain merged commit 3d7edbd into develop Aug 31, 2017

rajadain deleted the feature/collections-api branch August 31, 2017 18:06

rajadain mentioned this pull request Oct 16, 2017

Release 1.20.0 #2304

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collections API #2212

Collections API #2212

rajadain commented Aug 30, 2017

mmcfarland left a comment

Collections API #2212

Collections API #2212

Conversation

rajadain commented Aug 30, 2017

Overview

Notes

Testing Instructions

mmcfarland left a comment

Choose a reason for hiding this comment