Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaffold Excel Generation Endpoint #3191

Merged
merged 5 commits into from Oct 31, 2019
Merged

Conversation

@rajadain
Copy link
Member

rajadain commented Oct 30, 2019

Overview

This scaffolds two new endpoints:

  • /mmw/modeling/worksheet/: This endpoint takes a shape, pairs it with HUC-12s, and in the future will start a Celery job that calculates a number of values for each pair, resulting in a JSON blob
  • /export/worksheet/: This endpoint takes the JSON blob generated above and returns a ZIP file containing Excel worksheets corresponding to each pair

Currently, the first endpoint returns an empty job, and prints out the pairs in the debug log. The second endpoint takes an array of objects, and returns a ZIP of Excel worksheets that are hardcoded to have the same value, for as many objects were in the input array. This scaffolding will be expanded upon in future work.

Next steps:

  • Create a Celery queue that generates the JSON blob
  • Update the export endpoint to generate files corresponding to the HUC-12 names, rather than 0.xlsx like it currently does
  • Add input validation
  • Add tests?

See commit messages for details.

Connects #3185

Demo

Getting a Celery Job response shape from the modeling endpoint where a shape is submitted:

$ http :8000/api/modeling/worksheet/ Authorization:"Token $MMW_API_TOKEN" < triple-hucs.geojson

HTTP/1.1 200 OK
Allow: POST, OPTIONS
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Tue, 29 Oct 2019 20:50:32 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
Vary: Accept, Cookie, Origin

{
    "job": "00000000-0000-0000-0000-000000000000",
    "status": "started"
}

The clipped shapes are currently printed in the debugserver output. Porting them to GeoJSON.io, we can visualize the results like this:

image

image

Getting a ZIP file containing Excel Spreadsheets back from the export endpoint:

$ http --download :8000/export/worksheet/ < test.json

HTTP/1.1 200 OK
Allow: POST, OPTIONS
Connection: keep-alive
Content-Disposition: attachment; filename="MMW_BMP_Spreadsheets.zip"
Content-Length: 244138
Content-Type: application/zip
Date: Tue, 29 Oct 2019 20:31:05 GMT
Server: nginx
Vary: Accept, Cookie

Downloading 238.42 kB to "MMW_BMP_Spreadsheets.zip"
Done. 238.42 kB in 0.00400s (58.15 MB/s)

Excel file with Stream Lengths filled out:

image (2)

Testing Instructions

  • Checkout this branch and run the server $ ./scripts/debugserver.sh
  • Go to :8000/ and turn on the HUC-12 boundary layer in the layer selector
  • Draw a shape that intersects 2 or more HUC-12s
  • Download a GeoJSON of the shape
  • Using your user's MMW API Token, fire off a request to the /mmw/modeling/worksheet/ endpoint with the GeoJSON of the shape as the body
    • Ensure you see a Celery-like output
    • Ensure you see the cut shapes paired with their HUC-12s in the debugserver output
  • Fire off a request to the /export/worksheet/ endpoint, with an array of objects like [{},{}] as the body
    • Ensure you get a ZIP file in response
    • Ensure it has as many Excel worksheets as objects in the input array
    • Ensure the Excel worksheets have values in the stream length section of the MMW Output tab
This library will be used to write an excel file
with BMP values. Developed by the Academy of
Natural Sciences. Source:

https://github.com/TheAcademyofNaturalSciences/BMPxlsx
@rajadain rajadain added the PA DEP label Oct 30, 2019
@rajadain rajadain requested a review from mmcfarland Oct 30, 2019
Copy link
Member

mmcfarland left a comment

Working well. The worksheet endpoint is taking ~20s on my local machine, the download endpoint about ~12s for 5 workbooks. I'm a little concerned about fitting that into a request/response from the UI. If we expect it to increase when we're fetching real values, we may have be prepared for alternative workflows. The zip file is 400kb on disk, probably not a memory concern at the expected level of usage.

➜  mmw git:(tt/excel-generation-endpoint) ✗ time http :8000/api/modeling/worksheet/ Authorization:"Token $MMW_API_TOKEN" < ~/Downloads/Selected\ Area.geojson
HTTP/1.1 200 OK
Allow: POST, OPTIONS
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Wed, 30 Oct 2019 15:40:32 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
Vary: Accept, Cookie, Origin

{
    "job": "00000000-0000-0000-0000-000000000000",
    "status": "started"
}

http :8000/api/modeling/worksheet/ Authorization:"Token $MMW_API_TOKEN" <   
0.40s user 
0.11s system 2% cpu 
19.213 total
➜  mmw git:(tt/excel-generation-endpoint) ✗ time http --download :8000/export/worksheet/ < test.json
HTTP/1.1 200 OK
Allow: POST, OPTIONS
Connection: keep-alive
Content-Disposition: attachment; filename="MMW_BMP_Spreadsheets.zip"
Content-Length: 406882
Content-Type: application/zip
Date: Wed, 30 Oct 2019 15:46:19 GMT
Server: nginx
Vary: Accept, Cookie

Downloading 397.35 kB to "MMW_BMP_Spreadsheets.zip-2"
Done. 397.35 kB in 0.00506s (76.64 MB/s)
http --download :8000/export/worksheet/ < test.json  
0.40s user 
0.07s system 3% cpu 
11.977 total
@@ -33,6 +36,7 @@
SHAPEFILE_EXTENSIONS = ['cpg', 'dbf', 'prj', 'shp', 'shx']
DEFAULT_KEYWORDS = {'mmw', 'model-my-watershed'}
MMW_APP_KEY_FLAG = '{"appkey": "model-my-watershed"}'
EXCEL_TEMPLATE = '/opt/app/worksheet/MMW_BMP_Spreadsheet_Tool.xlsx'

This comment has been minimized.

Copy link
@mmcfarland

mmcfarland Oct 30, 2019

Member

Is this path portable between environments (dev and aws)? My initial instinct would be to make this resource in a directory relative to the views.py file. Is there precedent for either approach in the codebase?

This comment has been minimized.

Copy link
@rajadain

rajadain Oct 31, 2019

Author Member

This path is portable, but doing something relative is probably better, and has precedent:

# Full perimeter of the Delaware River Basin (DRB).
drb_perimeter_path = join(dirname(abspath(__file__)), 'data/drb_perimeter.json')
drb_perimeter_file = open(drb_perimeter_path)
drb_perimeter = json.load(drb_perimeter_file)

Will change.

# Create a zip file in memory for all the worksheets
stream = StringIO.StringIO()
with zipfile.ZipFile(stream, 'w') as zf:
for fpath in worksheets:
_, fname = os.path.split(fpath)
zf.write(fpath, fname)
os.remove(fpath)
Comment on lines +197 to +204

This comment has been minimized.

Copy link
@mmcfarland

mmcfarland Oct 30, 2019

Member

Do you have a sense of the memory consumption of these files in a worst case scenario? Curious to see what kind of load we could expect that would put memory pressure on the app servers. I expect the worksheet is relatively small, and that the likely number of intersected HUCs wouldn't exceed single digits.

If we have access to a logger, it would be interesting to DEBUG the number files created in the zip so we can monitor use metrics.

This comment has been minimized.

Copy link
@rajadain

rajadain Oct 31, 2019

Author Member

I'll make a card to add some monitoring. Also to see if we can limit the size of the smaller area of interest, which should protect from being too large and intersecting too many HUC-12s.

@mmcfarland mmcfarland assigned rajadain and unassigned mmcfarland Oct 30, 2019
find which HUC-12s intersect with it. For every HUC-12 that intersects
with the area of interest, we make pairs of thue HUC-12 and the clipped
area which is contained within it. For every pair, we make a BMP Worksheet.
Finally, all the worksheets are collected in a ZIP file and downloaded.

This comment has been minimized.

Copy link
@rajadain

rajadain Oct 30, 2019

Author Member

Should update this comment to say that we return a JSON dictionary of values, not the ZIP file from this endpoint.

@rajadain

This comment has been minimized.

Copy link
Member Author

rajadain commented Oct 31, 2019

Posted fixups for:

  1. Moving the Excel template from /src/mmw/worksheet/ to /src/mmw/apps/export/templates/
    c676bec
  2. Updating export/views.py to use a relative path instead b29f340
  3. Updating geoprocessing_api/views.py to have a more accurate comment description
    012cd4e
rajadain added 4 commits Oct 25, 2019
This will be available at /api/modeling/worksheet/,
the first modeling method available in the public API.
This is done for future compatibility, where we expect
all modeling methods to migrate to eventually. Also
because the input this takes (just a shape) is closer
to an Analyze method than the existing Modeling methods.
The input shape is intersected with HUC-12s, and for as many
it intersects, we make a pair of the HUC-12 with the clipping
of the shape to that HUC-12.

For every such pair, we will calculate a dictionary of values
used to generate an Excel sheet.
This is also available in https://github.com/WikiWatershed/MMW-BMP-spreadsheet-tool/,
although the version there is older than the one we are
supposed to use, so the newer version is added directly
to the repo.

From a runtime perspective, it is better to have the template
available immediately on disk for dynamic generation, rather
than downloading it from GitHub every time. This protects us
from GitHub outages.

This does cause some redundancy, and the need to keep the two
versions in sync. An alternative to this direct inclusion is
to include the source repo as a pinned submodule, and update
the submodule SHA reference when that repo is updated.

However, given that the original repo has an older version
of the file, I'm including the new version here for now. The
strategy for maintenance going forward is a policy decision
that should be made by the team.
The /api/modeling/worksheet endpoint will be used to
generate a JSON blob of values, which will then be
submitted to this endpoint to generate the actual
spreadsheets.

The expected input is still to be clarified, but will
be something along the lines of this:

[
	{
		"huc12": { ... },
		"aoi": { ... },
	},
	{
		"huc12": { ... },
		"aoi": { ... },
	},
]

For every pair of values, we copy the worksheet template
and fill in the values, save the new files in a temporary
directory, zip them all up in memory, delete the directory,
and return the zip file.
@rajadain rajadain force-pushed the tt/excel-generation-endpoint branch from 012cd4e to 699cbc9 Oct 31, 2019
@rajadain

This comment has been minimized.

Copy link
Member Author

rajadain commented Oct 31, 2019

Squashed fixups. Merging now.

@rajadain rajadain merged commit 047f666 into develop Oct 31, 2019
@rajadain rajadain deleted the tt/excel-generation-endpoint branch Oct 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.