Skip to content
This repository has been archived by the owner on Oct 21, 2019. It is now read-only.

Correct paths resulting package.zip #17

Open
meyer1994 opened this issue Jul 12, 2019 · 12 comments
Open

Correct paths resulting package.zip #17

meyer1994 opened this issue Jul 12, 2019 · 12 comments

Comments

@meyer1994
Copy link

meyer1994 commented Jul 12, 2019

I had some trouble creating a layer using the script at this repo.

According to AWS documentation, contents of package.zip will be extracted to /opt. So the correct environment variables, GDAL_DATA, and my inclusion, PROJ_LIB, should point to /opt/gdal and /opt/proj. I have included the PROJ_LIB variable to avoid the proj.db not found error.

Besides, for the python packages, instead of zipping them to the root address of the zip file, we should zip them to a python directory at root level.

In the end, it should look like this:

package.zip
├── bin     # executables
├── lib     # libraries
├── lib64   # libraries 64-bit
├── python  # python packages
└── share   # shared libraries

And last, installing rasterio with --no-binary raises the error of rasterio._shim module not existing. Installing with pip install rasterio fixes it. Also, I do not think is necessairy to install rasterio[s3] because boto3 is already included in python Lambdas by default.

After doing all the changes I have mentioned above, I've successfully created a layer that I am currently using.

@meyer1994 meyer1994 changed the title Correct paths for package.zip Correct paths resulting package.zip Jul 12, 2019
@vincentsarago
Copy link
Member

Hi @meyer1994, thanks for the issue. I was looking at lambda layer this week so this is a good reminder that I have to do it for real.

maybe you could submit a PR documenting how to create a layer in https://github.com/RemotePixel/amazonlinux-gdal#create-a-lambda-layer

And last, installing rasterio with --no-binary raises the error of rasterio._shim module not existing. Installing with pip install rasterio fixes it. Also, I do not think is necessairy to install rasterio[s3] because boto3 is already included in python Lambdas by default.

well if you don't use --no-binary you'll end up downloading rasterio wheels (with internal GDAL) thus 2x the size of the package. To fix it you need to have GDAL_DATA env set to the correct path

@meyer1994
Copy link
Author

meyer1994 commented Jul 12, 2019

Thanks for replying @vincentsarago

well if you don't use --no-binary you'll end up downloading rasterio wheels (with internal GDAL) thus 2x the size of the package. To fix it you need to have GDAL_DATA env set to the correct path

I've noticed that. The folder /rasterio/.libs takes a lot of space. I've installed rasterio and the GDAL_DATA variable is the same as the original container.

See:

$ docker run -it remotepixel/amazonlinux-gdal:3.0.1 /bin/bash
bash-4.2# echo $GDAL_DATA 
/var/task/share/gdal

Am I doing something wrong? I could not fix this issue :(

About the pull request, I can do it, sure. But I think we should resolve the above problem first. The resulting package.zip that I create is almost 50Mb. Most of it being the /rasterio/.libs directory.


Here is the script that I have used to generate the package.zip file. It is a modified version of yours.

docker run --name lambda -itd remotepixel/amazonlinux-gdal:3.0.1 /bin/bash

docker exec -it lambda bash -c 'mkdir python'
docker exec -it lambda bash -c 'pip install numpy gdal==3.0.1 rasterio --no-binary numpy,gdal -t python -U'

docker exec -it lambda bash -c 'zip -r9 /tmp/package.zip python'

docker exec -it lambda bash -c 'zip -r9 --symlinks /tmp/package.zip lib/*.so*'
docker exec -it lambda bash -c 'zip -r9 --symlinks /tmp/package.zip lib64/*.so*'
docker exec -it lambda bash -c 'zip -r9 --symlinks /tmp/package.zip bin'
docker exec -it lambda bash -c 'zip -r9 /tmp/package.zip share'

docker cp lambda:/tmp/package.zip package.zip

docker stop lambda
docker rm lambda

@vincentsarago
Copy link
Member

Oh I see you are trying to use GDAL>2.4 that's why rasterio is not happy.

rasterio doesn't support GDAL3 yet
from rasterio doc:

Rasterio 1.0.x works with Python versions 2.7.x and 3.5.0 through 3.7.x, and GDAL versions 1.11.x through 2.4.x. Official binary packages for Linux and Mac OS X are available on PyPI. Unofficial binary packages for Windows are available through other channels.

Rasterio 1.0.x is not compatible with GDAL versions 3.0.0 or greater.

@meyer1994
Copy link
Author

It seems that rasterio is on the path of being updated to support GDAL 3. See this pr.

Do you think it worth the wait for building the proper documentation?

@vincentsarago
Copy link
Member

I won't be to optimistic with a date, Sean is pretty busy so I'm not sure when this PR will move forward.

do we have to ship the layer with rasterio ? @meyer1994

@vincentsarago
Copy link
Member

vincentsarago commented Jul 12, 2019

oh before I forget,
when building GDAL we are using some custom settings #7 (comment)

this might break in the lambda layer

https://github.com/RemotePixel/amazonlinux-gdal#package-architecture-and-aws-lambda-config

@meyer1994
Copy link
Author

After doing some tests:

Note, I am using chalice for deployment.

GDAL python bindings

To use it, we must ship the layer with the bindings already installed in it. Inside the container:

$ mkdir python
$ pip install -U numpy gdal --no-binary :all: -t python/
$ zip -r9 --symlinks /tmp/package.zip python
$ # ...zip libs and cp package.zip

If we do not install the bindings inside the container, we get the following error when importing from osgeo during Lambda execution:

[ERROR] Runtime.ImportModuleError: Unable to import module 'app': libgdal.so.26: cannot open shared object file: No such file or directory

Even setting configs in requirements.txt does not do the trick:

numpy
gdal==3.0.0
--no-binary=:all:

Only works when installing directly into the container and using in the layer.


Rasterio

It is not necessairy to include the rasterio pacakge in the layer. I tested by installing it locally, with --no-binary :all: and shipping with chalice. It worked nicely.

@meyer1994
Copy link
Author

meyer1994 commented Jul 12, 2019

Maybe is possible to include the python bindings when building GDAL?

According to PIP readme:

You can also have the GDAL Python bindings built as part of a source build by specifying –with-python as part of your configure line:
$ ./configure --with-python

I have not tested it.

@vincentsarago
Copy link
Member

@meyer1994

Maybe is possible to include the python bindings when building GDAL?

Yes it's possible but to be honest the goal of this repo is to provide the minimal configuration docker container to create lambda package (either with rasterio or gdal) so I'm not sure I'm going to add any default here.

let me see if I have to time to do something this weekend.

@meyer1994
Copy link
Author

Fair enough.

I've only said because I only use gdal with it's python bindings. hehe.

@dazza-codes
Copy link

dazza-codes commented Aug 6, 2019

RE lambda layer:

  • is there a PR to fix this issue? Is it only documentation or code fixes too?
  • should the Dockerfile use ENV PREFIX=/opt for a lambda layer (vs runtime task)?
    • all the shared libs (*.so) will have links to the PREFIX path, right?
    • if so, the script to package the layer can't rely on the runtime PREFIX
  • the AWS base lambda layer already provides numpy and scipy in /opt/python, right?
  • does this project publish lambda layer ARNs?
    • might need "runtime" vs. "layer" builds (images)
  • I'd like to adapt the following medium blog post to use a docker image from this project to add pandas and rasterio, but not sure it's going to work without using a PREFIX of /opt for the base image and related ENV settings to get all the links right

@dazza-codes dazza-codes mentioned this issue Aug 7, 2019
2 tasks
@meyer1994
Copy link
Author

I've just created a PR #24 explaining how to create a layer.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants