Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting a 500 when using the newest version of Rasterio (1.0.22) #139

Closed
vlro opened this issue Apr 10, 2019 · 24 comments
Closed

Getting a 500 when using the newest version of Rasterio (1.0.22) #139

vlro opened this issue Apr 10, 2019 · 24 comments
Labels
bug Something isn't working

Comments

@vlro
Copy link
Contributor

vlro commented Apr 10, 2019

When deploying a Lambda function through Zappa, using the newest version of Rasterio (1.0.22), we get a 500 for some tiles, while others load fine.

When reverting back to Rasterio 1.0.13 everything works fine.

zappa tail gives the following:

[1554888957364] [ERROR] 2019-04-10T09:35:57.326Z b24c3c65-85ed-4390-a18b-c0cf347feec0 Exception on /rgb/italy/2018/7/70/47.png [GET]
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 213, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsis3/italy-composite/rasters/italy_2018_red.tif' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/terracotta/drivers/raster_base.py", line 464, in _get_raster_tile
  src = es.enter_context(rasterio.open(path))
  File "/var/task/rasterio/env.py", line 423, in wrapper
  return f(*args, **kwds)
  File "/var/task/rasterio/__init__.py", line 216, in open
  s = DatasetReader(path, driver=driver, **kwargs)
  File "rasterio/_base.pyx", line 215, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: '/vsis3/italy-composite/rasters/italy_2018_red.tif' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/flask/app.py", line 2292, in wsgi_app
  response = self.full_dispatch_request()
  File "/var/task/flask/app.py", line 1815, in full_dispatch_request
  rv = self.handle_user_exception(e)
  File "/var/task/flask/app.py", line 1718, in handle_user_exception
  reraise(exc_type, exc_value, tb)
  File "/var/task/flask/_compat.py", line 35, in reraise
  raise value
  File "/var/task/flask/app.py", line 1813, in full_dispatch_request
  rv = self.dispatch_request()
  File "/var/task/flask/app.py", line 1799, in dispatch_request
  return self.view_functions[rule.endpoint](**req.view_args)
  File "/var/task/terracotta/server/flask_api.py", line 50, in inner
  return fun(*args, **kwargs)
  File "/var/task/terracotta/server/rgb.py", line 85, in get_rgb
  return _get_rgb_image(keys, tile_xyz=tile_xyz)
  File "/var/task/terracotta/server/rgb.py", line 135, in _get_rgb_image
  some_keys, rgb_values, stretch_ranges=stretch_ranges, tile_xyz=tile_xyz, **options
  File "/var/lang/lib/python3.6/contextlib.py", line 52, in inner
  return func(*args, **kwds)
  File "/var/task/terracotta/handlers/rgb.py", line 85, in rgb
  band_data = band_data_future.result()
  File "/var/lang/lib/python3.6/concurrent/futures/_base.py", line 432, in result
  return self.__get_result()
  File "/var/lang/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
  raise self._exception
  File "/var/lang/lib/python3.6/concurrent/futures/thread.py", line 56, in run
  result = self.fn(*self.args, **self.kwargs)
  File "/var/task/cachetools/__init__.py", line 87, in wrapper
  v = method(self, *args, **kwargs)
  File "/var/lang/lib/python3.6/contextlib.py", line 52, in inner
  return func(*args, **kwds)
  File "/var/task/terracotta/drivers/raster_base.py", line 466, in _get_raster_tile
  raise IOError('error while reading file {}'.format(path))
OSError: error while reading file s3://italy-composite/rasters/italy_2018_red.tif
@mrpgraae mrpgraae added the bug Something isn't working label Apr 10, 2019
@dionhaefner
Copy link
Collaborator

I think to make a proper bug report upstream we will need the offending raster, preferably in a public S3 bucket.

@mrpgraae
Copy link
Collaborator

Yes. I think @vlro is trying to find out the exact Rasterio version that introduced this error, so we can make a nicer upstream report.

@dionhaefner
Copy link
Collaborator

Tests are still passing, so this is probably going to be something weird...

@mrpgraae
Copy link
Collaborator

Tests are still passing, so this is probably going to be something weird...

Uh oh... 😄

@vlro
Copy link
Contributor Author

vlro commented Apr 10, 2019

@mrpgraae Looks like the error was introduced in 1.0.15, has been broken since. Works fine with 1.0.13 and 1.0.14

@vlro
Copy link
Contributor Author

vlro commented Apr 10, 2019

Rest of the environment looks like:

affine==2.2.2
apispec==0.39.0
argcomplete==1.9.3
attrs==18.2.0
aws-xray-sdk==2.2.0
awscli==1.16.85
backcall==0.1.0
boto3==1.9.75
botocore==1.12.75
cachetools==3.0.0
certifi==2018.11.29
cfn-flip==1.1.0.post1
chardet==3.0.4
Click==7.0
click-plugins==1.0.4
cligj==0.5.0
colorama==0.3.9
decorator==4.3.0
docutils==0.14
durationpy==0.5
Flask==1.0.2
Flask-Cors==3.0.7
future==0.17.1
hjson==3.0.1
idna==2.8
ipython==7.2.0
ipython-genutils==0.2.0
itsdangerous==1.1.0
jedi==0.13.2
Jinja2==2.10
jmespath==0.9.3
jsonpickle==1.0
kappa==0.6.0
lambda-packages==0.20.0
MarkupSafe==1.1.0
marshmallow==3.0.0rc2
mercantile==1.0.4
numpy==1.15.4
parso==0.3.1
pexpect==4.6.0
pickleshare==0.7.5
Pillow==5.4.1
pkg-resources==0.0.0
placebo==0.8.2
prompt-toolkit==2.0.7
ptyprocess==0.6.0
pyasn1==0.4.5
Pygments==2.3.1
pyparsing==2.3.0
python-dateutil==2.7.5
python-slugify==1.2.4
PyYAML==3.13
rasterio==1.0.14
requests==2.21.0
rsa==3.4.2
s3transfer==0.1.13
Shapely==1.6.4.post2
six==1.12.0
snuggs==1.4.2
terracotta==0.5.1
toml==0.10.0
tqdm==4.19.1
traitlets==4.3.2
troposphere==2.4.0
Unidecode==1.0.23
urllib3==1.24.1
wcwidth==0.1.7
Werkzeug==0.14.1
wrapt==1.10.11
wsgi-request-logger==0.4.6
zappa==0.47.1

Python 3.6.7

@dionhaefner
Copy link
Collaborator

Almost nothing changed between 1.0.14 and 1.0.15:

https://github.com/mapbox/rasterio/compare/1.0.14..1.0.15

I think we'll definitely need access to the offending raster to investigate what's going on.

@j08lue
Copy link
Collaborator

j08lue commented Apr 10, 2019

Maybe we can get rasterio in TC to log the window it is requesting for the respective tile, so we can reproduce the exact read call for a conflicting XYZ tile. Then we can also try this offline on the GeoTIFF on disk, rather than S3, and see if it also occurs there.

@dionhaefner
Copy link
Collaborator

We already know which tile is failing:

/rgb/italy/2018/7/70/47.png

You should try and serve it locally, but it seems to me like it is related to GDAL's S3 driver.

@j08lue
Copy link
Collaborator

j08lue commented Apr 10, 2019

Speaking of GDAL - could the different versions of rasterio be bundled with different versions of gdal (rasterio.__gdal_version__)?

@dionhaefner
Copy link
Collaborator

They definitely could. It's also my best guess for where this comes from all of a sudden.

@vincentsarago
Copy link

@j08lue do you install rasterio from wheels ?

If yes, gdal version for rasterio >= 1 is build with https://github.com/sgillies/rasterio-wheels and GDAL shipped with it is 2.4.0 (since January 22nd - rasterio ~1.14.0)

@mrpgraae
Copy link
Collaborator

I think we'll definitely need access to the offending raster to investigate what's going on.

@vlro could you put the raster in a public S3 bucket and paste the url here?

@mrpgraae
Copy link
Collaborator

@vincentsarago We do install from wheels. From @vlro's findings, it seems to work with rasterio 1.0.14 but not 1.0.15, so it would have made more sense if the GDAL version changed in 1.0.15.

@vlro
Copy link
Contributor Author

vlro commented Apr 11, 2019

1.0.14 and later is shipped with GDAL 2.4.0, 1.0.13 is shipped with GDAL 2.3.2
I'll do a bit more testing today and upload the raster to a public bucket.

@vlro
Copy link
Contributor Author

vlro commented Apr 11, 2019

@dionhaefner
Copy link
Collaborator

Could not reproduce this on my machine with rasterio 1.0.22 and GDAL 2.4.0.

Here's a database file with the test raster ingested if someone else wants to give it a shot:

issue139.zip

@mrpgraae
Copy link
Collaborator

mrpgraae commented Jul 2, 2019

Small update on this. I'm seeing quite a few changes to /vsiscurl/ and /vsis3/ in the GDAL 2.4.0 changelog. I'm currently trying to build the Rasterio wheels with an older GDAL version to see if that fixes it.

@mrpgraae
Copy link
Collaborator

mrpgraae commented Jul 2, 2019

@dionhaefner Did you ever manage to reproduce this?

@dionhaefner
Copy link
Collaborator

No. Could be anything at this point... Network issues of the workers, full disk, full memory. To properly investigate this we would need a hacked version of Terracotta that dumps some system diagnostics upon failure.

@mrpgraae
Copy link
Collaborator

mrpgraae commented Jul 2, 2019

Okay. The reason I'm asking is that I wasn't able to reproduce it using the preview app (with the same lambda deployment as our front-end) 1 hour ago, despite trying for a long time. So I was about to say that I also couldn't reproduce it. But now it's bugging out like crazy again 🤷‍♂

@mrpgraae
Copy link
Collaborator

mrpgraae commented Jul 2, 2019

Small update on this. I'm seeing quite a few changes to /vsiscurl/ and /vsis3/ in the GDAL 2.4.0 changelog. I'm currently trying to build the Rasterio wheels with an older GDAL version to see if that fixes it.

This didn't work, zappa tail tells me that rasterio fails upon import, due to not being able to find libgdal.so.20, even though I built the wheel to include GDAL.

Usually Zappa downloads a manylinux wheel rather than using the Rasterio installed in the venv and I had to use some trickery to get around that, but it seems like it actually has to be a manylinux wheel to work.

EDIT: nevermind, the problem was that the gdal libs weren't packaged into the wheel.

@mrpgraae
Copy link
Collaborator

mrpgraae commented Jul 3, 2019

Just did a quick check with rasterio==1.0 which uses GDAL 2.2.4 and the issue is still there.

@j08lue
Copy link
Collaborator

j08lue commented Oct 11, 2019

New insights. We can get rid of the issue with rasterio.errors.RasterioIOError: '/vsis3/italy-composite/rasters/italy_2018_red.tif' not recognized as a supported file format. by setting GDAL_DISABLE_READDIR_ON_OPEN=FALSE. We were using EMPTY_DIR before.

We now tried FALSE to see whether we would get a different error message than the not recognized as supported file format, following rasterio/rasterio#1706.

The EMPTY_DIR was recommended among others here: rasterio/rasterio#987.

Whether the change from EMPTY_DIR to FALSE actually prohibits the actual cause of the S3 read failure or just prevents some race condition because it changes the timing of things, we do not know. But it seems to work consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants