Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
GDAL does not refresh IAMRole creds on EC2 or ECS after 6 hours #1593
GDAL version 2.4.0
After 6 hours GDAL will fail to talk to S3 and end due to continuous failure.
Haven't been able to generate useful logs at this stage but running GDAL in debug mode now.
Confirmed issue is not present when using environmental variables with AWS keys and no IAMrole attached.
Expected behavior - refresh AWS temp credentials token correctly when required.
the process which uncovered this behaviour is using gdal’s vsis3 driver to open warped virtual mosaics held on s3, which in turn reference imagery held on s3, and do the delayed-compute warping and clipping specified by the VRTs. The process can take a while - and as Chris mentioned, credentials time out.
Update: I now have reason to beileve this fix is sound and the problem I am experiencing lies elsewhere. See following update for more info and also https://lists.osgeo.org/pipermail/gdal-dev/2020-February/051719.html
My experience with this fix is as follows:
Now, I can't completely rule out that I have made a dumb error somewhere along the line, but I am reasonably sure I am running code derived from the gdal-2.4.2 source. If I can produce a standalone test-case for the problem I will raise a separate issue documenting that.
I am noting these issues here, for consideration of others who have this fix and are still experiencing similar issues.
A further update to above. I have now experienced the same symptoms "ERROR 4:" even when using the explicit AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables in my container (e.g. not using IAM roles). The error did seem to happen after a long pause in system usage, so still seems to be related to some kind of timeout, but doesn't seem to be explained by the timeout of IAM role credentials time out since in theory I am not using them currently.
I am able to exercise the gdal.VSICurlClearCache() call inside my container and when I exercised that call, the symptom disappeared and I was able to access the previously failing file successfully.
So, the summary is, it could well be that the fix is sound, but that there is a second issue which causes similar symptoms even if IAM role credentials are not in use.