Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source buckets with nested folder directories and large zip files fail to scan #54

Closed
mascarim9 opened this issue Jun 16, 2021 · 2 comments

Comments

@mascarim9
Copy link

mascarim9 commented Jun 16, 2021

When testing a deployment of this construct, I tested the source bucket with 3 different configurations. 1 with a relatively large zip file (7.5 GB), 1 with a smaller zip file (1.5 GB), and then a nested structure of images (each image about 17kb).

The smaller (1.5 GB) zip files and individual images, when placed at the root of the bucket, passed the scan.

The larger zip files and the nested structures returned the following errors respectively.....

[ERROR] FileNotFoundError: [Errno 2] No such file or directory: '7za'
Traceback (most recent call last):
  File "/var/lang/lib/python3.8/site-packages/aws_lambda_powertools/metrics/metrics.py", line 184, in decorate
    response = lambda_handler(event, context)
  File "/var/lang/lib/python3.8/site-packages/aws_lambda_powertools/logging/logger.py", line 347, in decorate
    return lambda_handler(event, context)
  File "/var/task/lambda.py", line 76, in lambda_handler
    expand_if_large_archive(
  File "/var/task/lambda.py", line 142, in expand_if_large_archive
    archive_summary = subprocess.run(
  File "/var/lang/lib/python3.8/subprocess.py", line 493, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/var/lang/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/var/lang/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
[ERROR] FileNotFoundError: [Errno 2] No such file or directory: '/mnt/lambda/b0289808-bb0c-4d94-934f-d08e8c27ad2a/nestedfoldercheck/0000001.jpg.262a6F0b'
Traceback (most recent call last):
  File "/var/lang/lib/python3.8/site-packages/aws_lambda_powertools/metrics/metrics.py", line 184, in decorate
    response = lambda_handler(event, context)
  File "/var/lang/lib/python3.8/site-packages/aws_lambda_powertools/logging/logger.py", line 347, in decorate
    return lambda_handler(event, context)
  File "/var/task/lambda.py", line 75, in lambda_handler
    download_object(input_bucket, input_key, payload_path)
  File "/var/task/lambda.py", line 119, in download_object
    s3_resource.Bucket(input_bucket).download_file(
  File "/var/lang/lib/python3.8/site-packages/boto3/s3/inject.py", line 244, in bucket_download_file
    return self.meta.client.download_file(
  File "/var/lang/lib/python3.8/site-packages/boto3/s3/inject.py", line 170, in download_file
    return transfer.download_file(
  File "/var/lang/lib/python3.8/site-packages/boto3/s3/transfer.py", line 307, in download_file
    future.result()
  File "/var/lang/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/var/lang/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/var/lang/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/var/lang/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/var/lang/lib/python3.8/site-packages/s3transfer/download.py", line 571, in _main
    fileobj.seek(offset)
  File "/var/lang/lib/python3.8/site-packages/s3transfer/utils.py", line 367, in seek
    self._open_if_needed()
  File "/var/lang/lib/python3.8/site-packages/s3transfer/utils.py", line 350, in _open_if_needed
    self._fileobj = self._open_function(self._filename, self._mode)
  File "/var/lang/lib/python3.8/site-packages/s3transfer/utils.py", line 261, in open
    return open(filename, mode)
@mascarim9 mascarim9 changed the title Source buckets with nested folder directories fail to scan Source buckets with nested folder directories large zip files fail to scan Jun 16, 2021
@mascarim9 mascarim9 changed the title Source buckets with nested folder directories large zip files fail to scan Source buckets with nested folder directories and large zip files fail to scan Jun 16, 2021
@dontirun
Copy link
Contributor

It looks like there are 2 separate issues occurring.

  1. Prefixes in S3 are not properly processed
  2. Large files are not properly processed

I'm going to close this issue in favor of creating 2 separate issues

@dontirun
Copy link
Contributor

closing in favor of #55 and #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants