Clarify documentation to note that ingesting s3 access logs from a sub-prefix inside prefix does not work 

I ran into an issue where we aggregate S3 distribution logs from a variety of sources into one account, and the logs are broken down into sub-prefix:

```
s3://<log-bucket>/s3_distribution_logs/<deployment-name-sub-prefix>/<log-file-name>
```

I was trying to run a single Glue job for `s3://<log-bucket>/s3_distribution_logs/` to populate all `<deployment-name-sub-prefix>` logs into the same `CONVERTED_TABLE_NAME`. In this case, the `RAW_TABLE_NAME` athena table was getting populated, the job would initially error withe the below error, then on subsequent runs would run "successfully". Unfortunately, I wouldn't get any logs into my `CONVERTED_TABLE_NAME` Athena table. 

With continuous logging enabled, and a little tinkering, I tracked the issue down to [`_get_first_key_in_prefix()`](https://github.com/awslabs/athena-glue-service-logs/blob/v6.0.0/athena_glue_service_logs/utils.py#L113):
```
line 128, in _get_first_key_in_prefix
    first_object = response.get('Contents')[0].get('Key')
    TypeError: 'NoneType' object is not subscriptable
```
The values going into `self.s3_client.list_objects_v2(**query_params)` were:
```
{'Bucket': 'reformated-log-bucket', 'Prefix': 's3_access/', 'MaxKeys': 10}
```
from the `glue_jobs.json`:
```
"S3_CONVERTED_TARGET":"s3://reformated-log-bucket/s3_access/"
```

Its entirely unclear to my why, since I'm _VERY_ new to both this project and Glue in general, but if I supply 
```
"S3_SOURCE_LOCATION":"s3://<log-bucket>/s3_distribution_logs/<deployment-name-sub-prefix>/"
```
instead of:
```
"S3_SOURCE_LOCATION":"s3://<log-bucket>/s3_distribution_logs/"
```
...it just work. Albeit with a smaller subset of data than I wanted. This may also be related to #30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify documentation to note that ingesting s3 access logs from a sub-prefix inside prefix does not work #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify documentation to note that ingesting s3 access logs from a sub-prefix inside prefix does not work #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions