Script to create metadata analytics table and write to redshift table.
This script will parse through a list of s3 buckets and document whether data asset records in each of those buckets does or does not contain metadata.nd.json
- Define the environment variables in the
.env.template
- REDSHIFT_SECRETS_NAME: defining secrets name for Amazon Redshift
- BUCKETS: list of buckets. comma separated format (ex: "bucket_name1, bucket_name2")
- TABLE_NAME: name of table in redshift
- FOLDERS_FILEPATH: Intended filepath for txt file
- METADATA_DIRECTORY: Intended path for directory containing copies of metadata records
- AWS_DEFAULT_REGION: Default AWS region.
- Records containing metadata.nd.json file will be copies to
METADATA_DIRECTORY
and compared against list of all records inFOLDERS_FILEPATH
- An analytics table containing columns
s3_prefix
,bucket_name
, andmetadata_bool
will be written toTABLE_NAME
in Redshift
- It's a bit tedious, but the dependencies listed in the
pyproject.toml
file needs to be manually updated