aind-data-asset-indexer

Script to create metadata analytics table and write to redshift table. This script will parse through a list of s3 buckets and document whether data asset records in each of those buckets does or does not contain metadata.nd.json

Usage

Define the environment variables in the .env.template
- REDSHIFT_SECRETS_NAME: defining secrets name for Amazon Redshift
- BUCKETS: list of buckets. comma separated format (ex: "bucket_name1, bucket_name2")
- TABLE_NAME: name of table in redshift
- FOLDERS_FILEPATH: Intended filepath for txt file
- METADATA_DIRECTORY: Intended path for directory containing copies of metadata records
- AWS_DEFAULT_REGION: Default AWS region.
Records containing metadata.nd.json file will be copies to METADATA_DIRECTORY and compared against list of all records in FOLDERS_FILEPATH
An analytics table containing columns s3_prefix, bucket_name, and metadata_bool will be written to TABLE_NAME in Redshift

Development

It's a bit tedious, but the dependencies listed in the pyproject.toml file needs to be manually updated

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github		.github
src/aind_data_asset_indexer		src/aind_data_asset_indexer
tests		tests
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aind-data-asset-indexer

Usage

Development

About

Releases

Packages 1

Contributors 5

Languages

License

AllenNeuralDynamics/aind-data-asset-indexer

Folders and files

Latest commit

History

Repository files navigation

aind-data-asset-indexer

Usage

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 1

Contributors 5

Languages