Skip to content

Latest commit

 

History

History
 
 

metadata-service

Metadata Service

The Metadata Service is a central store for the Metaflow metadata. Namely, it contains information about past runs, and pointers to data artifacts they produced. Metaflow client talks to the Metadata service over an HTTP API endpoint. Metadata service is not strictly required to use Metaflow (you can use Metaflow in the "local" mode without it), but it enables a lot of useful functionality, especially if there is more than person using Metaflow in your team.

This terraform module provisions infrastructure to run Metadata service on AWS Fargate.

To read more, see the Metaflow docs

Access control

If the access_list_cidr_blocks variable is set, only traffic originating from the specified IP addresses will be accepted. Services internal to AWS can directly access the load balancer used by the API.

Inputs

Name Description Type Default Required
access_list_cidr_blocks List of CIDRs we want to grant access to our Metaflow Metadata Service. Usually this is our VPN's CIDR blocks. list(string) n/a yes
database_name The database name string "metaflow" no
database_password The database password string n/a yes
database_username The database username string n/a yes
datastore_s3_bucket_kms_key_arn The ARN of the KMS key used to encrypt the Metaflow datastore S3 bucket string n/a yes
enable_api_basic_auth Enable basic auth for API Gateway? (requires key export) bool true no
enable_api_gateway Enable API Gateway for public metadata service endpoint bool true no
fargate_execution_role_arn The IAM role that grants access to ECS and Batch services which we'll use as our Metadata Service API's execution_role for our Fargate instance string n/a yes
iam_partition IAM Partition (Select aws-us-gov for AWS GovCloud, otherwise leave as is) string "aws" no
is_gov Set to true if IAM partition is 'aws-us-gov' bool false no
metadata_service_container_image Container image for metadata service string "" no
metadata_service_cpu ECS task CPU unit for metadata service number 512 no
metadata_service_memory ECS task memory in MiB for metadata service number 1024 no
metaflow_vpc_id ID of the Metaflow VPC this SageMaker notebook instance is to be deployed in string n/a yes
rds_master_instance_endpoint The database connection endpoint in address:port format string n/a yes
resource_prefix Prefix given to all AWS resources to differentiate between applications string n/a yes
resource_suffix Suffix given to all AWS resources to differentiate between environment and workspace string n/a yes
s3_bucket_arn The ARN of the bucket we'll be using as blob storage string n/a yes
standard_tags The standard tags to apply to every AWS resource. map(string) n/a yes
subnet1_id First private subnet used for availability zone redundancy string n/a yes
subnet2_id Second private subnet used for availability zone redundancy string n/a yes
vpc_cidr_blocks The VPC CIDR blocks that we'll access list on our Metadata Service API to allow all internal communications list(string) n/a yes
with_public_ip Enable public IP assignment for the Metadata Service. Typically you want this to be set to true if using public subnets as subnet1_id and subnet2_id, and false otherwise bool n/a yes

Outputs

Name Description
METAFLOW_SERVICE_INTERNAL_URL URL for Metadata Service (Accessible in VPC)
METAFLOW_SERVICE_URL URL for Metadata Service (Open to Public Access)
api_gateway_rest_api_id The ID of the API Gateway REST API we'll use to accept MetaData service requests to forward to the Fargate API instance
api_gateway_rest_api_id_key_id API Gateway Key ID for Metadata Service. Fetch Key from AWS Console [METAFLOW_SERVICE_AUTH_KEY]
metadata_service_security_group_id The security group ID used by the MetaData service. We'll grant this access to our DB.
metadata_svc_ecs_task_role_arn This role is passed to AWS ECS' task definition as the task_role. This allows the running of the Metaflow Metadata Service to have the proper permissions to speak to other AWS resources.
migration_function_arn ARN of DB Migration Function
network_load_balancer_dns_name The DNS addressable name for the Network Load Balancer that accepts requests and forwards them to our Fargate MetaData service instance(s)