For more information please refer to the main Apiary project page.

Environment Variables

Environment Variable Required Description
APIARY_S3_INVENTORY_PREFIX No Prefix used by S3 Inventory when creating data in the inventory bucket. Default is EntireBucketDaily.
APIARY_S3_INVENTORY_TABLE_FORMAT No Format of S3 inventory data. Valid options are ORC, Parquet, or CSV. Default is ORC.
APIARY_SYSTEM_SCHEMA No Name for internal system database. Default is apiary_system.
AWS_REGION Yes AWS region to configure various AWS clients.
AWS_WEB_IDENTITY_TOKEN_FILE No Path of the AWS Web Identity Token File for IRSA/OIDC AWS authentication.
DATANUCLEUS_CONNECTION_POOLING_TYPE No Type of connection pooling. Valid options are BoneCP, DBCP, DBCP2, C3P0, HikariCP.
DATANUCLEUS_CONNECTION_POOL_MAX_POOLSIZE No Maximum pool size for the connection pool.
DATANUCLEUS_CONNECTION_POOL_MIN_POOLSIZE No Minimum pool size for the connection pool.
DATANUCLEUS_CONNECTION_POOL_INITIAL_POOLSIZE No Initial pool size for the connection pool (C3P0 only).
DATANUCLEUS_CONNECTION_POOL_MAX_IDLE No Maximum idle connections for the connection pool.
DATANUCLEUS_CONNECTION_POOL_MIN_IDLE No Minimum idle connections for the connection pool.
DATANUCLEUS_CONNECTION_POOL_MIN_ACTIVE No Maximum active connections for the connection pool (DBCP/DBCP2 only).
DATANUCLEUS_CONNECTION_POOL_MAX_WAIT No Maximum wait time for the connection pool (DBCP/DBCP2 only).
DATANUCLEUS_CONNECTION_POOL_VALIDATION_TIMEOUT No Validation timeout for the connection pool (DBCP/DBCP2/HikariCP only).
DATANUCLEUS_CONNECTION_POOL_LEAK_DETECTION_THRESHOLD No Leak detection threshold for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_LEAK_MAX_LIFETIME No Maximum lifetime for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_AUTO_COMMIT No Auto commit for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_IDLE_TIMEOUT No Idle timeout for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_CONNECTION_WAIT_TIMEOUT No Connection wait timeout for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_READ_ONLY No Read only mode for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_NAME No Connection pool name (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_CATALOG No Connection pool catalog (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_REGISTER_MBEANS No Register MBeans for the connection pool (HikariCP only).
DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES No true/false value for hive.metastore.disallow.incompatible.col.type.changes, default true.
ENABLE_GLUESYNC No Option to turn on GlueSync Hive Metastore listener.
ENABLE_HIVE_LOCK_HOUSE_KEEPER No Option to turn on Hive Metastore Hive Lock House Keeper.
ENABLE_METRICS No Option to enable sending Hive Metastore and JMX metrics to Prometheus.
ENABLE_S3_INVENTORY No Option to create Hive tables on top of S3 inventory data if enabled in apiary-data-lake. Enabled if value is not null/empty.
ENABLE_S3_LOGS No Option to create Hive tables on top of S3 access logs data if enabled in apiary-data-lake. Enabled if value is not null/empty.
EXTERNAL_DATABASE No Option to enable external database mode, when specified it disables managing Hive Metastore MySQL database schema.
GLUE_PREFIX No Prefix added to Glue databases to handle database name collisions when synchronizing multiple Hive Metastores to the Glue catalog.
HADOOP_HEAPSIZE No Hive Metastore Java process heapsize. Default is 1024.
HMS_AUTOGATHER_STATS No Whether or not to create basic statistics on table/partition creation. Valid values are true or false. Default is true.
LIMIT_PARTITION_REQUEST_NUMBER No To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value -1 means no limit. The limit on partitions does not affect metadata-only queries.
HIVE_METASTORE_ACCESS_MODE No Hive Metastore access mode, applicable values are: readwrite, readonly.
HIVE_DB_NAMES No Comma separated list of Hive database names, when specified Hive databases will be created and mapped to corresponding S3 buckets.
HIVE_METASTORE_LOG_LEVEL No Hive Metastore service Log4j log level. Default is INFO.
HMS_MIN_THREADS No Minimum size of the Hive metastore thread pool. Default is 200.
HMS_MAX_THREADS No Maximum size of the Hive metastore thread pool. Default is 1000.
INSTANCE_NAME Yes Apiary instance name, will be used as prefix on most AWS resources to allow multiple Apiary instance deployments.
KAFKA_BOOTSTRAP_SERVERS No Kafka Bootstrap Servers to enable Kafka Metastore listener and send Metastore events to Kafka.
KAFKA_CLIENT_ID No Kafka label you define that names the Kafka producer.
KAFKA_COMPRESSION_TYPE No Kafka Compression type, if none is specified there is no compression enabled. Values available are gzip, lz4 and snappy. Default is 1048576.
KAFKA_MAX_REQUEST_SIZE No The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum uncompressed record batch size.
LDAP_BASE No LDAP base DN used to search for user groups.
LDAP_CA_CERT No Base64 encoded Certificate Authority Bundle to validate LDAP SSL connection.
LDAP_SECRET_ARN No LDAP bind DN SecretsManager secret ARN.
LDAP_URL No Active Directory URL to enable group mapping in metastore.
MYSQL_CONNECTION_DRIVER_NAME No Hive Metastore MySQL database JDBC connection Driver Name. Default is com.mysql.jdbc.Driver.
MYSQL_CONNECTION_POOL_SIZE No MySQL Connection pool size for Hive Metastore. Default is 10. See here for more info.
MYSQL_DB_HOST Yes Hive Metastore MySQL database hostname.
MYSQL_DB_NAME Yes Hive Metastore MySQL database name.
MYSQL_SECRET_ARN Yes Hive Metastore MySQL SecretsManager secret ARN.
MYSQL_SECRET_USERNAME_KEY No Hive Metastore MySQL SecretsManager secret username key. Default is username.
MYSQL_TYPE No Hive Metastore MySQL database Type (mariadb, mysql). Default is mysql.
MYSQL_DRIVER_JAR No Hive Metastore MySQL connector JAR location. Default is /usr/share/java/mysql-connector-java.jar.
RANGER_AUDIT_DB_URL No Ranger audit database JDBC URL.
RANGER_AUDIT_SECRET_ARN No Ranger audit database secret ARN.
RANGER_POLICY_MANAGER_URL No Ranger admin URL from where policies will be downloaded.
RANGER_SERVICE_NAME No Ranger service name used to configure RangerAuth plugin.
SNS_ARN No The SNS topic ARN to which metadata updates will be


Mailing List

If you would like to ask any questions about or discuss Apiary please join our mailing list at!forum/apiary-user


This project is available under the Apache 2.0 License.

Copyright 2018-2019 Expedia, Inc.