Skip to content

ExpediaGroup/apiary-metastore-docker

Repository files navigation

Overview

For more information please refer to the main Apiary project page.

Environment Variables

Environment Variable Required Description
APIARY_S3_INVENTORY_PREFIX No Prefix used by S3 Inventory when creating data in the inventory bucket. Default is EntireBucketDaily.
APIARY_S3_INVENTORY_TABLE_FORMAT No Format of S3 inventory data. Valid options are ORC, Parquet, or CSV. Default is ORC.
APIARY_SYSTEM_SCHEMA No Name for internal system database. Default is apiary_system.
AWS_REGION Yes AWS region to configure various AWS clients.
AWS_WEB_IDENTITY_TOKEN_FILE No Path of the AWS Web Identity Token File for IRSA/OIDC AWS authentication.
DATANUCLEUS_CONNECTION_POOLING_TYPE No Type of connection pooling. Valid options are BoneCP, DBCP, DBCP2, C3P0, HikariCP.
DATANUCLEUS_CONNECTION_POOL_MAX_POOLSIZE No Maximum pool size for the connection pool.
DATANUCLEUS_CONNECTION_POOL_MIN_POOLSIZE No Minimum pool size for the connection pool.
DATANUCLEUS_CONNECTION_POOL_INITIAL_POOLSIZE No Initial pool size for the connection pool (C3P0 only).
DATANUCLEUS_CONNECTION_POOL_MAX_IDLE No Maximum idle connections for the connection pool.
DATANUCLEUS_CONNECTION_POOL_MIN_IDLE No Minimum idle connections for the connection pool.
DATANUCLEUS_CONNECTION_POOL_MIN_ACTIVE No Maximum active connections for the connection pool (DBCP/DBCP2 only).
DATANUCLEUS_CONNECTION_POOL_MAX_WAIT No Maximum wait time for the connection pool (DBCP/DBCP2 only).
DATANUCLEUS_CONNECTION_POOL_VALIDATION_TIMEOUT No Validation timeout for the connection pool (DBCP/DBCP2/HikariCP only).
DATANUCLEUS_CONNECTION_POOL_LEAK_DETECTION_THRESHOLD No Leak detection threshold for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_LEAK_MAX_LIFETIME No Maximum lifetime for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_AUTO_COMMIT No Auto commit for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_IDLE_TIMEOUT No Idle timeout for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_CONNECTION_WAIT_TIMEOUT No Connection wait timeout for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_READ_ONLY No Read only mode for the connection pool (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_NAME No Connection pool name (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_CATALOG No Connection pool catalog (HikariCP only).
DATANUCLEUS_CONNECTION_POOL_REGISTER_MBEANS No Register MBeans for the connection pool (HikariCP only).
DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES No true/false value for hive.metastore.disallow.incompatible.col.type.changes, default true.
ENABLE_GLUESYNC No Option to turn on GlueSync Hive Metastore listener.
ENABLE_HIVE_LOCK_HOUSE_KEEPER No Option to turn on Hive Metastore Hive Lock House Keeper.
ENABLE_METRICS No Option to enable sending Hive Metastore and JMX metrics to Prometheus.
ENABLE_S3_INVENTORY No Option to create Hive tables on top of S3 inventory data if enabled in apiary-data-lake. Enabled if value is not null/empty.
ENABLE_S3_LOGS No Option to create Hive tables on top of S3 access logs data if enabled in apiary-data-lake. Enabled if value is not null/empty.
EXTERNAL_DATABASE No Option to enable external database mode, when specified it disables managing Hive Metastore MySQL database schema.
GLUE_PREFIX No Prefix added to Glue databases to handle database name collisions when synchronizing multiple Hive Metastores to the Glue catalog.
HADOOP_HEAPSIZE No Hive Metastore Java process heapsize. Default is 1024.
HMS_AUTOGATHER_STATS No Whether or not to create basic statistics on table/partition creation. Valid values are true or false. Default is true.
LIMIT_PARTITION_REQUEST_NUMBER No To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value -1 means no limit. The limit on partitions does not affect metadata-only queries.
HIVE_METASTORE_ACCESS_MODE No Hive Metastore access mode, applicable values are: readwrite, readonly.
HIVE_DB_NAMES No Comma separated list of Hive database names, when specified Hive databases will be created and mapped to corresponding S3 buckets.
HIVE_METASTORE_LOG_LEVEL No Hive Metastore service Log4j log level. Default is INFO.
HMS_MIN_THREADS No Minimum size of the Hive metastore thread pool. Default is 200.
HMS_MAX_THREADS No Maximum size of the Hive metastore thread pool. Default is 1000.
INSTANCE_NAME Yes Apiary instance name, will be used as prefix on most AWS resources to allow multiple Apiary instance deployments.
KAFKA_BOOTSTRAP_SERVERS No Kafka Bootstrap Servers to enable Kafka Metastore listener and send Metastore events to Kafka.
KAFKA_CLIENT_ID No Kafka label you define that names the Kafka producer.
KAFKA_COMPRESSION_TYPE No Kafka Compression type, if none is specified there is no compression enabled. Values available are gzip, lz4 and snappy. Default is 1048576.
KAFKA_MAX_REQUEST_SIZE No The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum uncompressed record batch size.
LDAP_BASE No LDAP base DN used to search for user groups.
LDAP_CA_CERT No Base64 encoded Certificate Authority Bundle to validate LDAP SSL connection.
LDAP_SECRET_ARN No LDAP bind DN SecretsManager secret ARN.
LDAP_URL No Active Directory URL to enable group mapping in metastore.
MYSQL_CONNECTION_DRIVER_NAME No Hive Metastore MySQL database JDBC connection Driver Name. Default is com.mysql.jdbc.Driver.
MYSQL_CONNECTION_POOL_SIZE No MySQL Connection pool size for Hive Metastore. Default is 10. See here for more info.
MYSQL_DB_HOST Yes Hive Metastore MySQL database hostname.
MYSQL_DB_NAME Yes Hive Metastore MySQL database name.
MYSQL_SECRET_ARN Yes Hive Metastore MySQL SecretsManager secret ARN.
MYSQL_SECRET_USERNAME_KEY No Hive Metastore MySQL SecretsManager secret username key. Default is username.
MYSQL_TYPE No Hive Metastore MySQL database Type (mariadb, mysql). Default is mysql.
MYSQL_DRIVER_JAR No Hive Metastore MySQL connector JAR location. Default is /usr/share/java/mysql-connector-java.jar.
RANGER_AUDIT_DB_URL No Ranger audit database JDBC URL.
RANGER_AUDIT_SECRET_ARN No Ranger audit database secret ARN.
RANGER_AUDIT_SOLR_URL No Ranger Solr audit URL.
RANGER_POLICY_MANAGER_URL No Ranger admin URL from where policies will be downloaded.
RANGER_SERVICE_NAME No Ranger service name used to configure RangerAuth plugin.
SNS_ARN No The SNS topic ARN to which metadata updates will be
                                                                                                                 |

Contact

Mailing List

If you would like to ask any questions about or discuss Apiary please join our mailing list at

https://groups.google.com/forum/#!forum/apiary-user

Legal

This project is available under the Apache 2.0 License.

Copyright 2018-2019 Expedia, Inc.