Skip to content

Latest commit

 

History

History

hive-metastore-chart

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

HMS (Hive Metastore Service) Helm Chart

NOTE: Kubernetes version must >= 1.23

Beginning in Hive 3.0, Hive Metastore can be run without the rest of Hive being installed. This decoupling provides us a way to implement the Metastore service as a stateless microservice in Kubernetes infrastructure. This Helm chart (package) encapsulates all configurations and components needed to deploy a HNS, and helps us to easily install a HMS as a scalable and secure k8s application.

Get the Helm command tool:

sudo yum install openssl && curl -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash

helm version --short

How to use it

Install with Helm (must be)

Replace placeholders in the values file as below. Alternatively, secure your Hive metastore credentials in AWS Secrets manager via this CDK value file.

echo -e "\n Default HDFS: $S3BUCKET\n Service Account IAM role: $EMR_ROLE_ARN\n host: $HOST_NAME\n DB: $DB_NAME\n password: $PASSWORD\n username: $USER_NAME\n"

cd hive-metastore-chart

sed -i '' -e 's|{RDS_JDBC_URL}|"jdbc:mysql://'$HOST_NAME':3306/'$DB_NAME'?createDatabaseIfNotExist=true"|g' values.yaml 
sed -i '' -e 's|{RDS_USERNAME}|"'$USER_NAME'"|g' values.yaml 
sed -i '' -e 's|{RDS_PASSWORD}|"'$PASSWORD'"|g' values.yaml
sed -i '' -e 's|{S3BUCKET}|"s3://'$S3BUCKET'"|g' values.yaml
sed -i '' -e 's|{EMRExecRole}|{"eks.amazonaws.com/role-arn": "'$EMR_ROLE_ARN'"}|g' values.yaml
helm repo add hive-metastore https://melodyyangaws.github.io/hive-metastore-chart
helm repo update hive-metastore
helm install hms hive-metastore/hive-metastore -f values.yaml --namespace=emr --debug

NOTE: we assume the EKS namespace emr exists registered to an EMR on EKS's virtual cluster. The HMS must be in the same namespace as registered EMR on EKS namespace, beause the HMS shares an IAM execution role $EMR_ROLE_ARN with EMR on EKS in the same SA of the namespace. The recommendation is to create a seperate IAM role for the HMS service account, which is called IRSA. Ensure the new IAM Role's trust relationship allows the HMS service account assumes the role. See the example IAM trust policy trust-relationship.json

Security consideration

Leveraging the k8s's External Secrets Operator(ESO) or the k8s External Secrets tool, we can automate the password retrieval process in order to connect to the Hive metastore database.

Check out the example values.yaml file deployed by the solution's CFN/CDK templates, and a sidecar pod template example.

EKS resources used in this Helm chart

The resources used in the this chart are defined in yaml files inside /templates directory. The following resources are used:

  • Configmap: creates volumes that can be attached to containers. Here, we're mounting a volume to the HMS configsets directory, which will be used by the HMS docker image to render the metastore-site.yaml and hadoop's core-site.yaml templates.
  • Service: exposes the HMS service as a ClusterIP type of service.
  • Horizontal Pods Autoscaler (HPA): To guarantee the HMS service availability, the HPA automatically increases or decreases the number of pods available in a ReplicaSet based on certain thresholds (memory and cpu).
  • Deployment: the main resource, because it specifies pod's configurations and is the link between all resources and the HMS pods.