This is the Kubernetes operator for DataHub.
DataHub is an extensible data catalog that enables data discovery, data observability and federated data governance.
This operator provides DataHub services on Kubernetes and consists of Python scripts which wrap the distributions from DataHub.
Note: This operator requires the use of juju>=3.3.
The DataHub charm relies on a number of other charms for core functionality:
- PostgreSQL, for storing metadata.
- Kafka; for ingestion, message passing, and audit logging.
- Opensearch, for search and graph indexing.
The dependencies in-turn have their own dependencies:
- Zookeeper, required by Kafka.
- Self Signed Certificates, required by Opensearch.
The DataHub charm requires a multi-cloud setup:
- DataHub requires a Kubernetes cloud.
- Kafka and PostgreSQL can be deployed on either a Kubernetes or a machine cloud.
- Opensearch requires a machine cloud.
A multi-cloud setup can be achieved with a single controller with multiple clouds registered or two controllers with one cloud registered each. Former will work with cross-model relations which are expected to be more stable than the latter's cross-controller relations.
In this guide, we assume that you have a datahub-vm machine model on a lxd controller and a datahub-k8s Kubernetes model on a k8s controller. We also assume that you have sufficient expertise on how to create models on different clouds and how to work with offers to adapt the instructions to your situation. For detailed instructions please refer to the Contributor's Guide.
# Switch to the machine model
juju switch lxd:datahub-vm
# Deploy applications
juju deploy postgresql
juju deploy kafka
juju deploy zookeeper
juju deploy opensearch --channel 2/edge -n 2
juju deploy self-signed-certificates-operator
# Wait for the units to settle
juju status --watch 3s --color
# Relate
juju relate kafka zookeeper
juju relate opensearch self-signed-certificates-operator
# Create named offers
juju offer postgresql:database pg-client
juju offer kafka:kafka-client kafka-client
juju offer opensearch:opensearch-client os-client
# Consume offers from the Kubernetes model
# These will create named saas
juju switch k8s:datahub-k8s
juju consume lxd:admin/datahub-vm.pg-client pg-client
juju consume lxd:admin/datahub-vm.kafka-client kafka-client
juju consume lxd:admin/datahub-vm.os-client os-client# Switch to the Kubernetes model
juju switch k8s:datahub-k8s
# Create a secret for encryption keys
echo "gms-key: $GMS_SECRET\nfrontend-key: $FE_SECRET\n" > /path/to/secret.yaml
juju add-secret <secret-name> --file=/path/to/secret.yaml # Copy the ID from the output
# Deploy
juju deploy datahub-k8s --config encryption-key-secret-id=<secret-id>
# Wait for the unit to settle
juju status --watch 3s --color
# Relate
# Wait between commands for the status to settle into 'active-idle'
juju relate datahub-k8s pg-client
juju relate datahub-k8s kafka-client
juju relate datahub-k8s os-client
# Get the address of DataHub
juju status | grep "^datahub-k8s\s" | grep -P "10\.\d+\.\d+\.\d+" | awk '{print $6}'After relations are set and settled, you can access DataHub at its address with port 9002 from your browser.
DataHub supports authentication via SSO. In order to enable it in the charm follow the steps:
- Issue credentials from Google Cloud via its dashboard.
1.1. On the linked page, click
Create Credentialsand chooseOAuth client ID. 1.2. In the next page, chooseWeb applicationas the type. 1.3. Give it a name. 1.4. UnderAuthorized redirect URIs, add a URI that ends with/callback/oidcand begins with the domain used to access the DataHub frontend. For local deployment the complete URI would look likehttp://localhost:9002/callback/oidc. 1.5. Get theClient IDand theClient secret. 1.6. Create a YAML file of the following format:
client-id: <client-id-value>
client-secret: <client-secret-value>- Create a Juju secret.
2.1. Go into the Juju model where DataHub is (to be) deployed.
2.2. Run
juju add-secret <secret-name> --file=/path/to/yamland copy the secret ID. 2.3. Deploy DataHub with the added config variable--config oidc-secret-id=<secret-id>. 2.4. Runjuju grant-secret <secret-name> datahub-k8sto set permissions. 2.5. Proceed with the relations.
This charm is still in active development. Please see the Juju SDK docs for guidelines on enhancements to this charm following best practice guidelines, and CONTRIBUTING.md for developer guidance.
The Charmed DataHub K8s Operator is free software, distributed under the Apache Software License, version 2.0. See License for more details.