Skip to content

Commit

Permalink
kubernetes v2 docs (#4234)
Browse files Browse the repository at this point in the history
* some logs

* Add cloud logging section.

* Add instructions on upgrading Kube K8s.

* minor edit

* minor clarification

Co-authored-by: Davin Chia <davinchia@gmail.com>
  • Loading branch information
jrhizor and davinchia committed Jun 21, 2021
1 parent 156629c commit 51bb5cc
Show file tree
Hide file tree
Showing 6 changed files with 185 additions and 111 deletions.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Expand Up @@ -107,6 +107,7 @@
* [Contributing to Airbyte](contributing-to-airbyte/README.md)
* [Code of Conduct](contributing-to-airbyte/code-of-conduct.md)
* [Developing Locally](contributing-to-airbyte/developing-locally.md)
* [Developing on Kubernetes](contributing-to-airbyte/developing-on-kubernetes.md)
* [Connector Development Kit \(Python\)](contributing-to-airbyte/python/README.md)
* [Concepts](contributing-to-airbyte/python/concepts/README.md)
* [Basic Concepts](contributing-to-airbyte/python/concepts/basic-concepts.md)
Expand Down
35 changes: 35 additions & 0 deletions docs/contributing-to-airbyte/developing-on-kubernetes.md
@@ -0,0 +1,35 @@
# Developing On Kubernetes

Make sure to read [our docs for developing locally](./developing-locally.md) first.

## Architecture

TODO

## Iteration Cycle (Locally)

If you're developing locally using Minikube/Docker Desktop/Kind, you can iterate with the following series of commands:
```bash
./gradlew composeBuild # build dev images
kubectl delete -k kube/overlays/dev # optional (allows you to recreate resources from scratch)
kubectl apply -k kube/overlays/dev # applies manifests
kubectl port-forward svc/airbyte-webapp-svc 8000:80 # port forward the api/ui
```

## Iteration Cycle \(on GKE\)

The process is similar to developing on a local cluster, except you will need to build the local version and push it to your own container
registry with names such as `your-registry/scheduler`. Then you will need to configure an overlay to override the name of images and apply
your overlay with `kubectl apply -k <path to your overlay>`.

We are [working to improve this process](https://github.com/airbytehq/airbyte/issues/4225).

## Completely resetting a local cluster

In most cases, running `kubectl delete -k kube/overlays/dev` is sufficient to remove the core Airbyte-related components. However, if you are in a dev environment on a local cluster only running Airbyte and want to start **completely from scratch** (removing all PVCs, pods, completed pods, etc.), you can use the following command
to destroy everything on the cluster:

```bash
# BE CAREFUL, THIS COMMAND DELETES ALL RESOURCES, EVEN NON-AIRBYTE ONES!
kubectl delete "$(kubectl api-resources --namespaced=true --verbs=delete -o name | tr "\n" "," | sed -e 's/,$//')" --all
```
223 changes: 122 additions & 101 deletions docs/deploying-airbyte/on-kubernetes.md
@@ -1,155 +1,176 @@
# On Kubernetes \(Alpha\)
# On Kubernetes

{% hint style="danger" %}
## Overview

The version Kubernetes support is not production-ready. We are actively working on stabilizing our Kubernetes implementation.
We recommend waiting until issue [#3839](https://github.com/airbytehq/airbyte/issues/3839) is resolved before trying to run production workflows on Kubernetes.
This new version should be released around June 18.
Airbyte allows scaling sync workloads horizontally using Kubernetes. The core components (api server, scheduler, etc) run as deployments while the scheduler launches connector-related pods on different nodes.

{% endhint %}
## Getting Started

### Cluster Setup
For local testing we recommend following one of the following setup guides:
* [Docker Desktop (Mac)](https://docs.docker.com/desktop/kubernetes/)
* [Minikube](https://minikube.sigs.k8s.io/docs/start/)
* NOTE: Start Minikube with at least 4gb RAM to Minikube with `minikube start --memory=4000`
* [Kind](https://kind.sigs.k8s.io/docs/user/quick-start/)

## Support
For testing on GKE you can [create a cluster with the command line or the Cloud Console UI](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster).

This is an early preview of Kubernetes support. It has been tested on:
For testing on EKS you can [install eksctl](https://eksctl.io/introduction/) and run `eksctl create cluster` to create an EKS cluster/VPC/subnets/etc. This process should take 10-15 minutes.

* Local single-node Kube clusters \(docker-desktop for Mac\)
* Google Kubernetes Engine \(GKE\)
* Amazon Elastic Kubernetes Service \(EKS\)
For production, Airbyte should function on most clusters v1.19 and above. We have tested support on GKE and EKS. If you run into a problem starting
Airbyte, please reach out on the `#issues` channel on our [Slack](https://slack.airbyte.io/) or [create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=).

Please let us know on [Slack](https://slack.airbyte.io) or with a [Github Issue](https://github.com/airbytehq/airbyte/issues/new/choose) if you're having trouble running it on these or other platforms - we'll be glad to help you get it running.
### Install `kubectl`

## Launching
If you do not already have the CLI tool `kubectl` installed, please follow [these instructions to install](https://kubernetes.io/docs/tasks/tools/).

All commands should be run from the root Airbyte source directory.
### Configure `kubectl`

1. Make sure you are using the correct Kubernetes context with `kubectl config current-context`
2. Apply the manifests for one of:
* Latest stable version
1. Apply with `kubectl apply -k kube/overlays/stable`
3. Wait for pods to be "Running" on `kubectl get pods | grep airbyte`
4. Run `kubectl port-forward svc/airbyte-server-svc 8000:8000` in a new terminal window.
* This exposes `airbyte-server`, the Airbyte api server.
* If you redeploy `airbyte-server`, you will need to re-run this process.
5. Run `kubectl port-forward svc/airbyte-webapp-svc 8000:80` in a new terminal window.
* This exposes `airbyte-webapp`, the server for the static web app.
* These static assets will make calls to the Airbyte api server, which is why both services needed to be port forwarded.
* If you redeploy `airbyte-webapp`, you will need to re-run this process.
6. Go to [http://localhost:8000/](http://localhost:8000/) and use Airbyte!
Configure `kubectl` to connect to your cluster by using `kubectl use-context my-cluster-name`.

If you have faced issues on launching Airbyte on Kubernetes, you can check the troubleshooting section below.

## Current Limitations

* The server, scheduler, and workers must all run on the same node in the Kubernetes cluster.
* Airbyte passes records inefficiently between pods by `kubectl attach`-ing to pods using `kubectl run`.
* The provided manifests do not easily allow configuring a non-default namespace.
* Latency for UI operations is high.
* We don't clean up completed worker job and pod histories.
* All records replicated are also logged to the Kubernetes logging service.
* Logs, events, and job/pod histories require manual deletion.
* Please let us know on [Slack](https://slack.airbyte.io):
* if those issues are blocking your adoption of Airbyte.
* if you encounter any other issues or limitations of our Kube implementation.
* if you'd like to make contributions to fix some of these issues!

## Creating Testing Clusters

* Local \(Mac\)
* Install [Docker for Mac](https://docs.docker.com/docker-for-mac/install/)
* Under `Preferences` enable Kubernetes.
* Use `kubectl config get-contexts` to show the contexts available.
* Use the Docker UI or `kubectl use-context <docker desktop context>` to access the cluster with `kubectl`.
* Local \(Linux\)
* Consider using a tool like [Minikube](https://minikube.sigs.k8s.io/docs/start/) to start a local cluster.
* GKE
* For GKE
* Configure `gcloud` with `gcloud auth login`.
* [Create a cluster with the command line or the Cloud Console UI](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster)
* If you created the cluster on the command line, the context will be written automatically.
* If you used the UI, you can copy and paste the command used to connect from the cluster page.
* On the Google Cloud Console, the cluster page will have a `Connect` button, which will give a command to run locally that looks like
`gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE_NAME --project PROJECT_NAME`.
* Use `kubectl config get-contexts` to show the contexts available.
* Run `kubectl use-context <gke context>` to access the cluster with `kubectl`.
* EKS
* Run `kubectl use-context <gke context>` to access the cluster from `kubectl`.
* For EKS
* [Configure your AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) to connect to your project.
* Install [eksctl](https://eksctl.io/introduction/)
* Run `eksctl create cluster` to create an EKS cluster/VPC/subnets/etc.
* This should take 10-15 minutes.
* The default settings should be able to support running Airbyte.
* Run `eksctl utils write-kubeconfig --cluster=<CLUSTER NAME>` to make the context available to `kubectl`
* Use `kubectl config get-contexts` to show the contexts available.
* Run `kubectl use-context <eks context>` to access the cluster with `kubectl`.

## Kustomize
### Configure Logs

We use [Kustomize](https://kustomize.io/), which is built into `kubectl` to allow overrides for different environments.
Airbyte requires an S3 bucket for logs. Configure this by filling up the following variables in the `.env` file in the `kube/overlays/stable`
directory:
```text
S3_LOG_BUCKET=
S3_LOG_BUCKET_REGION=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
```

Our shared resources are in the `kube/resources` directory, and we define overlays for each environment. We recommend creating your own overlay if you want to customize your deployments.
The provided credentials require both S3 read/write permissions. The logger attempts to create the bucket if it does not exist. See [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html)
for instructions on creating an S3 bucket and [here](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys)
for instructions to create AWS credentials.

Example `kustomization.yaml` file:
### Launch Airbyte

```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
- https://github.com/airbytehq/airbyte.git/kube/overlays/stable?ref=master
Run the following commands to launch Airbyte:
```text
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
kubectl apply -k kube/overlays/stable
```

This would allow you to define custom resources or extend existing resources, even within your own VCS.
After 2-5 minutes, `kubectl get pods | grep airbyte` should show `Running` as the status for all the core Airbyte pods. This may take longer
on Kubernetes clusters with slow internet connections.

## View Raw Manifests
Run `kubectl port-forward svc/airbyte-webapp-svc 8000:80` to allow access to the UI/API.

For a specific overlay, you can run `kubectl kustomize kube/overlays/stable` to view the manifests that Kustomize will apply to your Kubernetes cluster. This is useful for debugging because it will show the exact resources you are defining.
Now visit [http://localhost:8000](http://localhost:8000) in your browser and start moving some data!

## Resizing Volumes
## Production Airbyte on Kubernetes

To resize a volume, change the `.spec.resources.requests.storage` value. After re-applying, the mount should be extended if that operation is supported for your type of mount. For a production instance, it's useful to track the usage of volumes to ensure they don't run out of space.
### Cloud logging

## Copy Files To/From Volumes
Airbyte writes logs to two directories. App logs, including server and scheduler logs, are written to the `app-logging` directory.
Job logs are written to the `job-logging` directory. Both directories live at the top-level e.g., the `app-logging` directory lives at
`s3://log-bucket/app-logging` etc. These paths can change, so we recommend having a dedicated log bucket, and to not use this bucket for other
purposes.

See the documentation for [`kubectl cp`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#cp).
Airbyte publishes logs every minute. This means it is normal to see minute-long log delays. Each publish creates it's own log file, since Cloud
Storages do not support append operations. This also mean it is normal to see hundreds of files in your log bucket.

## Dev Iteration \(on local Kubernetes clusters\)
Each log file is named `{yyyyMMddHH24mmss}_{podname}_{UUID}` and is not compressed. Users can view logs simply by navigating to the relevant folder and
downloading the file for the time period in question.

If you're developing using a local context and are not using your local Kubernetes instance for anything else, you can iterate with the following series of commands.
See the [Known Issues](#known-issues) section for planned logging improvements.

```bash
./gradlew composeBuild # build dev images
kubectl delete -k kube/overlays/dev # optional, if you want to try recreating resources
kubectl apply -k kube/overlays/dev # applies manifests
```
### Using an external DB

Then restart the port-forwarding commands.
After [Issue #3605](https://github.com/airbytehq/airbyte/issues/3605) is completed, users will be able to configure custom dbs instead of a simple
`postgres` container running directly in Kubernetes. This separate instance (preferable on a system like AWS RDS or Google Cloud SQL) should be easier
and safer to maintain than Postgres on your cluster.

Note: this does not remove jobs and pods created for Airbyte workers.
## Known Issues

If you are in a dev environment on a local cluster only running Airbyte and want to start completely from scratch, you can use the following command to destroy everything on the cluster:
As we improve our Kubernetes offering, we would like to point out some common pain points. We are working on improving these. Please let us know if
there are any other issues blocking your adoption of Airbyte or if you would like to contribute fixes to address any of these issues.

```bash
# BE CAREFUL, THIS COMMAND DELETES ALL RESOURCES, EVEN NON-AIRBYTE ONES!
kubectl delete "$(kubectl api-resources --namespaced=true --verbs=delete -o name | tr "\n" "," | sed -e 's/,$//')" --all
* The server and scheduler deployments must run on the same node. ([#4232](https://github.com/airbytehq/airbyte/issues/4232))
* Some UI operations have higher latency on Kubernetes than Docker-Compose. ([#4233](https://github.com/airbytehq/airbyte/issues/4233))
* Pod histories must be cleaned up manually. ([#3634](https://github.com/airbytehq/airbyte/issues/3634))
* Specifying resource limits for pods is not supported yet. ([#3638](https://github.com/airbytehq/airbyte/issues/3638))
* Pods Airbyte launches to run connector jobs are always launched in the `default` namespace. ([#3636](https://github.com/airbytehq/airbyte/issues/3636))
* S3 is the only Cloud Storage currently supported. ([#4200](https://github.com/airbytehq/airbyte/issues/4200))
* Large log files might take a while to load. ([#4201](https://github.com/airbytehq/airbyte/issues/4201))
* UI does not include configured buckets in the displayed log path. ([#4204](https://github.com/airbytehq/airbyte/issues/4204))
* Logs are not reset when Airbyte is re-deployed. ([#4235](https://github.com/airbytehq/airbyte/issues/4235))
* File sources reading from and file destinations writing to local mounts are not supported on Kubernetes.

## Customizing Airbyte Manifests

We use [Kustomize](https://kustomize.io/) to allow overrides for different environments. Our shared resources are in the `kube/resources` directory,
and we define overlays for each environment. We recommend creating your own overlay if you want to customize your deployments.
This overlay can live in your own VCS.

Example `kustomization.yaml` file:

```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
- https://github.com/airbytehq/airbyte.git/kube/overlays/stable?ref=master
```

## Dev Iteration \(on GKE\)
### View Raw Manifests

For a specific overlay, you can run `kubectl kustomize kube/overlays/stable` to view the manifests that Kustomize will apply to your Kubernetes cluster.
This is useful for debugging because it will show the exact resources you are defining.

The process is similar to developing on a local cluster, except you will need to build the local version and push it to your own container registry with names such as `your-registry/scheduler`. Then you will need to configure an overlay to override the name of images and apply your overlay with `kubectl apply -k <path to your overlay`.
### Helm Charts
We do not currently offer Helm charts. If you are interested in this functionality please vote on the [related issue](https://github.com/airbytehq/airbyte/issues/1868).

## Listing Files
## Operator Guide

### View API Server Logs
`kubectl logs deployments/airbyte-server` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI.

### View Scheduler or Job Logs
`kubectl logs deployments/airbyte-scheduler` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI.

### Connector Container Logs
Although all logs can be accessed by viewing the scheduler logs, connector container logs may be easier to understand when isolated by accessing from
the Airbyte UI or the [Airbyte API](../api-documentation.md) for a specific job attempt. Connector pods launched by Airbyte will not relay logs directly
to Kubernetes logging. You must access these logs through Airbyte.

### Upgrading Airbyte Kube
See [Upgrading K8s](../operator-guides/upgrading-airbyte.md).

### Resizing Volumes
To resize a volume, change the `.spec.resources.requests.storage` value. After re-applying, the mount should be extended if that operation is supported
for your type of mount. For a production instance, it's useful to track the usage of volumes to ensure they don't run out of space.

### Copy Files To/From Volumes
See the documentation for [`kubectl cp`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#cp).

### Listing Files
```bash
kubectl exec -it airbyte-scheduler-6b5747df5c-bj4fx ls /tmp/workspace/8
```

## Reading Files

### Reading Files
```bash
kubectl exec -it airbyte-scheduler-6b5747df5c-bj4fx cat /tmp/workspace/8/0/logs.log
```

## Troubleshooting
If you run into any problems operating Airbyte on Kubernetes, please reach out on the `#issues` channel on our [Slack](https://slack.airbyte.io/) or
[create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=).

**Getting error: json: unknown field "envs" after running `kubectl apply -k kube/overlays/stable`**

This is a version mismatch between `kubectl` and `kustomize` which can happen when you are using a cloud provider running on older versions of `kubectl`. To fix this, update your `kubectl` version.
The current deployment in this guide was tested using kubectl 1.21.+.
You can read more about the issue [here](https://github.com/kubernetes-sigs/kustomize/issues/1069).
## Developing Airbyte on Kubernetes
[Read about the Kubernetes dev cycle!](https://docs.airbyte.io/contributing-to-airbyte/developing-on-kubernetes)

0 comments on commit 51bb5cc

Please sign in to comment.