Skip to content
Permalink
Browse files
[GOBBLIN-932] Create deployment for Azure, clean up existing deployments
Closes #2799 from Will-Lo/azure-deploy
  • Loading branch information
Will-Lo authored and suvasude committed Dec 6, 2019
1 parent bf8f7b1 commit 570f3d7129e708fd9583ef47676333cc486a97a5
Showing 14 changed files with 214 additions and 71 deletions.
@@ -0,0 +1,83 @@
# GaaS on Azure Deployment Steps

## Create Azure Container Registry [Optional]

1\) Log into Azure Container Registry

```bash
$ az acr login --name gobblintest
```

2\) Tag docker images to container registry

```bash
$ docker tag <gaas_image_id> gobblintest.azurecr.io/gobblin-service
$ docker tag <standalone_image_id> gobblintest.azurecr.io/gobblin-standalone
```

3\) Push the images

```bash
$ docker push gobblintest.azurecr.io/gobblin-service
$ docker push gobblintest.azurecr.io/gobblin-standalone
```

The images should now be hosted on azure with the tag:latest

## Deploy the base K8s cluster

1\) Create a resource group on Azure

2\) Create a cluster and deploy it onto the resource group

```bash
az aks create --resource-group <resource_group_name> --name GaaS-cluster-test --node-count 1 --enable-addons monitoring --generate-ssh-keys
```

3\) Switch kubectl to use azure

4\) Check status of cluster

```bash
$ kubectl get pods
```

## Install the nginx ingress to connect to the Azure Cluster

1\) Install helm if you don't currently have it

```bash
brew install helm
helm init
```

2\) Deploy the nginx helm chart to create the ingress

```bash
helm install stable/nginx-ingress
```

If this is the first time deploying helm (v2.0), you will need to set up the tiller, which is a helm serviceaccount with sudo permissions that lives inside of the cluster. Otherwise you'll run into this [issue](https://github.com/helm/helm/issues/2224).

> Error: configmaps is forbidden: User "system:serviceaccount:kube-system:default" cannot list configmaps in the namespace "kube-system"
To set up the tiller \(steps are also found in the issue link\)

```bash
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl edit deploy --namespace kube-system tiller-deploy #and add the line serviceAccount: tiller to spec/template/spec
```

3\) Deploy the ingress controller in `gobblin-kubernetes/gobblin-service/azure-cluster`

4\) Run `kubectl get services`, and the output should look something like this:

```text
gaas-svc ClusterIP 10.0.176.58 <none> 6956/TCP 16h
honorary-possum-nginx-ingress-controller LoadBalancer 10.0.182.255 <EXTERNAL_IP> 80:30488/TCP,443:31835/TCP 6m13s
honorary-possum-nginx-ingress-default-backend ClusterIP 10.0.236.153 <none> 80/TCP 6m13s
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 10d
```

5\) Send a request to the IP for the `honorary-possum-nginx-ingress-controller`
@@ -31,4 +31,24 @@ To run the full docker compose:
4. `docker compose -f gobblin-docker/gobblin-service/alpine-gaas-latest/docker-compose.yml build`
5. `docker compose -f gobblin-docker/gobblin-service/alpine-gaas-latest/docker-compose.yml up`

The docker container exposes the endpoints from Gobblin as a Service which can be accessed on `localhost:6956`
The docker container exposes the endpoints from Gobblin as a Service which can be accessed on `localhost:6956`

# Running Gobblin as a Service with Kubernetes
Gobblin as a service also has a kubernetes cluster, which can be deployed to any K8s environment.

Currently, the yamls use [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) for configuration management. In the future, we may utilise Helm instead.

To cluster is split into 3 environments
1) base-cluster (deploys one pod of GaaS and Gobblin standalone, where GaaS writes jobSpecs to a folder tracked by the standalone instance)
2) mysql-cluster (utilises MySQL for storing specStores instead of FS, future work may involve writing to a job queue to be picked by gobblin standalone)
3) azure-cluster (deploys Dev on Microsoft Azure), more docs [here](./Azure-Kubernetes-Deployment.md)

To add any flow config template for GaaS to use, add the `.template` file to `gobblin-kubernetes/gobblin-service/base-cluster/` and add the file to the configmap.
For production purposes, flow config templates should be stored in a proper file system or a database instead of being added to the configmap.

To deploy any of these clusters, run the following command from the repository root.
```
kubectl apply -k gobblin-kubernetes/gobblin-service/<ENV>/
```

There, find the external IP of the cluster and start sending requests.
@@ -0,0 +1,13 @@
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gaas-ingress
annotations:
# utilize an nginx ingress as default, to set up read file at incubator-gobblin/gobblin-docs/user-guide/Azure-Kubernetes-Deployment.md
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "false"
nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
backend:
serviceName: gaas-svc
servicePort: 6956
@@ -0,0 +1,4 @@
bases:
- ../mysql-cluster
patchesStrategicMerge:
- ingress.yaml
@@ -22,18 +22,17 @@ spec:
- name: 'shared-jobs'
persistentVolumeClaim:
claimName: shared-jobs-claim
- name: 'shared-template-catalogs'
persistentVolumeClaim:
claimName: shared-template-catalogs-claim
- name: flowconfig-templates
configMap:
name: flowconfig-templates
containers:
- name: gobblin-service
image: will97/gobblin-as-a-service:latest
volumeMounts:
- name: shared-jobs
mountPath: /tmp/gobblin-as-service/jobs
- name: shared-template-catalogs
- name: flowconfig-templates
mountPath: /tmp/templateCatalog

---
apiVersion: apps/v1
kind: Deployment
@@ -62,18 +61,3 @@ spec:
volumeMounts:
- name: shared-jobs
mountPath: /tmp/gobblin-standalone/jobs
---
apiVersion: v1
kind: Service
metadata:
name: gaas-svc
labels:
app: gobblin-service
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 6956
targetPort: 6956
selector:
app: gaas
@@ -0,0 +1,51 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# ====================================================================
# Job configurations
# ====================================================================

gobblin.template.required_attributes="from,to"

job.name=Distcp
job.description="Distributed copy"

# target location for copy
data.publisher.final.dir=${gobblin.flow.output.dataset.descriptor.path}
gobblin.dataset.pattern=${gobblin.flow.input.dataset.descriptor.path}

gobblin.dataset.profile.class=org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder

# ====================================================================
# Distcp configurations
# ====================================================================

extract.namespace=org.apache.gobblin.copy
data.publisher.type=org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher
source.class=org.apache.gobblin.data.management.copy.CopySource
writer.builder.class=org.apache.gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
converter.classes=org.apache.gobblin.converter.IdentityConverter

task.maxretries=0
workunit.retry.enabled=false

distcp.persist.dir=/tmp/distcp-persist-dir

cleanup.staging.data.per.task=false
gobblin.trash.skip.trash=true
state.store.enabled=false
job.commit.parallelize=true
@@ -1,8 +1,8 @@
apiVersion: networking.k8s.io/v1beta1
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gaas-ingress
spec:
backend:
serviceName: gaas-svc
servicePort: 6956
servicePort: 6956
@@ -0,0 +1,11 @@
resources:
- deployment.yaml
- storage.yaml
- service.yaml
- ingress.yaml
configMapGenerator:
# only used for development purposes to allow an easy way to expose template files to GaaS
# add flow templates here
- name: flowconfig-templates
files:
- flowconfig-templates/distcp.template
@@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: gaas-svc
labels:
app: gobblin-service
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 6956
targetPort: 6956
selector:
app: gaas
@@ -24,31 +24,3 @@ spec:
resources:
requests:
storage: 100Mi
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: shared-template-catalogs-volume
spec:
capacity:
storage: 50Mi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: manual
hostPath:
path: "/tmp/templateCatalog"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: shared-template-catalogs-claim
spec:
accessModes:
- ReadWriteOnce
storageClassName: manual
resources:
requests:
storage: 50Mi

@@ -22,6 +22,9 @@ spec:
- name: shared-jobs
persistentVolumeClaim:
claimName: shared-jobs-claim
- name: flowconfig-templates
configMap:
name: flowconfig-templates
- name: gaas-config
configMap:
name: gaas-config
@@ -44,14 +47,16 @@ spec:
volumeMounts:
- name: shared-jobs
mountPath: /tmp/gobblin-as-service/jobs
- name: flowconfig-templates
mountPath: /tmp/templateCatalog
- name: gaas-config
mountPath: /home/gobblin/conf/gobblin-as-service/application.conf
subPath: gaas-application.conf
# dependency on mysql to be initialized before gaas can be initialized
initContainers:
- name: init-mysql
image: busybox:1.28
command: ["sh", "-c", "until nslookup mysql; do echo waiting for mysql; sleep 2; done;"]
command: ['sh', '-c', 'until nslookup mysql; do echo waiting for mysql; sleep 2; done;']


---
@@ -88,18 +93,3 @@ spec:
- name: standalone-config
mountPath: /home/gobblin/conf/standalone/application.conf
subPath: standalone-application.conf
---
apiVersion: v1
kind: Service
metadata:
name: gaas-svc
labels:
app: gobblin-service
spec:
type: NodePort
ports:
- port: 6956
protocol: TCP
targetPort: 6956
selector:
app: gaas
@@ -1,7 +1,10 @@
bases:
- ../base-cluster
resources:
- application.yaml
- mysql-deployment.yaml
- mysql-pv.yaml
patchesStrategicMerge:
- deployment.yaml
configMapGenerator:
- name: gaas-config
files:
@@ -30,7 +30,7 @@ spec:
persistentVolumeClaim:
claimName: mysql-pv-claim
containers:
- image: mysql:5.6
- image: mysql:5.6.45
name: mysql
env:
- name: MYSQL_RANDOM_ROOT_PASSWORD
@@ -5,7 +5,6 @@ metadata:
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
@@ -18,7 +17,6 @@ kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:

0 comments on commit 570f3d7

Please sign in to comment.