How to Install SCF

satadruroy edited this page Dec 3, 2018 · 143 revisions

Table of Contents

Version

Note these instructions are applicable to SCF 2.14.5. For previous release install instructions see the SCF install pages in the sidebar.

Requirements for Kubernetes

The various machines (api, kube, and node) of the kubernetes cluster must be configured in a particular way to support the execution of SCF. These requirements are, in general:

  • Kubernetes API versions 1.8+ (tested on 1.9)
  • Kernel parameters swapaccount=1
  • docker info must not show aufs as the storage driver.
  • kube-dns must be running and be fully ready. See section Kube DNS.
  • Either ntp or systemd-timesyncd must be installed and active.
  • The kubernetes cluster must have a storage class SCF can refer to. See section Storage Classes.
  • Docker must be configured to allow privileged containers.
  • Privileged container must be enabled in kube-apiserver. See https://kubernetes.io/docs/admin/kube-apiserver
  • Privileged must be enabled in kubelet.
  • The TasksMax property of the containerd service definition must be set to infinity.
  • Helm's Tiller has to be installed and active.

An easy way of setting up a small single-machine kubernetes cluster with all the necessary properties is to use the Vagrant definition in the SCF repository. The details of this approach are explained in https://github.com/SUSE/scf/blob/develop/README.md#deploying-scf-on-vagrant

Verifying Kubernetes

For ease of verification of the above requirements a script (kube-ready-state-check.sh) is made available which contains the necessary checks.

The important part to know is this script must be run on different machines. A machine "category" is defined in the script to tell it which pieces of the script to run. The categories are described below:

Category Explanation
api Run this on the machine running the apiserver container
node Run this on the machine that's running the kubelet container
kube Run this on the machine that has access to the Kubernetes cluster via kubectl

Note: In CaaSP run api on the kube-master, node on the kube-workers, and kube on your desktop/laptop that has kubectl installed and connected to CaaSP. However, on EKS/AKS/GKE the readiness script can only be run on the worker nodes as EKS, AKS or GKE do not expose the master node.

An example invocation that you might run on the machine running the apiserver container might be:

./kube-ready-state-check.sh api

The script will run the tests applicable to the named category. Positive results are prefixed with Verified: , whereas failed requirements are prefixed with Configuration problem detected:.

Kube DNS

The cluster must have an active kube-dns. If you are running CaaSP you can simply use the following command to install it:

kubectl apply \
  -f https://raw.githubusercontent.com/SUSE/caasp-services/b0cf20ca424c41fa8eaef6d84bc5b5147e6f8b70/contrib/addons/kubedns/dns.yaml

Storage Classes

The kubernetes cluster must have a storage class SCF can refer to so that its database components have a place for their persistent data.

This class may have any name, in the case of vagrant it uses persistent.

Important information on storage classes and how to create and configure them can be found here:

Note: while the distribution comes with an example storage-class persistent of type hostpath, for use with the vagrant box, this is a toy option and should not be used with anything but the vagrant box. It is actually quite likely that whatever kube setup is used will not even support the type hostpath for storage classes, automatically preventing its use.

To enable hostpath support for testing, the kube-controller-manager must be run with the --enable-hostpath-provisioner command line option.

Cloud Foundry Console UI (Stratos UI)

See https://github.com/SUSE/stratos-ui/releases for distributions of Stratos UI - the Cloud Foundry Console UI. It it also deployed using Helm. Please follow the steps below to see when to install it.

Helm installation

SCF uses Helm charts to deploy on kubernetes clusters. To install Helm see

SCF Installation

Known Issues

#### Mixed case DOMAINs in values.yaml

Using something like this in the scf-config-values.yaml will results in an error:

UAA_HOST: uaa.751LSjkQ.mydomain.com

error (when visiting https://uaa.751lsjkq.mydomain.com:2793/login):

The subdomain does not map to a valid identity zone.

the reason is the way uaa matches hostnames internally and is should probably be considered a bug in UAA (https://github.com/cloudfoundry/uaa/issues/797). This issue has been resolved and merged upstream.

Recovering a failed installation/upgrade

If an installation or upgrade results in a failed installation with StatefulSet roles not coming online, subsequent upgrades must be followed by manually restarting the pods in the offline StatefulSet roles.

This happens because Kubernetes won't replace a previous generation pod of a StatefulSet unless it's alive and ready. To recover, you must manually delete the pods of the failing StatefulSets:

# Look and see which StatefulSets are not ok (*desired* count is more than *current* count)
kubectl get sts --namespace NAMESPACE

# Delete the offending pods
kubectl delete pods -l skiff-role-name=STATEFULSET_NAME --namespace NAMESPACE

Downloading the archive

Get the distribution archive from https://github.com/SUSE/scf/releases (the first link under Assets, not the Source code). Create a directory and extract the archive into it.

wget  https://github.com/SUSE/scf/releases/download/scf-X.Y.Z.linux-amd64.zip  # example url
mkdir deploy
unzip scf-X.Y.Z.linux-amd64.zip -d deploy                                      # example zipfile
cd    deploy
> ls
helm/
kube/
kube-ready-state-check.sh*
scripts/

We now have the helm charts for SCF and UAA in a subdirectory helm. Additional configuration files are found under kube. The scripts directory contains helpers for cert generation.

Choosing a Storage Class

Choose the name of the kube storage class to use, and create the class if it doesn't exist. See section Storage Classes for important notes. To see if you have a storage class you can use for scf run the command: kubectl get storageclasses.

Note: The persistent class created below is of type hostpath which is only meant for toy examples and is not to be used in production deployments (it's use is disabled in Kubernetes by default).

Here we use the hostpath storage class for simplicity of setup. Note that the storageclass apiVersion used in the manifest should either be storage.k8s.io/v1beta1 (for kubernetes 1.5.x) or storage.k8s.io/v1 (for kubernetes 1.6.x) the storageclass apiVersion used in the manifest should be storage.k8s.io/v1

Use kubectl to check your kubernetes server version:

kubectl version --short | grep "Server Version"

For kubernetes 1.5.x:

echo '{"kind":"StorageClass","apiVersion":"storage.k8s.io/v1beta1","metadata":{"name":"persistent"},"provisioner":"kubernetes.io/host-path"}' | kubectl create -f -

For kubernetes 1.6.x and 1.7.x: For kubernetes 1.6.x 1.8 and above:

echo '{"kind":"StorageClass","apiVersion":"storage.k8s.io/v1","metadata":{"name":"persistent"},"provisioner":"kubernetes.io/host-path"}' | kubectl create -f -

Configuring the deployment

Next create a values.yaml file (the rest of the docs assume filename: scf-config-values.yaml) with the settings required for the install. Copy the below as a template for this file and modify the values to suit your installation.

env:
    # Domain for SCF. DNS for *.DOMAIN must point to a kube node's (not master)
    # external ip address.
    DOMAIN: cf-dev.io

    # UAA host/port that SCF will talk to. If you have a custom UAA
    # provide its host and port here. If you are using the UAA that comes
    # with the SCF distribution, simply use the two values below and
    # substitute the cf-dev.io for your DOMAIN used above.
    # UAA_HOST: uaa.cf-dev.io
    # UAA_PORT: 2793

kube:
    # The IP address assigned to the kube node pointed to by the domain.
    #### the external_ip setting changed to accept a list of IPs, and was 
    #### renamed to external_ips 
    external_ips:
    - 192.168.77.77
    storage_class:
        # Make sure to change the value in here to whatever storage class you use
        persistent: "persistent"
        shared: "shared"
    auth: rbac

secrets:
    # Password for user 'admin' in the cluster
    CLUSTER_ADMIN_PASSWORD: changeme

    # Password for SCF to authenticate with UAA
    UAA_ADMIN_CLIENT_SECRET: uaa-admin-client-secret

Deploy Using Helm

The previous section gave a reference to the Helm documentation explaining how to install Helm itself. Remember also that in the Vagrant-based setup helm is already installed and ready.

  • Deploy UAA

    helm install helm/uaa \
        --namespace uaa \
        --values scf-config-values.yaml \
        --name uaa
    
  • With UAA deployed and running, get the internal-ca-cert for talking to the UAA

    SECRET=$(kubectl get pods --namespace uaa -o jsonpath='{.items[?(.metadata.name=="uaa-0")].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretKeyRef.name}')
    CA_CERT="$(kubectl get secret $SECRET --namespace uaa -o jsonpath="{.data['internal-ca-cert']}" | base64 --decode -)"
    
    

    Note that secrets are versioned and the numerical suffix on the secret name will change if you upgrade the chart; please check helm list or kubectl get secrets --namespace uaa for the correct number.

  • With UAA deployed, use Helm to deploy SCF. This step uses the cert determined by the previous step.

    helm install helm/cf \
        --namespace scf \
        --name scf \
        --values scf-config-values.yaml \
        --set "secrets.UAA_CA_CERT=${CA_CERT}"
    
  • Wait for everything to be ready:

    watch -c 'kubectl get pods --all-namespaces'
    

    Stop watching when all pods show state Running and Ready is n/n (instead of k/n, k < n).

Installing the Cloud Foundry UI (Stratos UI)

Stratos UI is also deployed using Helm.

Add the Stratos UI Helm Repository with the command:

helm repo add stratos-ui https://cloudfoundry-incubator.github.io/stratos

Deploy Stratos UI: (do this from the folder where you created the scf-config-values.yaml configuration file)

helm install stratos-ui/console \
    --namespace stratos \
    --values scf-config-values.yaml

This will install Stratos UI using the configuration that you created in the scf-config-values.yaml previously.

Please see here - Accessing the Console - for details on how to determine the URL of your Stratos Console UI.

When deploying with the SCF config values, you should be able to login with your Cloud Foundry credentials. If you see an upgrade message, please wait up to a minute for the installation to complete.

If you do not wish to use the SCF configuration values, then more information is available on deploying the UI in Kubernetes here - https://github.com/SUSE/stratos-ui/tree/master/deploy/kubernetes.

Note: If you deploy without the SCF configuration you will need to use the Setup UI to provider UAA configuration. Typical values are:

  • UAA URL: This is composed of https://NAMESPACE.uaa.DOMAIN:2793 (ie. https://scf.uaa.10.10.10.10.nip.io:2793)
  • Client ID: cf
  • Client Secret: EMPTY (do not fill in this box)
  • Admin Username: User provided value
  • Admin Password: User provided value

Using the Universal Service Broker

These example instructions deploy a MySQL server and an according sidecar as Cloud Foundry docker apps and expose the service via USB.

CF_DOMAIN=cf-dev.io # Set to match the DOMAIN value of your config
CF_MYSQL_DOMAIN="mysql.${CF_DOMAIN}"
SERVER_APP=mysql
MYSQL_USER=root
MYSQL_PASS=testpass
SIDECAR_API_KEY=secret-key
SIDECAR_APP=msc

# Create a shared domain
cf create-shared-domain "${CF_MYSQL_DOMAIN}" --router-group default-tcp
cf update-quota default --reserved-route-ports -1

# Create a security group
echo > "internal-services.json" '[{ "destination": "0.0.0.0/0", "protocol": "all" }]'
cf create-security-group       internal-services-workaround internal-services.json
cf bind-running-security-group internal-services-workaround
cf bind-staging-security-group internal-services-workaround

# Enable docker support in diego
cf enable-feature-flag diego_docker

# Deploy mysql server
cf push --no-start --no-route --health-check-type none "${SERVER_APP}" -o mysql/mysql-server
cf map-route "${SERVER_APP}" "${CF_MYSQL_DOMAIN}" --random-port
cf set-env   "${SERVER_APP}" MYSQL_ROOT_PASSWORD "${MYSQL_PASS}"
cf set-env   "${SERVER_APP}" MYSQL_ROOT_HOST '%'
cf start     "${SERVER_APP}"
MYSQL_PORT=`cf routes | grep $CF_MYSQL_DOMAIN | awk '{print $3}'`

# Wait for MySQL to be ready
function wait_on_port
{
  endpoint="${CF_MYSQL_DOMAIN}:${1}"
  for (( i = 0; i < 12 ; i++ )) ; do
    if curl --fail -s -o /dev/null "${endpoint}" ; then
      break
    fi
    sleep 5
  done
  # Last try, any error will abort the test
  curl -s "${endpoint}" > /dev/null
}
wait_on_port "${MYSQL_PORT}"

# Push the sidecar app

cf push "${SIDECAR_APP}" --no-start -o splatform/cf-usb-sidecar-dev-mysql
cf set-env "${SIDECAR_APP}" SIDECAR_API_KEY    "${SIDECAR_API_KEY}"
cf set-env "${SIDECAR_APP}" SERVICE_MYSQL_HOST "${CF_MYSQL_DOMAIN}"
cf set-env "${SIDECAR_APP}" SERVICE_MYSQL_PORT "${MYSQL_PORT}"
cf set-env "${SIDECAR_APP}" SERVICE_MYSQL_USER "${MYSQL_USER}"
cf set-env "${SIDECAR_APP}" SERVICE_MYSQL_PASS "${MYSQL_PASS}"
cf start   "${SIDECAR_APP}"

# Install cf-usb-plugin from https://github.com/SUSE/cf-usb-plugin/releases
# Download the zip archive you need, unpack it, then
cf install-plugin ./cf-plugin-usb

# Verify that USB is OK
cf usb-info

# Create a driver endpoint to the mysql sidecar
# Note that the -c ":" is required as a workaround to a known issue
cf usb-create-driver-endpoint my-service "https://${SIDECAR_APP}.${CF_DOMAIN}" "${SIDECAR_API_KEY}" -c ":"

# Check the service is available in the marketplace and use it
cf marketplace
cf create-service my-service default mydb
cf services

High Availability

To deploy an HA version of SCF, amend the values.yaml file you're using with helm install with the following - note that some of the role names have changed from the previous release.

sizing:
  api_group:
    count: 2
  cc_clock:
    count: 2
  cc_uploader
    count: 2
  cc_worker
    count: 2
  cf_usb:
    count: 2
  diego_api:
    count: 3
  diego_brain:
    count: 2
  diego_cell:
    count: 3
  diego_ssh
    count: 2
  doppler:
    count: 2
  log_api:
    count: 2
  mysql:
    count: 2
  nats:
    count: 2
  nfs_broker
    count: 2
  router:
    count: 2
  routing_api:
    count: 2
  syslog_scheduler:
    count: 2
  tcp_router:
    count: 2

The below role's HA pods will enter in passive state and won't show a ready state: * diego-api * diego-brain * routing-api

You can confirm this by looking at the logs inside the container. The logs will state .consul-lock.acquiring-lock.

You can also optionally enable the Application Autoscaler and Credhub features which are turned off by default. To do so amend the values in the values.yaml to include the following:

sizing:
...
autoscaler_api:
    count: 2
  autoscaler_metrics:
    count: 2
  autoscaler_postgres:
    count: 1
  credhub_user:
    count: 1
...

Note that credhub is considered an experimental feature on Azure AKS.

Known Issues

  • roles that cannot be scaled:
    • tcp-router (no strategy for exposing ports correctly)
    • blobstore (needs shared volume support and an active/passive configuration)
  • some roles follow an active/passive scaling model, meaning all pods except one (the active) will be shown as NOT READY by kubernetes; this is appropriate and expected behavior:
  • the resources required to run an HA deployment are considerably higher; for example, running HA in the vagrant box requires at least 24GB memory, 8 VCPUs and fast storage
  • when moving from a basic deployment to an HA one, the platform will be unavailable while the upgrade is happening
  • upgrading from a basic deployment to an HA one is not currently possible, because secrets get rotated even though reuse-values is specified when doing helm upgrade (jandubois: secrets should not get rotated when doing a helm upgrade ever since we switched to using the scf-secrets-generator mechanism)

Testing the Deployment

  • Basic operation of the deployed SCF can be verified by running the CF smoke tests.

    To invoke the tests, you must first modify the kube/cf/bosh-task/smoke-tests.yaml's DOMAIN parameter to match your config.

    Then run the command

    kubectl create \
       --namespace=scf \
       --filename="kube/cf/bosh-task/smoke-tests.yaml"
    
    # Wait for completion
    kubectl logs --follow --namespace=scf smoke-tests
    
  • If the deployed SCF is not intended as a production system then its operation can be verified further by running the CF acceptance tests.

    CAUTION: tests are only meant for acceptance environments, and while they attempt to clean up after themselves, no guarantees are made that they won't change the state of the system in an undesirable way. -- https://github.com/cloudfoundry/cf-acceptance-tests/

    To invoke the tests, you must first modify the kube/cf/bosh-task/acceptance-tests.yaml's DOMAIN parameter to match your config.

    Then run the command

    kubectl create \
       --namespace=scf \
       --filename="kube/cf/bosh-task/acceptance-tests.yaml"
    
    # Wait for completion
    kubectl logs --follow --namespace=scf acceptance-tests
    

Notes on CaaSP

There are some slight changes when running SCF on CaaSP. Main difference in the configuration are domain, ip address, and storageclass. Related to that, there are additional commands to generate and feed CEPH secrets into the kube, for use by the storageclass:

cat > scf-config-values.yaml <<END
env:
    # Domain for SCF. DNS for *.DOMAIN must point to the kube node's
    # external ip. This must match the value passed to the
    # cert-generator.sh script.
    DOMAIN: 10.0.0.154.nip.io
kube:
    # The IP address assigned to the kube node. The example value here
    # is what the vagrant setup assigns
    external_ips: 
    - 10.0.0.154
    storage_class:
        persistent: persistent
secrets:
    # Password for the cluster
    CLUSTER_ADMIN_PASSWORD: changeme

    # Password for SCF to authenticate with UAA
    UAA_ADMIN_CLIENT_SECRET: uaa-admin-client-secret
END

kubectl create namespace uaa

# Use Ceph admin secret for now, until we determine how to grant appropriate permissions for non-admin client.
kubectl get secret ceph-secret-admin -o json --namespace default | jq ".metadata.namespace = \"uaa\"" | kubectl create -f -

helm install helm/uaa \
    --namespace uaa \
    --values scf-config-values.yaml

kubectl create namespace scf
kubectl get secret ceph-secret-admin -o json --namespace default |sed's/"namespace": "default"/"namespace": "uaa"/' | kubectl create -f -

CA_CERT="$(kubectl get secret secret --namespace uaa -o jsonpath="{.data['internal-ca-cert']}" | base64 --decode -)"

helm install helm/cf \
    --namespace scf \
    --values scf-config-values.yaml
    --set "secrets.UAA_CA_CERT=${CA_CERT}"

Non-permissive RBAC on CaaSP 2

If the error message: "" appears when attempting to run helm install, then the RBAC permissions on your Kubernetes installation are too restrictive.
Run

  kubectl create clusterrolebinding permissive-binding \
  --clusterrole=cluster-admin \
  --user=admin \
  --user=kubelet \
  --group=system:serviceaccounts

this makes the underlying Kubernetes less restrictive and installation can continue ref: https://kubernetes.io/docs/admin/authorization/rbac/#permissive-rbac-permissions

Removal and Cleanup via helm

First delete the running system at the kube level

    kubectl delete namespace uaa
    kubectl delete namespace scf

This will especially remove all the associated volumes as well.

After that use helm list to locate the releases for the SCF and UAA charts and helm delete to remove them at helm's level as well.

CF documentation

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.