Skip to content

Latest commit

 

History

History
708 lines (468 loc) · 25 KB

moc_integration.adoc

File metadata and controls

708 lines (468 loc) · 25 KB

MOC Integration

Registering to Mass Open Cloud

  • In order to create an account, you should submit a request to MOC. It usually takes 2-3 business days to get approved.
    Follow this link → Request Account on MOC

  • Learn more about adding users to your Openstack project. Follow this link → Add Users to Openstack Project

  • If your access is approved, Follow these links to login to the Openshift and Openstack platforms → Openshift | Openstack

You should be able to login to these platforms by using your SSO credentials.

Adding Projects to Openshift and Openstack

In order to create new projects on Openshift and Openstack you can create a ticket here → Create a Ticket

If the projects are created, you should be able to login and view the projects.

Use your SSO credentials to login to https://kaizen.massopen.cloud to view your project in Openstack. (See the image below)

Openstack Project

Use your SSO credentials to login to https://k-openshift.osh.massopen.cloud:8443 to view your project in Openshift. (See the image below)

Openshift Project

You can create a project manually in the Openshift UI or by using a terminal command:

$ oc new-project cs6620-sp2021-integrated-med-ai --display-name="CS6620 Spring 2021 \
Integrating Medical AI Compute CPU/GPU worklows on the MOC -- PowerPC and x86-64 -- Using OpenShift"

Note: Make sure that you have OC libraries installed on your system. (See Installing Openshift CLI Tool )

Installing Openshift CLI Tool

It is recommended to install the Openshift CLI Tool on your system to ease the next steps working on MOC platform smoothly.

In order to download the Openshift CLI Tool, you will have to create/open a RedHat account. This method is reccomended in order to get the most up to date OC libraries.

Follow this link to get information on how to download and install the Openshift CLI Tool → Download Openshift CLI Tool

Alternative Method (Deprecated)

If you do not wish to create a RedHat account, you can download the packages from this GitHub repo: https://github.com/openshift/origin/releases

Creating Secrets on Openshift Using CLI Tool

After succesfully installing Openshift CLI Tool, you can login to Opeshift using CLI commands.

First go to Openshift and login using SSO credentials. After that click Copy Login Command as seen from the image below.

Openshift Login

Open a terminal and paste the login command. (Your token will differ from the example)

$ oc login https://k-openshift.osh.massopen.cloud:8443 --token=MQCsXU6Gs1DWNE0zk67eYA0A7eCmH0-cM576qZooRFY

Creating Secrets

Create kubecfg

oc project myproject
oc create secret generic kubecfg --from-file=$HOME/.kube/config -n myproject

Create pman_config

  1. Create a file example-config.cfg and add the following:

[AUTH TOKENS]
token = password
  1. Convert the configuration to base64 & copy the base64 encoded results with the following command:

cat example-config.cfg | base64
  1. Create a file example-secret.yml and add the encoded result:

apiVersion: v1
kind: Secret
metadata:
  name: pman-config
type: Opaque
data:
  pman_config.cfg: <base64 encoded configuration>
  1. Create the secret for pman

oc create -f example-secret.yml

Create swift-credentials

  1. Create a file swift-credentials.cfg and add the following:

[AUTHORIZATION]
osAuthUrl          = https://kaizen.massopen.cloud:13000/v3

[SECRET]
applicationId      = <Follow the below steps to generate applicationId>
applicationSecret  = <Follow the below steps to generate applicationSecret>

Follow these steps to create and applicationId and applicationSecret for the Openstack project:

    1) Visit the identity panel at https://onboarding.massopen.cloud/identity/
    2) Click the "+ Create Application Credential" button
    3) In the follow dialog, give your credential a name. You can leave the other fields blank.
    4) Click "Create Application Credential"
    5) This will present a window with an ID and secret. Record these values because you won't be able to retrieve them after closing the window.
  1. Create the secret swift-credentials

oc create secret generic swift-credentials --from-file=<path-to-file>/swift-credentials.cfg

If all the steps above went well, you should be able to see the secrets that were created succesfully

(chris_env) [cyoruk@localhost ChRISWORK]$ oc get secrets
NAME                       TYPE                                  DATA   AGE
builder-dockercfg-s4shq    kubernetes.io/dockercfg               1      155d
builder-token-5p9nl        kubernetes.io/service-account-token   4      155d
builder-token-xqpz2        kubernetes.io/service-account-token   4      155d
default-dockercfg-nh5s5    kubernetes.io/dockercfg               1      155d
default-token-n9lx8        kubernetes.io/service-account-token   4      155d
default-token-xb6x7        kubernetes.io/service-account-token   4      155d
deployer-dockercfg-hszz4   kubernetes.io/dockercfg               1      155d
deployer-token-fqvc5       kubernetes.io/service-account-token   4      155d
deployer-token-vcf2f       kubernetes.io/service-account-token   4      155d
kubecfg                    Opaque                                1      4d
pfioh-config               Opaque                                1      4d
pman-config                Opaque                                1      4d
swift-credentials          Opaque                                1      4d

Deploying pman on Openshift

Follow this link to download pmanhttps://github.com/Sandip117/pman-1

After downloading it, enter the subdirectory openshift:

cd pman/openshift

Note: The current version that supports flask is fnndsc/pman:flask. There is one place in the template where you need to change your project name. Look for a field saying OPENSHIFTMGR_PROJECT

Now edit the pman-openshift-template.json with your OPENSHIFT project name and updated pman docker image (See image below)

Pman Template

To deploy pman on Openshift we need a file that contains all the information about the service we’re going to deploy which is pman-openshift-template.json.

For deploying pman to Openshift:

oc new-app pman-openshift-template.json

After deploying pman, you can see it deployed and running on Openshift. (See image below)

Pman Overview

To delete pman

oc delete all -l app=pman
oc delete route pman

Deploying pfcon on Openshift

Follow this link to download pfiohhttps://github.com/Sandip117/pfcon

After downloading it, enter the subdirectory openshift:

cd pfcon/openshift

Note: The current version that supports flask is fnndsc/pfcon:pfiohless

To deploy pfcon on Openshift we need a file that contains all the information about the service we’re going to deploy which is pfcon-openshift-template.json.

Now update the COMPUTE_SERVICE_URL in pfcon-openshift-template.json with your pman route that you deployed in step 5. You can find your route with this command:

oc get route
Pfcon Template

For deploying pfcon to Openshift:

oc new-app pfcon-openshift-template.json

After deploying pfcon, you can see it deployed and running on Openshift. (See image below)

Pfcon Overview

To delete pfcon

oc delete all -l app=pfcon
oc delete route pfcon

Running Test Scripts on Openshift

There are a couple of prerequisites that we have to satisfy before running any plugins on Openshift.

Create a Python Virtual Environment

  1. Install the Python virtual environment creator

    • For Fedora → sudo dnf install python3-virtualenv

    • For Ubuntu → sudo apt install virtualenv virtualenvwrapper python3-tk

  1. Create a directory for your virtual environments

mkdir ~/python-envs
  1. Add these two lines to your .bashrc file

export WORKON_HOME=~/python-envs
source /usr/local/bin/virtualenvwrapper.sh
  1. Source your .bashrc and create a new Python3 virtual env

source .bashrc
mkvirtualenv --python=python3 chris_env
  1. Activate your virtual environment

workon chris_env

Note: To deactivate the virtual environment you can use deactivate command on the terminal

Install pfconclient

If you cretad the python virtual environment succesfully, you can install pfconclient:

pip install -U python-pfconclient

You can learn more about pfconclient: https://github.com/FNNDSC/python-pfconclient

Install httpie

For some of the scripts, you might need to install httpie:

pip install httpie

Download Test Scripts

You can download the test scripts from https://github.com/FNNDSC/ChRIS-E2E

Note: Sometimes, you can get an invalid response like 502 or 401 error when you execute the scripts. You have to recreate the secret kubecfg every time you log in. More information Troubleshoot

Running the Scripts

If you’ve succesfully completed all the prerequisites, you can start running the test scripts. First off, you need the routes of the services you deployed to run the scripts.

(chris_env) [cyoruk@localhost scripts]$ oc get routes
NAME    HOST/PORT                                     PATH   SERVICES   PORT       TERMINATION   WILDCARD
pfcon   pfcon-flask-chris.k-apps.osh.massopen.cloud          pfcon      5005-tcp                 None
pman    pman-flask-chris.k-apps.osh.massopen.cloud           pman       5010-tcp                 None
  1. Test pman

# $ http <pman-route>/api/v1/hello/


(chris_env) [cyoruk@localhost scripts]$ http pman-flask-chris.k-apps.osh.massopen.cloud/api/v1/hello/
HTTP/1.0 200 OK
Cache-control: private
Connection: keep-alive
Content-Length: 1171
Content-Type: application/json
Date: Mon, 19 Apr 2021 17:52:14 GMT
Server: Werkzeug/1.0.1 Python/3.8.5
Set-Cookie: 8f72863408ccaf75ef5904d263aa663f=6b2c25e4b707fd5a818643eecefe12d7; path=/; HttpOnly

{
    "d_ret": {
        "message": "pman says hello from openshift 😃",
        "sysinfo": {
            "cpu_percent": 1.2,
            "cpucount": 56,
            "hostname": "pman-1-45hv5",
            "inet": "10.128.9.19",
            "loadavg": [
                0.39,
                0.67,
                0.51
            ],
            "machine": "x86_64",
            "memory": [
                115996803072,
                105224880128,
                9.3,
                10000596992,
                63990882304,
                28992512000,
                17136709632,
                2138112,
                42003185664,
                14237696,
                4056023040
            ],
            "platform": "Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.29",
            "system": "Linux",
            "uname": [
                "Linux",
                "pman-1-45hv5",
                "3.10.0-1127.el7.x86_64",
                "#1 SMP Tue Feb 18 16:39:12 EST 2020",
                "x86_64",
                "x86_64"
            ],
            "version": "#1 SMP Tue Feb 18 16:39:12 EST 2020"
        }
    },
    "status": true
}
  1. Test pfcon

Create a folder /tmp/small & add some files above 100KB to that folder first. Then run the below script to run a job. A simple dataset of .mgz files can be found here → mgz_converter_dataset.

# $ ./post_pfcon_ds <pfcon-route> <job-id>


(chris_env) [cyoruk@localhost scripts]$ ./post_pfcon_ds pfcon-flask-chris.k-apps.osh.massopen.cloud jid04201513

Submitting job jid04201513 to pfcon service at -->http://pfcon-flask-chris.k-apps.osh.massopen.cloud/api/v1/<--...
Waiting for 2s before next polling for job status ...

Polling job jid04201513 status, poll number: 1
Job jid04201513 status: ['started']
Waiting for 4s before next polling for job status ...

Polling job jid04201513 status, poll number: 2
Job jid04201513 status: ['started']
Waiting for 8s before next polling for job status ...

Polling job jid04201513 status, poll number: 3
Job jid04201513 status: ['started']
Waiting for 16s before next polling for job status ...

Polling job jid04201513 status, poll number: 4
Job jid04201513 status: finishedSuccessfully

Downloading and unpacking job jid04201513 files...
Number of files to decompress at /tmp/jid04201513: 29
Done

Deleting job jid04201513 data from the remote...
Done

We can see that the containers are created in the Openstack environment.

Pfcon Output

(Additional Step) Building a Local Openshift Cluster via CRC

Note: This step is focused on bringing a minimal OpenShift 4.x cluster to your local laptop or desktop computer. If you are looking for a solution for running OpenShift 3.x , you will need tools such as OpenShift Origin, Minishift or CDK. The step below provides an example for running OpenShift 3.x.

This additional step is helpful for people who build ChRIS plugins/services to test/debug applications locally before testing it on the cloud environment.

There are couple steps involved to build a local Openshift 4.x cluster.

Download CodeReady Containers

Select your OS and Download CodeReady Containers binaries with an embedded OpenShift disk image from CodeReady Containers (See Image Below)

CodeReady Containers

After downloading CodeReady containers, extract it and place the executable in your $PATH (You can check your $PATH with $ echo $PATH)

$ tar -xf crc-linux-amd64.tar.xz (Extract CodeReady Containers)
$ cp -r crc-linux-amd64 $PATH (Place the executable in one of your $PATH)

You need to Download or copy your pull secret. The install program will prompt you for your pull secret during installation.

Note: In order to download the CodeReady Containers, you will have to create/open a RedHat account.

Install CodeReady Containers

CodeReady Containers requires the libvirt and NetworkManager packages to run on Linux. Consult the following code block to find the command used to install these packages for your Linux distribution:

  • Fedora → sudo dnf install NetworkManager

  • Red Hat Enterprise Linux/CentOS → su -c 'yum install NetworkManager'

  • Debian/Ubuntu → sudo apt install qemu-kvm libvirt-daemon libvirt-daemon-system network-manager

Set up the CodeReady Containers. We’re going to use the Pull Secret that we copied from the CodeReady Container page.

$ crc setup

Start the CodeReady Containers virtual machine:

$ crc start

Login to the Openshift Cluster as a developer:

$ oc login -u developer https://api.crc.testing:6443

Deploying pfcon and pman

Deploying pfcon and pman to local Openshift cluster is the same with deploying it on MOC. You can follow the referenced headers to deploy them.

  • Create a new project in the local Openshift cluster

oc new-project local-chris

(Alternative Way) Building a Local Openshift Cluster via OpenShift Origin

This additional step is helpful for people who build ChRIS plugins/services to test/debug applications locally before testing it on the cloud environment, especially for people that require a local OpenShift cluster that has the same version as the MOC (3.11 at the time of writing). This step uses Ubuntu 20.04 LTS.

Prerequisite

You need to have Docker CE installed. See Docker’s document for detailed steps. Make sure you add your user to the docker group after installation.

Download OpenShift Origin

At the time of writing, the newest version available is 3.11. Find all releases here.

Download OpenShift Origin, extract it and place the executable in your $PATH (You can check your $PATH with $ echo $PATH):

$ wget https://github.com/openshift/origin/releases/download/v3.11.0/openshift-origin-client-tools-v3.11.0-0cbc58b-linux-64bit.tar.gz
$ tar xvzf openshift*.tar.gz
$ cd openshift-origin-client-tools*/
$ mv oc kubectl $PATH (Place the executable in one of your $PATH)

At this point you should be able to call oc to check version:

$ oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
kubernetes v1.11.0+d4cacc0

Run OpenShift Origin

Before bringing up the cluster, configure the Docker daemon so it can use an insecure registry:

$ cat << EOF | sudo tee /etc/docker/daemon.json
{
    "insecure-registries" : [ "172.30.0.0/16" ]
}
EOF

Then restart docker service:

$ sudo systemctl restart docker

If you have a public hostname or IP address, you can specify that so OpenShift Origin will use that address. If not, use your local network IP address (find it with $ ifconfig), or just use localhost (not specifying public_hostname; discouraged as it may cause other issues):

$ oc cluster up --public-hostname=<your hostname or IP>

Access the web portal at:

https://<server>:8443/console

Login as a user:

$ oc login -u developer <hostname>:8443

Login as kube admin (for debugging):

$ oc login -u system:admin <hostname>:8443

Issue Resolution

With OpenShift Origin, there can be some obscure issues with IP config and name resolution. Here are some tips.

Restarting helps

Sometimes the configuration needs to be overwritten and the updates are not in place until it runs again. Run oc cluster down then oc cluster up <args> to apply these changes.

Specify what IP the server uses

If not, sometimes it defaults to 127.0.0.1 which could cause other problems. You could use your local network IP (find it with $ ifconfig).

Redirection problem

If the cluster constantly redirects from the server address you specified to 127.0.0.1, there are two solutions:

1) Check this file under the dir you ran oc cluster up:

$ sudo vim ./openshift.local.clusterup/openshift-controller-manager/openshift-master.kubeconfig

Search for this line:

server: https://127.0.0.1:8443

Replace with:

server: https://<host_ip>:8443

Then run oc cluster up --public-hostname=<host_ip>.

2) A workaround is to setup a tunnel:

$ sudo ssh -L 8443:localhost:8443 -f -N <username>@<host_ip>

Name resolution problem

If docker pull works fine (check this in pod’s events section) but processes in pods cannot resolve names…​

Try restarting first. (oc cluster down then oc cluster up <args>)

If the problem persists, bring down the cluster, find the name resolution file inside the dir created by running oc cluster up:

$ sudo vim ./openshift.local.clusterup/kubedns/resolv.conf

Find the nameserver line, and change it to 8.8.8.8 (Google’s DNS server). Start the cluster.

Power9 Cluster (Ongoing Progress)

Power9 is a architecture that is heavy on compute. You might want to use this cluster if you are planning to run heavy applications such as machine learning models. The deployment of pfcon and pman is pretty similiar to x86 however, you need images that are built for linux/ppc64le which means that you need to build images for specifically for this. You can learn more about building multiarch images here → Multi-Arch builds. The current images that I’m using and still is under devleopment is: cagriyoruk/pfcon:flaskP9 and cagriyoruk/pman:flaskP9.

Troubleshoot Errors

HTTP 400 Bad Request

This indicates that the server couldn’t understand the request due to invalid syntax. Check Openshift logs to find out the exact issue.

HTTP 401 Unauthorized

If you’re getting an HTTP 401 error, there are couple things you can do.

  1. Double check your swift-credentials secret is to see if it’s missing anything.

  1. Add --authToken password at the of the script that your trying to run.

  1. Double check if the auid is correct in the script.

  1. Recreate secret kubecfg (Every time you log in you need to recreate the kubecfg)

HTTP 409 Conflict

If your getting a HTTP 409 error, it’s likely that you already have a same jid(job id). Check Openshift storage to see if there are existing persistent storage. If yes, you can delete it and run the application again.

Feedback and Future Work

This is my feedback about working on both x86 and power9 openshift cluster over a certain period of time.

X86 Progress and Updates

  1. MOC integartion document provided a quicker onboarding experience for new student teams.

  2. We were able to test/deploy ChRIS services and plugins as long as the cluster was available.

  3. The only issue that I’ve and the student teams encountered is the docker pulling limit. This still needs some attention considering more and more people are joining the project. Creating a pull secret doesn’t work since the service accounts builder and deployer are using the default kubernetes/dockercfg secret. There should be a way to use docker credentials to modify the builder and the deployer so that we can’t prevent having the pulling limit at some certain point.

P9 Openshift Progress

  1. Most of the time it’s unavailable. We have to create a ticket every time we can’t access it.

  2. The time I had access to it, I deployed a ppc64le version of pfcon and pman. The logs of the deployment looked fine.

  3. When I run a ChRIS plugin on P9 openshift, the plugin wasn’t getting executed because the pod wasn’t getting bounded to a shared persistent volume.

  4. The next step is seeing if the containers are being created in Openstack environment after executing the plugin.