Skip to content
Nodir edited this page Mar 2, 2016 · 7 revisions

Welcome to the infrastructure wiki!

Cancer Genome Collaboratory user guide

Because Collaboratory is built using Openstack technology, which is open source and has publicly available documentation available at http://docs.openstack.org/user-guide/, this guide has only minimal instructions and it covers the following:

  • How to login into the Collaboratory environment
  • How to create a SSH key
  • How to create a new security group and customize it with security rules
  • How to start a new instance
  • How to use the ICGC-DCC client in order to download protected data
  • How to obtain your API settings from Dashboard
  • Cloud security best practices

How to login into the Collaboratory environment

In order to login into the Dashboard interface of the "Cancer Genome Collaboratory" please open https://console.cancercollaboratory.org in your browser, and login using the credentials provided to you.

NOTE: Access to the Dashboard and API endpoints is allowed only from the IP addresses listed in the enrollment form submitted by users.

Openstack provides end users with the ability to:

  • Define the parameters of a virtual computer (number of CPUs, amount of memory and storage) through the use of predefined "flavors" (or pre-set configurations)
  • Install their choice of operating system and layered software on virtual machines
  • Create their own virtual networks to attach instances as needed
  • Manage their own user accounts and security on these installations

Virtual resources use up real computing resources and therefore are controlled via resource quotas. Each Tenant/Project is assigned a limited amount of CPU, memory, so that one or a small number of users can't use up all the physical resources shared by everyone.

The operating system of the virtual machines that can be started in the cloud uses SSH key-based authentication, so before starting a new instance is recommended to create a SSH key. SSH keys have two parts: a public key that is saved in the cloud controller and later injected in the new instances you'll start, and a private key that you securely store locally and use to login into the cloud instances.


###How to create a SSH key

In order to create a SSH key-pair, go to Compute → Access & Security → Key Pairs and click Create Key Pair, and save the private key generated on your workstation/laptop.

When you launch a new instance you can choose this public key to be injected into it. You can then authenticate using the private key downloaded while creating the keypair (e.g. "ssh -i private_key.pem ubuntu@floating_ip"). If you want to allow other users SSH access to your instance and you don't want to share your private SSH key, then you can create new Linux users and add their public SSH keys into the "/home/USER/.ssh/authorized_keys".


###How to create a new security group and customize it with security rules

Security groups allow restricted access to the floating (or public) IP addresses associated to instances, acting as a firewall. The default rules block all access, so users should open up access only to the protocols and ports required, and only from the source addresses they want to allow. If the user doesn’t know from the beginning what type of access is needed, he/she can later create or change security groups and rules, and apply them to running instances. By default, the access is unrestricted between instances when initiating connections by using their private IP addresses.

Because of limited public IP addresses, most tenants will only be allocated one floating/public IP address which should be associated with a secure instance used as a "jump server".


###How to start a new instance

In order to start a new instance, go to Project → Compute → Instances and click on Launch Instance. Answer the questions regarding instance name, flavor, instance count and select “Boot from image” as “Instance Boor Source”. On the “Access & Security” tab select the name of the SSH key-pair you want to have injected in the new instance, as well as a security group that will be applied to it. If your project only has a single network defined, you can skip the “Networking” tab as well as “Post-Creation” and “Advanced Options” and just click the “Launch” button.

Launch instance The instance will be created in short time and will be allocated a private IP unreachable from the Internet. In order to access the instance, click on the drop-down list in the last column and select “Associate Floating IP” option, which will allow you to access it from the Internet.

Floating IP After getting access to the instance and installing additional software, the user can then snapshot the instance which can be used as a template to get new servers up and running quickly and more consistently. Before taking a snapshot of the instance, credentials and unneeded data should be deleted in order to keep the new image secure and small, which will also improve the provisioning time of future instances based on this image.


###How to use the ICGC-DCC client in order to download protected data

In order to download ICGC data from the Collaboratory object storage, you have to use the OICR developed client. Detailed instructions for installing and using the client are available at: https://dcc.icgc.org/icgc-in-the-cloud/guide

OICR also maintains a docker container with the latest version of this client available at: https://hub.docker.com/r/icgc/icgc-storage-client/

In addition, there are docker containers of common bioinformatics tools available at: https://www.dockstore.org/search

The cloud resources can be controlled through the dashboard, but power users might prefer to use the APIs in order to programmatically start/stop/configure instances, networks, IP addresses, volumes, etc. Openstack has REST-ful APIs that can be accessed from inside the IP space allocated to Collaboratory (142.1.177.0/24) as well as from the IP addresses listed in the enrollment form submitted by users. In order to programmatically control the cloud resources, the users can either install the Openstack clients or use SDKs (http://developer.openstack.org/)


###How to obtain your API settings from Dashboard

The settings needed to access the APIs can be obtained from Dashboard by going to Project → Compute → Access & Security → API Access and clicking on the “Download Openstack RC File”.


Security best practices when working in cloud environments:

  • Only allow access to the services and ports open on your instance from trusted IP addresses
  • Only allow SSH key-based access
  • Do not share SSH keys or other credentials
  • Always start by applying latest security updates to your instances
  • Terminate unused resources
  • Do not snapshot instances containing credentials and share them with other users