Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
92 lines (61 sloc) 3.68 KB
layout title description category tags
post
Using Google cloud computing
research
gce
gcs

{% include JB/setup %}

I recently started using cloud computing services.

Amazon seems to be the preferred provider of cloud services and they do rightly so: their breath of services and their customization is currently unparalleled. Although I had experimented with some Amazon Web Services (AWS) before (e.g. S3 storage), I had never used it for computing.

Unfortunately, Amazon has quite restrictive limits for new users (while you can't get out of the Free Tier):

  • limit of two usage zones (this wouldn't be a problem, weren't it for:)
  • all possible zones to choose from are in the US
  • really weak VMs available

I contacted service to change my zones and have my permissions raised and start real work, willing to pay the costs, but the issue took 3 days to be responded with basically "though luck" as reply and this seems like a general pattern.

So I figured other providers might give better conditions to starting users due to their smaller market share. So it was with Google Cloud Compute.

Features:

  • credit of 300$ to spend over 60 days - very attractive;
  • unrestricted choice of zones;
  • more choice of VMs for starting users;
  • simpler interface (also less features);
  • competitive prices per hour and Gb storage compared with AWS.

Computing and storage aren't as separated as in AWS. The computing service is called Google Cloud Engine - similar to AWS' EC2. Long-term storage is called Google Storage and is equivalent to AWS' S3. Disks can be mounted on instances in a way equivalent to AWS' EBS storage.

Following is a series of notes on how to interface with GCE and GCS, written mostly for the future me.

Instances

Mounting new disks in instances:

df -h  # see mounted volumes
sudo mkdir /projects
sudo chown user:user /projects
sudo /usr/share/google/safe_format_and_mount -m "mkfs.ext4 -F" /dev/sdb /projects

Set to mount at startup - add:

/dev/sdaX /media/mydata ext4 defaults 0 0

to /etc/fstab

Sftp

You can give an external IP to your instances and transfer files easily.

You can use Filezilla by adding your instance key (Edit -> Preferences -> SFTP -> Add key...) and using sftp://<user>@<externalIP>.

Images

Pretty much similar to AWS EC2: create a new instance, install all your software and save an image of the instance. Next time start a new instance with this image and voilá all your software is there.

Unfortunately, I haven't found a way of sharing images 😞.

Tools

  • gcloud : manage services, instances, configurations, permissions
  • gsutil : manage cloud storage (upload, download to and from local)

Uploading to gcs

Upload in parallel to Google cloud storage:

pip install crcmod
# configure ~/.boto
# uncomment parallel_process_count line
# or use this: https://github.com/afrendeiro/dotfiles/blob/master/.boto
# with Rsync
gsutil -m rsync -r . gs://storage/data/
# selectively using grep
ls /localdir/data/mapped | grep .dups.bam | \  # grep samples
grep -v _string_ | \  # exclude some samples based on some string
gsutil -m cp -I gs://storagedir/data/mapped/  # upload

Change permissions

e.g. upload bigwig tracks and hub, make them publicly accessible

gsutil -m rsync data/bigWig gs://storage/bigWig/
gsutil cp trackHub_hg19.txt gs://storage/bigWig/
gsutil -m acl ch -g All:R gs://storage/bigWig/*

Auto-resumable uploads, pretty fast.

Uploaded ~250 bam files (1-5 Gb each) overnight!