How to use Torvalds server

Laura edited this page Sep 21, 2018 · 7 revisions

A new server has been set up called torvalds with the following specifications:

32 CPUs, 256 GB RAM and 8 GPUs Nvidia GeForce GTX 1080 Ti
Ubuntu 18.04.1 LTS (Bionic Beaver)

If you want to use it you have to send a mail to Adrian to request an account.

  • Disks and quotas
    Your /home directory is stored in a SSD.
    Your /data directory is stored in a HDD. You should have a link /home/username/data -> /data/username

Tu check your quotas, current usage and limits, run:

$ /usr/sbin/xfs_quota -x -c 'report -h' 2>/dev/null

  • Use of GPUs

Users don’t have direct access to the GPUs. Instead, you have a user-gpu user that does have access to the GPU though the queue system (Slurm). Some tutorials:
http://www.arc.ox.ac.uk/content/slurm-job-scheduler https://support.ceci-hpc.be/doc/_contents/QuickStart/SubmittingJobs/SlurmTutorial.html

You must use the wrapper gpu to run any slurm command as user-gpu. Example:

$ gpu sbatch slurm.job
$ gpu squeue
$ gpu scancel 123

Nvidia driver version 390.77 is installed.
Cuda toolkit 8.0 (default) and 9.1 are installed on /usr/local.

  • Scipion in torvalds

There is a general Scipion installation in /usr/local/scipion.
It has been compiled with CUDA=True and OpenCV (some changes required).
Packages installed are Gautomatch 0.53, Gctf 1.06, bsoft1.9.0, chimera 1.10.1, ctffind4 4.1.10, eman 2.12, frealign 9.07, gEMpicker 1.1, motioncor2 1.1.0, relion 2.1, resmap 1.1.5s2, spider 21.13.
Gautomatch, Gctf, motioncor2, relion and xmipp work with cuda 8. Gempicker does not work cause it needs cuda 7.5 or below which is not installed. Remember to adapt your $HOME/.config/scipion/scipion.conf variables to point to the right cuda and binaries.

  • Installing your own Scipion

If you want to install your own Scipion you need some changes:

Scipion must use queues. Otherwise, jobs will not work (errors like “no CUDA available” will appear). The main Scipion in Torvalds (/usr/local/scipion/) has already been configured to use the gpu wrapper. Copy /usr/local/scipion/config/hosts.conf to your own installation.

Sqlite files must have 664 permissions (so user and user-gpu can both access those files).
You need to modify the following in pyworkflow/mapper/sqlite_db.py (check with a diff on /usr/local/scipion if you are not sure):

+import os  
  
 from pyworkflow.utils import envVarOn  
  
@@ -50,6 +51,10 @@ class SqliteDb():  
              self.connection = self.OPEN_CONNECTIONS[dbName]  
          else:  
              self.connection = sqlite.Connection(dbName, timeout, check_same_thread=False)  
+            try:  
+                os.chmod(dbName, 0664)  
+            except Exception as exc:  
+                print ("cannot set permission", dbName, exc)  
             self.connection.row_factory = sqlite.Row  
             self.OPEN_CONNECTIONS[dbName] = self.connection  

Change mpi paths on your scipion/config/scipion.conf to the right ones

#MPI_INCLUDE = /usr/lib64/mpi/gcc/openmpi/include
#MPI_BINDIR = /usr/bin
MPI_LIBDIR = /usr/lib/x86_64-linux-gnu/openmpi/lib
MPI_INCLUDE = /usr/lib/x86_64-linux-gnu/openmpi/include
MPI_BINDIR = /usr/bin 

To compile scipion you need to change gcc and g++ version (by default 6 on ubuntu 18) to 5 on your scipion/config/scipion.conf

CC = gcc-5
CXX = g++-5 
  • Using GPUs from Scipion
    As said before, the use of GPUs is restricted to the queue system, so to execute a protocol that requires GPU option 'Use queue' must be selected.
    For Motioncor2, Gctf and Gautomatch Scipion gives the possibility to select the GPU devices to be used. You should select them as follows:
    To select 1 GPU: 0
    To select 2 GPUs: 0 1
    To select 3 GPUs: 0 1 2
    and so forth.
    For Relion protocols the parameters used are on the Additional tab and can be either be left empty (in which case Relion will be calculate the number of GPUs to used based on the MPI / Threads requested) or specify them using the same syntax as explained above.

It is not possible to select the specific GPU devices that you want to use, but it is the queue system who assign your job to an available(s) GPU(s), so if you specify something like 1 4 6 it will fail.

Besides, when the Queue dialog appears, you should select the number of GPUs (which should be consistent with whatever was chosen in the protocol). Take into account that this is the number of GPUs that the queue system will reserve for your job so do not overtake.

If you encounter any problem or need some library to be installed you can contact Laura or Adrian.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.