The Mason-Lab-Code GitHub organization is a centralised store for all of our code.
Scripts are organised into workflows. Each workflow has its own repository.
Repositories contain a README file, detailing the function of each script, all software requirements, links to any reference files needed, a workflow overview diagram, and instructions on how to implement the workflow. (GitHub account with Mason-Lab-Code membership required to view repositories.)
Viking is the University of York's high-performance computing (HPC) cluster. It has enough storage, memory and computing resources to carry out our bioinformatics projects.
Path to Mason Lab group space on Viking: /mnt/scratch/project/biol-cancerinf-2020/
This directory has centralised locations for our raw data, reference data, and code (synced with Mason-Lab-Code GitHub).
biol-cancerinf-2020 directory structure (as of November 2023):
Please follow the information below and the Mason Lab Linux onboarding information page
If you haven't already, complete the Viking user application form to request a Viking account.
Enter the Viking project code: biol-cancerinf-2020
The Viking documentation explains how to access and use Viking.
Make sure you are comfortable using logging into Viking, navgiating around Viking using Unix/Linux commands, submitting batch jobs, and running interactive sessions.
Viking uses the Unix/Linux operating system. If you haven't used Linux before, you can read the Linux shell section of the Viking documentation, and/or follow The Unix Shell tutorial from Software Carpentry, which is designed for beginners.
If your project involves programming in R and Python, and you haven't used these languages before or just need to refresh your skills, you can follow the Programming with R and Programming with Python tutorials from Software Carpentry.
If you are unfamiliar with Git and GitHub, try watching these introductory videos:
You can try following this Introduction to Git workshop, which outlines how to use git on the command line.
If you don't already have a GitHub account, create one. Click Sign Up in the top right corner.
Once you have a GitHub account, ask Richard or Andrew to add you as a member of the Mason-Lab-Code organization. This will give you access to all of our code repositories.
- To create a personal access token, follow the instructions.
- Select "Tokens (classic)", not "Fine-grained tokens".
- When selecting scopes/permissions, tick "repo".
- Make a note of your personal access token.
- Every time Git prompts you for a "password" on the command line, enter your personal access token (not the password for your GitHub account).
Project directories (inside biol-cancerinf-2020/Projects/)
- YYYYMMDD-INITIALS-Project_name
- e.g. 20231006-RG-ATACseq_Bladder_vs_Ureter
Raw data directories (inside biol-cancerinf-2020/Raw-Data/)
- YYYYMMDD-Dataset_name
- e.g. 20230907-RNAseq_BK21dpi
Scripts
- Workflow_name_0N_<script_function/programme_command>.ext
- e.g. RNAseq_DE_04_kallisto_index.sh / RNAseq_28a_generate_log2fc_qval_table_donor-matched.R
- Number the script to denote at what stage in the workflow it is used (if applicable).
- Use letters a, b, c etc. for slightly different versions of the same script e.g. RNAseq_28a_generate_log2fc_qval_table_donor-matched.R and RNAseq_28b_generate_log2fc_qval_table_unmatched.R
Create a new directory to complete your project. Name it using the naming system above (YYYYMMDD-INITIALS-Project_name).
- Synchronise your work across Viking and a personal workstation
- Keep your work under version control
- Backup your work to a remote repository
- Easily share code and results on GitHub
git init
The .gitignore file is important to specify the files and directories that you don't want git to track. This will include large files that we do not want to backup to GitHub.
The gitignore template (path/) lists lots of common extensions for large files (/fq.gz / .fastq.gz / .bam etc.)
Copy the contents of the gitignore template to the .gitignore file in your git-initiated directory.
cp /mnt/scratch/projects/biol-cancerinf-2020/Mason-Lab-Code/gitignore.template /mnt/scratch/projects/biol-cancerinf-2020/Projects/<YYYYMMDD-INITIALS-Project_name>/.gitignore
Add any other expressions to match files and directories specific to your project that you don't want git to track.
Write and run code, create new files and directories, move files around etc.
Add contents of the directory to be tracked by git.
git add .
Check the status of the git tracking.
git status
Commit changes
git commit -m "Commit message - what changes have been made?"
Set up SSH Key (for authentication between Viking and GitHub)
On Viking, print the content of your public key using the command below.
cat ~/.ssh/id_alcescluster.pub
On GitHub, go to Settings > SSH and GPG keys > New SSK key and paste the content above into the Key box.
Create an empty remote repository on GitHub - go to the Repositories tab of your GitHub profile, and click New.
Link your local repository with the new remote repository.
git remote add origin git@github.com:<username>/new-remote-repo.git
Push contents of local repository to GitHub. You will be prompted to enter your username and personal access token on the command line.
git push -u origin master # or: git push -u origin main
Now, any further committed changes made to the local repository, can be pushed to the remote repository on GitHub.
git push
And any committed changes made to the remote repository on GitHub, can be pulled to the local repository.
git pull
- Now that this repository is linked to a remote repository on GitHub, you can synchronise your work across Viking and personal workstations by cloning the repository (git clone ) and then pushing and pulling changes on the different systems.
- This might be useful if you are running computationally demanding steps of your workflow on Viking, and more personalised analyses on a local workstation, and you want to keep all of your work neatly in one directory.
Create symlinks to raw files:
mkdir /mnt/scratch/projects/biol-cancerinf-2020/Projects/<YYYYMMDD-INITIALS-Project_name>/00_raw/
ln -s /mnt/scratch/projects/biol-cancerinf-2020/Raw-Data/<YYYYMMDD-Dataset_name>/raw-file.fastq.gz /mnt/scratch/projects/biol-cancerinf-2020/Projects/<YYYYMMDD-INITIALS-Project_name>/00_raw/raw-file.fastq.gz
- Use symlinks to raw files as input, as opposed to the original raw files.
- Any reference files required (e.g. reference genome FASTA) will be in /mnt/scratch/projects/biol-cancerinf-2020/Reference-Data/
- The genome index files have already been created for some common aligners.
- If writing any custom bash scripts for batch submissions on Viking, you can use the sbatch script template (/mnt/scratch/projects/biol-cancerinf-2020/Mason-Lab-Code/sbatch-script-template.sh). Remember to name the script according to the naming system above. If the script would be a useful addition to a workflow, we can add it to the appropriate Mason-Lab-Code repository.
- Write a README.md file to accompany your project and add to your project directory.
- GitHub README.md template: /mnt/scratch/projects/biol-cancerinf-2020/Mason-Lab-Code/GitHub-README-template.md
- Make final commit and push changes to GitHub.
- Keep a lab book to document all of your work. Include code snippets, plots, and exact versions of any software used. Here is a lab book template.
- Stick to the file/directory naming system for project directories and scripts. For other files/subdirectories inside your project directory, use your own naming system and keep it as consistent as possible.
- Store intermediate files in chronologically numbered directories, e.g. 00_raw-links/, 01_quality-control/, 02_fastq-trimmed/, 03_bam/, 99_logs etc.
- Keep log files from batch submissions.