---
title: "ABC.4: Introduction to bash language for bioinformatics"
author: "Samuele Soraggi, Manuel Peral Vazquez"
image: ./2024-09-03-ABC4/bash.png
date: 2024-09-03
categories: [bash, command line]
description: "Slides and bash intro at the ABC.4"
eval: true
---

# Introduction

The difficulty of learning bash is often underestimated by others, who expect people approaching bioinformatics to learn it automatically. Here we try to put together the first basic concepts and commands.

:::{.callout-note title="Why the bash command line?"}

Using the bash command line becomes quickly essential if you are doing bioinformatics. 

First of all, you might need it to **access a computing cluster** (for example, GenomeDK at Aarhus University), since most clusters runs on a [UNIX-based operating system](https://www.hpc.iastate.edu/guides/unix-introduction), such as Linux, using a bash command line.

Just as important is the fact that on a command line you can very easily **do operations on multiple and very large files**, something you would not be able to do using, for example, `R` or `python`. Large sequences of operations can be automatized into **pipelines** (an advanced topic not for this tutorial).

With a command line you can **run many small programs, compose them together, and organize them in a chain of commands**. This type of program organization fits well with what a bioinformatics project consist of: many tools to be applied repetitevely on multiple large files, and organizing those programs in a specific sequence. An example could be aligning to a reference genome many raw bulk-RNA sequencing files: the alignment operation must be repeated many times, and when files are finished, they might need to be merged if they are from the same sample.

:::

## Some terminology

When using a UNIX operating system (Linux, MacOs), everything on your computer fits one of two categories: **processes and files**. Processes are running instances of a program, and a program is any executable file stored in your computer. A file is any collection of data (program, image, video, audio, ...).

Whenever we write a command on the terminal and press enter, we have a shell taking the code we wrote and sending it to the kernel. The **shell is the outer layer of the operating system**, which facilitates the communication to the kernel. The **kernel is the core of the operating system**, managing the computer physical components (hardware) and interfacing them with the processes that need to run. In general, any program (browser, game, ...) you open or action (moving files, renaming folders, ...) 
 you do on your computer ends up being a process managed by the kernel. This communication process is shown in @fig-shellkernel

![Communication scheme where the outer layer is a bash shell command, which the shell then communicate to the kernel, which in turn manages the hardware resources to make the program actually run. Note that there can be **many languages for the UNIX shell**: bash is the most popular, but others exist and are used (for example *zsh* on MacOs). *Figure credit: InnoKrea.*](./2024-09-03-ABC4/shellkernel.png){label="fig-shellkernel"}

## Efficiency and Speed

We can roughly identify various levels of efficiency, manual work, speed, number and size of handled files when working with a command line, the typical languages like R and python, or bash pipelines:


| Programming mode | Nr of files | File Size | operational speed | Manual work | 
| ---------------- | --------------- | --------------- | ----------- | -------- |
| R, python, ...   | from 1 to 10s   | small | slow | A lot |
| Command line     | from 1 to 100s  | 1-10s GB | fast | Low-moderate |
| Unix Pipeline    | from 1 to many 1000s | many TB | fast | low (writing repeated operations only once) |

You will see in this tutorial how we can handle pretty large text files in a short time. Those files would take long time to read in R and python and the code to modify them would run just as slowly.



# Slides

Our slides introducing the bash command line

&nbsp;

 <p align="center">
  <a href="https://abc.au.dk/documentation/slides/20240903-ABC.4.zip" style="background-color: #4266A1; color: #FFFFFF; padding: 30px 20px; text-decoration: none; border-radius: 5px;">
    Download Slides
  </a>
</p>

&nbsp;










# Slides

Our slides introducing the bash command line

&nbsp;

 <p align="center">
  <a href="https://abc.au.dk/documentation/slides/20240903-ABC.4.zip" style="background-color: #4266A1; color: #FFFFFF; padding: 30px 20px; text-decoration: none; border-radius: 5px;">
    Download Slides
  </a>
</p>

&nbsp;

# Tutorial

Here starts the tutorial. There is only one technical prerequisite, that is, you need a **Terminal**. The terminal is where you can write your commands - which are then **interpreted** and sent to the computer to be executed.

- Mac and Linux computers already have a software called `Terminal` installed (they are both computers with UNIX-based operating systems)

- Windows have a different sort of terminal called Powershell (it is DOS-based and not UNIX-based). Please install `Git Bash`

## Install packages

First of all you need quite some packages for bulkRNA analysis. The following installations will also help in the fiture analysis tutorial where various different plots are explored. Note how you install some packages with `install.packages` (from the R default channel) and with `BiocManager::install` (from the BiocManager channel)."





<form id="quizForm">
    <h4>Behauptung: Das 50%-PI [.6,.8] ist das plausiblere.</h4>
    <label>
      <input type="radio" name="q1" value="Ja"> Ja
    </label>
    <label>
      <input type="radio" name="q1" value="Nein"> Nein
    </label>
    <label>
      <input type="radio" name="q1" value="Keine Antwort möglich"> Keine Antwort möglich
    </label>
    <button type="button" onclick="submitQuiz()">Antworten</button>
</form>

  <script>
  function submitQuiz() {
      var selectedOption = getSelectedOption("q1");
      var correctAnswer = "Ja";

      // Display feedback
      if (selectedOption === correctAnswer) {
        alert("Richtig!");
      } else {
        alert("Falsch. Die richtige Antwort lautet *Ja*.");
      }
    }

    function getSelectedOption(questionName) {
      var radioButtons = document.getElementsByName(questionName);
      for (var i = 0; i < radioButtons.length; i++) {
        if (radioButtons[i].checked) {
          return radioButtons[i].value;
        }
      }
      return null;
    }
 </script>