# <i class="fa fa-laptop"></i> Hands-on training 1: Using Linux in Bioinformatics

<div style="background-color: #86CBBB; 1px; height:3px " ></div>

## 1. Introduction

We are about to start a journey into real Bioinformatics. This series of **four hands-on training activities** will show you the basic workflows in bioinformatics: from the **data management and processing** with Linux, **visualization**, and posterior **functional analyses**. The practicals are divided into two big parts: **Part I: Basics on Bioinformatics workflows: Transforming data into insight** and **Part II: Solving real cases in genomics**.

<center>
    <img src="https://raw.githubusercontent.com/marta-coronado/TAB-data-figs/refs/heads/main/logoTAB2019_10_27D8_47_29.png" width=55%>
</center>

These practicals pretend also to gain other skills, very valuable in research but rarely experienced during the Degree, such as **collaborating** and **doing reproducible research**.

### 1.1 Part I: Basics on Bioinformatics workflows

The first part, **Basics on Bioinformatics workflows: Transforming data into insight**, consist on two practicals:

In this first practical (P1), **Using Linux in Bioinformatics**, you will learn the basic commands of Linux. Getting familiar with this commands is key in bioinformatics, as many pipelines rely on the combination of different tools through a terminal. In addition, biological data are stored in large text files that can be easily processed using Linux. In the second practical (P2), **Data exploration and visualization**, we will transform biological data into knowledge graphs.

#### P1 Learning outcomes

* Get familiar with the Unix directories
* Learn basic text-processing Linux commands and piping
* Use the Linux command-line to perform tasks on the data

#### P2 Learning outcomes

* Learn the grammar of graphics of `ggplot2`
* Create the most common bioinformatics graphs (scatterplots, lineplots, barplots, ...)
* Understand the important elements of a ready-to-publish figure

### 1.2 Practicals organization

#### 1.2.1 Jupyter Notebook

In these practicals, we are going to use **Jupyter Notebook** dashboard. A Jupyter Notebook is an interactive work environment that allows you to develop code in Python (and not only Python) dynamically, integrating blocks of code, text, graphics and images in the same document. It emerged in 2014 to offer the scientific community a very powerful set of tools to work with data, visualize it and be able to share the results with the community.

<center>
    <img src="https://raw.githubusercontent.com/marta-coronado/TAB-data-figs/refs/heads/main/jupyterpreview2019_10_27D10_6_30.png
" width=45%>
</center>

This **new work philosophy** contrasts a lot with the idea that we usually have what programming and writing code is. This way of programming, called **literary programming**, emphasizes the purpose of writing comfortable text to read and understand, separated by blocks of code, combining text, equations and figures, and allowing sharing your research and results very easily.

<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-book"></i> Read the following article published in Nature, where the benefits of using Jupyter Notebook for scientific research are emphasized: <b>Shen, H. (2014) Interactive notebooks: Sharing the code. <i>Nature</i> 2014 515:151-152</b>.
</div>

<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> Go to section <b>2. Tools installation</b> for instructions on how to install Jupyter Notebook in your computer.
</div>

#### 1.2.2 Writing a report

Finally, to **write our scientific reports**, we recommend using an online $\LaTeX$ editor that's easy to use: [**Overleaf**](https://www.overleaf.com). We have created a ready-to-use template available [here](https://www.overleaf.com/read/vchzpswtycyg) so you can write your results.
 
### 1.3. Jupyter Notebook menu

All navigation and actions in the Jupyter Notebook are available using the mouse through the toolbar:

&emsp;<i class="fa fa-save"></i>save changes and creates a checkpoint  
&emsp;<i class="fa fa-plus"></i> insert a cell below a selected cell  
&emsp;<i class="fa fa-scissors"></i> cut selected cell(s)  
&emsp;<i class="fa fa-copy"></i> copy selected cell(s)  
&emsp;<i class="fa fa-paste"></i> paste cell(s) below  
&emsp;<i class="fa fa-arrow-up"></i> move selected cell(s) up  
&emsp;<i class="fa fa-arrow-down"></i> move selected cell(s) down  
&emsp;<i class="fa fa-step-forward"></i> run a cell  
&emsp;<i class="fa fa-stop"></i> interrupt the kernel  
&emsp;<i class="fa fa-repeat"></i> restart the kernel  
&emsp;<i class="fa fa-forward"></i> restart the kernel, then re-run the whole notebook
 
##### Shortcuts

We recommend learning the command-mode shortcuts:

&emsp;**Basic navigation**: enter, shift-enter, up/k, down/j  
&emsp;**Saving the notebook**: s  
&emsp;**Change cell types**: c, m  
&emsp;**Cell creation**: a, b  
&emsp;**Cell editing**: x, c, v, d, z  

<div style="background-color: #86CBBB; 1px; height:3px " ></div>

# 2. Tools installation 
<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> We strongly recommend using a <b>Linux</b> operating system and get used to work with the <b>terminal</b>.
</div>
<br>

If you're using **Windows 10**, a good alternative is to install **Ubuntu 22.04** on it. The application is available in the [Microsoft Store](https://www.microsoft.com/store/productId/9PN20MSR04DW). Simply click on the *Install* button, and it will be downloaded and installed automatically. When launched for the first time, Ubuntu will inform you that it's *Installing* and you'll need to wait a few moments. When complete, you'll be asked for a username and password specific to your Ubuntu installation. With this step complete, you'll find yourself at the Ubuntu bash command line.

If you're using **MacOS**, you won't need to install Linux, as Apple has been using Unix as the underlying operating system on all of their computers since 2001. However, take into account that the installation of tools is different and may require additional steps that go beyond the scope of this practicals.

## 2.1 Installing `Jupyter Notebook` with `pip`

`pip` is a standard package-management system used to install and manage software packages written in Python.

1. Install pip through the terminal window (if not already installed):

<code style="background-color:#222D32; color:#FFF">apt install python-pip   #python 2
</code>
<code style="background-color:#222D32; color:#FFF">apt install python3-pip  #python 3
</code>

2. Install Jupyter Lab (which includes Jupyter Notebook):

<code style="background-color:#222D32;color:#FFF">pip install jupyterlab</code><br>

## 2.2 Installing `Jupyter Notebook` with `conda`

Conda is an open-source,cross-platform, language-agnostic package manager and environment management system. It was originally developed to solve difficult package management challenges faced by Python data scientists, and today is a popular package manager for Python and R.

1. Download the installer:

    1. [Miniconda installer for Linux](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh) (Linux 64 bits)
    2. [Anaconda installer for Linux](https://repo.anaconda.com/archive/Anaconda3-2025.06-1-Linux-x86_64.sh) (Linux 64 bits)

Both Anaconda and Miniconda uses `conda` as the package manager. The difference is that Miniconda only has the package management system, while Anaconda also comes with a bundle of pre-installed packages (e.g. `jupyterlab`, which contains Jupyter Notebook).

2. In your terminal window, run:

    1. Miniconda: <code style="background-color:#222D32; color:#FFF">bash Miniconda3-latest-Linux-x86_64.sh</code>
    2. Anaconda: <code style="background-color:#222D32; color:#FFF">bash Anaconda3-2025.06-1-Linux-x86_64.sh
</code>

3. Follow the prompts on the installer screens.
4. If you are unsure about any setting, accept the defaults. You can change them later.
5. To make the changes take effect, close and then re-open your terminal window.

You'll know that `conda` is installed because your command-line will be preceded with `(base)` to denote you are in your `base` conda environment.

6. Install Jupyter Lab (which includes Jupyter Notebook) in the `base` environment if it is not installed by default.

<code style="background-color:#222D32;color:#FFF">conda install -c conda-forge jupyterlab</code><br>

<div style="background-color: #86CBBB; 1px; height:3px " ></div>

# 3. Quick introduction to Linux

Linux is an open source operating system, similar to Microsoft Windows or Apple Mac OS. The Unix operating system was created in 1969, when there was no such thing as a graphical user interface. Instructions sent to the computer were typed into the screen instead of using a mouse.

In Linux, **folder navigation** and **program executions** are performed in the terminal by typing commands. The terminal allows you to type input to the computer (i.e. run programs, move/view files etc.) and to see output from those programs. You can open a terminal using the `Ctrl` + `Alt` + `T` shortcut, or looking for the `Konsole` or `Terminal` application from the installed programs list in your computer.

You can also use Linux commands inside the Jupyter Notebook by specifying `%%bash` at the top of the cell. If nothing is specified, then the commands are expected to be in the Python programming language.<br>

This table exemplifies the usage the most common commands used in Linux that we will use during this practical:

| command          | function                           |
|--------------|-------------------------------------------------------|
| `pwd`          | print the current working directory                               |
| `cd <dir>`     | change to  directory "dir"                         |
| `ls`           | list files and directories of current directoy        |
| `touch <name>`          | create empty file |
| `echo <something>`       | print something (to a file if it is specified) |
| `grep` |  searches the given file for lines containing a match to the given strings or words |
| `cut` |  cutting out the sections from each line of files |
| `cat`          | print file content |
| `head`          | print first lines of a file |
| `tail`          | print last lines of a file |
| `less` and `more` | print file content (large files)|
| `sort`          | order file content |
| `uniq`          | delete or show repeated lines |
| `tr`          | replace a character with another |
| `fold`          | wrap line to a specific width |
| `\| wc -l` | print the line count  |  
| `mkdir <name>` | create a new folder with name "name" |
| `wget <url>` | download the contents from a link in the current directory |
| `cat filetmp \| cut -f1 > file` | redirect the output of a command to a file |
| `gunzip file.gz` | uncompress a gz file |


<i class="fa fa-search"></i> Example of the `ls` command runing from the terminal:<br><br>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Documents/Project</span></b>$ ls</code>
</div><br>

<i class="fa fa-search"></i> Example of the `ls` command runing from the Jupyer Notebook:<br><br>

In [1]:
%%bash

ls

P1.ipynb
P1_original.ipynb
P1_solutions.ipynb
PFAM_seqs.zip
pfam
seqs


<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> The way to get more comfortable with the command-line is: <b>PRACTICE</b>. And keep the <a href="https://files.fosswire.com/2007/08/fwunixref.pdf" target="_blank">cheat sheet</a> close to you.
</div>

<div style="background-color: #86CBBB; 1px; height:3px " ></div>

# 4. The Unix Tree

<center>
        <img src="https://raw.githubusercontent.com/marta-coronado/TAB-data-figs/refs/heads/main/linux-cmd-directory-1.png" width=50%>
</center>

Looking at directories from within a Unix terminal can often seem confusing. But bear in mind that these directories are exactly the same type of folders that you can see if you use Apple’s or Windows' graphical file-management programs. A tree analogy is often used when describing computer filesystems. From the root level (/) there can be one or more top level directories. In the example above, we show just three. When you log in to a computer you are working with your files in your home directory, and this will nearly always be inside a ‘Users’ directory. On many computers there will be multiple users.

<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> Everything is a file in Linux, including the keyboard, mouse, etc.
</div>

###  <i class="fa fa-cogs"></i> Navigating your filesystem

Where are you? The first step to navigate the Linux filesystem is to know where you are. The command `pwd` prints the current directory into the screen.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Documents/Project</span></b>$ pwd
    /home/1244149/Documents/Project</code>
</div><br>

When you log in to a Unix computer, you are typically placed into your home directory. In this example, we are not in our home directory (<code>/home/1244149/</code>) but in another subdirectory. The first forward slash that appears in a list of directory names always refers to the top level directory of the file system (known as the root directory). The remaining forward slashes delimit the various parts of the directory hierarchy. If you ever get ‘lost’ in Unix, remember the <code>pwd</code> command.

As you learn Unix you will frequently type commands that don’t seem to work. Most of the time this will be because you are in the wrong directory, so it’s a really good habit to get used to running the <code>pwd</code> command a lot.

<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> A tilde character (~) is used as a short-hand way of specifying a home directory.
</div>
<br>

To change directories, you type the command `cd` (change directory) followed by the path to the directory.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Documents/Project</span></b>$ cd /home/1244149/Desktop
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
<code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$
</div><br>

<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> Just typing <code>cd</code> takes you to your home directory.
</div>
<br>

Note that the above path starts with / (root). This kind of paths are known as **absolute paths**. There is another type of path, called **relative paths**, in which you write the path in relation to your current directory. These paths do not start with /.

To see the files contained within your current directory, you use `ls`. `ls` shows a lists of subdirectories and files. It is useful if you don't know the content of a directory.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ ls
    <span style="color:#907AD6">Data</span> <span style="color:#FFF">PHB.pdf</span> <span style="color:#DB3A34">file.fasta.gz</span>
</code>
</div>

Directories, files and compressed files are shown with different colours. In this example, **<span style="color:#907AD6">Data</span>** is a directory, **PHB.pdf** is a file, and **<span style="color:#DB3A34">file.fasta.gz</span>** is a compressed file.

 <i class="fa fa-question-circle"></i> **How would you move to the <span style="color:#907AD6">Data</span> subdirectory? (Consider you are in the Desktop folder, following the previous code)**


In [None]:
%%bash

# Write here the command with the absolute and relative path

Pondría cd Data

**What if you wanted to go to an upper directory without having to write the full path?** Two dots (`..`) are used in Unix to refer to the parent directory (the previous one). Every directory, except the root, has a parent directory.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/Data</span></b>$ cd ..
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$
</code>
</div>

You can even combine more than one `..` (separated by /). Note that in these examples we are using a relative path (it does not start with /).

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/Data/HP</span></b>$ cd ../..
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$
</code>
</div>

<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> If you type more than one / by mistake, Linux is still able to recognise the path.
</div>
<br>

Current directory can be expressed in a path as a `.` or be omitted.

<i class="fa fa-question-circle"></i> **Let's practice absolute and relative paths! Change the following paths from absolute to relative assuming that you are currently located at `/home/apeiron`.** For all paths, consider that you always start in `/home/apeiron`.

In [1]:
%%bash

# /home/apeiron/dodos 
cd dodos

# /home
cd

# /home/apeiron/dodos/alvilda
cd dodos
cd alvilda

# /tmp
cd ..
cd tmp

# /home/pablo/corona
cd ..
cd pablo
cd corona


<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> <b>Tip</b>: when writing a directory or file that exits, you can use the autocomplete the name by pressing the <code>TAB</code> key.
</div>

###  <i class="fa fa-cogs"></i> Create a working directory

When starting any bioinformatic pipeline, usualy the first step to do is to create the folder where we'll place all our data and scripts. This is called the _working directory_.

Directories are created in Linux with the command `mkdir`.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ mkdir TAB
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ ls
    <span style="color:#907AD6">Data</span> <span style="color:#FFF">PHB.pdf</span> <span style="color:#DB3A34">file.fasta.gz</span> <span style="color:#907AD6">TAB</span>
</code>
</div>

<br>
<div style="background-color:#ffddad;">  
<i class="fa fa-info-circle"></i> You can create a directory wherever you want if you specify the whole path to it.
</div>

You can also create a directory and a subdirectory at the same time by typing `mkdir -p`.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ mkdir -p DND/DM
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ ls
    <span style="color:#907AD6">Data</span> <span style="color:#907AD6">DND</span> <span style="color:#FFF">PHB.pdf</span> <span style="color:#DB3A34">file.fasta.gz</span> <span style="color:#907AD6">TAB</span>
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ cd DND
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#907AD6">DM</span>
</code>
</div>

If you want to remove an (empty) directory, you can use `rmdir`.

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ rmdir TAB
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop</span></b>$ ls
    <span style="color:#907AD6">Data</span> <span style="color:#907AD6">DND</span> <span style="color:#FFF">PHB.pdf</span> <span style="color:#DB3A34">file.fasta.gz</span>
</code>
</div>

<br> 
<div style="background-color:#ffddad;">  
<i class="fa fa-info-circle"></i> If a directory is not empty, you can force its removal using <code>rm -r</code>. But take into account that you <b>cannot undo it</b>. This is considered the <b>MOST DANGEROUS LINUX COMMAND YOU WILL EVER LEARN!</code></div>

 <i class="fa fa-question-circle"></i> **Create a directory called `P1_TAB` in your home folder to store all the files that we will create in the next section.**

In [2]:
%%bash 

# Write here the command to create that directory

mkdir P1_TAB


<div style="background-color: #86CBBB; 1px; height:3px " ></div>

# 5. Working with files in Linux

###  <i class="fa fa-cogs"></i> Creating files

We can create an empty file using the command `touch`. Simply type that command followed by the path were you want to create a file.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#907AD6">DM</span>
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ touch races.txt
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#907AD6">DM</span> <span style="color:#FFF">races.txt</span>
</code>
</div>

You can also create files with content using `echo`.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ echo "Artificer" > classes.txt
</code>
</div>

<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> If you use <code>></code> in a file that already exists, its content will be overwritten.
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#FFF">classes.txt</span> <span style="color:#907AD6">DM</span> <span style="color:#FFF">races.txt</span>
</code>
</div>

For adding new lines to a file that already exists, use `>>` instead of `>`.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ echo "Bard" >> classes.txt
</code>
</div>

 <i class="fa fa-question-circle"></i> **Create an empty file called `animals.txt` in the `P1_TAB` directory that you just created. Then, use `echo` to add some animals to that file**.

In [3]:
%%bash

# Write here the commands use to create those files

#suponiendo que estamos dentro de P1_TAB:
touch animals.txt
echo "lion" > animals.txt
echo "dog" >> animals.txt

echo -e "cat\nbird\nhorse" >> animals.txt

###  <i class="fa fa-cogs"></i> Moving and copying files

Sometimes, you may have created a file in the wrong folder. What do you do now? Should you delete it using `rm` and create it again in the correct path? This may be a solution if your file is empty or if it is easy to create again. But imagine that you have spent 5 hours writing the introduction of your TFG, you save it in the wrong folder and you don't have a graphical interface. The solution is simple: moving the file. The command **`mv`** allows you to move a file from one path to another (the filename must be included in the path).

Let's see an example. Imagine you want to move the `races.txt` file into the `DM` folder.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#FFF">classes.txt</span> <span style="color:#907AD6">DM</span> <span style="color:#FFF">races.txt</span>
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ mv races.txt DM/races.txt
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#FFF">classes.txt</span> <span style="color:#907AD6">DM</span>
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cd DM
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND/DM</span></b>$ ls
    <span style="color:#FFF">races.txt</span>
</code>
</div>

In this case we have used the relative paths, but we could have used the absolute paths to any other directory of our computer.

Let's see another use of `mv`. For that we will create a new file called `spells.txt`.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ touch spels.txt
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#FFF">classes.txt</span> <span style="color:#907AD6">DM</span> <span style="color:#FFF">spels.txt</span>
</code>
</div>

Wait a minute... We wrote it wrong! Let's fix it using `mv`. For that you just have to move your file to the same directory but with a new name.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ mv spels.txt spells.txt
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#FFF">classes.txt</span> <span style="color:#907AD6">DM</span> <span style="color:#FFF">spells.txt</span>
</code>
</div>

Now it's perfect.

Suppose now that you want to copy your file `spells.txt` into the `DM` folder. We will use `cp` for that. It works exactly as `mv` but the result will be that your original file will still be in place.

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cp spells.txt DM/spells.txt
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ ls
    <span style="color:#FFF">classes.txt</span> <span style="color:#907AD6">DM</span> <span style="color:#FFF">spells.txt</span>
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cd DM
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND/DM</span></b>$ ls
    <span style="color:#FFF">spells.txt</span> <span style="color:#FFF">races.txt</span>
</code>
</div>

 <i class="fa fa-question-circle"></i> **Change directory to your `home` folder. Create a new file in called `colours.txt` and add using `echo` 5 lines. Then, copy that file into the `P1_TAB` directory and remove the original file. After that, change the name of `colours.txt` to `colors.txt`.**

In [1]:
%%bash

# Write the commands you used here

cd
#esto es para llegar a la home, si seguimos como antes y estamos en /home/apeiron

touch colours.txt
echo -e "First line\nSecond line\nThird line\nForth line\nFifth line" > colours.txt

bash: línia 13: You: no s'ha trobat l'ordre
bash: línia 15: One: no s'ha trobat l'ordre


CalledProcessError: Command 'b'\n# Write the commands you used here\n\ncd\n#esto es para llegar a la home, si seguimos como antes y estamos en /home/apeiron\n\ntouch colours.txt\necho -e "First line\\nSecond line\\nThird line\\nForth line\\nFifth line" > colours.txt\n\n\n###  <i class="fa fa-cogs"></i> Reading file contents\n\nYou have already learned how to create files, and how to move them from one folder to another. But how can you read the contents of a file using the command line?\n\nOne of the most used is **`cat`**. `cat` prints the **whole** contents of a file into the terminal. To use it, just type `cat` followed by the path to your file.\n\n<br>\n<div style="background-color:#222D32; color:#FFF">\n    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cat classes.txt\n    <span style="color:#FFF">Artificer\n    Barbarian\n    Bard\n    Cleric\n    Druid\n    Fighter\n    Monk\n    Paladin\n    Ranger\n    Rogue\n    Sorcerer\n    Warlock\n    Wizard</span>\n</code>\n</div>\n'' returned non-zero exit status 127.

In [2]:
###  <i class="fa fa-cogs"></i> Reading file contents

You have already learned how to create files, and how to move them from one folder to another. But how can you read the contents of a file using the command line?

One of the most used is **`cat`**. `cat` prints the **whole** contents of a file into the terminal. To use it, just type `cat` followed by the path to your file.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cat classes.txt
    <span style="color:#FFF">Artificer
    Barbarian
    Bard
    Cleric
    Druid
    Fighter
    Monk
    Paladin
    Ranger
    Rogue
    Sorcerer
    Warlock
    Wizard</span>
</code>
</div>

SyntaxError: invalid syntax (1929959619.py, line 5)

Many times, however, you may find yourself with a very long file. Printing all its contents when you only want to see the first lines is not helpful, as you will have to scroll up a lot until you find the beginning (if it is really long you may not even find the beginnint).

Luckily, there are other commands that allow us to partially see a file. The first one is **`head`**. This command prints, by default, the first 10 lines of a file.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ head classes.txt
    <span style="color:#FFF">Artificer
    Barbarian
    Bard
    Cleric
    Druid
    Fighter
    Monk
    Paladin
    Ranger
    Rogue</span>
</code>
</div>

You can specify the number of lines to be shown with `head -n` followed by the number of lines.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ head -n 2 classes.txt
    <span style="color:#FFF">Artificer
    Barbarian</span>
</code>
</div>

You can even use `head` to show all lines except the last X ones by adding a `-` before the number.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ head -n -2 classes.txt
    <span style="color:#FFF">Artificer
    Barbarian
    Bard
    Cleric
    Druid
    Fighter
    Monk
    Paladin
    Ranger
    Rogue
    Sorcerer</span>
</code>
</div>

The opposite to `head` is **`tail`**. It shows, by default, the last 10 lines of a file.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ tail classes.txt
    <span style="color:#FFF">Cleric
    Druid
    Fighter
    Monk
    Paladin
    Ranger
    Rogue
    Sorcerer
    Warlock
    Wizard</span>
</code>
</div>

You can also specify the number of lines using `tail -n`.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ tail -n 3 classes.txt
    <span style="color:#FFF">Sorcerer
    Warlock
    Wizard</span>
</code>
</div>

And you can show all lines except the first X ones by adding a `+` before the number. In this case, the number refers to the line where you start.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ tail -n +3 classes.txt
    <span style="color:#FFF">Bard
    Cleric
    Druid
    Fighter
    Monk
    Paladin
    Ranger
    Rogue
    Sorcerer
    Warlock
    Wizard</span>
</code>
</div>

<center>
        <img src="https://i.pinimg.com/originals/f9/d9/87/f9d98745802f8abe704e8b739be0dc21.jpg" width=50%>
</center>

There are two more commands (pun not intended) to see the contents of a file: **`more`** and **`less`**. These two commands allow you to see the contents one "page" at a time from the beginning to the end. The difference between them is that `less` is faster than `more` because it does not load the whole file and it allows to go back and forward.

You can try these commands in the part 7 of this practical.

###  <i class="fa fa-cogs"></i> Getting information from a file

Sometimes, you need to find a specific line in your file, or count words, or obtain any other information from your file. This can be done automatically using some commands, avoiding the tedious task of opening the file and doing it yourself (which is almost imposible with files like the ones in FASTA format).

The first command you will learn is **`grep`**. `grep` filters lines that contain (or not) a given string of characters. It accepts regular expressions and is case sensitive.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ grep "Fire" spells.txt
    Delayed Blast <span style="color:#DB3A34">Fire</span>ball    7    Evocation    1 action    Yes
    Faerie <span style="color:#DB3A34">Fire</span>               1    Evocation    1 action    Yes
    <span style="color:#DB3A34">Fire</span> Bolt                 0    Evocation    1 action    No
    <span style="color:#DB3A34">Fire</span> Shield               4    Evocation    1 action    No
    <span style="color:#DB3A34">Fire</span> Storm                7    Evocation    1 action    No
    <span style="color:#DB3A34">Fire</span>ball                  3    Evocation    1 action    No
    Wall of <span style="color:#DB3A34">Fire</span>              4    Evocation    1 action    Yes
</code>
</div>

There are additional options to use with `grep` that will allow you to make more sophisticated filters.

<br>
<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> You can see all the options available for any command by typing <code><b>man</b></code> followed by the command name.
</div>

For instance, you can use **`grep -i`** to **ignore case**, **`grep -v`** to filter lines that **do not** contain the pattern, **`grep -E`** to filter **more than one pattern** (different patterns are specified using `|`), or **`grep -n`** to **print line numbers**.

A couple of useful options are **`grep -c`** and **`grep -cw`**, which count the number of lines that contain the pattern. Let's see the difference with an example:

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ grep -c "Fire" spells.txt
    <span style="color:#FFF">7
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ grep -cw "Fire" spells.txt
    <span style="color:#FFF">5
</code>
</div>

`grep -cw` counts less lines than `grep -c` because the `w` option refers to "word". `grep -cw` counts the lines that contain the word *Fire*, while `grep -c` counts also the lines with words that contain the pattern *Fire*. You can check the count by checking the output of the first `grep` example.

Speaking of counting words, the command **`wc`** is used to count the number of **lines**, **words**, and **characters** (respectively) of a file.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ wc classes.txt
    <span style="color:#FFF">13 13 96 classes.txt</span>
</code>
</div>

The options `wc -l`, `wc -w` and `wc -c` only print the number of lines, words and characters, respectively.

Sometimes, like in the `spells.txt` file above, information is stored in fields separated by `TAB`s, dots, spaces or any other character. You may want to obtain the content of a specific field. In this case, you should use the command **`cut`**. This command extracts the field given to the **`-f`** option. Let's see an example:

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cut -f1 spells.txt
    Abi-Dalzim's Horrid Wilting
    Absorb Elements
    Acid Splash
    Aganazzar's Scorcher
    Aid
    Alarm (Ritual)
    [...]
</code>
</div>

The above code extracts the first field (`f1`) separated by `TAB`. By default, `cut` expects TAB-separated values. If you have fields separated by any other character, you must specify it with the option `-d"separator"`.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cut -f4 -d" " spells.txt </code>
</div>

You can also select more than one field with a comma. For example, here, fields 1 and 3 are selected

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cut -f1,3 spells.txt
    Abi-Dalzim's Horrid Wilting    Necromancy
    Absorb Elements                Abjuration
    Acid Splash                    Conjuration
    Aganazzar's Scorcher           Evocation
    Aid                            Abjuration
    Alarm (Ritual)                 Abjuration
    [...]
</code>
</div>

When selecting a range of non-alternating fields, you use a hyphen.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/DND</span></b>$ cut -f1-3 spells.txt
    Abi-Dalzim's Horrid Wilting    8    Necromancy
    Absorb Elements                1    Abjuration
    Acid Splash                    0    Conjuration
    Aganazzar's Scorcher           2    Evocation
    Aid                            2    Abjuration
    Alarm (Ritual)                 1    Abjuration
    [...]
</code>
</div>

In the above example, fields from 1 to 3 have been selected. If you need files from X field to the last, do not write a second number after the hyphen.

The next command, **`uniq`**, is very useful for files with repeated lines. You can remove all the repetitions but one (`uniq`), obtain ONLY the lines that are not repeated (`uniq -u`), the lines that are repeated (`uniq -d`) and even count the number of times each line is repeated (`uniq -c`).

###  <i class="fa fa-cogs"></i> Modifying the contents of a file

It is likely that you do not want to obtain certain information but modify something in the file. In the next section you will learn how to save the outputs into files. For now, you'll see some commands that could be used to alter the contents of a file.

In python, the method `translate` allows you to substitute a character by another one. The command **`tr`** does the same in Linux. Imagine that you want to transcribe the following DNA sequence into RNA.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat dna.txt
CACGGAGTTGTTTAGTTGTAATTATTGTACGCATAAGGATTGGTATCGTTGGGGGGATAATAAGCA
</code>
</div>

You just have to pass the DNA sequence to `tr` using `|` and change all Ts into Us.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat dna.txt | tr 'T' 'U'
CACGGAGUUGUUUAGUUGUAAUUAUUGUACGCAUAAGGAUUGGUAUCGUUGGGGGGAUAAUAAGCA
</code>
</div>

<div style="background-color:#ffddad;">  
    <i class="fa fa-info-circle"></i> <code>tr</code> does not replace whole patterns. It replaces the first character of the first string to the corresponding character in the second string.
</div>

The `tr` command can also be used to squeeze letters that are repeated in tandem using **`tr -s`** or to delete a given character from the string using **`tr -d`** (followed in both cases with the character to squeeze/delete).

Another useful command is **`sort`**, which, as the name states, orders the lines of a file. By default, lines are order lexicographically (**including numbers**). You can sort in **reverse order** using **`sort -r`**, or sort **by field** using **`sort -k`** (followed by the field number). Numerical sorting can also be specified using **`sort -n`**.

**`fold`** is a command that may not seem useful at first, but that can be used to obtain information in a very elegant way. This command takes a file and folds the line to fit a specific number of characters (specified using **`-w`**). For examle:

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ fold -w 3 dna.txt
CAC
GGA
GTT
GTT
TAG
TTG
TAA
TTA
TTG
TAC
GCA
TAA
GGA
TTG
GTA
TCG
TTG
GGG
GGA
TAA
TAA
GCA
</code>
</div>

Finally, there are 3 commands that can be used to combine information from 2 files: `paste`, `join` and `cat`.

The difference among them is the way they combine files, **`paste`** combines files **horizontally**, **`join`** **by fields** (first field by default) and **`cat`** **vertically**.

<div style="background-color: #86CBBB; 1px; height:3px " ></div>

# 6. Pipes and redirections


###  <i class="fa fa-cogs"></i> Pipes

Pipelines are created by combining multiple instructions in a row, so that the output of one is the input for the next. In Linux, pipes (**`|`**) work the same way. There was an example above of a pipe, when you learned `tr`. In that case, the output of the `cat` command was sent to `tr` as an input.

<br>
<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat dna.txt | tr 'T' 'U'
CACGGAGUUGUUUAGUUGUAAUUAUUGUACGCAUAAGGAUUGGUAUCGUUGGGGGGAUAAUAAGCA
</code>
</div>

You can pipe more than two commands at the same time, creating what are called **one-liners**. The following one-liner converts a DNA sequence into RNA, separates each codon in a different line, and then filters only those lines that contain the codon "UAA".

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat dna.txt | tr 'T' 'U' | fold -w 3 | grep "UAA"
UAA
UAA
UAA
UAA
</code>
</div>

Pipes are very useful for combining many commands without having to create intermediate files that you will have to remove afterwards.

 <i class="fa fa-question-circle"></i> **Combine the commands `cat`, `head` and/or `tail` to print ONLY the 3rd line of the `colors.txt` file that you have created in the `P1_TAB` directory.**

In [2]:
%%bash

# Write here the commands used

cat colors.txt |head -n-2 | tail -n-1 
#como son 5 lineas, con este comando sirve


###  <i class="fa fa-cogs"></i> Redirections

Until now, all the outputs shown in the examples are displayed in the terminal but not saved in any file. Remember the `>` and `>>` symbols from the above example of the `echo` command? Probably not, but those symbols is used as output redirectors.

There are 3 **input/output redirectors**: **`<`**, **`>`**, and **`>>`**.

1. **`<`(input file)**: you pass a file as an input to a command.

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat < dna.txt
CACGGAGTTGTTTAGTTGTAATTATTGTACGCATAAGGATTGGTATCGTTGGGGGGATAATAAGCA
</code>
</div>

2. **`>`(output file)**: you save the output of a command into a file, instead of showing it in the terminal. If the file already exists, its content is overwritten.

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat dna.txt | tr 'T' 'U' > rna.txt
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat rna.txt
CACGGAGUUGUUUAGUUGUAAUUAUUGUACGCAUAAGGAUUGGUAUCGUUGGGGGGAUAAUAAGCA
</code>
</div>


3. **`>>`(append file)**: you append the output of a command to a file. If the  file does not exist, it is created.

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ echo "CACGCATCCTCCTTACGCTAGTCTTCTGGATCC" >> dna.txt
</code>
</div>

<div style="background-color:#222D32; color:#FFF">
    <code style="background-color:#222D32; color:#FFF">&emsp;(TAB)<b><span style="color:#86CBBB">1244149@MRB014</span>:<span style="color:#ffddad">~/Desktop/TAB</span></b>$ cat < dna.txt
CACGGAGTTGTTTAGTTGTAATTATTGTACGCATAAGGATTGGTATCGTTGGGGGGATAATAAGCA
CACGCATCCTCCTTACGCTAGTCTTCTGGATCC
</code>
</div>

 <i class="fa fa-question-circle"></i> **Practice redirections: Why don't you run some commands from part 5 again and save their outputs into files?**

<i class="fa fa-comment"></i> ...

<div style="background-color: #86CBBB; 1px; height:3px " ></div>

# The end: Integrative exercise


In your research group you're analyzing the proteome (the set of proteins expressed by an organism or cell) of different species. This information is available for downloading here: https://github.com/marta-coronado/TAB-data-figs/raw/refs/heads/main/PFAM_seqs.zip

Integrating different commands that you have learn during this practical, answer the following questions:

1. Once you have downloaded the data (use `wget`!), unzip the file with the proper function according to the type of compression.
2. Indicate the basic properties of this data: how many folders and files have you unzipped? How is the size of the files and folers? How many lines do they have? etc.

Move the entire folder to the `P1_TAB` folder, into a folder called <code>IntegrativeExerciseP1</code>. Answer the following biological questions:

1. How many proteins does the human proteome have? And the mouse? (The human proteome is stored in the file <code>Homo_sapiens.GRCh38.pep.all.fa</code>and the mouse proteome in <code>Mus_musculus.GRCm38.pep.all.fa</code> and has the structure of a regular fasta file)
2. Find, for all the files that contain the longest protein (the files that have the extension `.liso`), how many sequences they have contained
3. List all protein names without the comment (the comment always starts after the first space in the sequence identifier of the fasta file)
4. How many genes are contained in the human genome?
5. How many genes have a single isoform?
6. How many genes have more than one isoform?
7. How many domains are there in the human proteome? How many are different? (The domains are in the file `Homo_sapiens.GRCh38.pep.all.fa.liso.dom` and always start with the identifier PF. Since the file is separated with multiple spaces, use the command `sed -r 's/ +/ /g '` to remove them by a single space first)
8. How many times it is found the 19th most frequent domain? Which domain it is?


In [None]:
%%bash

#1:
cd
wget https://github.com/marta-coronado/TAB-data-figs/raw/refs/heads/main/PFAM_seqs.zip
unzip PFAM_seuqs.zip

#2
#There are 2 folders. #
cd pfam
wc -l Gallus_gallus.Galgal4.pep.all.fa.liso.dom
#29955
wc -l Homo_sapiens.GRCh38.pep.all.fa.liso.dom
#47173
wc -l Mus_musculus.GRCm38.pep.all.fa.liso.dom
#43355
cd ..
cd seqs
wc -l Gallus_gallus.Galgal4.pep.all.fa.liso
wc -l Homo_sapiens.GRCh38.pep.all.fa
wc -l Homo_sapiens.GRCh38.pep.all.fa.liso
wc -l Mus_musculus.GRCm38.pep.all.fa
wc -l Mus_musculus.GRCm38.pep.all.fa.liso

#1
grep -c ">" Mus_musculus.GRCm38.pep.all.fa
#58668
grep -c ">" Homo_sapiens.GRCh38.pep.all.fa
#102915

#2
grep -c ">" Mus_musculus.GRCm38.pep.all.fa.liso
#22769
grep -c ">" Homo_sapiens.GRCh38.pep.all.fa.liso
#22964
grep -c ">" Gallus_gallus.Galgal4.pep.all.fa.liso
#15508

#3
grep ">" Gallus_gallus.Galgal4.pep.all.fa.liso | cut -d " " -f1


#5
uniq