# <center>  Anti-rosalind Rosalind club

## Seminar 1, 11.10.2022

This is the seminar on solving bioinformatics problems from [Rosalind](https://rosalind.info/problems/locations/) platform.

Teching assistants:

- Nikita Vaulin, Nikita.Vaulin@skoltech.ru, tg: @nvaulin
    
- Oksana Kotovskaya, Oksana.Kotovskaya@skoltech.ru

***Notion 0***. Feel free to start using shortcuts when working with Jupyter Notebook:
- To run the cell `Ctrl`+`Enter`
- To run the cell and step to the next one `Shift` + `Enter`
- To run the cell and create the new one `Alt` + `Enter`

There are two modes of action: cell-editing (the line on the left is green) and cell-selecting (the line on the left is blue). 

- cell-editing mode &#8594; `Esc` &#8594; cell-selecting mode
- cell-selecting mode &#8594; `Enter` &#8594; cell-editing mode

In cell-selecting mode you can:

- Delete  a cell - `DD`
- Undo deleting - `Z`


***Notion 1***. On Rosalind platform input data is provided as a .txt file for download. In this regard, it is highly recommended to create a separate directory on your computer. Inside this directory it is convenient to store .ipynb notebooks and also create two additional directories: for input and output files.

In [1]:
#Provide here your own directories
input_dir = 'input/'
output_dir = 'output/'

***Notion 2***. On the Rosalind platform, it is not the program code that needs to be submitted as an answer, but the final result. The result of the program can be attached as a file, or copied and pasted into the text field. In the case of a file attachment, the resulting files should be stored in a separate directory as described above.

If you want to copy and paste text (as we will do at least for the first time), it is very convenient to use a special python library `pyperclip`. It allows you to copy data to the clipboard and paste from it

To install this library, write on the command line:

```
pip install pyperclip
```

To open the command line in Windows, press `Win`+`R`, type `cmd` and hit `Enter`. 

In [2]:
#First of all, we need to import our installed pretty library
import pyperclip

It can exchange text with the clipboard by commands
```
pyperclip.copy()
pyperclip.paste() 
```

After the `pyperclip.copy()` command was executed, you can just use `Crtl`+`V` to paste the text wherever you want

## <center> Let's move on!

### #2. Bioinformatics Stronghold

#### DNA: Counting DNA Nucleotides

In [3]:
# You can specify the name of the file to open here
filename = f'{input_dir}rosalind_dna.txt'

with open(filename, "r") as f:
    seq = f.readline().upper().strip()

numbers = []
for n in ['A', 'C', 'G', 'T']:
    numbers.append(str(seq.count(n)))
    
print(' '.join(numbers))
pyperclip.copy(' '.join(numbers))

221 206 185 219


#### RNA: Transcribing DNA into RNA

Here we provide you with to solitions: one is with `for` cycle and the other with apllying string method `.replace()`. The `%%time` magic method prints the time needed to run the cell - look at CPU time and compare this two ways! Imagine that your data is several orders of magnitude bigger.

Also, we will not print long sequences - don't be afrade, it's working, it is in your clipboard, just use `Ctrl`+`V`.

In [4]:
%%time

filename = f'{input_dir}rosalind_rna.txt'

with open(filename, "r") as f:
    seq = f.readline().upper().strip()

rna = ''
for n in seq:
    if n == "T":
        rna += "U"
    else:
        rna += n

pyperclip.copy(rna)

CPU times: total: 15.6 ms
Wall time: 14 ms


Another solution

In [5]:
%%time

filename = f'{input_dir}rosalind_rna.txt'

with open(filename, "r") as f:
    seq = f.readline().upper().strip()

rna = seq.replace('T', 'U')

pyperclip.copy(rna)

CPU times: total: 0 ns
Wall time: 3 ms


#### REVC: Complementing a Strand of DNA

In [6]:
filename = f'{input_dir}rosalind_revc.txt'

with open(filename, "r") as f:
    seq = f.readline().upper().strip()

def reverse(seq):
    return seq[::-1]

def complement(seq):
    seq = seq.replace('A', 't')
    seq = seq.replace('T', 'a')
    seq = seq.replace('G', 'c')
    seq = seq.replace('C', 'g')
    return seq.upper()

pyperclip.copy(complement(reverse(seq))) 

#### FIB: Rabbits and Recurrence Relations

If you are having trouble understanding the problem statement, feel free to contact teaching assistants. 

Here we are faced with a slightly abstract but interesting problem. Although in fact the idea underlying the problem is very often embedded in bioinformatics. Here are two ways to solve it: one is very simple and fast. However, remember the second solution as a template that will be very useful in complex cases with many inputs, conditions, and outcomes.

In [7]:
%%time

filename = f'{input_dir}rosalind_fib.txt'

with open(filename, "r") as f:
    n, k = map(int, f.readline().strip().split())

F = [0]*(n+1)
F[0] = 0
F[1] = 1

for i in range(2,n+1):
    F[i] = F[i-1] + k*F[i-2]

print(F[-1])
pyperclip.copy(F[-1])

574888488199
CPU times: total: 0 ns
Wall time: 2 ms


In [8]:
%%time

filename = f'{input_dir}rosalind_fib.txt'

with open(filename, "r") as f:
    n, k = map(int, f.readline().strip().split())


def F(n, k):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return F(n-1,k) + k * F(n-2, k)

    
res = F(n,k)
print(res)
pyperclip.copy(res)

574888488199
CPU times: total: 1.28 s
Wall time: 1.28 s


#### GC: Computing GC Content

In [9]:
def gc_count(seq):
    g = seq.upper().count('G')
    c = seq.upper().count('C')
    if len(seq) == 0:
        return 0
    gc = (g+c)*100/len(seq)
    return gc

names = []
seqs = []
filename = f'{input_dir}rosalind_gc.txt'

with open(filename, "r") as f:
    seq = ''
    for line in f:
        if not line.startswith(">"):
            seq += line.strip()
        else:
            names.append(line.strip()[1:])
            seqs.append(seq)
            seq = ''
    seqs.append(seq)

seqs = seqs[1:]   

_max = 0
for seq in seqs:
    if gc_count(seq) > _max:
        _max = gc_count(seq)
        index = seqs.index(seq)

res = f'{names[index]}\n{_max}'
print(res)
pyperclip.copy(res)

Rosalind_9288
52.36051502145923
