# Introduction to Coding

### Magic functions  
IPython has a set of predefined `magic functions` that you can call with a command line style syntax. There are two kinds of magics, line-oriented and cell-oriented:

* **Line magics** are prefixed with the `%` character and work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes.

* **Cell magics** are prefixed with a double `%%`, and they are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument.

These are the available **magic functions**:

In [None]:
%lsmagic

### How to measure the execution time

#### With **line magic**:

In [None]:
def make_squares(n):
    """Calculate the square of the first 'n' numbers"""
    
    results = []
    for i in range(n):
        results.append(i ** 2)

In [None]:
# Calculate the square of the first 1000 numbers
%timeit make_squares(10 ** 3)

#### With **cell magic**:

In [None]:
%%timeit -n 10000 -r 7

make_squares(10 ** 3)

### Speed up the execution with Cython

In order to use `Cython` compiler it has to be installed:
* pip install -U Cython
* conda install -c anaconda cython

In [None]:
# load Cython
%load_ext Cython

This is a function that calculate the square of a number (run with `CPython`)

In [None]:
def make_square_1(x):
    return x * x

This is the same function as above but run with `Cython`

In [None]:
%%cython

def make_square_2(x):
    return x * x

This is the same function as above but optimized (`Cython`)

In [None]:
%%cython

def make_square_3(int x):
    return x * x

Now let's check the execution times:

In [None]:
%timeit -r 5 -n 10_000_000 make_square_1(10**6)  # 1.000.000

In [None]:
%timeit -r 5 -n 10_000_000 make_square_2(10**6)  # 1.000.000

In [None]:
%timeit -r 5 -n 10_000_000 make_square_3(10**6)  # 1.000.000

### Speed up the execution with concurrency

#### Calculate the nucleotides abundance of the first 10 human chromosomes
The sequences of the chromosomes have been downloaded from [Ensembl release 113](https://ftp.ensembl.org/pub/release-113/fasta/homo_sapiens/dna/)

In [None]:
# We import the required modules
import gzip
from multiprocessing import Pool
import matplotlib.pyplot as plt

#### Functions we are going to use

In [None]:
def parse_fasta_file(chromosome):
    bases = {'A': 0, 'C': 0, 'G': 0, 'T': 0, 'N': 0}
    fasta_file = f'../data/Homo_sapiens.GRCh38.dna.chromosome.{chromosome}.fa.gz'
    
    with gzip.open(fasta_file, 'rb') as fd:
        for line in fd:
            line = line.decode().strip().upper()
            if line.startswith('>'):
                continue
            for base in 'ACTGN':
                bases[base] += line.count(base)

    percentages = create_percentages(bases) 
    return chromosome, percentages


def create_percentages(bases):
    length = sum(bases.values())
    percentages = dict()
    for base, value in bases.items():
        percentages[base] = round(value / length * 100, 2)
    return percentages


def plot_nucleotides(results, chromosomes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]):
    fig = plt.figure(figsize=(5, 3))
    ax = plt.subplot2grid((1, 1), (0, 0))

    for base in 'ATCG':
        y = [results[chrom][base] for chrom in chromosomes]
        ax.plot(chromosomes, y, 'o-', linewidth=2, alpha=0.5, label=base)
        
    ax.set_xticks(list(results.keys()))
    ax.set_xlabel('Chromosome')
    ax.set_ylabel('Nucleotides (%)')
    
    ax.legend(loc=[1.02, 0])
    plt.tight_layout()
    plt.show()

In [None]:
# We analyze the first 10 chromosomes
chromosomes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

#### Single core processing

In [None]:
%%time

results_singlecore = {}

for chromosome in chromosomes:
    chrom, percentages = parse_fasta_file(chromosome)
    results_singlecore[chrom] = percentages
    print(f'Chromosome {chrom}: {percentages}')

In [None]:
# Plot the results
plot_nucleotides(results_singlecore)

#### Multicore processing

In [None]:
%%time

NUM_CPU = 10
results_multicore = {}

with Pool(NUM_CPU) as pool:
    for chrom, percentages in pool.imap_unordered(parse_fasta_file, chromosomes):
        results_multicore[chrom] = percentages
        print(f'Chromosome {chrom}: {percentages}')    

In [None]:
# Plot the results
plot_nucleotides(results_multicore)

#### Checks if the results are the same

In [None]:
results_multicore == results_singlecore