# Using NumPy to accelerate numeric operations

The following code simulates random sequences as a matrix where each row is a sequence represented by numbers 0 (A), 1 (C), 2 (T) and 3 (G). We want to count the numbers of Cs in this sequence.

In [None]:
import numpy as np
# we set a random seed to make things reproducible
np.random.seed(1)

n_sequences = 10
sequence_length = 30
sequences = np.random.randint(0, 4, (n_sequences, sequence_length))
print(sequences)

We first use vanilla Python code to count the number of Cs:

In [None]:
def count_c(sequences):
    number_of_cs = 0
    for sequence in sequences:
        for base in sequence:
            if base == 1:
                number_of_cs += 1
    return number_of_cs

In [None]:
%time print("Result: ", count_c(sequences))

**TASK**

Try changing `n_sequences` to a larger number (e.g. 1000000) and see how long time the above code takes to run.

Next, we have implemented the same functionality in NumPy. How much faster is NumPy than pure Python?

In [None]:
def count_c_using_numpy(sequences):
    is_c = sequences == 1
    return np.sum(is_c)

In [None]:
%time print("Result: ", count_c_using_numpy(sequences))

**TASK**

Assume we have the following base qualities (a number between 0 and 60 for each base in each sequence):

In [None]:
base_qualities = np.random.randint(0, 60, (n_sequences, sequence_length))


Given the above base qualities, try to find how many are **above 30**.

If you have time, also try to answer these questions using NumPy:

* What is the mean base quality?
* Compute the mean base quality for each read (hint: you need to use `axis=....`)
* How many reads have mean base quality above 35?
* What is the standard deviation of the base qualities? (You can google numpy standard deviation if you don't know which function to use)
* What is the mean base quality of all the bases except the first base of each read?
* What is the mean base quality of all the bases with quality above 30 not considering the first base of each read?

In [None]:
# You can write code here