# 1) Analysis of the Serial Algorithm


## String-Matching

The algorithms for the search of substrings called pattern-string (of length M), in a larger string called text-string (of length N) are of fundamental importance. With applications in many different fields. 

## Rabin-Karp Algorithm

It is an algorithm that uses a rolling-hash function that allows it to have performances in the order of O (M + N), with high probability. This is a completely different approach to brute-force, substring searching is based on hashing. 


### The operations  can be summarized in these few lines:
* Compute a hash function for the pattern
* Aearch for a match using the same hash function for each possible substring of characters M of the text
* If we find a text substring with the same hash value as the pattern, check the match (Note: a matching hash does NOT imply a matching string)

## Role of the Rolling-Hash Function:

Be aware, however, that a direct implementation based on the above description would be much slower than a brute force search. Computing a hash function involving each character is probably much more expensive than just comparing characters. But Rabin and Karp showed that it is easy to compute hash functions for character substrings M in constant time (after some preprocessing). This leads to a search for substrings in linear time in practical situations, i.e. at the average performance (high probability) we have indicated.

The underlying theory is the following: we know that a string of length M corresponds to a base-R number of M digits (R: alphabet size). If I want to use a hash table of size Q for keys of this type, we need a hash function to convert a base-R M-digit number to an int value between 0 and Q-1. Modular hashing can be a good idea:


hash (x) = x mod Q


Basically, we use a Q, prime, random number as large as possible.



As already mentioned, Rabin-Karp does not compute hashing on all strings of length M, to find the one with the same hash and then make the comparison. But we use the "mathematical trick" exploited by Rabin and Karp.
This "trick" leads, after preprocessing, to compute hashes in constant time. It exploits Horner's rule which states that it is possible to evaluate a polynomial of degree N by carrying out only N additions and N multiplications. 



Example:
a2* x2 + a1 * x + ... + a0 = (a2 * x + a1) * x + a0
this idea works equally with the modulo-arithmetics, the one we use for hashing, and then the following applies:

(a2 * x2 + a1 * x + ... + a0) % z = (((a2 * x + a1) % z) * x + a0) % z
where the % denotes the modulo operation.

The Rabin-Karp method is based on efficiently computing the hash function xi+1 for the position i+1 in the text, given its value for position i, xi. Example:

xi= ti * RM-1 + ti+1 * RM-2 + ... + ti + M-1 * R0
xi+1 = ti+1 * RM + ti * RM-1 + ... + ti + M * R0

xi+1 = (xi - ti * RM-1) * R + ti+M

We do not have to keep the values ​​of the numbers, only the values ​​of their remainder divided by Q. The algorithm is then used iteratively, calculating the hash of the next string, using that of the previous one as in the formula above.

Once we find a hash value for a substring of text of characters M that matches the hash value of the pattern, we do not test for match or collision. Matching requires M to be matched against the text string and this degrades performance. But if we make Q very large, we make the probability of a collision extremely small. With a Q value greater than 10 20, we make the probability of a collision equal to 10 -20.

This algorithm is a first and famous example of a Monte Carlo algorithm that has a guaranteed completion time but fails to provide a correct answer with a small probability.

In our implementation of the algorithm, we actually do check the pattern against the substring each time their hashes match. This is due to the fact that using a very large Q-value can lead to overflows, or slow down the execution. This means that once a match is found by our program, it’s guaranteed to be a match.

<img src="./images/a.png" width="300"></img>




# A Priori Study of Available Parallelism

The Rabin-Karb algorithm for string-matching uses a rolling hash to avoid having to repeat the computation of the hash for each window of text from scratch.

<img src="http://img.sparknotes.com/figures/E/e06912cfeeac4cade7d4527506538c17/rabin1.gif"></img>

This effectively means that there's always a data-dependency between the current iteration and the one preceding it. Which makes it hard to parallelize the algorithm itself.

The Approach we chose to take was to split the text into N parts, and assign one to each slave process. 

The same pattern is passed to each slave process. 

Each slave process is made to search for the pattern in its own portion of text.






## Calculating 'S' and 'P' for Amdahl's law equation

Assuming that the ratio of C-slocs to Asm-slocs is approximately constant throughout the codebase.

#### Lines of C/C++ Code:

In [2]:
def count_lines(pathname):
    import re
    with open(pathname, "r") as f:
        source = f.read()
    source = source.split("\n") 
    source = [line for line in source if len(line.strip()) > 0 ]
    source = [re.sub("\\s+", "", line) for line in source]
    source = [line for line in source if line[:2] != '//'  ]
    return len(source)



In [3]:

# the rabin-karp part and roughly half of the main (slave).
parallelizable = count_lines("headers/rabin_karp.h") +count_lines("parallel2.cpp")/2

# total
total = parallelizable + count_lines("parallel2.cpp")/2 + count_lines("headers/read_file.h")

P = parallelizable/total
S = 1-P
S, P

(0.38372093023255816, 0.6162790697674418)

#### Applying The Amdahl Law's Equation

In [4]:
speedup = lambda N : 1 / (S + P/N)

In [14]:
speedup(16)

2.368330464716007

#### With Assembly slocs instead????
(Assuming ratio of C-slocs/Asm-slocs is NOT constant throughout the code).

In [21]:
# ! cp ./headers/rabin_karp.h rabin_karp.cpp 
# ! echo "int main(){}" >> rabin_karp.cpp
# ! g++ rabin_karp.cpp -o a.out
# rabin_karp_sloc = ! objdump -d a.out | wc -l 
# ! rm rabin_karp.cpp
# ! rm a.out
# f"Asm slocs in 'rabin_karp.cpp': {int(rabin_karp_sloc[0])}"

"Asm slocs in 'rabin_karp.cpp': 352"

In [22]:
# ! cp ./headers/read_file.h read_file.cpp 
# ! echo "int main(){}" >> read_file.cpp
# ! g++ read_file.cpp -o a.out
# read_file_sloc = ! objdump -d a.out | wc -l 
# ! rm read_file.cpp
# ! rm a.out
# f"Asm slocs in 'read_file_sloc.cpp': {int(read_file_sloc[0])}"

"Asm slocs in 'read_file_sloc.cpp': 362"

In [28]:
! mpic++ -g -O0 -fno-builtin parallel2.cpp -o a.out
# parallel2_sloc = ! objdump -d a.out | wc -l 
!  objdump -fsd --source a.out > dump.txt
! rm a.out
f"Asm slocs 'in parallel2.cpp': {int(parallel2_sloc[0])}"

"Asm slocs 'in parallel2.cpp': 9487"

In [13]:

# m = (9487 - 352 - 362) # just the code in 'parlallel2'
# s =  (m/2  + 362)/9487 
# p =  (m/2  + 352)/9487
# s, p


(0.5005270369979973, 0.4994729630020027)

In [31]:
# from dump.txt (objump data)
slave =  749 - 604
master = 600 - 320
main_com = 315 - 277
search = 275 - 86
read = 79 - 1
tot = slave + master + main_com + search + read
par = (slave + main_com + search )/ tot
ser= 1 - par 
ser, par

(0.4904109589041096, 0.5095890410958904)

In [34]:
speedup = lambda N : 1 / (ser + par/N)
speedup(7)

1.7755385684503127

In [35]:
80/1.7

47.05882352941177

$ 2^{2} + 3 = 5    X_{2}  $