# Projektni zadatak 4, Genomska informatika, Skolska 2021/2022

# Aleksandar Malovic 2021/3375

## Assignment

* Implement an algorithm for indexed string search using Burrows-Wheeler transform and FM index as described on the lesson slides, without additional optimizations
* Write tests for intermediate functions as well as final algorithm
* Optimize algorithm in regards to memory usage and performace. Run regression tests and check optimization results using assigned testing parameters

## Testing remarks

Tests for individual functions are grouped within testing functions to separate and enclose testing scopes. This also simplifies calling the tests at other places in the code.

To accomodate the need for regression testing, in certain cases test cases will be represented as dictionaries which include the class name which will be used to construct the testing instance, along with all the test case data. This will make regression testing simple as we just need to rerun the tests with a different class name.

## Helper functions

#### Is input a valid string

In [1]:
def isInputValid(t):
    return t != None and len(t) > 0

def testInputValidation():
    # Test case 1: None, should return false
    assert not isInputValid(None)
    
    # Test case 2: Empty string, should return false
    assert not isInputValid('')
    
    # Test case 3: Valid input
    assert isInputValid('abc')

In [2]:
testInputValidation()

#### Is input a valid string with ending character

In [3]:
def isBWInputValid(t):
    return t != None and len(t) > 0 and t != "$" and t.endswith('$')

def testInputValidation():
    # Test case 1: None, should return false
    assert not isBWInputValid(None)
    
    # Test case 2: Empty string, should return false
    assert not isBWInputValid('')
    
    # Test case 3: String containing only the ending character, should return false
    assert not isBWInputValid('$')
    
    # Test case 4: String missing ending character, should return false
    assert not isBWInputValid('abc')
    
    # Test case 5: Valid string, should return true
    assert isBWInputValid('abc$')

In [4]:
testInputValidation()

#### Arays are equal check, both length and order

In [5]:
def arraysEqual(output, expectedOutput):
    # Both inputs being None is considered an error as well
    if output == None or expectedOutput == None:
        return False
    if  len(output) != len(expectedOutput):
        return False
    for i in range(0, len(expectedOutput)):
         if output[i] != expectedOutput[i]:
            return False
    return True

def testArraysEqual():
    assert not arraysEqual(None, None)
    
    assert not arraysEqual(None, [])
    
    assert not arraysEqual(['0', '1', '2'], ['0', '1', '2', '3'])
    
    assert not arraysEqual(['0', '1'], ['2', '3'])
    
    assert arraysEqual(['0', '1', '2', '3'], ['0', '1', '2', '3'])

In [6]:
testArraysEqual()

#### Sets are equal check, order not important, same elements

In [7]:
def setsEqual(output, expectedOutput):
    # Both input being None is considered an error as well
    if output == None or expectedOutput == None:
        return False
    if  len(output) != len(expectedOutput):
        return False
    for e in output:
         if e not in expectedOutput:
            return False
    return True

def testSetsEqual():
    assert not setsEqual(None, None)
    
    assert not setsEqual(None, [])
    
    assert not setsEqual(['0', '1', '2'], ['0', '1', '2', '3'])
    
    assert not setsEqual(['1', '0'], ['2', '3'])
    
    assert setsEqual(['0', '1', '3', '2'], ['2', '0', '1', '3'])

In [8]:
testSetsEqual()

## Burrows-Wheeler transform

Burrows-Wheeler transform consists of three steps.
* Create an array of all input string rotations
* Sort array in alphabetical order (Burrows-Wheeler matrix)
* Take last column of the Burrows-Wheeler matrix

**ASSUMPTION 1:** Input string will already have the ending character appended before being subjected to the BWT, otherwise functions will return None to indicate error. Algorithm could create a local copy with the ending character appended but in case of large strings creating a local copy with just one additional character would be suboptimal from a memory standpoint considering other functions in the FM index rely on the same string.

### Create array of all string rotations

Function appends the string to itself to make it simpler to calculate rotations (based on lesson slides). Implementation using splicing is also possible but is suboptimal from a memory standpoint.

In [9]:
def rotations(t):
    if not isBWInputValid(t):
        return None
    tt = t * 2
    return [ tt[i:i+len(t)] for i in range(0, len(t)) ]

#### Tests

In [10]:
def testRotations():
    # Test case 1: None, should return None
    assert rotations(None) == None
    
    # Test case 2: Empty string, should return false
    assert rotations('') == None
    
    # Test case 3: String containing only the ending character, should return false
    assert rotations('$') == None
    
    # Test case 4: String missing ending character, should return false
    assert rotations('abc') == None
    
    # Test case 5: Input string of just one character 
    inputValue = 'a$'
    expectedOutput = ['a$', '$a']
    output = rotations(inputValue)
    assert arraysEqual(output, expectedOutput)
    
    # Test case 6: Valid input string
    inputValue = 'abcd$'
    expectedOutput = ['abcd$', 'bcd$a', 'cd$ab', 'd$abc', '$abcd']
    output = rotations(inputValue)
    assert arraysEqual(output, expectedOutput)

In [11]:
testRotations()

### Sort string rotations in alphabetical order, Burrows-Wheeler Matrix

Based on lesson slides.

In [12]:
def calculateBurrowsWheelerMatrix(t):
    r = rotations(t)
    return sorted(r) if r != None else None

#### Tests

In [13]:
def testCalculateBurrowsWheelerMatrix():
    # Test case 1: None, should return None
    assert calculateBurrowsWheelerMatrix(None) == None
    
    # Test case 2: Empty string, should return None
    assert calculateBurrowsWheelerMatrix('') == None
    
    # Test case 3: String containing only the ending character, should return None
    assert calculateBurrowsWheelerMatrix('$') == None
    
    # Test case 4: String missing ending character, should return None
    assert calculateBurrowsWheelerMatrix('abc') == None
    
    # Test case 5
    inputValue = 'abcd$'
    expectedOutput = ['$abcd','abcd$', 'bcd$a', 'cd$ab', 'd$abc']
    output = calculateBurrowsWheelerMatrix(inputValue)
    arraysEqual(output, expectedOutput)

In [14]:
testCalculateBurrowsWheelerMatrix()

### Generate final Burrows-Wheeler transform

We take the last column of the sorted rotations matrix (based on lesson slides)

In [15]:
# Calculates the actual Burrows-Wheeler transform, or L index (last column of the matrix)
def calculateLIndex(t):
    return ''.join(map(lambda x: x[-1], t)) if t != None else None

def calculateBurrowsWheelerTransform(t):
    r = calculateBurrowsWheelerMatrix(t)
    return calculateLIndex(r) if r != None else None

#### Tests

In [16]:
def testCalculateLIndex():
    # Test case 1: None, should return None
    assert calculateLIndex(None) == None
    
    # Test case 2:
    assert calculateLIndex(['$abcd','abcd$', 'bcd$a', 'cd$ab', 'd$abc']) == 'd$abc'

def testBurrowsWheelerTransform():
    # Test case 1: None, should return None
    assert calculateBurrowsWheelerTransform(None) == None
    
    # Test case 2: Empty string, should return None
    assert calculateBurrowsWheelerTransform('') == None
    
    # Test case 3: String containing only the ending character, should return None
    assert calculateBurrowsWheelerTransform('$') == None
    
    # Test case 4: String missing ending character, should return None
    assert calculateBurrowsWheelerTransform('abc') == None
    
    # Test case 5
    inputValue = 'abcd$'
    expectedOutput = 'd$abc'
    output = calculateBurrowsWheelerTransform(inputValue)
    assert output != None
    assert output == expectedOutput
    
    # Test case 6
    inputValue = 'abaaba$'
    expectedOutput = 'abba$aa'
    output = calculateBurrowsWheelerTransform(inputValue)
    assert output != None
    assert output == expectedOutput

In [17]:
testCalculateLIndex()

testBurrowsWheelerTransform()

## FM index

Core of the FM index structure consists of the following data:
* F index (first column of the Burrows-Wheeler matrix)
* L index (last column of the Burrows-Wheeler matrix, the Burrows-Wheeler transform itself)
* Tally (matrix of input string character ranks)
* Suffix array of the input string

To facilitate regression testing after optimisations, F Index, Tally and Suffix array will be implemented as classes wrapping the internal data structure, making testing possible without the need to know the internal data structure. This will also simplify creating an optimised FM index as the constructor will accept a dictionary of class names where we can swap between optimised and unoptimised classes.

### F Index

F Index represents the first column of the Burrows-Wheeler matrix.

In [75]:
class FIndex(object):
    def __init__(self, bwm):
        if bwm == None:
            raise ValueError('Invalid BWM supplied')
        self.value = ''.join(map(lambda x: x[0], bwm))
        
    """
    Returns first occurence of a character in the F Index, or -1 if character doesn't exist
    """
    def first(self, char):
        try:
            return self.value.index(char)
        except ValueError:
            return -1
    
    """
    Returns last occurence of a character in the F Index, or -1 if character doesn't exist
    """
    def last(self, char):
        try:
            return self.value.rindex(char)
        except ValueError:
            return -1

#### Tests

To simplify function signatures and testing code, the following structure will be used to represent a test case:

```python
{ # F Index test case
    'class': class_name # used to construct testing instance
    'input': input for testing
    'expectedOutput': expected value for F Index
    'expectedFailure': true/false # whether we expect a failure
    'queryTests': [] # Test cases for querying the F Index
    """
    Query tests is an array of dictionaries with the following structure
    {
    'char': 'c'
    'expectedFirst': 0
    'expectedLast': 0
    }
    """
}
```

In [129]:
def _testFIndex(test):
    try:
        fIndex = test['class'](test['input'])
    except ValueError:
        assert test['expectedFailure']
        return
    assert fIndex.value == test['expectedOutput']
    for qTest in test['queryTests']:
        assert (fIndex.first(qTest['char']) == qTest['expectedFirst'] 
            and fIndex.last(qTest['char']) == qTest['expectedLast'])

def testFIndex(constr):
    # Test case 1: None
    _testFIndex(
        {
            'class': constr,
            'input': None,
            'expectedFailure': True
        }
    )
    
    # Test case 2: Valid BWM for input string 'abcd'
    _testFIndex(
        {
            'class': constr,
            'input': ['$abcd','abcd$', 'bcd$a', 'cd$ab', 'd$abc'],
            'expectedFailure': False,
            'expectedOutput': '$abcd',
            'queryTests': [
                {
                    'char': 'a',
                    'expectedFirst': 1,
                    'expectedLast': 1
                },
                {
                    'char': 'b',
                    'expectedFirst': 2,
                    'expectedLast': 2
                },
                {
                    'char': 'n',
                    'expectedFirst': -1,
                    'expectedLast': -1
                }
            ]
        }
    )
    
    # Test case 2: Valid BWM with repetition for input string 'aabbab'
    _testFIndex(
        {
            'class': constr,
            'input': ['$aabbab','aabbab$','ab$aabb','abbab$a','b$aabba','bab$aab','bbab$aa'],
            'expectedFailure': False,
            'expectedOutput': '$aaabbb',
            'queryTests': [
                {
                    'char': 'a',
                    'expectedFirst': 1,
                    'expectedLast': 3
                },
                {
                    'char': 'b',
                    'expectedFirst': 4,
                    'expectedLast': 6
                },
                {
                    'char': 'n',
                    'expectedFirst': -1,
                    'expectedLast': -1
                }
            ]
        }
    )

In [76]:
testFIndex(FIndex)

### Calculate tally

Tally is a matrix of L index(BWT) character ranks.
Each row is assigned to one of the characters of the BWT. The number of columns is equal to the length of the BWT. The value of a particular field in the matrix is the rank of the particular character at that point of the BWT, which represents how many occurences of said character have been in BWT up to that point.

In [77]:
class Tally(object):
    def __init__(self, bwt):
        if not isInputValid(bwt):
            raise ValueError("Invalid BWT supplied")
        self.value = {}
        for i in range(0, len(bwt)):
            # Copy previous column values to current one
            for row in self.value.values():
                row[i] = row[i-1]
            """ 
            Take current character in input string.
            If a row for said character exists, increment rank. If not, insert a row populated with 0s, and then increment.
            """
            currentChar = bwt[i]
            if currentChar not in self.value:
                self.value[currentChar] = [0] * len(bwt)
            self.value[currentChar][i] += 1
     
    """
    Returns character rank at that position, or -1 if position is below 0 or greater than len, or if character isn't in tally
    """
    def query(self, char, j):
        if (char not in self.value) or (j < 0 or j > len(self.value[char])):
            return -1
        return self.value[char][j]

#### Tests

To simplify function signatures and testing code, the following structure will be used to represent a test case:

```python
{ # Tally test case
    'class': class_name # used to construct testing instance
    'input': input for testing
    'expectedOutput': expected value for Tally
    'expectedFailure': true/false # whether we expect a failure
    'queryTests': [] # Test cases for querying the Tally
    """
    Query tests is an array of dictionaries with the following structure
    {
    'char': 'c'
    'position': 0
    'expectedRank': 0
    }
    """
}
```

In [50]:
def _testCalculateTally(test):
    try:
        tally = test['class'](test['input'])
    except ValueError:
        assert test['expectedFailure']
        return
    expectedOutput = test['expectedOutput']
    assert setsEqual(tally.value.keys(), expectedOutput.keys())
    for key in tally.value.keys():
        assert arraysEqual(tally.value[key], expectedOutput[key])
    for qTest in test['queryTests']:
        assert tally.query(qTest['char'], qTest['position']) == qTest['expectedRank']

def testCalculateTally(constr):
    # Test case 1: None
    _testCalculateTally(
        {
            'class': constr,
            'input': None,
            'expectedFailure': True
        }
    )
    
    
    # Test case 2: Empty string
    _testCalculateTally(
        {
            'class': constr,
            'input': '',
            'expectedFailure': True
        }
    )
    
    # Test case 3: One character string
    _testCalculateTally(
        {
            'class': constr,
            'input': 'a',
            'expectedFailure': False,
            'expectedOutput': {
                'a': [1]
            },
            'queryTests': [
                {
                    'char': 'a',
                    'position': 0,
                    'expectedRank': 1
                },
                {
                    'char': 'a',
                    'position': 2,
                    'expectedRank': -1
                },
                {
                    'char': 'n',
                    'position': 0,
                    'expectedRank': -1
                }
            ]
        }
    )
    
    # Test case 4: Valid BWT string
    _testCalculateTally(
        {
            'class': constr,
            'input': 'abcaab$c',
            'expectedFailure': False,
            'expectedOutput': {
                '$': [0, 0, 0, 0, 0, 0, 1, 1],
                'a': [1, 1, 1, 2, 3, 3, 3, 3],
                'b': [0, 1, 1, 1, 1, 2, 2, 2],
                'c': [0, 0, 1, 1, 1, 1, 1, 2]
            },
            'queryTests': [
                {
                    'char': 'a',
                    'position': 0,
                    'expectedRank': 1
                },
                {
                    'char': 'b',
                    'position': 5,
                    'expectedRank': 2
                },
                {
                    'char': 'n',
                    'position': 0,
                    'expectedRank': -1
                }
            ]
        }
    )

In [78]:
testCalculateTally(Tally)

### Calculate suffix array

Suffix array is a sorted array of all suffixes of an input string. The array itself contains tuples (offset, suffix), sorted by suffix, where offset is the offset of the suffix within the input string. In case of an FM index, we are storing a suffix array of the original input string.

In [79]:
class SuffixArray(object):
    def __init__(self, t):
        if not isInputValid(t):
            raise ValueError('Invalid value provided for Suffix Array calculation')
        suffixArray = []
        for i in range(0, len(t)):
            suffixArray.append((i, t[i:]))
        self.value = sorted(suffixArray, key=lambda x: x[1])
        
    """
    Returns offsets on indexes between start and end
    """
    def query(self, start, end):
        return [suffix[0] for suffix in self.value[start:end]]

#### Tests

To simplify function signatures and testing code, the following structure will be used to represent a test case:

```python
{ # Suffix Array test case
    'class': class_name # used to construct testing instance
    'input': input for testing
    'expectedOutput': expected value for Suffix Array
    'expectedFailure': true/false # whether we expect a failure
    'queryTests': [] # Test cases for querying the Tally
    """
    Query tests is an array of dictionaries with the following structure
    {
    'start': 'c'
    'end': 0
    'expectedOffsets': [0, 1]
    }
    """
}
```

In [64]:
def _testCalculateSuffixArray(test):
    try:
        suffixArray = test['class'](test['input'])
    except ValueError:
        assert test['expectedFailure']
        return
    expectedOutput = test['expectedOutput']
    assert arraysEqual(suffixArray.value, expectedOutput)
    for qTest in test['queryTests']:
        assert arraysEqual(suffixArray.query(qTest['start'], qTest['end']), qTest['expectedOffsets'])

def testCalculateSuffixArray(constr):
    # Test case 1: None
    _testCalculateSuffixArray(
        {
            'class': constr,
            'input': None,
            'expectedFailure': True
        }
    )
    
    
    # Test case 2: Empty string
    _testCalculateSuffixArray(
        {
            'class': constr,
            'input': '',
            'expectedFailure': True
        }
    )
    
    # Test case 3: String with one character
    _testCalculateSuffixArray(
        {
            'class': constr,
            'input': 'a',
            'expectedFailure': False,
            'expectedOutput': [
                (0, 'a')
            ],
            'queryTests': [
                {
                    'start': 0,
                    'end': 1,
                    'expectedOffsets': [0]
                }
            ]
        }
    )
    
    # Test case 4: Valid string
    _testCalculateSuffixArray(
        {
            'class': constr,
            'input': 'abcaabc$',
            'expectedFailure': False,
            'expectedOutput': [
                (7, '$'),(3,'aabc$'), (4,'abc$'), (0,'abcaabc$'), (5,'bc$'), (1,'bcaabc$'),(6,'c$'),(2,'caabc$')
            ],
            'queryTests': [
                {
                    'start': 0,
                    'end': 3,
                    'expectedOffsets': [7, 3, 4]
                }
            ]
        }
    )

In [80]:
testCalculateSuffixArray(SuffixArray)

### FMIndex class with query support

FM index class will take two input values:
* t - string to be indexed and queried
* functions - dictionary containing class names for F Index, Tally and Suffix Array
```python
{
    'fIndex': class_name
    'tally': class_name
    'suffixArray': class_name
}
```

FM index querying is performed in the following way.

If we are looking for P, start with characters in P in reverse order. Find positions of P's shortest suffix, and then extend the suffix until we exhaust P, or are not able to calculate the position of the next suffix, in which case we know there is no match.

**EXAMPLE:**

Let us assume R is ABCD, string we are querying is T. 

* Start with D and calculate first and last position of D based on the F Index.

 **\[start, end\] = \[index of first occurence of D in F index + 1, index of last occurence of D in F index\]**
 
* Now we are looking for CD. Using LF mapping, we can see the rows of L index that contain C before D using the F index and the tally.

 **\[start, end\] = \[
index of first occurence of C in F index + tally(C, start -1) + 1, 
index of first occurence of C in F index + tally(C, end)
\]**

* Continue repeating the process until we exhaust R, or until we get an invalid value.

In [66]:
class FMIndex:
    def __init__(self, t, functions):
        """
        Check if input is a valid string.
        If input doesn't already have the ending character, append the ending character.
        """
        if not isInputValid(t):
            raise ValueError("Invalid input string for calculating FM index")
        tt = t
        if not tt.endswith('$'):
            tt += '$'
        bwm = calculateBurrowsWheelerMatrix(tt)
        self.fIndex = functions['fIndex'](bwm)
        self.lIndex = calculateLIndex(bwm)
        self.tally = functions['tally'](self.lIndex)
        self.suffixArray = functions['suffixArray'](tt)
    
    """
    Queries the initial string t for substring p. Returns array of indexes within t where p is located.
    """
    def query(self, p):
        start = 0
        end = 0
        length = len(p)
        for i in range(0, length):
            currentChar = p[length - i - 1]
            if start == 0 and end == 0: # first character, look just in F index
                start = self.fIndex.first(currentChar)
                end = self.fIndex.last(currentChar)
            else:
                firstOcc = self.fIndex.first(currentChar)
                start = firstOcc + self.tally.query(currentChar, start - 1)
                end = firstOcc + self.tally.query(currentChar, end)
        return self.suffixArray.query(start, end)

#### Tests

To simplify function signatures and testing code, the following structure will be used to represent a test case:

```python
{ # Suffix Array test case
    'functions': dictionary with class name for F Index, Tally and Suffix Array # used to construct testing instance
    'input': input for testing
    'expectedFailure': true/false # whether we expect a failure
    'queryTests': [] # Test cases for querying the FM Index
    """
    Query tests is an array of dictionaries with the following structure
    {
    'substring': 'c'
    'expectedOffsets': [0, 1]
    }
    """
}
```

In [72]:
def _testFMIndex(test):
    try:
        fmIndex = FMIndex(test['input'], test['functions'])
    except ValueError:
        assert test['expectedFailure']
        return
    for qTest in test['queryTests']:
        assert arraysEqual(fmIndex.query(qTest['substring']), qTest['expectedOutput'])

def testFMIndex(functions):
    # Test case 1: None
    _testFMIndex(
        {
            'functions': functions,
            'input': None,
            'expectedFailure': True
        }
    )
    
    # Test case 2: Empty string
    _testFMIndex(
        {
            'functions': functions,
            'input': '',
            'expectedFailure': True
        }
    )
    
    #Test case 3: Simple string without ending character
    _testFMIndex(
        {
            'functions': functions,
            'input': 'testtest',
            'expectedFailure': True,
            'queryTests': [
                {
                    'substring': 'te',
                    'expectedOutput': [4, 0]
                }
            ]
        }
    )
    
    #Test case 4: Simple string with ending character
    _testFMIndex(
        {
            'functions': functions,
            'input': 'testtest$',
            'expectedFailure': True,
            'queryTests': [
                {
                    'substring': 'te',
                    'expectedOutput': [4, 0]
                }
            ]
        }
    )

In [73]:
testFMIndex(
    {
        'fIndex': FIndex,
        'tally': Tally,
        'suffixArray': SuffixArray
    }
)

## FM Index Optimisations

### F Index optimisation

Since we know that F index is stored in alphabetical order, we don't have to store the entire string. Instead, we can compress it into a dictionary mapping characters to their first occurence in the array. This greately speeds up F index searching, and in case of large strings with a lot of character repetition (such as DNA), significantly reduces memory usage.

**EXAMPLE:**

Uncompressed: \'\\$aaabbcccd\' <br>
Compressed: \{\'\\$\': 0, \'a\': 1, \'b\': 4, \'c\': 6, \'d\': 9\}

In [127]:
import collections

class OptimisedFIndex(FIndex):
    def __init__(self, bwm):
        super(self.__class__, self).__init__(bwm)
        self.dict = {}
        """
        We are using two helper dictionaries to maintain O(1) search for the last occurence of an char.
        ki - mapping of a key to its index
        ik - mapping of an index to its key
        
        When looking for the next entry in the dictionary, if we have k as the current key, we find next key as
        ik[ki[k] + 1]
        """
        self.ik = {}
        self.ki = {}
        self
        j = 0
        for i in range(0, len(self.value)):
            if self.value[i] not in self.dict:
                self.dict[self.value[i]] = i
                self.ik[j] = self.value[i]
                self.ki[self.value[i]] = j
                j += 1
            
    def first(self, char):
        return self.dict[char] if char in self.dict else -1
    
    def last(self, char):
        if char not in self.dict:
            return -1
        nextIndex = self.ki[char] + 1
        if nextIndex == len(self.dict):
            return len(self.value) - 1
        return self.dict[self.ik[nextIndex]] - 1 if char in self.dict else -1

#### Tests

To make sure the optimised verion runs the same way as unoptimised, we will run the same tests.

In [130]:
testFIndex(OptimisedFIndex)

### Tally optimisation

Depending on the length of the input string and the number of different characters in it, the tally matrix can get very big with potentially a lot of column repetition. A way to optimise tally matrix size without impacting its O(1) search time complexity is to implement checkpoints. Instead of storing a column for every position in the BWT, we will store a checkpoint, or ranks for every N-th position in the array where N is an integer value higher than 1 and less than input length (example: every 5th position).

In this case, when we are querying the tally matrix. We take the closest checkpoint to the position we need, and then simply query the BWT from the position we need to the nearest checkpoint to make rank adjustements accordingly. Since we make at most N + 1 queries where N is the space of the checkpoints, our time complexity remains O(1) while we have greately reduced the space used.

In this implementation, we will store every 5th rank. This does mean that we are suffering a performance penalty for small strings, but this is negligent compared to the memory optimisation for larger strings where we are cutting down the memory usage by 80%.

In [136]:
class OptimisedTally(Tally):
    def __initII(self, bwt):
        if not isInputValid(bwt):
            raise ValueError("Invalid BWT supplied")
        self.tally = {}
        # Calculate optimized tally length
        tallyLen = len(bwt) // 5
        # Separate counter used to track the optimized tally columns
        tallyCounter = 0
        for i in range(0, len(bwt)):
            # Copy previous column values to current one if is a factor of 5 and increment our tallyCounter
            if i % 5 == 0:
                for value in self.tally.values():
                    value[tallyCounter] = value[tallyCounter-1]
                tallyCounter += 1
                # If we have reached end of tally, return immediately and skip remaining characters
                if tallyCounter == tallyLen:
                    break
            """ 
            Take current character in input string.
            If a row for said character exists, increment rank. If not, insert a row populated with 0s, and then increment.
            """
            currentChar = bwt[i]
            if currentChar not in self.tally:
                self.tally[currentChar] = [0] * tallyLen
            self.tally[tallyCounter][i] += 1
            
    def query(self, char, j):
        pass

#### Tests