# De Bruijn graphs

You will be implementing one of the primary assembly algorithms from short-read data that is used today. We will implement a simple form of the algorithm where we _assume perfect sequencing_. That is, everything is sequenced exactly once and there are no errors or variants in the sequencing. 

A graph is composed of **nodes** and **edges** and we will need to develop a data strcture to track edges between nodes in our graph. We have provided the basic class structure as well as descriptions of functions to `add_edge` and `remove_edge` from the graph. You will need to implement these functions in order to then build the de Bruijn graph. 

In our implementation below, we use a `defaultdict` data structure to hold a list of all edges in the graph where all "right" nodes connected to a "left" node are stored in a list for that node.

```
build_debruijn_graph:
define substring length k and input string
For each k-length substring of input:
  split k mer into left and right k-1 mer
  add k-1 mers as nodes with a directed edge from left k-1 mer to right k-1 mer
```

In [None]:
from collections import defaultdict
import random

class DeBruijnGraph():
    """Main class for De Bruijn graphs
    
    Private Attributes:
        graph (defaultdict of lists): Edges for De Bruijn graph
        first_node (str): starting position for traversing the graph
    """

    def __init__(self, input_string, k):
        self.graph = defaultdict(list)
        self.first_node = ''
        self.build_debruijn_graph(input_string, k)
        
    def add_edge(self, left, right):
        ''' This function adds a new edge to the graph
        
        Args:
            left (str): The k-1 mer for the left edge
            right (str): The k-1 mer for the right edge

        Updates graph attribute to add right to the list named left in defaultdict   
        '''
       
        
    def remove_edge(self, left, right):
        ''' This function removes an edge from the graph
        
        Args:
            left (str): The k-1 mer for the left edge
            right (str): The k-1 mer for the right edge

        Updates graph attribute to remove right from the list named left in defaultdict
        '''
        pass

    def build_debruijn_graph(self, input_string, k):
        ''' This function builds a De Buijn graph from a string
        
        Args:
            input_string (str): string to use for building the graph
            k (int): k-mer length for graph construction

        Updates graph attribute to add all valid edges from the string
        
        Example:
        >>> dbg = DeBruijnGraph("this this this is a test", 4)
        >>> print(dbg.graph) #doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
        defaultdict(<class 'list'>, {'thi': ['his', 'his', 'his'], 'his': ['is ', 'is ', 'is '], ...)
        '''
        char_pos = 6
        self.first_node = input_string[0:5]
        while char_pos <= len(input_string):
            defaultdict[input_string[(char_pos-6):(char_pos-1)]].append(input_string[char_pos-5:char_pos])
        

In [None]:
graph = DeBruijnGraph("fool me once shame on shame on you fool me", 6)
print(graph.graph)

Expected output:
defaultdict(<class 'list'>, {'fool ': ['ool m', 'ool m'], 'ool m': ['ol me', 'ol me'], 'ol me': ['l me '], 'l me ': [' me o'], ' me o': ['me on'], 'me on': ['e onc', 'e on ', 'e on '], 'e onc': [' once'], ' once': ['once '], 'once ': ['nce s'], 'nce s': ['ce sh'], 'ce sh': ['e sha'], 'e sha': [' sham'], ' sham': ['shame', 'shame'], 'shame': ['hame ', 'hame '], 'hame ': ['ame o', 'ame o'], 'ame o': ['me on', 'me on'], 'e on ': [' on s', ' on y'], ' on s': ['on sh'], 'on sh': ['n sha'], 'n sha': [' sham'], ' on y': ['on yo'], 'on yo': ['n you'], 'n you': [' you '], ' you ': ['you f'], 'you f': ['ou fo'], 'ou fo': ['u foo'], 'u foo': [' fool'], ' fool': ['fool ']})

In [None]:
import doctest
doctest.testmod()