# Graphical Methods
# Group: Jodie and Gabe

This is a Rosalind-based assignment. Consider (and do) the 7 new problems. These are all from chapter 3 of the text.

Please feel encouraged to use a module for keeping reusable functions and classes. Of course, remember to provide that module with your notebook.

| Program Name | Rosalind Problem |
|:-------------|:------------------------------------------------------|
|problem8| Generate the k-mer Composition of a String|
|problem9| Reconstruct a String from its Genome Path|
|problem10| Construct the Overlap Graph of a Collection of k-mers|
|problem11| Construct the De Bruijn Graph of a String |
|problem12| Construct the De Bruijn Graph of a Collection of k-mers|
|problem13| Find an Eulerian Path in a Graph|
|problem14| Reconstruct a String from its k-mer Composition|





In [1]:
def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines


# Problem 8: 

In [21]:
def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines

class kmerComp:
    """
    Class for generating kmer compostion of a sequence.

    Attributes:
    seq: str - the genome sequence
    k: int - the size of kmer to generate
    """
    def __init__(self, seq, k) -> None:
        self.k = k
        self.seq = seq
        self.comp = self.generateComposition()
    
    def generateComposition(self):
        """
        Generates kmer composition.
        """
        out = []
        for i in range(len(self.seq) - self.k + 1):
            out.append(self.seq[i:i+self.k])
        return sorted(out)

def main(inFile=None):
    inp = inputFileReader(inFile)
    k = kmerComp(inp[1], int(inp[0]))
    with open("cm_8_out.txt", "w") as f:
        for l in k.comp:
            print(l, file=f)
            print(l)

if __name__ == "__main__":
    main(inFile="rosalind_ba3a.txt")

AAAAAAACGCTTATTTAAGTACAACCAAGAGTACTGCCCAACTTGGATAC
AAAAAACGCTTATTTAAGTACAACCAAGAGTACTGCCCAACTTGGATACG
AAAAAATCCCTATCTGCCACCGAGTCCCCCGATCTTCATCTCAGCTTCGA
AAAAACGCTTATTTAAGTACAACCAAGAGTACTGCCCAACTTGGATACGG
AAAAATCCCTATCTGCCACCGAGTCCCCCGATCTTCATCTCAGCTTCGAC
AAAACGCGGCATGCCTTGCGCAGAACTTGAGCCTTGGCGTGTCCGGCTAT
AAAACGCTCAGAAAACGCGGCATGCCTTGCGCAGAACTTGAGCCTTGGCG
AAAACGCTTATTTAAGTACAACCAAGAGTACTGCCCAACTTGGATACGGA
AAAACTAGCACGATCGCCGGATAATCATAATACAATTGAATCGGGTCAGC
AAAACTTGGTCATTTGTTTACCACTCCGCCTCACGTTATACCCGTTCTAA
AAAAGTGCATCTGTAGCAGCTGAGCGGCCTATTACAAGCAACACACCCCG
AAAATCCCTATCTGCCACCGAGTCCCCCGATCTTCATCTCAGCTTCGACC
AAAATGCAGCATCTAAGTTTAAACTAGCCAGCCTTACATTGGTCTGCCTT
AAACGCGGCATGCCTTGCGCAGAACTTGAGCCTTGGCGTGTCCGGCTATC
AAACGCTCAGAAAACGCGGCATGCCTTGCGCAGAACTTGAGCCTTGGCGT
AAACGCTTATTTAAGTACAACCAAGAGTACTGCCCAACTTGGATACGGAC
AAACTAGCACGATCGCCGGATAATCATAATACAATTGAATCGGGTCAGCC
AAACTAGCCAGCCTTACATTGGTCTGCCTTCTGAGTGATCCTACCCGAGG
AAACTTGGTCATTTGTTTACCACTCCGCCTCACGTTATACCCGTTCTAAG
AAAGAGCTAGTGGAATGACATTTTTCCCCTC

## Inspection Intro

For this problem I created a class kmerComp that takes the genome sequence and desired size of kmer as input.
The function generateComposition generates the kmer compositon.

## Inspection Results

- Add docstrings
- Print result to stdout
- Add inFile option to main()

# Problem 9: 

In [22]:
def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines

class GenomePath:
    """
    Class constructs genome from kmers

    attributes:
    seqs: list - list of kmers (strings)
    """
    def __init__(self, seqs):
        self.seqs = seqs
        self.genome = self.makeGenome()
    def makeGenome(self):
        """
        Creates genome by iterating kmers
        """
        s = self.seqs[0] # initalize final genome output starting with full first kmer
        for kmer in self.seqs[1:]:
            s += kmer[-1] # add the last char of every next kmer
        return s

def main(inFile=None):
    inp = inputFileReader(inFile)
    k = GenomePath(inp)
    with open("cm_9_out.txt", "w") as f:
        print(k.genome, file=f)
        print(k.genome)
         
if __name__ == "__main__":
    main(inFile="rosalind_ba3b.txt")

GTGTGAAAGAGACGCTTGCCACTCGGTGCCTTTGTATCCCGCGGTGTTGCCTTTGGGACCGACCGCTATGTCCACAAGGCGGCGATGCGTGCGGCGTACGACACAACAATGACGTTCCATTAAACCCAAAACACGCAGAGTCGGCAGCCTACCACGGATGCTTTCCATCCGTCCACAAAAGTGCCTGAATGTACGCCCTCAATCTCATTTTTAATTTGGAGATTCATATATCGCCGAGCGCATAATATTCTGGTCGATACCATCTGCGCTAGACAAGCTCGACAATTAAGTTTCCGTGCCATCGTCCTGAAGGGAGCTAAGGTCTTACAAACGTAAGTATCATTTTAGGGAGTTATGGTCGTAAGTCCAATGATCTAAGTTAGTTGCACCGTCATAAGCACGTACGTGTTCCTCAAACGCAAAGTCGTTCGTCACAGGGGAAAGGGGACGTCTGGGCTTGATGGGAAGTGACTTTCCGGAATATTGGGGGCAGCAATTGTTTAAATCTAGTAGGTAGATATCCTGTGGTCATAAGCCAGCCTACTTGCAGGGACCCCCATCTCTCGCAATTCTTCACTAGCCGACAGATTCTCTGATGAAGCTAGATTACGGCGCGTGCAGTGGCATGCTTAATCCGATACACCATCGCTGCCACCAAATTTGCTTCTCACAACTGAATACTACCGGTCATGCGTAGCCACGCGCCGAGGCAGCTTGGGGGAGGCGGAATCACCTCTAACCGAGGTCCGCTCCCCGCCGTTGTGAGCCACCATGGGGCCTCATGCCCGTCGTGAAAGTATTTTCCGAGACTAATCGAGCGCGAATGCGATGATAGTATACATACTCTCCGCTATAGACTAGCCATGTTATCGCTCCCTGACAAGCGCCGAAGCAACAACGTGAGGCCGAGTCCTACCAATTAGAACTCACTCGCTTTCCTATAGGGAGAGACTGGTTCCGTGTGCCGTCATTTATTCAAAATGGGGATGGGGAATAAACT

## Inspection Intro

This class takes in a list of kmers and rebuilds the genome it came from.

## Inspection Results

- Add inFile param to main
- Add comments
- Print to stdout too.

# Problem 10: 

In [23]:
def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines

class OverlapGraph:
    """
    Creates overlap graph from set of reads

    attributes:
    seqs: list - list of sequencing reads (str)
    """
    def __init__(self, seqs):
        self.seqs = seqs
        self.graph = self.createGraph()
    def createGraph(self):
        """
        Creates the graph by iterating reads
        """
        out = []
        for s1 in self.seqs:
            for s2 in self.seqs:
                # check if s1 is not same as s2 and if s1 overlaps s2 by 1 base
                if s1 != s2 and s1[1:] == s2[:-1]:
                    out.append((s1,s2))
        return sorted(out)
    
def main(inFile=None):
    inp = inputFileReader(inFile)
    k = OverlapGraph(inp)
    with open("cm_10_out.txt", "w") as f:
        for s1, s2 in k.graph:
            print(f"{s1} -> {s2}", file=f)
            print(f"{s1} -> {s2}")
         
if __name__ == "__main__":
    main(inFile="rosalind_ba3c.txt")

AAAAAAGGTTACAATACAGT -> AAAAAGGTTACAATACAGTG
AAAAAGGTTACAATACAGTG -> AAAAGGTTACAATACAGTGC
AAAAGAAGATTGTAGTGGGT -> AAAGAAGATTGTAGTGGGTG
AAAAGGTTACAATACAGTGC -> AAAGGTTACAATACAGTGCG
AAAAGTAACGCAACGACCAA -> AAAGTAACGCAACGACCAAT
AAACACTTTGGTTCAATCGT -> AACACTTTGGTTCAATCGTG
AAACCTTGTATAAGTATTCG -> AACCTTGTATAAGTATTCGT
AAAGAAGATTGTAGTGGGTG -> AAGAAGATTGTAGTGGGTGA
AAAGAGCTTCCAACCCAGTT -> AAGAGCTTCCAACCCAGTTG
AAAGCTAATCCTGTGCAATT -> AAGCTAATCCTGTGCAATTT
AAAGGGGCGGGCTATACGGT -> AAGGGGCGGGCTATACGGTA
AAAGGTTACAATACAGTGCG -> AAGGTTACAATACAGTGCGG
AAAGTAACGCAACGACCAAT -> AAGTAACGCAACGACCAATA
AAATAGAACTGTATCTAAAT -> AATAGAACTGTATCTAAATC
AAATAGCCGCCACGACACCG -> AATAGCCGCCACGACACCGT
AAATCACAAGCCCGCAATGG -> AATCACAAGCCCGCAATGGG
AAATCGGAAGAAACCTTGTA -> AATCGGAAGAAACCTTGTAT
AAATCGTCCGCCGATACTGG -> AATCGTCCGCCGATACTGGG
AAATGACCTTGTTTTGATGC -> AATGACCTTGTTTTGATGCA
AACACTTTGGTTCAATCGTG -> ACACTTTGGTTCAATCGTGT
AACAGGTTTTAGAGATTAGC -> ACAGGTTTTAGAGATTAGCC
AACCCAGTTGATCCCACCTT -> ACCCAGTTGATCCCACCTTG
AACCCTGTTC

## Inspection Intro

Creates adjancency list from a list of sequencing reads.

## Inspection Results

- Add inFile argument to main()
- Add more comments to explain code
- Add better docstrings to explain attributes.

# Problem 11: 

In [24]:
def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines

class kmerComp:
    """
    Class for generating kmer compostion of a sequence.

    Attributes:
    seq: str - the genome sequence
    k: int - the size of kmer to generate
    """
    def __init__(self, seq, k) -> None:
        self.k = k
        self.seq = seq
        self.comp = self.generateComposition()
    
    def generateComposition(self):
        """
        Generates kmer composition.
        """
        out = []
        for i in range(len(self.seq) - self.k + 1):
            out.append(self.seq[i:i+self.k])
        return sorted(out)

class DebruijnGraphFromString:
    """
    Class creates debruijn graph from string

    attributes:
    k: int - kmer size
    seq: str - input sequence 
    """
    def __init__(self, k, seq):
        self.k = k
        self.seq = seq
        self.graph = self.graphFromSeq()
    def graphFromSeq(self):
        """
        Generates debruijn graph from sequence.

        Uses kmerComp class from problem 8 to generate kmer compoistion.
        """
        out = {}
        # get kmers using solution from problem 8
        kmers = sorted(kmerComp(self.seq, self.k).comp)
        for k in kmers:
            left, right = k[:-1], k[1:]
            # check if we've seen kmer yet
            if left not in out.keys():
                out[left] = [right]
            else:
                out[left].append(right)
        return out
def main(inFile=None):
    inp = inputFileReader(inFile)
    k = DebruijnGraphFromString(int(inp[0]), inp[1])
    with open("cm_11_out.txt", "w") as f:
        for key, value in k.graph.items():
            value = ",".join(value)
            print(f"{key} -> {value}", file=f)
            print(f"{key} -> {value}")
         
if __name__ == "__main__":
    main(inFile="rosalind_ba3d.txt")


AAAAACCGGGG -> AAAACCGGGGT
AAAACACTGCG -> AAACACTGCGA
AAAACCGGGGT -> AAACCGGGGTT
AAAAGGCGACT -> AAAGGCGACTT
AAAAGTGCTAC -> AAAGTGCTACC
AAACAATCTGG -> AACAATCTGGC
AAACACTGCGA -> AACACTGCGAT
AAACCCCAGTT -> AACCCCAGTTG
AAACCGGGGTT -> AACCGGGGTTC
AAACGGGCCCC -> AACGGGCCCCC
AAACGTGAATT -> AACGTGAATTC
AAAGAGGAAAA -> AAGAGGAAAAG
AAAGATGAATC -> AAGATGAATCG
AAAGATTCACT -> AAGATTCACTG
AAAGCGGGTAC -> AAGCGGGTACT
AAAGGAGGTTC -> AAGGAGGTTCC
AAAGGCGACTT -> AAGGCGACTTG
AAAGGGTTGTC -> AAGGGTTGTCG
AAAGGTCCTTG -> AAGGTCCTTGC
AAAGTACTTGA -> AAGTACTTGAG
AAAGTGAGTTC -> AAGTGAGTTCT
AAAGTGCTACC -> AAGTGCTACCC
AAATAACGGTC -> AATAACGGTCA
AAATAATCAGT -> AATAATCAGTT
AAATATCCAAC -> AATATCCAACA
AAATCTAGATC -> AATCTAGATCC
AACAAGTGTGT -> ACAAGTGTGTA
AACAATCTGGC -> ACAATCTGGCC
AACACCCTTAA -> ACACCCTTAAT
AACACTGCGAT -> ACACTGCGATC
AACAGCGCAAG -> ACAGCGCAAGG
AACATTCGTTG -> ACATTCGTTGC
AACATTCTAGG -> ACATTCTAGGT
AACCACCATGT -> ACCACCATGTG
AACCACCGTTC -> ACCACCGTTCG
AACCACGCCCG -> ACCACGCCCGT
AACCCAGGAGG -> ACCCAGGAGGA
A

## Inspection Intro

This class generates the debruijn graph of a sequence given a kmer size k.

We reuse the kmerComp class from problem 8 to help us build a solution for this problem.

## Inspection Results

- Added kmerComp code to make cell runnable by itself.
- Added inFile arg to main()
- Added more comments

# Problem 12: 

In [25]:
def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines

class DebruijnGraphFromKmers:
    """
    Class generates debruijn graph from input of kmers.

    attributes:

    kmers: list - list of kmers (str)
    """
    def __init__(self, kmers):
        self.kmers = sorted(kmers)
        self.graph = self.graphFromKmers()
    def graphFromKmers(self):
        # initalize empty dict
        out = {}
        
        # iterate kmers
        for k in self.kmers:
            left, right = k[:-1], k[1:]
            # check if left kmer not seen yet
            if left not in out.keys():
                out[left] = [right]
            else:
                out[left].append(right)
        return out
def main(inFile=None):
    inp = inputFileReader(inFile)
    k = DebruijnGraphFromKmers(inp)
    with open("cm_12_out.txt", "w") as f:
        for key, value in k.graph.items():
            value = ",".join(value)
            print(f"{key} -> {value}", file=f)
            print(f"{key} -> {value}")
         
if __name__ == "__main__":
    main(inFile="rosalind_ba3e.txt")


AAAAACACATTACGGATTC -> AAAACACATTACGGATTCC
AAAACACATTACGGATTCC -> AAACACATTACGGATTCCA
AAAACACCCGATTCCATAA -> AAACACCCGATTCCATAAA
AAAACCCCACATTTAAGAA -> AAACCCCACATTTAAGAAT
AAAACCCTTTGAGTAGATA -> AAACCCTTTGAGTAGATAG
AAACACATTACGGATTCCA -> AACACATTACGGATTCCAC
AAACACCCGATTCCATAAA -> AACACCCGATTCCATAAAG
AAACATACCTTCGTAACCT -> AACATACCTTCGTAACCTC
AAACCATGTTCAACCATGA -> AACCATGTTCAACCATGAT
AAACCCCACATTTAAGAAT -> AACCCCACATTTAAGAATT
AAACCCTTTGAGTAGATAG -> AACCCTTTGAGTAGATAGA
AAAGAATAGCTACGGCGGA -> AAGAATAGCTACGGCGGAT
AAAGAGAATGTAACGAGGT -> AAGAGAATGTAACGAGGTA
AAAGATTCTTGAGGGCAGT -> AAGATTCTTGAGGGCAGTT
AAAGCAAAACACCCGATTC -> AAGCAAAACACCCGATTCC
AAAGCAGAGATATACATCT -> AAGCAGAGATATACATCTA
AAAGCGCATGGACCCCAGT -> AAGCGCATGGACCCCAGTA
AAAGCTACATGTCGCGGCA -> AAGCTACATGTCGCGGCAA
AAAGGCTGTCGACCTCAGT -> AAGGCTGTCGACCTCAGTA
AAAGGTTCAAAGCTACATG -> AAGGTTCAAAGCTACATGT
AAAGTACGATTGTCGACAT -> AAGTACGATTGTCGACATT
AAAGTTTGTTCCGTCTGAC -> AAGTTTGTTCCGTCTGACA
AAATAATACCAGAGGCGGC -> AATAATACCAGAGGCGGCC
AAATCATAGTG

## Inspection Intro

This class generates a debruijn graph from kmers. It takes in a list of kmers and returns the debruijn graph.

## Inspection Results

- Added inFile arg to main
- Added more comments
- Added print to stdout.

# Problem 13: 

In [26]:
from collections import defaultdict
import random

def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines

class Edge:
    """
    Class represents a edge in a graph

    attributes:
    source: Node - the source node
    dest: Node - the dest node
    traveled: bool - if the edge has been traveled
    """
    def __init__(self, source, dest) -> None:
        self.source = source
        self.dest  = dest
        self.traveled = False
        self.source.addOutEdge(self)
        self.dest.addInEdge(self)
    def markTraveled(self):
        """
        Marks edge as traveled
        """
        self.traveled = True

class Node:
    """
    Represents node in a graph

    attributes:
    name: str - the name of the node
    incomingEdges: list - edges that end at this node
    outgoingEdges: list - edges that start from this node
    """
    def __init__(self, name) -> None:
        self.name = name
        self.incomingEdges = []
        self.outgoingEdges = []
    def addInEdge(self, edge):
        """
        Add an in edge
        """
        self.incomingEdges.append(edge)
    def addOutEdge(self, edge):
        """
        Add an out edge
        """
        self.outgoingEdges.append(edge)
    def hasUntraveledEdges(self):
        """
        Checks if there are edges to explore from this node
        """
        for e in self.outgoingEdges:
            if e.traveled == False:
                return True
        return False
    def unTraveledEdges(self):
        """
        Returns untraveled edges from this node
        """
        return [e for e in self.outgoingEdges if not e.traveled]

class EulerPath:
    """
    Generates eulerian path from input adjacency list

    attributes:
    inp: list - adjacency list
    """


    def __init__(self, inp) -> None:
        self.nodes, self.edges = self.parseInp(inp)

    def parseInp(self, inp):
        """
        Parses input adj. list into graph data structure
        """
        nodes = {}
        edges = []
        for line in inp:
            source, dest = line.strip().split(" -> ")
            if source not in nodes:
                nodes[source] = Node(source)
            for d in dest.split(","):
                if d not in nodes:
                    nodes[d] = Node(d)
                thisEdge = Edge(nodes[source], nodes[d])
                edges.append(thisEdge)
        return nodes, edges

    def mergeCycles(self, c1, c2):
        """
        Merges eulerian cycles
        """
        if not c1:
            c1 = c2
            return c2
        else:
            index = c1.index(c2[0])
            merged = c1[:index] + c2 + c1[index+1:]
            return merged
    
    def findEulerPath(self):
        """
        Finds eulerian path through graph.
        """

        # Find starting node
        for n in self.nodes:
            node = self.nodes[n]
            if len(node.outgoingEdges) > len(node.incomingEdges):
                start = node
        
        # initalize empty cycle
        cycle = []

        while True:
            if len(cycle) == 0:
                currentNode = start
            else:
                # get explorable nodes 
                toExplore = [node for node in cycle if node.hasUntraveledEdges()]
                if len(toExplore) == 0:
                    return cycle
                # pick random node to explore
                currentNode = random.choice(toExplore)
            tempCycle = [currentNode]

            while True:
                if len(currentNode.outgoingEdges) == 0:
                    return cycle
                # explore a random edge
                edgesToExplore = currentNode.unTraveledEdges()
                currentEdge = random.choice(edgesToExplore)
                # mark the edge traveled
                currentEdge.markTraveled()
                # go to the next node
                nextNode = currentEdge.dest
                tempCycle.append(nextNode)

                # check if next node has edges to explore
                if nextNode.hasUntraveledEdges():
                    currentNode = nextNode
                else:
                    cycle = self.mergeCycles(cycle, tempCycle)
                    break
def main(inFile=None):
    inp = inputFileReader(inFile)
    k = EulerPath(inp)
    output = "->".join([p.name for p in k.findEulerPath()])
    
    
    with open("cm_13_out.txt", "w") as f:
         print(output, file=f)
         print(output)
if __name__ == "__main__":
    main(inFile="rosalind_ba3g.txt")



2209->1825->1827->1376->1375->882->463->464->528->527->526->464->465->148->136->82->83->2138->2137->2416->2417->2418->2137->2139->83->84->99->97->98->84->354->2143->2145->2781->2779->2780->2145->2144->354->353->931->932->933->1938->1937->2354->2355->2353->1937->1936->933->2470->2471->2472->933->353->1981->1982->1983->353->352->637->2494->2496->2495->637->638->639->352->84->31->8->892->893->894->8->7->48->1348->2511->2510->2509->1348->2271->2269->2270->1348->1350->1349->48->1488->1487->1486->48->47->46->135->133->383->1161->1160->1159->383->382->384->133->346->347->348->877->1033->1035->1034->877->879->878->1395->1394->1393->878->348->133->134->583->2084->2083->2085->583->585->584->1840->1841->1842->584->134->46->7->9->1090->1091->1092->1565->2105->2106->2104->1565->1566->1564->1092->9->122->123->128->2689->2690->2691->128->127->129->1928->1929->1927->129->552->551->550->957->955->2385->2383->2384->955->956->550->652->654->1441->2440->2442->2441->1441->1442->1914->1913->1912->1442->1443

## Inspection Intro

This class generates  the eulerian path through an eulerian graph.

## Inspection Results

- Added infile arg to main
- Added more comments to explain path findinh
- Added docstrings
- Removed typehints

# Problem 14: 

In [28]:
from collections import defaultdict
import random

def inputFileReader(file):
    """
    Reads Rosalind input file and returns list.

    Args:
        file (str): Path to input file.
    Returns:
        list: List of the arguments parsed from the file.
    """
    with open(file, "r") as f:
        lines = [l.strip() for l in f.readlines()]
    return lines

class Edge:
    """
    Class represents a edge in a graph

    attributes:
    source: Node - the source node
    dest: Node - the dest node
    traveled: bool - if the edge has been traveled
    """
    def __init__(self, source, dest) -> None:
        self.source = source
        self.dest  = dest
        self.traveled = False
        self.source.addOutEdge(self)
        self.dest.addInEdge(self)
    def markTraveled(self):
        """
        Marks edge as traveled
        """
        self.traveled = True

class Node:
    """
    Represents node in a graph

    attributes:
    name: str - the name of the node
    incomingEdges: list - edges that end at this node
    outgoingEdges: list - edges that start from this node
    """
    def __init__(self, name) -> None:
        self.name = name
        self.incomingEdges = []
        self.outgoingEdges = []
    def addInEdge(self, edge):
        """
        Add an in edge
        """
        self.incomingEdges.append(edge)
    def addOutEdge(self, edge):
        """
        Add an out edge
        """
        self.outgoingEdges.append(edge)
    def hasUntraveledEdges(self):
        """
        Checks if there are edges to explore from this node
        """
        for e in self.outgoingEdges:
            if e.traveled == False:
                return True
        return False
    def unTraveledEdges(self):
        """
        Returns untraveled edges from this node
        """
        return [e for e in self.outgoingEdges if not e.traveled]

class EulerPath:
    """
    Generates eulerian path from input adjacency list

    attributes:
    inp: list - adjacency list
    """

    def __init__(self, inp) -> None:
        self.nodes, self.edges = self.parseInp(inp)

    def parseInp(self, inp):
        """
        Parses input from rosalind into graph data structure
        """
        nodes = {}
        edges = []
        for line in inp[1:]:
            source = line[:-1]
            dest= line[1:]
            if source not in nodes:
                nodes[source] = Node(source)
            for d in dest.split(","):
                if d not in nodes:
                    nodes[d] = Node(d)
                thisEdge = Edge(nodes[source], nodes[d])
                edges.append(thisEdge)
        return nodes, edges

    def mergeCycles(self, c1, c2):
        """
        Merges eulerian cycles
        """
        if not c1:
            c1 = c2
            return c2
        else:
            index = c1.index(c2[0])
            merged = c1[:index] + c2 + c1[index+1:]
            return merged
    
    def findEulerPath(self):
        """
        Finds eulerian path through graph.
        """

        # Find starting node
        for n in self.nodes:
            node = self.nodes[n]
            if len(node.outgoingEdges) > len(node.incomingEdges):
                start = node
        
        # initalize empty cycle
        cycle = []

        while True:
            if len(cycle) == 0:
                currentNode = start
            else:
                # get explorable nodes 
                toExplore = [node for node in cycle if node.hasUntraveledEdges()]
                if len(toExplore) == 0:
                    return cycle
                # pick random node to explore
                currentNode = random.choice(toExplore)
            tempCycle = [currentNode]

            while True:
                if len(currentNode.outgoingEdges) == 0:
                    return cycle
                # explore a random edge
                edgesToExplore = currentNode.unTraveledEdges()
                currentEdge = random.choice(edgesToExplore)
                # mark the edge traveled
                currentEdge.markTraveled()
                # go to the next node
                nextNode = currentEdge.dest
                tempCycle.append(nextNode)

                # check if next node has edges to explore
                if nextNode.hasUntraveledEdges():
                    currentNode = nextNode
                else:
                    cycle = self.mergeCycles(cycle, tempCycle)
                    break
def main(inFile=None):
    inp = inputFileReader(inFile)
    k = EulerPath(inp)
    path = k.findEulerPath()
    output = ''
    for node in path:
        if len(output) == 0:
            output += node.name
            continue
        else:
            output += f'{node.name[-1:]}'
    
    
    with open("cm_14_out.txt", "w") as f:
         print(output, file=f)
         print(output)
if __name__ == "__main__":
    main(inFile="rosalind_ba3h.txt")



CTCGGTATTCTCAAATGTGCCGCCGCCAGTATGATGCCGTGGTGTGGTGGTGGAAGACAGATGGAAGGATACGCTTGCCATTCTGGCAGACAACTCTAACACTAGACCTTCAAAGACTGCTATTAAAGGCGTAATTGCTCATCTTTACTAATACAGACTACGTGAAGTGCAACGTTGGTTTCTCAATTACTCGTACATCCGAAAAGCTAGTTCTCCGGCGGGTCGAGCTGAGCAGCCCATATCTTGTTCAAGATCCGGTGGGTAGCGTTAGGCCACGGGATCAATTTGGATGGCCACGCGACTGCAAGATTATGTCTCACGCCAGCTTGATCATGGCAATATACTTTGGTTGCTACACATATGGCGTCGAAGCCACGATGAGTCAGAAAACAGCCCACGTAATGACGTTGAGCTTAAATCAGGAGCGGCGCACGGTTTCTCATGAAATAGTGACATCATTTCACGGGGAAAGCTCGTATTTGGTGGCTCTAGTTACGTCGACAGGTCCTTACAGACTCAGGCGCGCACTCAATGTTCTGTGATCCGGGAGATGCATGCGGGAGCTAAACTCACCTACAACGGAGTGGGCTACGAACTAGGTACTGATTGCGTAACGTAGATAGGTTCGGGTCTGACGTTGTCAGCGGGTTACCCTGCTCATCCAGTGTCGCGTCATGGCGTTAATTCACTCCTTTGATCTTGTTTTGTGAGGACTGGTCTAATGGTAACTGAGCTTAGTGTCAGTATATCTGGCTCGTGTCTAATTCGGCGATAGCTCTCCATCTATCGGCCATGGCAGGCATTTTACACTTTCAAGTCACGCAGGCTAGCTTTTGGTCCGGGTTATGATTACCTGTTCTGAGGATCCGACCCCAAACATAATAGGCGCTGCTAGGCACCTGTAACTTTTCACCACTGATCCCTGGTGGTTCAAGGGCGTTCTCCTCTCGTGACGAGAAAGGGTCTCTACCCACATTAGGTCTACCTCCAGAGGGTCATG

## Inspection Intro

This class reconstructs a string from its k-mer composition by finding eulerian path through the graph of the kmers. 

We reuse the class from problem 13 to solve this and adjust it slightly to accept kmers as input.

## Inspection Results

- Added inFile arg to main
- Added more comments
- Added print to stdout