<h1><span>Sequence Analysis Part 1 (Basics)</span></h1>

This "Part 1" is a collection of Python-based example codes focussing on sequence analysis on a very basic level (see <a name="TOC">TOC</a>). It is motivated by the many jobs descriptions currently online for genome analysis, next-gen-sequencing (NGS) advances and my fascination towards life science in general. Check out my website (www.bio-century.net) for demos, literature recommendations, links to data bases and further inspirations. Apologies for the code often being a little sloppy and "easy-going". Due to a limited time frame, priorities were set to "progress & visualization" instead of "finding the most efficient solution" :) <br>
Whenever a function or method is developed to fulfill the goals, it is outsources into the subfolder "Part01", take a look and have fun.

# Table of Contents
0. [Reference Table](#ReferenceTable)
1. [First Steps in Sequence Analysis (Groundwork)](#FirstStepsInSequenceAnalysis)
   1. [Color Strings in Terminal Output](#ColoringStrings)
   2. [Computation of Complement, Reverse and Reverse Complement of ssDNA](#ComplRevRevComplssDNA)
   3. [Identification of Sequences of Interest within a Target Sequence](#IdentOfSOIsInTargetSeq)
   4. [Identification of Multiple Sequences of Interest within a Target Sequence](#IdentMultipleOfSOIsInTargetSeq)

# 0. <a name="ReferenceTable"> Reference Table</a>

| Tag or Variable | Meaning              |
|-----------------|----------------------|
| mySequence      | Target Sequence      |
| mySOI           | Sequence of Interest |
| ssDNA           | single-stranded DNA  |

# 1. <a name="FirstStepsInSequenceAnalysis"> First Steps in Sequence Analysis (Groundwork)</a>

## 1.1 <a name="ColoringStrings">Color Strings in Terminal Output</a>

Goal:
1. Load a target DNA-sequence ("mySequence") from external file
2. split the DNA at first incidence of a dedicated subpart ("mySOI") and
3. colorprint a for visualization purposes.

In [1]:
# sources
# - https://www.bioinformatics.org/sms2/random_dna.html
# - https://pkg.go.dev/github.com/whitedevops/colors

# include packages
from ExternalPackages.TerminalColors import TerminalColors

mySOI="CGCCAAAAA"

# load file line-by-line
with open('./Part01Data/01_01_SeqOfInterest.txt') as f:
    myLines = f.readlines()

# find mySO, split and color it
for ii, mySequence in enumerate(myLines):
    SOI_POS = mySequence.upper().rfind(mySOI)
    print(mySequence[:SOI_POS - 1]
          + f"{TerminalColors.Green}"
          + mySequence[SOI_POS:SOI_POS + len(mySOI)].upper()
          + f"{TerminalColors.Default}", end = '\n')
    print(" "*(SOI_POS - 1)
          + f"{TerminalColors.LightBlue}"
          + mySequence[SOI_POS:SOI_POS + len(mySOI)].upper()
          + f"{TerminalColors.Default}" + mySequence[SOI_POS + 1:], end = '')

ctgggactctagctgatccacccgcctagggcagcacacataggacgtagct[32mCGCCAAAAA[39m
                                                    [94mCGCCAAAAA[39mgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc

## 1.2 <a name="ComplRevRevComplssDNA">Computation of Complement, Reverse and Reverse Complement of ssDNA</a>

Goal: <br>
To determine all modalities of a DNA-single-strand sequence, i.e.
1. the complement-,
2. the reverse- as well as
3. the reverse-complement-strand

and visualize them.

In [2]:
# sources
# - https://www.bioinformatics.org/sms2/random_dna.html

# include packages
from tkinter import *
from tkinter.ttk import *
from ExternalPackages.TerminalColors import TerminalColors
import Part01.A_Groundwork as Groundwork

mySequenceInput      = "ctgggactctagctgatccacccgcctagggcagcacacataggacgtagctgcgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc"
mySequence           = mySequenceInput.upper()
myNumberOfSOIs       = 3
mySOI                = [""] * myNumberOfSOIs
myColors             = [""] * myNumberOfSOIs
mySequenceColored    = [""] * myNumberOfSOIs
myColorsDNA          = [TerminalColors.Green, TerminalColors.Yellow, TerminalColors.Blue, TerminalColors.Magenta]
myDefaultColor       = TerminalColors.Default

returnSeq = Groundwork.ComplRev(mySequence,"Sequence")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Sequence: \t \t \t", mySequenceColoredDNA, end = '\n')
print("\t \t \t \t", ''.join('|' for i in returnSeq), end = '\n')

returnSeq = Groundwork.ComplRev(mySequence,"SequenceComplement")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Complement: \t \t \t", mySequenceColoredDNA, end = '\n')
print("")

returnSeq = Groundwork.ComplRev(mySequence,"SequenceReverse")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Reverse: \t \t \t", mySequenceColoredDNA, end = '\n')
print("\t \t \t \t", ''.join('|' for i in returnSeq), end = '\n')

returnSeq = Groundwork.ComplRev(mySequence,"SequenceReverseComplement")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Reverse Complement: \t \t", mySequenceColoredDNA, end = '\n')

Sequence: 	 	 	 [33mC[39m[35mT[39m[34mG[39m[34mG[39m[34mG[39m[32mA[39m[33mC[39m[35mT[39m[33mC[39m[35mT[39m[32mA[39m[34mG[39m[33mC[39m[35mT[39m[34mG[39m[32mA[39m[35mT[39m[33mC[39m[33mC[39m[32mA[39m[33mC[39m[33mC[39m[33mC[39m[34mG[39m[33mC[39m[33mC[39m[35mT[39m[32mA[39m[34mG[39m[34mG[39m[34mG[39m[33mC[39m[32mA[39m[34mG[39m[33mC[39m[32mA[39m[33mC[39m[32mA[39m[33mC[39m[32mA[39m[35mT[39m[32mA[39m[34mG[39m[34mG[39m[32mA[39m[33mC[39m[34mG[39m[35mT[39m[32mA[39m[34mG[39m[33mC[39m[35mT[39m[34mG[39m[33mC[39m[34mG[39m[33mC[39m[33mC[39m[32mA[39m[32mA[39m[32mA[39m[32mA[39m[32mA[39m[34mG[39m[32mA[39m[33mC[39m[34mG[39m[32mA[39m[32mA[39m[33mC[39m[33mC[39m[33mC[39m[32mA[39m[33mC[39m[33mC[39m[32mA[39m[35mT[39m[34mG[39m[33mC[39m[33mC[39m[33mC[39m[32mA[39m[34mG[39m[32mA[39m[33mC[39m[34mG[39m[33mC[39m[32mA[39m[35mT[39m[33mC[39m[35m

## 1.3 <a name="IdentOfSOIsInTargetSeq">Identification Sequences of Interest within a Target Sequence</a>

Goal: To identify and mark the the position of a SOI (mySOI) in a given target sequence (mySequence). The return "positionsSOI" shall be
1. 1 if the respective base at position ii of the sequence "mySequence" is part of mySOI
2. \>1 if two mySOI-regions overlap and
3. 0 otherwise.

In [3]:
# sources
# - https://www.bioinformatics.org/sms2/random_dna.html

# include packages
from ExternalPackages.TerminalColors import TerminalColors
import Part01.A_Groundwork as Groundwork

mySequenceInput      = "ctgggactctagctgatccacccgcctagggcagcacacataggacgtagctgcgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc"
mySequence           = mySequenceInput.upper()
mySOI                = "GG"

myCount, mySOIPositions = Groundwork.SOIPositions(mySequence, mySOI)
myColor                 = TerminalColors.Green
myDefaultColor          = TerminalColors.Default

mySequenceColored       = Groundwork.ColorTheSeq(mySequence, mySOIPositions, myColor, myDefaultColor)
mySOIPositionsColored   = Groundwork.ColorTheSeq(mySOIPositions, mySOIPositions, myColor, myDefaultColor)

print("mySequence : \t \t" + mySequenceColored, end = "\n")
print("mySOIPositions: \t" + mySOIPositionsColored, end = '\n')

mySequence : 	 	[39mC[39m[39mT[39m[32mG[39m[32mG[39m[32mG[39m[39mA[39m[39mC[39m[39mT[39m[39mC[39m[39mT[39m[39mA[39m[39mG[39m[39mC[39m[39mT[39m[39mG[39m[39mA[39m[39mT[39m[39mC[39m[39mC[39m[39mA[39m[39mC[39m[39mC[39m[39mC[39m[39mG[39m[39mC[39m[39mC[39m[39mT[39m[39mA[39m[32mG[39m[32mG[39m[32mG[39m[39mC[39m[39mA[39m[39mG[39m[39mC[39m[39mA[39m[39mC[39m[39mA[39m[39mC[39m[39mA[39m[39mT[39m[39mA[39m[32mG[39m[32mG[39m[39mA[39m[39mC[39m[39mG[39m[39mT[39m[39mA[39m[39mG[39m[39mC[39m[39mT[39m[39mG[39m[39mC[39m[39mG[39m[39mC[39m[39mC[39m[39mA[39m[39mA[39m[39mA[39m[39mA[39m[39mA[39m[39mG[39m[39mA[39m[39mC[39m[39mG[39m[39mA[39m[39mA[39m[39mC[39m[39mC[39m[39mC[39m[39mA[39m[39mC[39m[39mC[39m[39mA[39m[39mT[39m[39mG[39m[39mC[39m[39mC[39m[39mC[39m[39mA[39m[39mG[39m[39mA[39m[39mC[39m[39mG[39m[39mC[39m[39mA[39m[39mT[39m[39mC[39m[39m

## 1.4 <a name="IdentMultipleOfSOIsInTargetSeq">Identification of Multiple Sequences of Interest within a Target Sequence</a>

Goal:

To generalize the algorithm above. The positions of 3 given SOIs (mySOIs) shall be
1. identified in a target sequence (mySequence),
2. labelled individually (green, blue and yellow) and
3. a character of the target sequence shall be colored in red, if there is an overlap between the same or different SOIs.

In [4]:
# include packages
from ExternalPackages.TerminalColors import TerminalColors
import Part01.A_Groundwork as Groundwork

myNumberOfSOIs       = 3
mySOI                = [""] * myNumberOfSOIs
myCount              = [0]  * myNumberOfSOIs
mySOIPositions       = [0]  * myNumberOfSOIs
myColors             = [""] * myNumberOfSOIs
mySequenceColored    = [""] * myNumberOfSOIs
myColors             = [TerminalColors.Green, TerminalColors.Blue, TerminalColors.Yellow]
myDefaultColor       = TerminalColors.Default
myColorWarning       = TerminalColors.Red

mySequenceInput      = "ctgggactctagctgatccacccgcctagggcagcacacataggacgtagctgcgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc"
mySequence=mySequenceInput.upper()
mySOI[0]             = "AGG"
mySOI[1]             = "GAC"
mySOI[2]             = "CTAG"

for ii in range(myNumberOfSOIs):
    myCount[ii], mySOIPositions[ii] = Groundwork.SOIPositions(mySequence, mySOI[ii])
    mySequenceColored[ii]           = Groundwork.ColorTheSeq(mySequence, mySOIPositions[ii], myColors[ii], myDefaultColor)
    print("mySOI ", ii, ": \t \t \t", mySequenceColored[ii], end = '\n')

mySequenceColored, mySOIPositionsTotalColored, mySOIPositionsTotal = Groundwork.ColorTheSeqMerge(mySequence,
                                                                                          mySOIPositions,
                                                                                          myColors,
                                                                                          myDefaultColor,
                                                                                          myColorWarning)

print("-" * 133, end = '\n')
print("mySOIs (total): \t \t", mySequenceColored, end = '\n')
print("mySOIPositions (total) : \t", mySOIPositionsTotalColored, end = '\n')

mySOI  0 : 	 	 	 [39mC[39m[39mT[39m[39mG[39m[39mG[39m[39mG[39m[39mA[39m[39mC[39m[39mT[39m[39mC[39m[39mT[39m[39mA[39m[39mG[39m[39mC[39m[39mT[39m[39mG[39m[39mA[39m[39mT[39m[39mC[39m[39mC[39m[39mA[39m[39mC[39m[39mC[39m[39mC[39m[39mG[39m[39mC[39m[39mC[39m[39mT[39m[32mA[39m[32mG[39m[32mG[39m[39mG[39m[39mC[39m[39mA[39m[39mG[39m[39mC[39m[39mA[39m[39mC[39m[39mA[39m[39mC[39m[39mA[39m[39mT[39m[32mA[39m[32mG[39m[32mG[39m[39mA[39m[39mC[39m[39mG[39m[39mT[39m[39mA[39m[39mG[39m[39mC[39m[39mT[39m[39mG[39m[39mC[39m[39mG[39m[39mC[39m[39mC[39m[39mA[39m[39mA[39m[39mA[39m[39mA[39m[39mA[39m[39mG[39m[39mA[39m[39mC[39m[39mG[39m[39mA[39m[39mA[39m[39mC[39m[39mC[39m[39mC[39m[39mA[39m[39mC[39m[39mC[39m[39mA[39m[39mT[39m[39mG[39m[39mC[39m[39mC[39m[39mC[39m[39mA[39m[39mG[39m[39mA[39m[39mC[39m[39mG[39m[39mC[39m[39mA[39m[39mT[39m[39mC[39m[39

In [5]:
# export code to html
import os
os.system("jupyter nbconvert SequenceAnalysisPart01.ipynb --to html")

0