Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimization and translation functions for sequences #11

Merged
merged 39 commits into from
Oct 22, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7378e85
Add optimization and translation functions for sequences
Koeng101 Jun 18, 2020
fa76b55
fixed godoc comments to use object name for exported objects.
TimothyStiles Jun 18, 2020
1612107
renamed struct names to be more readable.
TimothyStiles Jun 19, 2020
99bc4ee
made variable and function names more readable.
TimothyStiles Jun 19, 2020
09782b0
changed another variable name.
TimothyStiles Jun 19, 2020
b92ca4b
recommenting test function for now.
TimothyStiles Jun 19, 2020
3ac819f
updated with a new DefaultCodonTablesByName.
TimothyStiles Jun 19, 2020
64a9fdc
updated variable name in transformations_test.go.
TimothyStiles Jun 19, 2020
db33d75
merging prime branch.
TimothyStiles Sep 24, 2020
04a757b
added new codonFrequency and Translate functions.
TimothyStiles Sep 26, 2020
96e63b9
debugged GetCodonFrequency and optimized translate function.
TimothyStiles Sep 26, 2020
9b737aa
added basic codon optimization for arbitrary AnnotatedSequence.
TimothyStiles Oct 1, 2020
3158fc3
committing before roll back and refactor.
TimothyStiles Oct 2, 2020
a042748
basic tests pass for codon optimization.
TimothyStiles Oct 13, 2020
c46a45f
merging.
TimothyStiles Oct 13, 2020
7a713c6
fixed merge error.
TimothyStiles Oct 13, 2020
26f5174
refactored so CodonTable can be more portable.
TimothyStiles Oct 14, 2020
eb035aa
added comment and clarified variable name.
TimothyStiles Oct 14, 2020
4fb3387
prettified command related files and transformations.go.
TimothyStiles Oct 17, 2020
910c932
made .CreateWeights() method public.
TimothyStiles Oct 17, 2020
db0b4b8
refactored hash in commands.go.
TimothyStiles Oct 17, 2020
e676dde
switched test file to puc19.gbk. Backing up before test refactor.
TimothyStiles Oct 21, 2020
9e55a71
backing up progress before yet another refactor.
TimothyStiles Oct 21, 2020
3c371b6
hash command test refactored.
TimothyStiles Oct 21, 2020
be485cf
refactored command tests to spoof stdin and stdout.
TimothyStiles Oct 22, 2020
86c71a9
actually refactored command tests to spoof stdin and stdout.
TimothyStiles Oct 22, 2020
c3594a3
updating urfave/cli to latest commit.
TimothyStiles Oct 22, 2020
37fcc12
made it so codons are initialized with at least a weight of 1.
TimothyStiles Oct 22, 2020
e507f7d
fixed comment in hash.go
TimothyStiles Oct 22, 2020
b63a72b
rearranged main.go for easier debugging.
TimothyStiles Oct 22, 2020
754e77a
added flag.
TimothyStiles Oct 22, 2020
49ed9ee
added translation command and test.
TimothyStiles Oct 22, 2020
fb8af83
adding test.
TimothyStiles Oct 22, 2020
87e2d57
added variables for easy debugging.
TimothyStiles Oct 22, 2020
0fbc267
made application private.
TimothyStiles Oct 22, 2020
1472d4d
made private things public. Updated command comments.
TimothyStiles Oct 22, 2020
36a7848
made getFeatureSequence private.
TimothyStiles Oct 22, 2020
2ee4afa
fixed typo.
TimothyStiles Oct 22, 2020
816dd24
renamed functions.
TimothyStiles Oct 22, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ go 1.13
require (
github.com/PuerkitoBio/goquery v1.5.1
github.com/google/go-cmp v0.4.1
github.com/mroth/weightedrand v0.2.1
github.com/pmezard/go-difflib v1.0.0
github.com/sergi/go-diff v1.1.0
github.com/urfave/cli/v2 v2.2.0
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORN
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/mroth/weightedrand v0.2.1 h1:ivJastXlhBrj0q931DJ8IwhOLGwrYtPeENWd3WlVI0s=
github.com/mroth/weightedrand v0.2.1/go.mod h1:3p2SIcC8al1YMzGhAIoXD+r9olo/g/cdJgAD905gyNE=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/russross/blackfriday/v2 v2.0.1 h1:lPqVAte+HuHNfhJ/0LC98ESWRz8afy9tM/0RK8m9o+Q=
Expand Down
142 changes: 142 additions & 0 deletions transformations.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
package main

import (
"math/rand"
"time"

weightedRand "github.com/mroth/weightedrand"
)

// Codon holds information for a codon triplet in a struct
type Codon struct {
Triplet string
Occurrence int
}

// AminoAcid holds information for an amino acid and related codons in a struct in a struct
type AminoAcid struct {
Letter string
Codons []Codon
}

// CodonTable holds information for a codon table.
type CodonTable struct {
StartCodons []string
StopCodons []string
AminoAcids []AminoAcid
}

// Generate map of amino acid -> codon chooser
func (codonTable CodonTable) generateOptimizationTable() map[string]weightedRand.Chooser {
rand.Seed(time.Now().UTC().UnixNano())
var optimizationMap = make(map[string]weightedRand.Chooser)
for _, aminoAcid := range codonTable.AminoAcids {
// Get list of triplets and their weights
codonChoices := make([]weightedRand.Choice, len(aminoAcid.Codons))
totals := 0
for _, codon := range aminoAcid.Codons {
codonChoices = append(codonChoices, weightedRand.Choice{Item: codon.Triplet, Weight: uint(codon.Occurrence)})
totals += codon.Occurrence
}
optimizationMap[aminoAcid.Letter] = weightedRand.NewChooser(codonChoices...)
}
return optimizationMap
}

// Optimize takes an amino acid sequence and CodonTable and returns an optimized codon sequence
func Optimize(aminoAcids string, codonTable CodonTable) string {
var codons string
optimizationTable := codonTable.generateOptimizationTable()
for _, aminoAcid := range aminoAcids {
codons += optimizationTable[string(aminoAcid)].Pick().(string)
}
return codons
}

// Generate map of codons -> amino acid
func (codonTable CodonTable) generateTranslationTable() map[string]string {
var translationMap = make(map[string]string)
for _, aminoAcid := range codonTable.AminoAcids {
for _, codon := range aminoAcid.Codons {
translationMap[codon.Triplet] = aminoAcid.Letter
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't know go very well. But is this a for loop in a for loop in a map >.>

}
return translationMap
}

// Translate translates a codon sequence to an amino acid sequence
func Translate(nucleotides string, codonTable CodonTable) string {
var aminoAcids string
translationTable := codonTable.generateTranslationTable()
length := len(nucleotides) / 3 // Assumes input sequences are divisible by 3
for i := 0; length > i; i++ {
aminoAcids += translationTable[nucleotides[i*3:(i+1)*3]]
}
return aminoAcids
}

// Function to generate default codon tables from NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
func generateCodonTable(aas, starts string) CodonTable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does aas stand for here? Amino acid string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stands for amino acids. On the link, they just name it "AAs", so I just copied their nomenclature (also why starts is named starts)

base1 := "TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG"
base2 := "TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG"
base3 := "TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG"
// Add triplets to an amino acid -> triplet map, and if a possible start codon, add to start codon list
var aminoAcidMap = make(map[rune][]Codon)
var startCodons []string
var stopCodons []string
for i, aminoAcid := range aas {
if _, ok := aminoAcidMap[aminoAcid]; ok == false {
aminoAcidMap[aminoAcid] = []Codon{}
}
triplet := string([]byte{base1[i], base2[i], base3[i]})
aminoAcidMap[aminoAcid] = append(aminoAcidMap[aminoAcid], Codon{triplet, 0})
if starts[i] == 77 { // M rune
startCodons = append(startCodons, triplet)
}
if starts[i] == 42 { // * rune
stopCodons = append(stopCodons, triplet)
}
}
// Convert amino acid -> triplet map to an amino acid list
var aminoAcidSlice []AminoAcid
for k, v := range aminoAcidMap {
aminoAcidSlice = append(aminoAcidSlice, AminoAcid{string(k), v})
}
return CodonTable{startCodons, stopCodons, aminoAcidSlice}
}

// DefaultCodonTables stores all codon tables published by NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
var DefaultCodonTables = map[int]CodonTable{
1: generateCodonTable("FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "---M------**--*----M---------------M----------------------------"),
2: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG", "----------**--------------------MMMM----------**---M------------"),
3: generateCodonTable("FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------**----------------------MM---------------M------------"),
4: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "--MM------**-------M------------MMMM---------------M------------"),
5: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG", "---M------**--------------------MMMM---------------M------------"),
6: generateCodonTable("FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "--------------*--------------------M----------------------------"),
9: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG", "----------**-----------------------M---------------M------------"),
10: generateCodonTable("FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------**-----------------------M----------------------------"),
11: generateCodonTable("FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "---M------**--*----M------------MMMM---------------M------------"),
12: generateCodonTable("FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------**--*----M---------------M----------------------------"),
13: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG", "---M------**----------------------MM---------------M------------"),
14: generateCodonTable("FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG", "-----------*-----------------------M----------------------------"),
16: generateCodonTable("FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------*---*--------------------M----------------------------"),
21: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG", "----------**-----------------------M---------------M------------"),
22: generateCodonTable("FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "------*---*---*--------------------M----------------------------"),
23: generateCodonTable("FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "--*-------**--*-----------------M--M---------------M------------"),
24: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSSKVVVVAAAADDEEGGGG", "---M------**-------M---------------M---------------M------------"),
25: generateCodonTable("FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "---M------**-----------------------M---------------M------------"),
26: generateCodonTable("FFLLSSSSYY**CC*WLLLAPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------**--*----M---------------M----------------------------"),
27: generateCodonTable("FFLLSSSSYYQQCCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "--------------*--------------------M----------------------------"),
28: generateCodonTable("FFLLSSSSYYQQCCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------**--*--------------------M----------------------------"),
29: generateCodonTable("FFLLSSSSYYYYCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "--------------*--------------------M----------------------------"),
30: generateCodonTable("FFLLSSSSYYEECC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "--------------*--------------------M----------------------------"),
31: generateCodonTable("FFLLSSSSYYEECCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------**-----------------------M----------------------------"),
33: generateCodonTable("FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSSKVVVVAAAADDEEGGGG", "---M-------*-------M---------------M---------------M------------")}

//func main() {
// rand.Seed(time.Now().UTC().UnixNano())
// codonTable := CodonTable{[]string{},[]string{},[]aminoAcid{{"M",[]Codon{{"ATG", 1}}},{"G",[]Codon{{"GGA", 1}}},{"*",[]Codon{{"TAA",1},{"TGA",1}}}}}
// translation := Translate("ATGGGCTGA", DefaultCodonTables[1])

// fmt.Println(optimize(translation,codonTable))
//}
13 changes: 13 additions & 0 deletions transformations_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
package main

import (
"testing"
)

func TestTranslation(t *testing.T) {
gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*"
gfpDnaSequence := "ATGGCTAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCTACATACGGAAAGCTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCATATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACATTGAAGATGGATCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAATAA"
if got := Translate(gfpDnaSequence,DefaultCodonTables[11]); got != gfpTranslation {
t.Errorf("TestTranslation has failed. Translate has returned %q, want %q", got, gfpTranslation)
}
}