Add optimization and translation functions for sequences #11

Koeng101 · 2020-06-18T21:58:06Z

This pull request adds codon optimization and translation functions for sequence, plus adds default values for all NCBI default codon tables.

The general idea is that you have a codonTable object that stores amino acids, codons, and the number of occurrences of any given codon in proteins (defaults to 0). That codonTable has methods associated with it to build simple mappings between amino acids <-> codons.

Some checks not yet added in:

Translate requires inputs to be divisible by 3
Creating an optimization tables requires that all codons must have at least 1 occurrence

Integration to think about:

How do we generate codonTable objects from genbank files? How can we save them as JSON for use later?

@TimothyStiles please review and comment with anything you think I should add into the pull request.

TimothyStiles · 2020-06-19T00:44:21Z

transformations.go

+}
+
+// Function to generate default codon tables from NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
+func generateCodonTable(aas, starts string) CodonTable {


What does aas stand for here? Amino acid string?

It stands for amino acids. On the link, they just name it "AAs", so I just copied their nomenclature (also why starts is named starts)

TimothyStiles · 2020-08-25T21:07:34Z

@Koeng101 do you have any idea on how we would generate codon tables for arbitrary sequences? Are there any examples that we can cite/work off of?

Koeng101 · 2020-08-25T21:21:44Z

@TimothyStiles Generation of codon tables for arbitrary sequences can be done by counting the codon occurrences for each CDS feature in that GenBank file. This is one of the reasons we were working on the "location" feature previously.

tylermaran · 2020-10-14T15:52:49Z

transformations.go

+	for _, aminoAcid := range codonTable.AminoAcids {
+		for _, codon := range aminoAcid.Codons {
+			translationMap[codon.Triplet] = aminoAcid.Letter
+		}


Maybe I don't know go very well. But is this a for loop in a for loop in a map >.>

Koeng101

One function should be split up because AnnotatedSequences can get big!

Koeng101 · 2020-10-14T16:41:03Z

transformations.go

+// Optimize takes an amino acid sequence and CodonTable and returns an optimized codon sequence
+func Optimize(aminoAcids string, annotatedSequence AnnotatedSequence, codonTable CodonTable) string {
+	var codons strings.Builder
+	var sequenceBuffer strings.Builder
+	for _, feature := range annotatedSequence.Features {
+		if feature.Type == "CDS" {
+			sequenceBuffer.WriteString(feature.getSequence())
+		}
+	}
+	optimizationTable := codonTable.generateOptimizationTable(sequenceBuffer.String())
+	for _, aminoAcid := range aminoAcids {
+		codons.WriteString(optimizationTable[string(aminoAcid)].Pick().(string))
+	}
+	return codons.String()
+}
+


Optimization table should be an input to the Optimize function. This is because the AnnotatedSequences may be huge (in case of human chromosomes), and you will need to load multiple files in order to get the correct optimization table (again, with multiple human chromosomes). Once generated, you won't need to do that again.

Ideally, there would be some kind of robust import / export function to use JSON codon tables. But at minimum, split this function so that can be added later.

Koeng101

Looks good to me. I'll make a git issue of JSON representations for the codon table stuff. <hold on, found a bug>

TimothyStiles · 2020-10-22T22:13:10Z

Alright this PR is getting to be a monster. I refactored the way command line commands are tested for easier debugging and now have two simple command line utilities that both translate and optimize streams of sequence strings. It also has related library functions as well.

I'm tired and I'm merging it y'all.

🎉 🎉 🎉

Add optimization and translation functions for sequences

7378e85

Koeng101 requested a review from TimothyStiles June 18, 2020 21:58

TimothyStiles added 3 commits June 18, 2020 16:53

fixed godoc comments to use object name for exported objects.

fa76b55

renamed struct names to be more readable.

1612107

made variable and function names more readable.

99bc4ee

TimothyStiles reviewed Jun 19, 2020

View reviewed changes

TimothyStiles added 4 commits June 19, 2020 13:47

changed another variable name.

09782b0

recommenting test function for now.

b92ca4b

updated with a new DefaultCodonTablesByName.

3ac819f

updated variable name in transformations_test.go.

64a9fdc

TimothyStiles added this to Q3 2020 – Aug-Oct in Poly roadmap Aug 7, 2020

TimothyStiles mentioned this pull request Aug 25, 2020

Optimizing sequences #41

Closed

TimothyStiles added 8 commits September 24, 2020 12:50

merging prime branch.

db33d75

added new codonFrequency and Translate functions.

04a757b

debugged GetCodonFrequency and optimized translate function.

96e63b9

added basic codon optimization for arbitrary AnnotatedSequence.

9b737aa

committing before roll back and refactor.

3158fc3

basic tests pass for codon optimization.

a042748

merging.

c46a45f

fixed merge error.

7a713c6

tylermaran reviewed Oct 14, 2020

View reviewed changes

Koeng101 commented Oct 14, 2020

View reviewed changes

TimothyStiles added 2 commits October 14, 2020 15:58

refactored so CodonTable can be more portable.

26f5174

added comment and clarified variable name.

eb035aa

Koeng101 commented Oct 15, 2020

View reviewed changes

TimothyStiles added 3 commits October 17, 2020 12:59

prettified command related files and transformations.go.

4fb3387

made .CreateWeights() method public.

910c932

refactored hash in commands.go.

db0b4b8

TimothyStiles added 18 commits October 21, 2020 10:04

switched test file to puc19.gbk. Backing up before test refactor.

e676dde

backing up progress before yet another refactor.

9e55a71

hash command test refactored.

3c371b6

refactored command tests to spoof stdin and stdout.

be485cf

actually refactored command tests to spoof stdin and stdout.

86c71a9

updating urfave/cli to latest commit.

c3594a3

made it so codons are initialized with at least a weight of 1.

37fcc12

fixed comment in hash.go

e507f7d

rearranged main.go for easier debugging.

b63a72b

added flag.

754e77a

added translation command and test.

49ed9ee

adding test.

fb8af83

added variables for easy debugging.

87e2d57

made application private.

0fbc267

made private things public. Updated command comments.

1472d4d

made getFeatureSequence private.

36a7848

fixed typo.

2ee4afa

renamed functions.

816dd24

TimothyStiles merged commit 7b0947a into bebop:prime Oct 22, 2020

TimothyStiles mentioned this pull request Oct 22, 2020

add windows to test matrix #45

Closed

TimothyStiles moved this from Q3 2020 – Aug-Oct to Done in Poly roadmap May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimization and translation functions for sequences #11

Add optimization and translation functions for sequences #11

Koeng101 commented Jun 18, 2020

TimothyStiles Jun 19, 2020

Koeng101 Jun 19, 2020

TimothyStiles commented Aug 25, 2020

Koeng101 commented Aug 25, 2020

tylermaran Oct 14, 2020

Koeng101 left a comment

Koeng101 Oct 14, 2020

Koeng101 left a comment •

edited

Loading

TimothyStiles commented Oct 22, 2020

Add optimization and translation functions for sequences #11

Add optimization and translation functions for sequences #11

Conversation

Koeng101 commented Jun 18, 2020

TimothyStiles Jun 19, 2020

Choose a reason for hiding this comment

Koeng101 Jun 19, 2020

Choose a reason for hiding this comment

TimothyStiles commented Aug 25, 2020

Koeng101 commented Aug 25, 2020

tylermaran Oct 14, 2020

Choose a reason for hiding this comment

Koeng101 left a comment

Choose a reason for hiding this comment

Koeng101 Oct 14, 2020

Choose a reason for hiding this comment

Koeng101 left a comment • edited Loading

Choose a reason for hiding this comment

TimothyStiles commented Oct 22, 2020

Koeng101 left a comment •

edited

Loading