In [1]:
library(ggplot2)
library(reshape2)
library(dplyr)
library(stringr)
library(tidyr)
theme_set(theme_bw())
options(repr.plot.width=7, repr.plot.height=4)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Attaching package: ‘tidyr’

The following object is masked from ‘package:reshape2’:

    smiths



# Introduction

In [clade-freqs](clade-freqs.ipynb), I've annotated a list of identity elements that appear to be conserved in over 95% of tRNAs. Some of these are reflected in the literature; others are not. I expect that the well-studied IDEs that are agreed to be universal to hold true. Thus, I need to look into the tRNAs that *don't* have this IDE. 

Do these exceptions function as tRNAs? Using a suite of supposedly gold standard IDEs, we would expect to be able to differentiate between bona fide tRNAs and tRNA pseudogenes.

I'll get a set of tRNAs that may or may not be missing a key IDE. I'll then proceed in two branches. 
1) IDE rules. We've learned something about which IDEs are required. We now know how to choose canonical tRNAs. Filter based on these IDEs or based on suites of IDEs, regenerate frequencies, rinse and repeat.
2) Interesting exceptions to the rule. Some tRNAs are exceptional. Look deeply into a few examples where they're missing a key IDE. Are any of them functional? Are they missing all of the other IDEs?
 
#2 is easier to tackle first, as we isolate the tRNAs. First, we'll recreate the frequency table.

# Data wrangling

## Import alignment and bases


In [2]:
identities = read.delim('identities.tsv', sep='\t', stringsAsFactors=FALSE)
identities$quality = as.logical(identities$quality)
identities$restrict = as.logical(identities$restrict)

In [3]:
positions = colnames(identities)[which(str_detect(colnames(identities), "X\\d+\\.\\d+$"))]
positions = c(positions, 'X8', 'X9', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20', 'X20a', 'X21', 'X26', 'X32', 'X33', 'X34', 'X35', 'X36', 'X37', 'X38', 'X44', 'X45', 'X46', 'X47', 'X48', 'X54', 'X55', 'X56', 'X57', 'X58', 'X59', 'X60', 'X73')

## Get frequencies

In [86]:
clade_iso_ac_freqs = identities %>%
  select(match(c('clade', 'isotype_ac', 'anticodon', positions), colnames(identities))) %>%
  mutate(isotype=isotype_ac) %>%
  gather(positions, bases, -clade, -isotype, -anticodon) %>%
  group_by(clade, isotype, anticodon, positions, bases) %>%
  tally() %>%
  group_by(clade, isotype, anticodon, positions) %>%
  mutate(freq=n) %>%
  group_by(clade, isotype, anticodon, positions) %>%
  summarize(A = sum(freq[bases == "A"]),
            C = sum(freq[bases == "C"]),
            G = sum(freq[bases == "G"]),
            U = sum(freq[bases == "U"]),
            Deletion = sum(freq[bases %in% c("-", ".")]), 
            Purine = sum(freq[bases %in% c("A", "G")]),
            Pyrimidine = sum(freq[bases %in% c("C", "U")]),
            Weak = sum(freq[bases %in% c("A", "U")]),
            Strong = sum(freq[bases %in% c("G", "C")]),
            Amino = sum(freq[bases %in% c("A", "C")]),
            Keto = sum(freq[bases %in% c("G", "U")]),
            B = sum(freq[bases %in% c("C", "G", "U")]),
            D = sum(freq[bases %in% c("A", "G", "U")]),
            H = sum(freq[bases %in% c("A", "C", "U")]),
            V = sum(freq[bases %in% c("A", "C", "G")]),
            D = sum(freq[bases %in% c("A", "G", "U")]),
            GC = sum(freq[bases == "G:C"]),
            AU = sum(freq[bases == "A:U"]),
            UA = sum(freq[bases == "U:A"]),
            CG = sum(freq[bases == "C:G"]),
            GU = sum(freq[bases == "G:U"]),
            UG = sum(freq[bases == "U:G"]),
            PairDeletion = sum(freq[bases == "-:-"]), 
            PurinePyrimidine = sum(freq[bases %in% c("A:U", "G:C")]),
            PyrimidinePurine = sum(freq[bases %in% c("U:A", "C:G")]),
            StrongPair = sum(freq[bases %in% c("G:C", "C:G")]),
            WeakPair = sum(freq[bases %in% c("A:U", "U:A")]),
            Wobble = sum(freq[bases %in% c("G:U", "U:G")]),
            Paired = sum(freq[bases %in% c("A:U", "U:A", "C:G", "G:C", "G:U", "U:G")]),
            Bulge = sum(freq[bases %in% c("A:-", "U:-", "C:-", "G:-", "-:A", "-:G", "-:C", "-:U")]),
            Mismatched = sum(freq[bases %in% c("A:A", "G:G", "C:C", "U:U", "A:G", "A:C", "C:A", "C:U", "G:A", "U:C")])
            ) %>%
  mutate(total = A + B + Deletion + Paired + Mismatched + Bulge + PairDeletion) %>%
  melt(id.vars=c("clade", "isotype", "anticodon", "positions", "total")) %>%
  mutate(freq=value/total)

In [11]:
clade_iso_freqs = clade_iso_ac_freqs %>%
  group_by(positions, isotype, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total))

euk_freqs = clade_iso_ac_freqs %>%
  group_by(positions, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total))

consensus = euk_freqs %>%
  filter(freq > 0.95) %>%
  group_by(positions) %>% # remove duplicates
  filter(row_number(freq) == 1) %>%
  arrange(positions)
consensus

Unnamed: 0,positions,variable,count,freq
1,X10.25,Paired,108575,0.9849145
2,X14,A,109604,0.9942488
3,X15,Purine,109916,0.997079
4,X18,G,109042,0.9891507
5,X18.55,GU,107878,0.9785918
6,X19,G,108902,0.9878808
7,X19.56,Paired,108429,0.983599
8,X20,H,106812,0.9689218
9,X21,A,108344,0.982819
10,X26,D,107831,0.9781654


# Analysis of eukaryotic all-tRNA consensus identity elements

## 3:70

M&G: Even numbers across all pairs, 9 mismatches. G3-U70 unique to Ala. A few other isotypes have single exceptions. Antideterminant for Thr. C3-G70 positive for iMet. Dependent on 2-71 context.

First, let's check the iMet frequencies. 

In [12]:
head(clade_iso_freqs)

Unnamed: 0,positions,isotype,variable,count,freq
1,X10.25,Ala,A,0,0
2,X10.25,Ala,C,0,0
3,X10.25,Ala,G,0,0
4,X10.25,Ala,U,0,0
5,X10.25,Ala,Deletion,0,0
6,X10.25,Ala,Purine,0,0


In [13]:
clade_iso_ac_freqs %>% filter(positions == 'X3.70' & isotype == "iMet") %>%
  group_by(positions, isotype, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(freq > 0.05)

Unnamed: 0,positions,isotype,variable,count,freq
1,X3.70,iMet,CG,1195,0.9991639
2,X3.70,iMet,PyrimidinePurine,1195,0.9991639
3,X3.70,iMet,StrongPair,1195,0.9991639
4,X3.70,iMet,Paired,1196,1.0


M&G's frequencies with iMet are confirmed. This is a pretty strong determinant for initiator methionine.

For alanine, previous work (e.g. with [Chihade et al. 1998](http://pubs.acs.org/doi/pdf/10.1021/bi9804636)) shows that G3-U70 is a strong determinant in *C. elegans*. M&G do find that a few other tRNAs also contain G3-U70. [Beuning et al. 2002](http://rnajournal.cshlp.org/content/8/5/659.full.pdf) also shows that the orientation of a 2:71 purine:pyrimidine pair is helpful for charging. Let's see if G3-U70 is specifically enriched in alanine.

In [24]:
clade_iso_ac_freqs %>% filter(positions == 'X3.70' & variable == "GU") %>%
  group_by(positions, isotype, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(freq > 0.01)

clade_iso_ac_freqs %>% filter(positions == 'X3.70' & isotype == 'Ala' & variable == "GU") %>%
  mutate(isNGC=(str_detect(anticodon, "[AGCT]GC"))) %>%
  group_by(isNGC) %>%
  summarize(count=sum(value), freq=sum(value)/6658) %>%
  filter(freq > 0.001)

clade_iso_ac_freqs %>% filter(positions == 'X3.70' & isotype == 'Ala') %>%
  summarize(sum(value))

Unnamed: 0,positions,isotype,variable,count,freq
1,X3.70,Ala,GU,6658,0.2527043
2,X3.70,Cys,GU,27,0.01237964
3,X3.70,Gly,GU,62,0.0118865


Unnamed: 0,isNGC,count,freq
1,False,590,0.0886152
2,True,6068,0.9113848


Unnamed: 0,sum(value)
1,87549


This basically confirms M&G's (non-)conclusions - GU is enriched for Ala, though no recriprocal relationship exists, except for Ala-NGC.

## U8-A14

This is known to be extremely conserved, since it stabilizes the tertiary structure. M&G found that a variety of bacteria and archaea contain a C8 variation. Our data fits the eukaryotic side of things at 97%.

## R9 and 9:23

M&G: mostly a purine here. Interacts with base 23 in class I tRNAs. 

Our data supports this, and goes a step further in class II tRNAs, where it's a G9. The 9-23 interaction is not restricted to a particular interaction in any way, which agrees with my previous [tertiary interactions analysis](../tertiary-interactions.ipynb), where I proposed that it is an isotype- and clade-specific IDE. As for fungi, it would appear that the 9:23 interaction is more diverged, but is still typically mismatched as in other clades. Then again, it's not even listed as part of the consensus IDEs found above.

In [47]:
clade_iso_ac_freqs %>% filter(!(isotype %in% c("Ser", "Leu")) & positions %in% c('X9', 'X9.23')) %>%
  group_by(positions, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(freq > 0.9)

clade_iso_ac_freqs %>% filter(isotype %in% c("Ser", "Leu") & positions %in% c('X9', 'X9.23')) %>%
  group_by(positions, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(freq > 0.9)

clade_iso_ac_freqs %>% filter(positions == 'X9.23') %>%
  group_by(positions, clade, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(freq > 0.1)

Unnamed: 0,positions,variable,count,freq
1,X9,Purine,99885,0.9863626
2,X9,D,100207,0.9895424
3,X9,V,100875,0.9961389
4,X9.23,Mismatched,95677,0.9448087


Unnamed: 0,positions,variable,count,freq
1,X9,G,8929,0.9952073
2,X9,Purine,8957,0.9983281
3,X9,Strong,8934,0.9957646
4,X9,Keto,8938,0.9962104
5,X9,B,8943,0.9967677
6,X9,D,8966,0.9993313
7,X9,V,8962,0.9988854
8,X9.23,Mismatched,8647,0.9637762


Unnamed: 0,positions,clade,variable,count,freq
1,X9.23,Fungi,GU,1173,0.1026606
2,X9.23,Fungi,Wobble,1173,0.1026606
3,X9.23,Fungi,Paired,1683,0.1472956
4,X9.23,Fungi,Mismatched,9743,0.8527044
5,X9.23,Insecta,Mismatched,901,0.9019019
6,X9.23,Mammalia,Mismatched,50648,0.9666571
7,X9.23,Nematoda,Mismatched,4615,0.9380081
8,X9.23,Spermatophyta,Paired,210,0.1171875
9,X9.23,Spermatophyta,Mismatched,1582,0.8828125
10,X9.23,Streptophyta,Mismatched,4975,0.9012681


## 10:25

M&G: 10/41 GC, 31/41 GU, positive determinant for yeast Asp, negative determinant for yeast M22G on 26, interacts with 45.

This is pretty par for the course. Our data show that GC $\approx$ 75% is more common though (GU $\approx$ 22%).

In [51]:
euk_freqs %>% filter(positions == 'X10.25' & freq > 0.1)

Unnamed: 0,positions,variable,count,freq
1,X10.25,GC,83468,0.7571618
2,X10.25,GU,23788,0.2157877
3,X10.25,PurinePyrimidine,83984,0.7618426
4,X10.25,StrongPair,83774,0.7599376
5,X10.25,Wobble,23790,0.2158058
6,X10.25,Paired,108575,0.9849145


## A14

This is an invariant position, mentioned in M&G, involved in U8:A14.

## R15, 15:48

This is the Levitt base pair. M&G note that this usually forms R15:Y48, but has been shown in *E. coli* to tolerate different combinations. Our data show support for a weak R15:Y48 requirement.

In [54]:
clade_iso_ac_freqs %>% filter(positions == "X15.48") %>%
  group_by(positions, variable) %>%
  summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(freq > 0.1)

Unnamed: 0,positions,variable,count,freq
1,X15.48,GC,88316,0.8011466
2,X15.48,PurinePyrimidine,98253,0.8912888
3,X15.48,StrongPair,88317,0.8011557
4,X15.48,Paired,101730,0.9228299


## G18:U55, 19:56

M&G: G18:U55, G19 in eukaryotes. G19:U56 is common enough. Data shows invariant G19:C56 in eukaryotes. 4 bases downstream from 14 is always a G- so G18.

Our data does show slightly more variation than they'd bargained for with G19:C56.

In [59]:
clade_iso_ac_freqs %>% filter(positions %in% c("X18", "X55", "X18.55", "X19", "X56", "X19.56")) %>%
  group_by(positions, variable) %>% 
  summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(freq > 0.1) %>%
  group_by(positions) %>% # remove duplicates
  filter(row_number(freq) == 1)

Unnamed: 0,positions,variable,count,freq
1,X18,G,109042,0.9891507
2,X18.55,GU,107878,0.9785918
3,X19,G,108902,0.9878808
4,X19.56,GC,103109,0.9353393
5,X55,U,109063,0.9893412
6,X56,C,104296,0.946107


## H20

M&G: Mostly U. G20 is exclusive to Phe-GAA, and vice versa, barring 1 exception.

This base is almost certainly involved in 3d structure, whether as a spacer or stacking nucleotide. Our data shows many tRNAs that don't have a U. G20 does seem to be conserved in Phe-GAA too. It also looks like A20 in arginine is fairly well conserved with respect to the other isotypes.

In [73]:
clade_iso_ac_freqs %>% filter(positions %in% c("X20")) %>% group_by(positions, variable) %>% summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(variable %in% c("A", "G", "C", "U"))

clade_iso_ac_freqs %>% filter(positions %in% c("X20")) %>% group_by(positions, isotype, anticodon, variable) %>% summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(variable %in% c("A", "G", "C") & freq > 0.5 & count > 100)

clade_iso_ac_freqs %>% filter(positions %in% c("X20") & isotype == "Arg") %>% group_by(positions, isotype, variable) %>% summarize(count=sum(value), freq=sum(value)/sum(total)) %>%
  filter(variable %in% c("A", "G", "C", "U"))

Unnamed: 0,positions,variable,count,freq
1,X20,A,5721,0.05189681
2,X20,C,6061,0.05498104
3,X20,G,3420,0.03102378
4,X20,U,95030,0.8620439


Unnamed: 0,positions,isotype,anticodon,variable,count,freq
1,X20,Arg,ACG,A,897,0.7346437
2,X20,Arg,CCG,A,315,0.8583106
3,X20,Arg,CCT,A,578,0.8797565
4,X20,Arg,TCG,A,619,0.5992256
5,X20,Arg,TCT,A,826,0.7488667
6,X20,iMet,CAT,A,1113,0.9408284
7,X20,Met,CAT,C,947,0.7004438
8,X20,Phe,GAA,G,1598,0.9791667


Unnamed: 0,positions,isotype,variable,count,freq
1,X20,Arg,A,3321,0.5789749
2,X20,Arg,C,416,0.07252441
3,X20,Arg,G,22,0.003835425
4,X20,Arg,U,1977,0.3446653


## A21

M&G: A21 is invariant except for G21 in Met-CAT in *S. pombe*.

This base stacks with 8:14. However, I don't see G21 at all. 

In [95]:
clade_iso_ac_freqs %>% filter(positions == "X21") %>% group_by(positions, variable) %>% summarize(count=sum(value), freq=sum(value)/sum(total)) %>% filter(freq > 0.5) %>% filter(row_number(freq) == 1)

clade_iso_ac_freqs %>% filter(positions == "X21") %>% group_by(positions, clade, isotype, anticodon, variable)%>% summarize(count=sum(value), freq=sum(value)/sum(total)) %>% filter(clade == "Fungi" & isotype == "Met" & variable %in% c("U", "G", "C", "A"))

Unnamed: 0,positions,variable,count,freq
1,X21,A,108344,0.982819


Unnamed: 0,positions,clade,isotype,anticodon,variable,count,freq
1,X21,Fungi,Met,CAT,A,423,0.9929577
2,X21,Fungi,Met,CAT,C,0,0.0
3,X21,Fungi,Met,CAT,G,0,0.0
4,X21,Fungi,Met,CAT,U,0,0.0


Digging in more deeply, the *S. pombe* tRNAs score poorly, and align better to the threonine CM. Previously, the alignment has U20-G20a-G21. However, the alignment places a gap at 21 and 22, and places the insertion before 20a.

This is an exceedingly rare exception - almost all other tRNAs have this aligned properly, as shown in [euk-tRNAs](../euk-tRNAs.ipynb). 

In [93]:
identities[identities$clade == "Fungi" & identities$isotype_ac == "Met" & identities$species == "schiPomb_972H", ]

Unnamed: 0,clade,domain,isotype,seqname,species,species_long,isotype_ac,anticodon,score,isoscore,GC,D.loop,AC.loop,TPC.loop,V.arm,intron,insertions,deletions,quality,restrict,X0i1,X0i2,X0i3,X0i4,X0i5,X0i6,X0i7,X0i8,X0i9,X0i10,X0i11,X1.72,X1,X1i1,X2.71,X2,X2i1,X3.70,X3,X3i1,X3i2,X3i3,X3i4,X3i5,X3i6,X3i7,X4.69,X4,X4i1,X4i2,X4i3,X4i4,X4i5,X4i6,X4i7,X4i8,X4i9,X4i10,X4i11,X5.68,X5,X5i1,X5i2,X5i3,X5i4,X5i5,X5i6,X5i7,X6.67,X6,X6i1,X7.66,X7,X7i1,X7i2,X7i3,X7i4,X7i5,X7i6,X7i7,X7i8,X7i9,X7i10,X7i11,X7i12,X7i13,X7i14,X8,X8.14.21,X8.14,X8i1,X8i2,X8i3,X8i4,X9,X9.12.23,X9.23,X9i1,X9i2,X9i3,X10.25,X10,X10.25.45,X10.45,X10i1,X11.24,X11,X12.23,X12,X12i1,X12i2,X12i3,X12i4,X13.22,X13,X13.22.46,X13i1,X14,X14i1,X14i2,X14i3,X14i4,X14i5,X14i6,X14i7,X14i8,X14i9,X14i10,X14i11,X14i12,X14i13,X14i14,X14i15,X14i16,X14i17,X14i18,X14i19,X14i20,X14i21,X14i22,X14i23,X14i24,X14i25,X15,X15.48,X16,X16i1,X16i2,X16i3,X16i4,X16i5,X16i6,X16i7,X16i8,X16i9,X16i10,X16i11,X16i12,X16i13,X16i14,X16i15,X16i16,X16i17,X16i18,X16i19,X16i20,X17,X17i1,X17i2,X17i3,X18,X18.55,X19,X19.56,X20,X20i1,X20i2,X20i3,X20i4,X20i5,X20i6,X20i7,X20i8,X20i9,X20i10,X20a,X20b,X21,X21i1,X22,X22.46,X22i1,X23,X23i1,X24,X24i1,X25,X25i1,X25i2,X25i3,X25i4,X25i5,X25i6,X25i7,X25i8,X25i9,X25i10,X25i11,X25i12,X25i13,X26,X26.44,X26i1,X26i2,X26i3,X26i4,X27.43,X27,X27i1,X27i2,X27i3,X27i4,X27i5,X28.42,X28,X28i1,X28i2,X28i3,X28i4,X29.41,X29,X29i1,X30.40,X30,X30i1,X31.39,X31,X32,X33,X34,X35,X35i1,X36,X37,X37i1,X37i2,X37i3,X37i4,X37i5,X37i6,X37i7,X37i8,X37i9,X37i10,X37i11,X37i12,X37i13,X37i14,X37i15,X37i16,X37i17,X37i18,X37i19,X37i20,X37i21,X37i22,X37i23,X37i24,X37i25,X37i26,X37i27,X37i28,X37i29,X37i30,X37i31,X37i32,X37i33,X37i34,X37i35,X37i36,X37i37,X37i38,X37i39,X37i40,X37i41,X37i42,X37i43,X37i44,X37i45,X37i46,X37i47,X37i48,X37i49,X37i50,X37i51,X37i52,X37i53,X37i54,X37i55,X37i56,X37i57,X37i58,X37i59,X37i60,X37i61,X37i62,X37i63,X37i64,X37i65,X37i66,X37i67,X37i68,X37i69,X37i70,X37i71,X37i72,X37i73,X37i74,X37i75,X37i76,X37i77,X37i78,X37i79,X37i80,X37i81,X37i82,X37i83,X37i84,X37i85,X37i86,X37i87,X37i88,X37i89,X37i90,X37i91,X37i92,X37i93,X37i94,X37i95,X37i96,X37i97,X37i98,X37i99,X37i100,X37i101,X37i102,X37i103,X37i104,X37i105,X37i106,X37i107,X37i108,X37i109,X37i110,X37i111,X37i112,X37i113,X37i114,X37i115,X37i116,X37i117,X37i118,X37i119,X37i120,X37i121,X37i122,X37i123,X37i124,X37i125,X37i126,X37i127,X37i128,X37i129,X37i130,X37i131,X37i132,X37i133,X37i134,X37i135,X37i136,X37i137,X37i138,X37i139,X37i140,X37i141,X37i142,X37i143,X37i144,X37i145,X37i146,X37i147,X37i148,X37i149,X37i150,X37i151,X37i152,X37i153,X37i154,X37i155,X37i156,X37i157,X37i158,X37i159,X37i160,X37i161,X37i162,X37i163,X37i164,X37i165,X37i166,X37i167,X37i168,X37i169,X37i170,X37i171,X37i172,X37i173,X37i174,X37i175,X37i176,X37i177,X37i178,X37i179,X37i180,X37i181,X37i182,X37i183,X37i184,X37i185,X37i186,X37i187,X37i188,X37i189,X37i190,X37i191,X37i192,X37i193,X37i194,X37i195,X37i196,X37i197,X37i198,X37i199,X37i200,X37i201,X37i202,X37i203,X37i204,X37i205,X37i206,X37i207,X37i208,X37i209,X37i210,X37i211,X37i212,X37i213,X37i214,X37i215,X37i216,X37i217,X37i218,X37i219,X37i220,X37i221,X37i222,X37i223,X37i224,X37i225,X37i226,X37i227,X37i228,X37i229,X37i230,X37i231,X37i232,X37i233,X37i234,X37i235,X37i236,X37i237,X37i238,X37i239,X37i240,X37i241,X37i242,X37i243,X37i244,X37i245,X37i246,X37i247,X37i248,X37i249,X37i250,X37i251,X37i252,X37i253,X37i254,X37i255,X37i256,X37i257,X37i258,X37i259,X37i260,X37i261,X37i262,X37i263,X37i264,X37i265,X37i266,X37i267,X37i268,X37i269,X37i270,X37i271,X37i272,X37i273,X37i274,X37i275,X37i276,X37i277,X37i278,X37i279,X37i280,X37i281,X37i282,X37i283,X37i284,X37i285,X37i286,X37i287,X37i288,X37i289,X37i290,X37i291,X37i292,X37i293,X37i294,X37i295,X37i296,X37i297,X37i298,X37i299,X37i300,X37i301,X37i302,X37i303,X37i304,X37i305,X37i306,X37i307,X37i308,X37i309,X37i310,X37i311,X37i312,X37i313,X37i314,X37i315,X37i316,X37i317,X37i318,X37i319,X37i320,X37i321,X37i322,X37i323,X37i324,X37i325,X37i326,X37i327,X37i328,X37i329,X37i330,X37i331,X37i332,X37i333,X37i334,X37i335,X37i336,X37i337,X37i338,X37i339,X37i340,X37i341,X37i342,X37i343,X37i344,X37i345,X37i346,X37i347,X37i348,X37i349,X37i350,X37i351,X37i352,X37i353,X37i354,X37i355,X37i356,X37i357,X37i358,X37i359,X37i360,X37i361,X37i362,X37i363,X37i364,X37i365,X37i366,X37i367,X37i368,X37i369,X37i370,X37i371,X37i372,X37i373,X37i374,X37i375,X37i376,X37i377,X37i378,X37i379,X37i380,X37i381,X37i382,X38,X38i1,X39,X39i1,X40,X40i1,X40i2,X40i3,X40i4,X41,X41i1,X42,X42i1,X43,X44,X44i1,X44i2,X44i3,X44i4,X44i5,X44i6,X44i7,X44i8,X44i9,X44i10,X44i11,X44i12,X44i13,X44i14,X44i15,X44i16,X44i17,X44i18,X44i19,X44i20,X44i21,X44i22,X44i23,X45,V11.V21,V12.V22,V13.V23,V14.V24,V15.V25,V16.V26,V17.V27,V1,V2,V3,V4,V11,V12,V13,V14,V15,V16,V17,V21,V22,V23,V24,V25,V26,V27,X46,X46i1,X47,X47i1,X48,X49.65,X49,X49i1,X50.64,X50,X50i1,X50i2,X50i3,X50i4,X50i5,X50i6,X50i7,X50i8,X51.63,X51,X51i1,X51i2,X51i3,X52.62,X52,X52i1,X53.61,X53,X53i1,X54,X54.58,X54i1,X54i2,X54i3,X54i4,X54i5,X54i6,X54i7,X55,X55i1,X55i2,X55i3,X55i4,X55i5,X55i6,X55i7,X56,X56i1,X56i2,X56i3,X56i4,X56i5,X56i6,X56i7,X56i8,X56i9,X56i10,X56i11,X56i12,X56i13,X56i14,X56i15,X56i16,X56i17,X56i18,X56i19,X56i20,X56i21,X57,X57i1,X58,X58i1,X58i2,X58i3,X58i4,X58i5,X58i6,X58i7,X58i8,X58i9,X58i10,X58i11,X58i12,X58i13,X58i14,X58i15,X58i16,X58i17,X59,X59i1,X59i2,X59i3,X59i4,X60,X60i1,X60i2,X61,X61i1,X62,X63,X64,X64i1,X64i2,X64i3,X64i4,X64i5,X64i6,X65,X65i1,X65i2,X65i3,X65i4,X65i5,X65i6,X65i7,X65i8,X65i9,X65i10,X65i11,X65i12,X66,X66i1,X67,X67i1,X67i2,X68,X68i1,X68i2,X68i3,X68i4,X68i5,X68i6,X68i7,X68i8,X68i9,X68i10,X68i11,X69,X69i1,X70,X70i1,X70i2,X70i3,X70i4,X70i5,X70i6,X70i7,X70i8,X70i9,X70i10,X70i11,X71,X71i1,X71i2,X71i3,X71i4,X71i5,X72,X73
103031,Fungi,eukaryota,Thr,schiPomb_972H_chrI.trna50,schiPomb_972H,Schizosaccharomyces pombe 972h-,Met,CAT,57.1,76.3,0.2682927,12,16,8,0,9,2,2,False,True,.,.,.,.,.,.,.,.,.,.,.,G:C,G,.,C:G,C,.,U:A,U,.,.,.,.,.,.,.,U:G,U,.,.,.,.,.,.,.,.,.,.,.,C:C,C,.,.,.,.,.,.,.,U:A,U,.,G:U,G,.,.,.,.,.,.,.,.,.,.,.,.,.,.,U,U:A:-,U:A,.,.,.,.,A,A:U:G,A:G,.,.,.,G:C,G,G:C:G,G:G,.,C:A,C,U:G,U,.,.,.,.,C:-,C,C:-:G,.,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,G:C,U,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,C,.,.,.,G,G:U,G,G:C,U,.,.,.,.,G,.,.,.,.,.,G,.,-,.,-,-:G,U,G,.,A,.,C,.,.,.,.,.,.,.,.,.,.,.,.,.,A,A:A,.,.,.,.,U:A,U,.,.,.,.,.,C:G,C,.,.,.,.,C:G,C,.,C:G,C,.,U:G,U,C,U,C,A,.,U,A,U,U,A,U,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,A,C,A,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,A,.,G,.,G,.,.,.,.,G,.,G,.,A,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,-:-,-:-,-:-,-:-,-:-,-:-,-:-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,G,.,U,.,C,G:C,G,.,U:A,U,.,.,.,.,.,.,.,.,G:C,G,.,.,.,A:U,A,.,G:C,G,.,U,U:A,.,.,.,.,.,.,.,U,.,.,.,.,.,.,.,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,.,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,.,.,.,.,C,.,.,C,.,U,C,A,.,.,.,.,.,.,C,.,.,.,.,.,.,.,.,.,.,.,.,U,.,A,.,.,C,.,.,.,.,.,.,.,.,.,.,.,G,.,A,.,.,.,.,.,.,.,.,.,.,.,G,.,.,.,.,.,C,A
103032,Fungi,eukaryota,Thr,schiPomb_972H_chrII.trna7,schiPomb_972H,Schizosaccharomyces pombe 972h-,Met,CAT,57.2,76.3,0.2682927,12,14,8,0,7,2,2,False,True,.,.,.,.,.,.,.,.,.,.,.,G:C,G,.,C:G,C,.,U:A,U,.,.,.,.,.,.,.,U:G,U,.,.,.,.,.,.,.,.,.,.,.,C:C,C,.,.,.,.,.,.,.,U:A,U,.,G:U,G,.,.,.,.,.,.,.,.,.,.,.,.,.,.,U,U:A:-,U:A,.,.,.,.,A,A:U:G,A:G,.,.,.,G:C,G,G:C:G,G:G,.,C:A,C,U:G,U,.,.,.,.,C:-,C,C:-:G,.,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,G:C,U,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,C,.,.,.,G,G:U,G,G:C,U,.,.,.,.,G,.,.,.,.,.,G,.,-,.,-,-:G,U,G,.,A,.,C,.,.,.,.,.,.,.,.,.,.,.,.,.,A,A:A,.,.,.,.,U:A,U,.,.,.,.,.,C:G,C,.,.,.,.,C:G,C,.,C:G,C,.,U:G,U,C,U,C,A,.,U,A,U,A,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,U,G,A,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,A,.,G,.,G,.,.,.,.,G,.,G,.,A,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,-:-,-:-,-:-,-:-,-:-,-:-,-:-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,G,.,U,.,C,G:C,G,.,U:A,U,.,.,.,.,.,.,.,.,G:C,G,.,.,.,A:U,A,.,G:C,G,.,U,U:A,.,.,.,.,.,.,.,U,.,.,.,.,.,.,.,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,.,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,.,.,.,.,C,.,.,C,.,U,C,A,.,.,.,.,.,.,C,.,.,.,.,.,.,.,.,.,.,.,.,U,.,A,.,.,C,.,.,.,.,.,.,.,.,.,.,.,G,.,A,.,.,.,.,.,.,.,.,.,.,.,G,.,.,.,.,.,C,A
103039,Fungi,eukaryota,Thr,schiPomb_972H_chrII.trna58,schiPomb_972H,Schizosaccharomyces pombe 972h-,Met,CAT,57.2,76.3,0.2682927,12,14,8,0,7,2,2,False,True,.,.,.,.,.,.,.,.,.,.,.,G:C,G,.,C:G,C,.,U:A,U,.,.,.,.,.,.,.,U:G,U,.,.,.,.,.,.,.,.,.,.,.,C:C,C,.,.,.,.,.,.,.,U:A,U,.,G:U,G,.,.,.,.,.,.,.,.,.,.,.,.,.,.,U,U:A:-,U:A,.,.,.,.,A,A:U:G,A:G,.,.,.,G:C,G,G:C:G,G:G,.,C:A,C,U:G,U,.,.,.,.,C:-,C,C:-:G,.,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,G:C,U,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,C,.,.,.,G,G:U,G,G:C,U,.,.,.,.,G,.,.,.,.,.,G,.,-,.,-,-:G,U,G,.,A,.,C,.,.,.,.,.,.,.,.,.,.,.,.,.,A,A:A,.,.,.,.,U:A,U,.,.,.,.,.,C:G,C,.,.,.,.,C:G,C,.,C:G,C,.,U:G,U,C,U,C,A,.,U,A,U,G,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,U,G,A,U,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,A,.,G,.,G,.,.,.,.,G,.,G,.,A,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,-:-,-:-,-:-,-:-,-:-,-:-,-:-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,G,.,U,.,C,G:C,G,.,U:A,U,.,.,.,.,.,.,.,.,G:C,G,.,.,.,A:U,A,.,G:C,G,.,U,U:A,.,.,.,.,.,.,.,U,.,.,.,.,.,.,.,C,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,.,A,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,G,.,.,.,.,C,.,.,C,.,U,C,A,.,.,.,.,.,.,C,.,.,.,.,.,.,.,.,.,.,.,.,U,.,A,.,.,C,.,.,.,.,.,.,.,.,.,.,.,G,.,A,.,.,.,.,.,.,.,.,.,.,.,G,.,.,.,.,.,C,A


## D26, D44, 26:44

M&G don't say much about this. Our data show a near-universal consensus for anything but C at these positions. In [tertiary-interactions](../tertiary-interactions.ipynb) I show that this is most commonly G26:A44 (about half of tRNAs), but it's not a strong identity element.

## Y32

M&G: WC pairs rare at 32:38, but there tends to be a weak pairing here for a more stable AC stem-loop. Auffinger and Westhof (1999) showcase a variety of weak pairings (e.g., single hydrogen bond).

In [105]:
clade_iso_ac_freqs %>% filter(positions %in% c("X32", "X38")) %>% group_by(positions, variable) %>% summarize(count=sum(value), freq=sum(value)/sum(total)) %>% filter(freq > 0.2)

Unnamed: 0,positions,variable,count,freq
1,X32,C,76145,0.6907328
2,X32,U,33729,0.3059653
3,X32,Pyrimidine,109874,0.9966981
4,X32,Weak,33936,0.307843
5,X32,Strong,76298,0.6921207
6,X32,Amino,76352,0.6926105
7,X32,Keto,33882,0.3073532
8,X32,B,110027,0.998086
9,X32,D,34089,0.3092309
10,X32,H,110081,0.9985758



X32	Pyrimidine
X33	U
X37	Purine
X3.70	Paired
X38	H
X44	D
X46	Purine
X48	Pyrimidine
X49.65	Paired
X53.61	Paired
X54	U
X54.58	Paired
X55	U
X56	V
X57	Purine
X58	A
X59	D
X60	Pyrimidine
X73	D

## Non-consensus identity elements

There's plenty of examples where our frequencies confirm known rules, supplant known rules, or indicate new rules. There's also plenty of rules that weren't recapitulated above - and those are worth looking into individually.

## C1:G72

## C5:G68

# No. tRNAs by missing IDEs