Commit
In Bio.Data.IUPACData: - corrected masses for monophosphate nucleotides in unambiguous_dna_weights and unambiguous_rna_weights (most values where too high by a mass of 16 Da) - added two dictionaries with monoisotopic masses for monophosphate nucleotides (monoisotopic_unambiguous_dna_weights and monoisotopic_unambiguous_rna_weights) - added average and monisotopic masses for selenocysteine and pyrrolysine in protein_weights and monoisotopic_protein_weights In Bio.SeqUtils.__init__: Rewrote method molecular_weight to - correct the calculation (sum masses of sequence elements and substract 18 Da for each formed bond) - allow mass calculation for RNA and protein sequences - allow mass calculation for double stranded nucleic acids
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -165,12 +165,33 @@ def xGC_skew(seq, window=1000, zoom=100, | |
canvas.configure(scrollregion=canvas.bbox(ALL)) | ||
|
||
|
||
def molecular_weight(seq): | ||
"""Calculate the molecular weight of a DNA sequence.""" | ||
if isinstance(seq, str): | ||
seq = Seq(seq, IUPAC.unambiguous_dna) | ||
weight_table = IUPACData.unambiguous_dna_weights | ||
return sum(weight_table[x] for x in seq) | ||
def molecular_weight(seq, type='DNA', double_stranded=False): | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong.
MarkusPiotrowski
Author
Owner
|
||
# Rewritten by Markus Piotrowski | ||
"""Calculate the molecular weight of a DNA, RNA or protein sequence.""" | ||
seq = ''.join(str(seq).split()).upper() # Do the minimum formatting | ||
|
||
if type == 'DNA': | ||
weight_table = IUPACData.unambiguous_dna_weights | ||
elif type == 'RNA': | ||
weight_table = IUPACData.unambiguous_rna_weights | ||
elif type == 'protein': | ||
weight_table = IUPACData.protein_weights | ||
else: | ||
raise ValueError('allowed types are DNA, RNA or protein') | ||
|
||
try: | ||
weight = sum(weight_table[x] for x in seq) - (len(seq)-1) * 18.02 | ||
except KeyError as e: | ||
raise ValueError('%s is not a valid unambiguous letter for the ' %e + | ||
'chosen sequence type (DNA, RNA or protein)') | ||
except: | ||
raise | ||
|
||
if type in ('DNA', 'RNA') and double_stranded: | ||
seq = str(Seq(seq).complement()) | ||
weight += sum(weight_table[x] for x in seq) - (len(seq)-1) * 18.02 | ||
|
||
return weight | ||
|
||
|
||
def nt_search(seq, subseq): | ||
|
If you want to generalise this function, it should obey the seq object's alphabet to get the sequence type (with the optional argument as a way to set this directly).
Personally I wonder if separate functions for DNA, RNA and protein weight would be better (which can include error checking for being fed a sequence with the wrong alphabet)?