BioPython est une bibliothèque Python open source conçue pour faciliter les analyses bioinformatiques et le traitement de données biologiques. Elle offre des outils pour manipuler des séquences d'ADN, ARN et protéines, interagir avec des bases de données biologiques, effectuer des analyses phylogénétiques, et travailler avec des formats de fichiers courants comme FASTA, GenBank ou PDB. BioPython est particulièrement utile pour automatiser des tâches répétitives et intégrer des pipelines d’analyse bioinformatique grâce à ses nombreuses fonctionnalités et sa facilité d'utilisation.

In [4]:
from Bio.Seq import Seq 

my_seq = Seq("AGTACACTGGT")
print(my_seq)

AGTACACTGGT


In [8]:
print(f'Séquence complémentaire : {my_seq.complement()}')

print(f'Complément de la séquence complémentaire : {my_seq.reverse_complement()}')

Séquence complémentaire : TCATGTGACCA
Complément de la séquence complémentaire : ACCAGTGTACT


You can access elements of the sequence in the same way as for strings (but remember, Python counts from zero!):

In [9]:
print(my_seq[0])  # first letter
print(my_seq[2])  # third letter
print(my_seq[-1])  # last letter

A
T
T


The Seq object has a .count() method, just like a string. Note that this means that like a Python string, this gives a non-overlapping count:

In [10]:
print(my_seq.count("A"))

3


For some biological uses, you may actually want an overlapping count (i.e. in this trivial example). When searching for single letters, this makes no difference:

In [11]:
from Bio.Seq import Seq
long_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
print(len(long_seq))

long_seq.count("G")
9
100 * (long_seq.count("G") + long_seq.count("C")) / len(long_seq)
46.875

32


46.875

While you could use the above snippet of code to calculate a GC%, note that the Bio.SeqUtils module has several GC functions already built. For example:

In [12]:
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction
long_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
print(gc_fraction(long_seq))

0.46875


Also note that just like a normal Python string, the Seq object is in some ways “read-only”. If you need to edit your sequence, for example simulating a point mutation, look at the Section MutableSeq objects below which talks about the MutableSeq object.

In [13]:
long_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
long_seq[3] = 'F'

<class 'TypeError'>: 'Seq' object does not support item assignment