Skip to content

Edinburgh-Genome-Foundry/sequenticon

Repository files navigation

Sequenticon Logo

GitHub CI build status

image

Sequenticon is a Python library to generate identicons for DNA sequences. For instance the sequence ATGGTGCA gets converted to the following icon:




When are sequenticons useful ?

Identifying DNA sequences

In biological engineering, DNA sequence files often get updated or re-named. This can cause critical confusions when the wrong files or wrong sequence versions get used in a process. Ideally, laboratory information systems would prevent such mistakes, but when they happen they are difficult to trace back to the faulty sequences.

Therefore, when using software to process large batches of sequences, one may want a way to quickly decide whether the sequence pLac3.gb used on March 15th is the same as plac3.gb which appears in the April 18th batch.

Identicons provide a simple visual way to know that two sequences are different (different identicons) or very probably the same (same identicon).

Also note that, theoretically, even two large sequences differing by one nucleotide only will have very different sequenticon looks.

Identifying biological samples

Sequenticons can also be used to ease the handling of biological samples by humans. With the right printer, you can label your tubes with colorful little sequenticons which will be much easier to identify than complex and often very similar text labels.




Library usage

from sequenticon import sequenticon

# Write a sequence to a PNG sequenticon file
sequenticon("ATGGTGCA", size=120, output_path="icon.png")

# Get a self-contained "<img/>" HTML string, to embed in a webpage
img_tag = sequenticon("ATGGTGCA", size=60, output_format="html_image")

To process a batch:

from sequenticon import sequenticon_batch

sequences = [("seq1", "ATTGTG"), ("seq2", "TAAATGCC"), ...] # OR
sequences = ["record1.gb", "record2.fa", ...] # OR
sequences = [biopython_record_1, biopython_record_2, ...]

# Write a batch of sequences as PNG in a folder
sequenticon_batch(sequences, size=120, output_path="my_emoticons/")

# Get a list [(sequence_name, html_image_tag), (...)]
data = sequenticon_batch(sequences, size=60, output_format="html_image")

# Write a PDF report with every sequenticon
sequenticon_batch_pdf(sequences, "my_report.pdf")

Here is an example PDF output from the last command (full PDF):

sequenticon Logo

Installation

You can install Sequenticon through PIP:

pip install sequenticon

Alternatively, you can unzip the sources in a folder and type:

python setup.py install

License = MIT

This project is an open-source software originally written at the Edinburgh Genome Foundry by Zulko and released on Github under the MIT license (Copyright 2018 Edinburgh Genome Foundry).

Everyone is welcome to contribute!

More biology software

image

Sequenticon is part of the EGF Codons synthetic biology software suite for DNA design, manufacturing and validation.

Note: also check out Pydenticon. Sequenticon is really just a few lines of Python around the more generic pydenticon library. The upside of having an official sequenticon library is to make sure that the icons, colors, etc. remain consistent accross projects.