# Downloads and Installation

Download necessary libraries and tools.

In [None]:
!pip install biopython

!git clone https://github.com/jiqingxiaoxi/GLAPD.git
%cd GLAPD

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting biopython
  Downloading biopython-1.79-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 3.8 MB/s 
Installing collected packages: biopython
Successfully installed biopython-1.79
Cloning into 'GLAPD'...
remote: Enumerating objects: 189, done.[K
remote: Total 189 (delta 0), reused 0 (delta 0), pack-reused 189[K
Receiving objects: 100% (189/189), 23.99 MiB | 13.44 MiB/s, done.
Resolving deltas: 100% (85/85), done.
/content/GLAPD


In [None]:
!make

gcc single.c -o Single -lm
gcc LAMP.c -o LAMP -lm


In [None]:
import pathlib

from Bio import Entrez, SeqIO


Entrez.email = "hello.devpatel@gmail.com"
Entrez.tool = "GLAPD_ITS_Design.ipynb"

data_dir = pathlib.Path.cwd() / "oak_wilt"
data_dir.mkdir(exist_ok=True)

Query and download reference genome.

In [None]:
ref_id = "FJ347031.1"
handle = Entrez.efetch(db="nucleotide", rettype="gbwithparts", retmode="text", id=ref_id)
record = SeqIO.read(handle, "genbank")

SeqIO.write([record], str(data_dir / "ref_its.fasta"), "fasta")
record

SeqRecord(seq=Seq('TCATTACTGAGTTTTCAACTCTTTAAAACCATTTGTGAACATACCATTTTTTTT...GTT'), id='FJ347031.1', name='FJ347031', description='Ceratocystis fagacearum 18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8 S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence; and 26S ribosomal RNA gene, partial sequence', dbxrefs=[])

Query and download dataset genomes.

In [None]:
genus = ["Bretziella", "Ceratocystis"]
title = ["internal transcribed spacer 1", "internal transcribed spacer 2"]

genus_query = " OR ".join(f"({g}[Organism])" for g in genus)
title_query = " AND ".join(f"(\"{t}\"[Title])" for t in title)
query = f"(({genus_query}) AND ({title_query}))"
query

'(((Bretziella[Organism]) OR (Ceratocystis[Organism])) AND (("internal transcribed spacer 1"[Title]) AND ("internal transcribed spacer 2"[Title])))'

In [None]:
handle = Entrez.esearch(db="nucleotide", retmax=1500, term=query, idtype="acc")
record = Entrez.read(handle)
dataset_ids = set(record["IdList"])

assert ref_id in dataset_ids 
len(dataset_ids)

1159

In [None]:
desc_key = "internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence"
desc_key = desc_key.replace(" ", "")

handle = Entrez.efetch(db="nucleotide", rettype="gbwithparts", retmode="text", id=dataset_ids)
dataset = SeqIO.parse(handle, "genbank")

# make more resilient to weird spacings by removing white spaces
dataset = filter(lambda r: desc_key in r.description.replace(" ", ""), dataset)

SeqIO.write(dataset, str(data_dir / "dataset.fasta"), "fasta")

with open(data_dir / "target.txt", "w+") as target_f, \
     open(data_dir / "dataset.fasta", "r") as dataset_f:

    for line in dataset_f.readlines(): 
        if (line[0] == ">") and ("fagacearum" in line):
            target_f.write(line)
            print(line)

>MH865866.1 Bretziella fagacearum culture CBS:130770 strain CBS 130770 small subunit ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence; and large subunit ribosomal RNA gene, partial sequence

>DQ318193.1 Ceratocystis fagacearum strain WIN(M) 892 18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence; and 26S ribosomal RNA gene, partial sequence

>MH865196.1 Bretziella fagacearum culture CBS:129241 strain CBS 129241 small subunit ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence; and large subunit ribosomal RNA gene, partial sequence

>KC305152.1 Ceratocystis fagacearum strain C 520 18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribe

# Run GLAPD

In [None]:
!bowtie/bowtie-build oak_wilt/dataset.fasta oak_wilt/index 

Settings:
  Output files: "oak_wilt/index.*.ebwt"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 5 (one in 32)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  oak_wilt/dataset.fasta
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 63424
Using parameters --bmax 47568 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 47568 --dcv 1024
Constructing suffix-array element generato

In [None]:
!mkdir results

In [None]:
!./Single -in oak_wilt/ref_its.fasta -out candidates -dir results

It takes 0 seconds to prepare.
There ara 553 candidate primers used as F3/F2/B2/B3.
There are 507 candidate primers used as F1c/B1c.
It takes 4 seconds to identify candidate single primer regions.


In [None]:
!perl par.pl --in candidates --ref oak_wilt/ref_its.fasta --dir results --common oak_wilt/target.txt --left \
    --bowtie bowtie/bowtie --index oak_wilt/index

Now the program is handling the 1-th file, total files is 2...
# reads processed: 507
# reads with at least one reported alignment: 507 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 80407 alignments to 1 output stream(s)
    In this step, it takes 0 seconds.
Now the program is handling the 2-th file, total files is 2...
# reads processed: 553
# reads with at least one reported alignment: 553 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 92917 alignments to 1 output stream(s)
    In this step, it takes 1 seconds.


In [None]:
!./LAMP -in candidates -ref oak_wilt/ref_its.fasta -dir results -out results/success.txt -common -specific

It takes 0 seconds to prepare data.
Running: amplify 12 target genome.
Running: amplify 11 target genome.
It takes 1 seconds to design the 1-th LAMP primer set successfully.
Running: amplify 10 target genome.
Running: amplify 9 target genome.
Running: amplify 8 target genome.
Running: amplify 7 target genome.
Running: amplify 6 target genome.
Running: amplify 5 target genome.
Running: amplify 4 target genome.
Running: amplify 3 target genome.
Running: amplify 2 target genome.
Running: amplify 1 target genome.
It takes 0 seconds to free memory.

It takes total 1 seconds to finish this design.


In [None]:
!zip -r results.zip results/

  adding: results/ (stored 0%)
  adding: results/success.txt (deflated 51%)
  adding: results/Outer/ (stored 0%)
  adding: results/Outer/candidates-common.txt (deflated 81%)
  adding: results/Outer/candidates-specific.txt (deflated 83%)
  adding: results/Outer/candidates (deflated 83%)
  adding: results/Inner/ (stored 0%)
  adding: results/Inner/candidates-common.txt (deflated 82%)
  adding: results/Inner/candidates-specific.txt (deflated 83%)
  adding: results/Inner/candidates (deflated 83%)
  adding: results/Inner/candidates-common_list.txt (deflated 42%)
