### **Software info**

|Software     |Version|
|-------------|-------|
|`python`|3.13.0|
|`ipykernel`|[6.29.5](https://pypi.org/project/ipykernel/)|
|`Entrez-direct`|[22.4](https://anaconda.org/bioconda/entrez-direct)|
|`Emboss`|[6.6.0](https://anaconda.org/bioconda/emboss)|

Conda envinronment: `phoacr.yaml`<br>
Install the envinronment with:

In [None]:
! conda env create -f ../phoacr.yaml

Reload `VS Code` (close & open), then activate this envinronment as kernel

### **Hardware info**

- OS: Ubuntu 22.04 (Windows Subsystem for Linux)
- CPU: Intel Xeon E5-2670v3
- RAM: 32GB (16GB for WSL)

In [29]:
! lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
    CPU family:          6
    Model:               63
    Thread(s) per core:  2
    Core(s) per socket:  12
    Socket(s):           1
    Stepping:            2
    BogoMIPS:            4589.37
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscal
                         l nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopo
                         logy cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1
                          sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervis
                         or lahf_lm abm invpcid_single pti ssbd i

### **Step 1. Download sequences**

Download all the coronaviruses from `RefSeq` database

In [1]:
! esearch -db nucleotide \
    -query '("Alphacoronavirus"[Organism] OR "Betacoronavirus"[Organism] OR "Gammacoronavirus"[Organism] OR coronavirus[All Fields]) AND srcdb_refseq[PROP] AND viruses[filter]' \
    | efetch -format fasta > data/CoVs.fa

Download all the viruses from `RefSeq` database

> **Warning**: The command below runs approx. 25 minutes

In [2]:
! esearch -db nucleotide \
    -query 'srcdb_refseq[PROP] AND viruses[filter]' \
    | efetch -format fasta > data/viruses.fa

Check how many coronaviruses has been downloaded from `RefSeq`

In [3]:
with open("data/CoVs.fa", "r") as fasta_file:
    content = fasta_file.read()
    num_sequences = content.count(">")
print(f"The number of sequences in `data/CoVs.fa` file: {num_sequences}")

The number of sequences in `data/CoVs.fa` file: 74


Check how many viruses has been downloaded from `RefSeq`

In [4]:
with open("data/viruses.fa", "r") as fasta_file:
    content = fasta_file.read()
    num_sequences = content.count(">")
print(f"The number of sequences in `data/viruses.fa` file: {num_sequences}")

The number of sequences in `data/viruses.fa` file: 18748


### **Step 2. _In silico_ PCR**

Make directory to store results

In [12]:
! mkdir results/

PCR primers were designed by our collaborators<br>
Here we test their selectivity and sensetivity

Check selectivity

In [13]:
! primersearch -seqall data/CoVs.fa -infile data/primers -mismatchpercent 10 -outfile results/selectivity.txt

Search DNA sequences for matches with primer pairs


Check sensetivity

In [21]:
! primersearch -seqall data/viruses.fa -infile data/primers -mismatchpercent 10 -outfile results/sensetivity.txt

Search DNA sequences for matches with primer pairs


### **Step 3. Visualization**

Please open `RStudio` and proceed to the `heatmap_journal.R` script