# Preface:

The entire work consists of a total of 6 *.py* files, which are (1) "basic_4_functions" ; (2) "ext1.py" ; (3) "ext12.py" ; (4) "ext123.py" ; (5) "extall.py" and (6) "all_extensions_in_class.py"

The guide will go through every *.py* files and describe how to use them.

# (1) basic_4_functions.py

This is the most basic file that contains 4 necessary functions: **fun_1, fun_2, fun_3 and fun_4**.

- `fun_1` takes a message and turns it into a DNA str using codons of 4 bases.
- `fun_2` takes a DNA str and decodes it into the original message.
- `fun_3` takes the encoded DNA str to a text file called **"basic_version_encoded.txt"**.
- `fun_4` takes the name of the text file and extract the stored DNA str from it.

In [1]:
from basic_4_functions import *
# test fun_1: 
# the encoded DNA str is programmed to be printed
test_fun_1 = fun_1('hello world')

TCCATCTTTCGATCGATCGGACAATGTGTCGGTGACTCGATCTA


In [2]:
# test fun_2:
test_fun_2 = fun_2(test_fun_1)

The coded message is: hello world


In [3]:
# test fun_3:
test_fun_3 = fun_3(test_fun_1)

The encoded DNA is stored in: basic_version_encoded.txt


In [4]:
# test fun_4:
test_fun_4 = fun_4(test_fun_3)
print(test_fun_4)

file has been received
TCCATCTTTCGATCGATCGGACAATGTGTCGGTGACTCGATCTA


# (2) ext1.py

This file integrates the 4 functions in the "basic_4_functions.py" and adds the 1st extension that has the ability to locate and read an encoded message when it is inserted into an arbitary DNA seq at an arbitary pos.

It has 2 main functions: **ext1_encode, and ext1_decode**.

- `ext1_encode` takes a message input, encodes it using codons of 4 bases, and after adding tags to the encoded DNA seq, it inserts the DNA seq into a random DNA library (default length of the DNA library is 500) and finally stores the whole DNA paragraph to a text file called **"ext1_encoded.txt"**.
- `ext1_decode` takes the name of the text file, extracts the stored information, finds the region of encoded DNA seq via tags, and finally retrieves the message encoded.

In [5]:
from ext1 import *
# test ext1_encode:
test_ext1_encode = ext1_encode('hello python')

The encoded DNA is stored in: ext1_encoded.txt


In [6]:
# test ext1_decode:
test_ext1_decode = ext1_decode(test_ext1_encode)

The coded message is: hello python


# (3) ext12.py

This adds an another entension to the "ext1.py" that it has the ability to add to the encoded message seq a checksum val as a crosscheck on the integrity of the encoded seq, and it can also check a decoded message using the checksum val.

It still has 2 main functions: **ext12_encode, and ext12_decode**.

- `ext12_encode` works similarly to ext1_encode, except that there is an additional step to add a checksum val to the encoded seq.

  (DNA_seq_insert = 'Tag'+DNA_seq + 'checksum_val' + 'Tag')
  
  
- `ext12_decode` also has similar functions to ext1_decode, but has the ability to perform a checksum test and displays its result.

In [7]:
from ext12 import *
# test ext12_encode:
test_ext12_encode = ext12_encode('hello python world')

The encoded DNA is stored in: ext12_encoded.txt, and a checksum value is added


In [8]:
# test ext12_decode:
test_ext12_decode = ext12_decode(test_ext12_encode)

The result of the checksum test is: True
The coded message is: hello python world


# (4) ext123.py


This adds an another entension to the "ext12.py" that it can take a reference genome file as a one-time pad to encrypt the input message and only with the reference genome file, the message can be decrypted.

It has 2 main functions: **ext123_encode, and ext123_decode**.

- `ext123_encode` is similar to ext12_encode but with the extra ability to take a reference genome and further encrypt the message.(The reference genome used here is the sequence from watermelon chromosome 1,its FASTA seq can be found in: https://www.ncbi.nlm.nih.gov/nuccore/CM018018.1?report=fasta)

- `ext123_decode` takes the filename and reference genome and decodes the DNA seq into the stored message.

In [9]:
from ext123 import*
# test ext123_encode:
test_ext123_encode = ext123_encode('hello world again')

The encoded DNA is stored in: ext123_encoded.txt


In [12]:
# test ext123_decode:
# prepare reference genome:
f = open('ref_genome_watermelon_chro1.txt','r')
reference_genome = f.read()
reference_genome = reference_genome.replace('\n','')

test_ext123_decode = ext123_decode(test_ext123_encode, reference_genome)

The result of the checksum test is: True
The coded message is: hello world again


# (5) extall.py

This file contains all extensions that uses a randomised dictionary and can take unicode input.

It has 2 main functions: **extall_encode, and extall_decode**.

- `extall_encode`

- `extall_decode`

In [28]:
from extall import *
# test extall_encode:
test_extall_encode = extall_encode('你好')

The encoded DNA is stored in: extall_encoded.txt


In [27]:
# get random dict from json files
with open('Ranhex_to_codon_dict.json','r') as file_object:  
  hex_to_codon_dict = json.load(file_object) 

with open('Rancodon_to_hex_dict.json','r') as file_object:  
  codon_to_hex_dict = json.load(file_object) 

with open('allRancodon_to_number_dict.json','r') as file_object:  
  codon_to_numbers_dict = json.load(file_object) 

with open('allRannumber_to_codon_dict.json','r') as file_object:  
  numbers_to_codon_dict = json.load(file_object) 

In [29]:
# test extall_decode:
test_extall_decode = extall_decode(test_extall_encode, reference_genome)

The result of the checksum test is: True
The coded message is: 你好


# (6) all_extensions_in_class.py



In [33]:
from all_extensions_in_class import *
# test
test_class_encode = Encrypt('hello world + 你好',reference_genome)
test = test_class_encode.encode()

The encoded DNA is stored in: class_encoded.txt


In [34]:
# test
test_class_decode = Decrypt(test, reference_genome)
test_output = test_class_decode.decode()

The result of the checksum test is: True
The coded message is: hello world + 你好
