# P01: Replication Process

To start with, we load the Classes and Functions that we will use in this notebook. As explained in the README file, we will use the `Replication` class to replicate some DNA sequences. This sequences can be obtained through the `LoaderFactory` class, allowing us to load the sequences from a file path or from the ncbi's API directly.

In [1]:
from replication.dna import DNA
from replication.replication import Replication
from replication.dna_utils import DNAUtils

from data_loader import LoaderFactory

## Local DNA Sequences

### Load DNA Sequences from a File

The first step is to load the DNA sequences from a file whenever this is the application case. To do so, we have followed the Factory Method pattern, which allows us to create a `Loader` object that will load the sequences from different sources, depending on the needs of the application. In this case, we will use the `FileDataLoader` class to load the sequences from a file.

In [2]:
loader = LoaderFactory.initialize_loader('Files')
dna_seq = loader.collect('./data/Data1.txt')

In [3]:
def write_results(filename, dna):
    with open(filename, 'w') as file:
        file.write(dna)

Once we have obtained the sequence directly from the file, we can extract some information from it, such as the number of nucleotides that it contains.

In [4]:
dna = DNA(dna_seq)
dna.show_nucleotides()

Number of nucleotides: 935


## Simulation process

We simulate now the replication process of the DNA sequence. We will use the `Replication` class to replicate the DNA sequence, which will simulate the creation of all necessary enzymes that participate in the replication process. The output of its `execute`method will show how replication occurs, step by step, in the real world.

In [5]:
replication = Replication()
dna_son1, dna_son2 = replication.execute(dna)

Number of helicases: 4
----------------------------------------------------------------------------------------------------
Helicases bind to the DNA strand and unwind the double helix at the positions: [0, 273, 475, 699]
Leading strand replication process...
Helicase unwinds the DNA double helix...
The unique primer of each the leading strand is placed at the 5' end of the DNA strand in positions:  [0, 273, 475, 699, 935]
----------------------------------------------------------------------------------------------------
The ADN polymerase enzyme is added to each primer and begins to synthesize the new DNA strand...
The leading strand replication process is complete.
----------------------------------------------------------------------------------------------------
Lagging strand replication process...
The lagging strand is synthesized in fragments called Okazaki fragments.
Helicases unwind the DNA double helix at the following positions:  [0, 273, 475, 699, 935]
--------------------

We now check the replication process has been successful by comparing the original DNA sequence with the replicated one.

In [6]:
dNAUtils = DNAUtils()
test1 = dNAUtils.test_dna(dna.strand, dna_son2)
test2 = dNAUtils.test_dna(dna.comp, dna_son1)

if test1 and test2:
    print("The replication process has been successful.")
else:
    print("There's been a mutation in the replication process.")

write_results('./data/son_1.txt', dna_son1)
write_results('./data/son_2.txt', dna_son2)

The replication process has been successful.


Given this result, we are able to move towards more challenging tasks, such as loading DNA sequences from ncbi's API.

## Remote DNA Sequences

### Load DNA Sequences from the NCBI's API

We now initialize the `LoaderFactory` class to load the DNA sequences from the ncbi's API. We will use the `ApiDataLoader` class to load the sequences from the API.

In [7]:
loader = LoaderFactory.initialize_loader('APIncbi')
dna_seq = loader.collect()

Once connected to the API, we can load the DNA sequences from it.

In [8]:
for id, seq in dna_seq:
    print(f'{id}: {seq}')
    break

Found 10 results.
NR_197590.1: CTCCGTCCCTTCTATTCTCAGCGCCCGCCTGGCAGGACGACTGAGCAAGGCTTTGGAAAACCAGAGAGATTAGAGCGCAGAATGGGGAAATGGAGAGAGAACCTGAAAGAGCCCCAAACTCGAGGACCTATTGCTCCCCAAGAATAACATCTTCCAGAACTAGACAGAAACTAAGCGTCTGGAAACCCTGAAATCCTTGGAGGAGTAGCATCATCCTGACCCTCTGTGCTCCTTTTGGCAAAGGACTTGCTTCCATTGTTTGTTTGTTCAATTGTCTGTTTGTTAAATAAATAAAACTCTTTTCATATATCTTTAAAA


We will be obtaining 10 DNA sequences from the API for each data load, but we can modify this number, as is shown in the code below.

Notice that the API returns the DNA sequences corresponding to human dna by default, related to different chromosomes. However, we can modify the query to obtain sequences from other organisms or chromosomes by modifying the `ApiDataLoader` class.

## Simulation process

We will collect three different DNA sequences from the API and simulate the replication process of each of them.

In [9]:
dna_seq = loader.collect(3)

In [10]:
for id, seq in dna_seq:
    dna = DNA(seq)
    dna.show_nucleotides()

    print(f'Replicating {id}...')

    replication = Replication()

    rate = 0.005
    scale = 20

    dna_son1, dna_son2 = replication.execute(dna, scale=scale)
    dNAUtils = DNAUtils()
    test1 = dNAUtils.test_dna(dna.strand, dna_son2)
    test2 = dNAUtils.test_dna(dna.comp, dna_son1)

    if test1 and test2:
        write_results(f'./data/{id}_son1.dna', dna_son1)
        write_results(f'./data/{id}_son2.dna', dna_son2)

    else:
        continue

    print(f'{id} has been replicated successfully.')
    print('\n\n')

Found 3 results.
Number of nucleotides: 318
Replicating NR_197590.1...
Number of helicases: 1
----------------------------------------------------------------------------------------------------
Helicases bind to the DNA strand and unwind the double helix at the positions: [0]
Leading strand replication process...
Helicase unwinds the DNA double helix...
The unique primer of each the leading strand is placed at the 5' end of the DNA strand in positions:  [0, 318]
----------------------------------------------------------------------------------------------------
The ADN polymerase enzyme is added to each primer and begins to synthesize the new DNA strand...
The leading strand replication process is complete.
----------------------------------------------------------------------------------------------------
Lagging strand replication process...
The lagging strand is synthesized in fragments called Okazaki fragments.
Helicases unwind the DNA double helix at the following positions:  [0,

## Accessing our API

We have developed a simple API that allows us to access the DNA replica sequences that we have generated. Also, one could enter some DNA sequences and replicate them using the API. You can see both use cases below.

### Get Replica Method

In [11]:
api_path = 'http://127.0.0.1:5000'

In [12]:
import requests

In [13]:
get_dna_replica_path = '/replica?id=NR_105010.2'
response = requests.get(api_path + get_dna_replica_path)

In [14]:
response.json()

{'son1': 'TAAACTCTAGGCTCATACTAACCCCTTTATAGGACTTTATCAGATACTAAACTTCTTTACCTATCTACCGATCTTATTGTACGATCTACCATCAGTTCAATGATTGACGTCTTAACTAAACTATTTTAAGAGTAGCACAGAGGACTGAATCCTCTTTATAACCCAGACTACTTACCCATTTTAAGGGTCCACTTTTCTTAAACCAGACATTCTAAACCGTCGTGGGACTCCAAACGGTATAGACTACCGGACCGAAATCACCGTCGTGGATCACTTCCGGGTCCACAGGAAGACTGTTGGAAATGGTGGATGGCAAAACAGGAATGAATACAACTTAGTGATGTGTGTCCGACAACTTAAGTCGACAGACACCTCTTTAATATGGTTGGTGTACCAATCTGTCTTCCTTATGTTATATAACGAATTAAACATAGACTTTCGATTCGTTGTACCTTTTTTTAGTTCGTACGAGTCAGTGAAGGTACTCCTGTTTGCTTCTGTGAAGTTCCTCCTGTAACTGTACCTAGACGGTCGAAAGATTCTGAAAGGGGTCTAAGTTCAGGTCAGTGTAGTTCAACGGAATAATTTCCCAGACGGGTGAGTTTTCCTCTTTCACGTTTTCTACGAGTCGTTCCCATCGTTTGCGTCAGTCTACTATGAAGGACTGTCGTAGGTTCTCGTTCAGAGTGGTGGTGTAGACAGTTATTTTTTCTTTTGTTCCTCGACGGTTAAACAGATAAATATCCACGGTTATTTTAGGACGTGTCTT',
 'son2': 'ATTTGAGATCCGAGTATGATTGGGGAAATATCCTGAAATAGTCTATGATTTGAAGAAATGGATAGATGGCTAGAATAACATGCTAGATGGTAGTCAAGTTACTAACTGCAGAATTGATTTGATAAAATTCTCATCGTGTCTCCTGACTTAGGAGAAATATTGGGTCTGATGAATGGGTAAAATTCCCAGGTGAAAAGAATTTGGTCTG

### Replicate Method

In [15]:
loader = LoaderFactory.initialize_loader('Files')
dna_seq = loader.collect('./data/Data1.txt')

In [16]:
post_dna_path = '/replicate'
data = {'dna': dna_seq}

response = requests.post(api_path + post_dna_path, json=data)
response.json()

{'son1': 'TGCGTTGTGCGTTGATCGTTCTTGCATAAGGGCTGCGTTGAAGGGCTGCGTTGTGCGTTGCTTGCATATCGTTAAGGGCTGCGTTGATCGTTCTTGCATATCGTTAAGGGCTAACAGCTTGCATCTTGCATTGCGTTGTGCGTTGTGCGTTGCTTGCATATCGTTAAGGGCCTTGCATTAACAGTAACAGAAGGGCCTTGCATTAACAGAAGGGCTAACAGTAACAGCTTGCATAAGGGCCTTGCATTGCGTTGAAGGGCTGCGTTGCTTGCATTGCGTTGAAGGGCCTTGCATTAACAGATCGTTATCGTTTGCGTTGATCGTTTGCGTTGAAGGGCTGCGTTGCTTGCATCTTGCATATCGTTATCGTTCTTGCATTAACAGTAACAGTGCGTTGTGCGTTGATCGTTCTTGCATATCGTTCTTGCATCTTGCATATCGTTCTTGCATTGCGTTGCTTGCATATCGTTTGCGTTGAAGGGCTAACAGTGCGTTGTAACAGTAACAGATCGTTAAGGGCTAACAGCTTGCATATCGTTATCGTTTGCGTTGTAACAGTAACAGATCGTTTGCGTTGATCGTTTGCGTTGCTTGCATTAACAGATCGTTAAGGGCTAACAGAAGGGCAAGGGCCTTGCATTAACAGCTTGCATATCGTTATCGTTCTTGCATTAACAGTAACAGATCGTTATCGTTCTTGCATATCGTTCTTGCATAAGGGCTGCGTTGCTTGCATTGCGTTGAAGGGCAAGGGCTGCGTTGTAACAGCTTGCATTAACAGTAACAGCTTGCATATCGTTATCGTTATCGTTTAACAGCTTGCATCTTGCATCTTGCATCTTGCATAAGGGCAAGGGCTGCGTTGTGCGTTGATCGTTTGCGTTGAAGGGCAAGGGCATCGTTCTTGCATTGCGTTGATCGTTATCGTTTAACAG',
 'son2': 'ACGCAACACGCAACTAGCAAGAACGTATTCCCGACGCAACTT