This notebook aids in the analysis of blast results. 
Its function is to give you list of targets that the primers and probe (one assay) can pick up.

It does this by:
1. Concat For, Rev and Probe blastn results. 
2. Look for the specific target using keywords.
3. Check to see if the specified targets can be picked up.

At the end of it, it makes a list and produces a csv file containing a list of accession numbers and strain names that the assay can pick up.


How to use:

Perform Blastn on primer forward, primer reverse, probe, save your results for each in its own csv file.

Be sure to cleanup the file so that it contains only the following columns:
(It's ok to leave column names in the first row. Please see example file)

1. Description
2. Max Score
3. Total Score
4. Query Cover
5. E value
6. Per. Ident
7. Accession

Then proceed to use this notebook to retrieve the strains and accession numbers picked up by assay. Remember to use the right path to your files,
and name your output file as you like it.

Types of files processed: CSV files
Python version: 3.7.1
Developed on: Jupyter Lab 0.35.6 [24 Jun 2019]


In [None]:
#import packages

import pathlib
import os
import csv
import pandas as pd


In [None]:
#Get files

for_file = pd.read_csv(r'C:\Users\EASlim\BlastN\BLASTNR_290519/_For.csv')
for_file = for_file.add_suffix('_F')

rev_file = pd.read_csv(r'C:\Users\EASlim\BlastN\BLASTNR_290519/_Rev.csv')
rev_file = rev_file.add_suffix('_R')

p_file = pd.read_csv(r'C:\Users\EASlim\BlastN\BLASTNR_290519/_P.csv')
p_file = p_file.add_suffix('_P')


In [None]:
#concat files

combined = pd.concat([for_file, rev_file, p_file], axis=1 )

#combined.head()

In [None]:
combined.columns = combined.columns.str.replace('.','').str.replace(':','').str.replace(' ', '_').str.lower()

In [None]:
#combined.head()

## Finding unique accession number or unique strain names.

Base the unique accession number on the probe, since if the probe does not bind, it does not matter in the end if the forward 
and reverse primers bind which more often, they do. 

1. Extract unique number in the primer accession column
2. Compare to see if they exist in the for and rev accession columns

In [None]:
list_unique_p = []

for num in combined['accession_p']:
    if num not in list_unique_p:
        list_unique_p.append(num)

In [None]:
print(list_unique_p[:10])

In [None]:
list_unique_f = []

for num in combined['accession_f']:
    if num not in list_unique_f:
        list_unique_f.append(num)

In [None]:
list_unique_r = []

for num in combined['accession_r']:
    if num not in list_unique_r:
        list_unique_r.append(num)

In [None]:
all_3 = []

for target in list_unique_p:
    if target in list_unique_f and target in list_unique_r:
        all_3.append(target)

In [None]:
print(all_3[:10])

## Retrieve corresponding names of targets using found accession numbers

Take the all_3 list, compare it to the items in the table and retrieve a list of strains that the all 3 oligos will bind to, thus confirming
the assay picks up these strains. 

Use example:

combined.loc[combined['accession_p'] == 'MH045846.1', 'description_p'].iloc[0]

In [None]:
target_list = []

for x in all_3:
    item = combined.loc[combined['accession_p'] == x, 'description_p'].iloc[0]
    target_list.append(item)

In [None]:
#for item in target_list:
#   print(item)

## Create list and export as a csv file

In [None]:
d = {'accession_nums': all_3, 'target': target_list}

Target_List = pd.DataFrame(d, columns=['accession_nums','target']) 

In [None]:
Target_List.head()

In [None]:
Target_List.to_csv('.csv', index=False, encoding='utf-8')  #we can name the file as we like.