### Guide to using Jupyter Notebook Christina's DNA Analysis codes.

<font color='green'>**Using functions you need**</font> \
You will need to read cells 1-5 regardless of which functions you want to analyse your DNA with. After that select whichever cell you need. Cells 6 and 7 must be used chronologically. Cells 8 and 9 must be used chronologically. 

<font color='green'>**Keeping Outputs concise**</font> \
Some readable code has been muted (#) to make analysis more streamlined. Delete only the four letters in #CODE, but don't change any of the spacing. Spacing is essential for readability of python scripts.

<font color='green'>**Data Quality checkpoints**</font> \
A number of cells are numbered as .1 and are data quality checkpoints. These are to make sure your imports and code is being read correctly. You don't need to run them if your code is working, but it's always good to double check.

<font color='green'>**Troubleshooting**</font> \
There is a separate Jupyter Notebook called Troubleshooting if you are having issues importing your files.

In [1]:
#1. Import relevant packages. Ensure you have them installed on your Anaconda environment or they will fail to import.

import numpy as np
from Bio import SeqIO
import pandas as pd
import os
import sys

from IPython.display import HTML, display
def set_background(color):    
    script = (
        "var cell = this.closest('.jp-CodeCell');"
        "var editor = cell.querySelector('.jp-Editor');"
        "editor.style.background='{}';"
        "this.parentNode.removeChild(this)"
    ).format(color)
    display(HTML('<img src onerror="{}" style="display:none">'.format(script)))

set_background('#000000')

In [2]:
#2. Import Fasta from Relative Path
    #Tip: Fasta file needs to be in the same folder as your Jupyter Notebook

file_name = 'C1.308.fa'
set_background('#000000')

In [3]:
#3. Open your Sequence file and look at the contents

with open(file_name, 'r') as file:
    sequences = SeqIO.parse(file, 'fasta')
    for seq_record in sequences:
        print(f"ID: {seq_record.id}")
        print(f"Description: {seq_record.description}")
        print(f"Sequence: {seq_record.seq}") #add f before "Sequence to see your whole file

set_background('#000000')

In [7]:
#4. Create a DataFrame from the parsed records

df = pd.DataFrame(sequences, columns=['FastaID', 'Description', 'Sequence'])

#Display the DataFrame to verify sorting of the columns.
#CODEprint(df.head())

set_background('#000000')

In [11]:
#5. Data cleansing

#Convert column Sequence to a string
df['Sequence'] = df['Sequence'].astype(str)
#CODEprint(df['Sequence'].dtypes)

# Add Spaces between the nucleotides in your dataframe.
#df['Sequence'] = df['Sequence'].apply(lambda x: ' '.join(list(x)))

#Turn all letters from upper to lowercase
df['Sequence'] = df['Sequence'].str.lower()

#Check your output
#CODEprint(df[f'Sequence'])


set_background('#000000')

object


In [9]:
#9. Convert DNA nucleotides to RNA

# Assuming your DataFrame is named df and the column containing DNA sequences is 'Sequence'
df['RNASequence'] = df['Sequence'].str.replace('t', 'u', case=False)

#Make column U_count and count number of U's in the sequence.
df['U_count'] = df['RNASequence'].str.count('u')
df['U_count'] = df['U_count'].astype(int)
print(df['U_count'])

Series([], Name: U_count, dtype: int32)


In [10]:
#10. Export nucleotide content and GC Percentage to a text file. 
#Make sure you've run cells 6 to 7.

#Store the current system output
original_stdout = sys.stdout

#Define the file to which you want to redirect the print statements
file_name2 = 'RNASequence.txt'

#Open the file in write mode
with open(file_name2, 'w') as f:
    #Redirect sys.stdout to the file
    sys.stdout = f
    #Your code with print statements
    print(df[['U_count', 'RNASequence']])
    #Restore the original stdout
    sys.stdout = original_stdout

#Confirm that the print statements were redirected by reading the file.
with open(file_name2, 'r') as f:
    content = f.read()
    print(content)

Empty DataFrame
Columns: [U_count, RNASequence]
Index: []

