# Reading and Writing Text FIles
### Often to preform analysis we must first get the files to manipulate 

I introduce this now because you should feel more comfortable with Python and pythonic ways to do tasks.
In this demo we will be reading in a text file (genetic info). Often genes are obtained from https://www.ncbi.nlm.nih.gov/gene. These can be downloaded as fasta format or Text file. 
Right now we will concern ourselves with textfiles to get the basics down.

# open( ) is a built in file handler
the open('path', 'mode') allows Python to read or write to files. 
#### read (r): reads in the file and can iterate through it for information 
#### write (w): create and write to a file 
#### append (a): mode adds content to a file 

## Reading a File 
### Methods and Practices 

In [None]:
#file reader methods

my_gene = open('Data/VEGFA.txt','r') #reads in a file from the path name as a string

#my_gene.read(100) #outputs a specifc number of characters from the file

#reading lines 
#my_gene.readline() #iterates and returns a whole line of the file, up to a \n character

#my_gene.readline(5) # reads the 5th line 

my_gene_content = my_gene.readlines() #reads every line, output is seperated by a comma

#Special note: once you iterate through the file using readlines() the iterator is finished
#for line in my_gene:
    #print(line)
#this wont work we must switch to my_gene_content now. 
#this can be avoided by using seek(0)


Special note: once you iterate through the file using readlines() the iterator is pointing to the end of the file
```python
for line in my_gene:
    #print(line)
```
This wont work! The reader is at the end of the file
### use seek(0) to move the cursor back to the beginning of the file


In [9]:
#common ways to iterate through a file using loops
my_gene.seek(0)
for line in my_gene:
    print(line)


>NC_000006.12:43770209-43786487 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly

TCGCGGAGGCTTGGGGCAGCCGGGTAGCTCGGAGGTCGTGGCGCTGGGGGCTAGCACCAGCGCTCTGTCG

GGAGGCGCAGCGGTTAGGTGGACCGGTCAGCGGACTCACCGGCCAGGGCGCTCGGTGCTGGAATTTGATA

TTCATTGATCCGGGTTTTATCCCTCTTCTTTTTTCTTAAACATTTTTTTTTAAAACTGTATTGTTTCTCG

TTTTAATTTATTTTTGCTTGCCATTCCCCACTTGAATCGGGCCGACGGCTTGGGGAGATTGCTCTACTTC

CCCAAATCACTGTGGATTTTGGAAACCAGCAGAAAGAGGAAAGAGGTAGCAAGAGCTCCAGAGAGAAGTC

GAGGAAGAGAGAGACGGGGTCAGAGAGAGCGCGCGGGCGTGCGAGCAGCGAAAGCGACAGGGGCAAAGTG

AGTGACCTGCTTTTGGGGGTGACCGCCGGAGCGCGGCGTGAGCCCTCCCCCTTGGGATCCCGCAGCTGAC

CAGTCGCGCTGACGGACAGACAGACAGACACCGCCCCCAGCCCCAGCTACCACCTCCTCCCCGGCCGGCG

GCGGACAGTGGACGCGGCGGCGAGCCGCGGGCAGGGGCCGGAGCCCGCGCCCGGAGGCGGGGTGGAGGGG

GTCGGGGCTCGCGGCGTCGCACTGAAACTTTTCGTCCAACTTCTGGGCTGTTCTCGCTTCGGAGGAGCCG

TGGTCCGCGCGGGGGAAGCCGAGCCGAGCGGAGCCGCGAGAAGTGCTAGCTCGGGCCGGGAGGAGCCGCA

GCCGGAGGAGGGGGAGGAGGAAGAAGAGAAGGAAGAGGAGAGGGGGCCGCAGTGGCGACTCGGCGCTCGG

AAGCCGGGCTCATGGACGGGTGAGGCGGCGGTGTGCGCAGACAGTGCT

In [None]:
#reading character by character
my_gene.seek(100)
for line in my_gene:
    for nucleotide in line:
        print(nucleotide)
    break #one line of characters 
my_gene.close() #good pratice to close your files 

# Creating and Writing to File
We can use Python's base package to also write to files
## f = file('newfile.txt','w')
### This would create a file in the directory called newfile.txt if it doesnt already exist


## Write Methods and Practices 

In [15]:
#using open to create a new file in the Data directory
#mode = 'w'
VEGFA_RNA = open('Data/VEGFA_RNA.txt','w')

#opening VEGFA as a textfile in read mode 
VEGFA_DNA = open('Data/VEGFA.txt','r')
VEGFA_DNA.readline() #clears the header line

#this dictionary is used to convert DNA into RNA 
DNA_to_RNA = {'G':'C','C':'G','T':'A','A':'U','\n':''} #remember this dictionary?

for line in VEGFA_DNA: #loops through lines
    for nucleotide in line: #loops through chars 
        '''
        here we start writing to VEGFA_RNA using write() 
        alternatively we could use writelines() to write multiple lines at once 
        however since we are translating DNA char by char it makes more sense to use write()
        '''
        VEGFA_RNA.write(DNA_to_RNA[nucleotide]) 
    VEGFA_RNA.write('\n') #a new line at each line of DNA 

VEGFA_RNA.close()
VEGFA_DNA.close()


# Using 'With' for cleaner syntax and less errors
## With is a python keyword for specifying the file to be iterated through

## lets modify the above code to use 'with' 

In [18]:
rna = ''
with open('Data/VEGFA.txt','r') as VEGFA:
    VEGFA.readline()
    for line in VEGFA:
        for nucleotide in line:
            rna = rna+ DNA_to_RNA[nucleotide]
# with will automatically close files for you 
print(rna[0:100])

AGCGCCUCCGAACCCCGUCGGCCCAUCGAGCCUCCAGCACCGCGACCCCCGAUCGUGGUCGCGAGACAGCCCUCCGCGUCGCCAAUCCACCUGGCCAGUC


## File Writing Exercise


In [None]:
%load exercises/exercise_textfiles_1.py



In [None]:
%load answers/answer_textfiles_1.py


# try, except, finally
## this block of code is used to catch exceptions or errors
Handling errors is important in writing safe and usable code. When using file openers it common practice to have some backup plan if the file can't be read: 'FileNotFoundError' is common if a textfile is spelled wrong. 
```python
try: 
    with open('fish_dna.txt') as fishfile:
        fish = fishfile.readlines()
except FileNotFoundError as e:
    print(e)
```
python has a whole list of errors that can be looked up here https://docs.python.org/3/library/exceptions.html.

# Assertions and AssertionErrors
assertions will make sure a condition is true. An assertion is defined then handled using an AssertionError as 
an except statement. 
# One Major Problem with Assertion Statements:
# Assertions can be turned off!
## Therefore it is crucial that data validation is done by raising errors or capturing everything in a try block
### example A will demonstrate what not to do with assertions 
### example B illustrates how to raise an error

In [1]:
'''
EXAMPLE A: DONT DO THIS
'''

def isDNA_bad(DNA):
    try:
        assert type(DNA)==str , 'dna must be string' # asserts that dna must be type string!
        DNA.upper()
        for char in DNA:
            if(char=='G' or char=='C' or char=='T' or char=='A'):
                pass #do nothing 
            else:
                return False
    except AssertionError as e:
        print(e)
     
    return True


'\nabove illustrates how you could use assertions and a subsequent assertion error to catch any condition \ndesired by the user. This will be essential when working with biological data! Think of faulty images or DNA that\nhas erroneous content. \n'

In [16]:
'''
EXAMPLE B: raise error 'ValueError'
'''
def isDNA(DNA):
    try:
        if not isinstance(DNA,str):
            raise ValueError('must be string') # slightly better than asserting value type 
        else:
            DNA.upper()
            for char in DNA:
                if(char=='G' or char=='C' or char=='T' or char=='A'):
                    pass #do nothing 
                else:
                    return False
    except ValueError as e:
        print(e)
        return False
    
    return True



False

# Brief discussion about taking user input
## Input( ) allows the user to enter keyboard input

In [30]:
#input will take a string message then prompt the user to enter something
userInput = input('enter anything:')
print(userInput +' is the type'+ str(type(userInput)))
#all output is a string unless casted later

#two inputs at once using .split() 
# split is used to break of strings at a given parameter, default is whitespace
x,y = input('enter  2 numbers punctuated by a space:').split()
print('number one:',x)
print('number two:',y)

#using list notation 
a,b,c = [int(x) for x in input('enter three numbers please:').split()]
print('a:',a)
print('b:',b)
print('c:',c)

enter anything:3
3 is the type<class 'str'>
enter  2 numbers punctuated by a space:3 3
number one: 3
number two: 3
enter three numbers please:3 7 9
a: 3
b: 7
c: 9


In [22]:
'''
using a try except block

one caveat all user input is type string so doing isinstance(num,int) will evaluate to false.
doing int(num) will also cause an error because a characters cant be casted to ints if a 'y' was typed in 
as input. Thus a try catch block is necessary. 
'''

num = input('please enter a number:' )
try:
    num = int(num)
except ValueError as e:
    print(e)
finally:
    print(num)


please enter a number:7
7


# summing up
## I hope this gives an understanding of working with reading and writing to files
## dont get to worried about user input, it is non-essential 
## do try and implement try except blocks for crucial functions.