# File handling – exercise

- Lets write a (new) file
- Use as a name of the file: "file.txt"
- By using "w" it will overwrite any existing files
- The file will be located in the current working directory, unless you specify the entire path before the filename

In [2]:
# Open a file
file = open("file.txt", "w")
print "Name of the file: ", file.name

Name of the file:  file.txt


# File handling – good practice
- After opening also close your files:
- The close() method of a file object flushes any unwritten information and closes the file object, after which no more writing can be done. It is a good practice to use close().

In [3]:
# Open a file
file = open("file.txt", "w")
# do stuff
file.close()

# File handling – good practice
- Before opening a file its also good to actually check the file exists

In [11]:
import sys ## module System-specific parameters and functions
try:
    file = open("file.txt", "r")
    print "Name of the file: ", file.name
    file.close()
except:
    sys.exit("File does not exist!")


Name of the file:  file.txt


# File handling – writing

- To write to a file you use the command below:
    - file.write("What you want to write")

- Try yourself to write some variables to a file

In [57]:
file = open("file.txt", "w")

file.write("Hello script!\n") ## write directly

line = "This is my output!"
file.write(line+"\n") ## write string

line2 = "1 2 3 4 5"
file.write(line2+"\n") ## write string


line3 = "blah blah blah"
file.write(line3+"\n") ## write string


file.close()

# File handling – reading
- Try to read the lines you just wrote to a file in the previous exercise

In [26]:
# Most frequently a file is being read line by line using a loop. 
# This method also reads a file line by line just like readline()
# Example:

file = open("file.txt", "r")
for line in file:
    print line,
file.close()

Hello script!
This is my output!
1,2,3,4,5
blah blah blah


# Newline characters
- Most of the times the newline characters at the end of a line are simply said “annoying”, we can remove them using the following command:

line.rstrip() ## removes newline character

# File handling – splitting the lines

- To split your line, you can use the line.split() function using any delimiter (example below)

- But of course if we want to split on a bit more difficult pattern we rather use the Regex split function we discussed this morning

In [72]:
import re

file = open("file.txt", "r")
for line in file:
    subline = re.sub( "blah", "ja", line)
    print subline
    splitline = re.split( "(\d)", line )## you can use different (deliminators)!
    #print splitline 
file.close()

Hello script!

This is my output!

1 2 3 4 5

ja ja ja



# File handling – exercise

- The file "exercise.bed" contains genomic regions and is rather large, the perfect opportunity to use Python to process this file

- Read the BED file and make an output file were "chr" in front of the number is removed (e.g. chr1 will be 1).

- Determine the number of regions covered in each chromosome (e.g. each line is a region chromosome 1 --> 71906 regions)

- Parse and print to screen the total number of regions and size covered for each chromosome

- Challenge: combine number 1 and 2

In [9]:
import re

file = open("exercise.bed", "r")

out_file = open("exercise.bed", "w") #duplicate file before you start fidgetting with it
counts = {}

for line in file:
    line = line.rstrip() 
    sub = re.sub( "chr", "", line)
    spl = re.split("\t", sub)
    if spl[0] in counts:
        counts[spl[0]] += 1
    else :
        counts[spl[0]] = 1
    #print counts
file.close()

print counts
# Type here your code

{'20': 21129, '21': 6894, '22': 12363, '1': 71906, '3': 55881, '2': 65512, '5': 42642, '4': 44452, '7': 49400, '6': 47934, '9': 29893, '8': 34334, 'Y': 56, 'X': 29180, '11': 40677, '10': 39957, '13': 23109, '12': 44469, '15': 23975, '14': 27399, '17': 32679, '16': 23597, '19': 19482, '18': 18883, 'MT': 4}


# As frequently requested two examples:
- Limit the raw_input to only nucleotides:

In [11]:
import re
input_str = ""
while not re.match("^[actg]{1,}$", input_str,re.I):
    input_str = raw_input("Please provide some nucleotides:")
print input_str

Please provide some nucleotides:CGAGAG12
Please provide some nucleotides:CFGAFAF
Please provide some nucleotides:CGAGAT
CGAGAT


# As frequently requested two examples:¶
- make reverse complement a sequence easy:

In [14]:
from string import maketrans
dna_code = "aCGttgagatcagat"
dna_code2 = "CGCGATGC"
complement = maketrans("acgtACGT", "tgcaTGCA")
print "dna_code:           ", dna_code
print "complement:         ", dna_code.translate(complement)
print "reverse complement: ", dna_code.translate(complement)[::-1]

print "dna_code:           ", dna_code2
print "complement:         ", dna_code2.translate(complement)
print "reverse complement: ", dna_code2.translate(complement)[::-1]

dna_code:            aCGttgagatcagat
complement:          tGCaactctagtcta
reverse complement:  atctgatctcaaCGt
dna_code:            CGCGATGC
complement:          GCGCTACG
reverse complement:  GCATCGCG
