# File handling – reading and writing
- Python is also able to create and open files
- A file is opened or created by using the ```open()``` function with two parameters. 
    - The first parameter is the filename and the second parameter is the 'type' of opening: ```"w"``` in the case of writing, ```""r"``` in the case of reading and ```"a"``` in the case of appending
    - For example ```dna_sequences = open("dna_seqs.txt", "r")``` would read the file "dna_seqs.txt"
- If we open the file with the writing parameter we can start writing in the file
    - You can write using ```dna_sequences.write("this is the text that goes in the file \n")```. 
    - If you want the next text to start on a new line it is important to end with ```'\n'```
    - You can also write using a variable!
- It is also good practise to close the file after using it. 
    - ```dna_sequences.close()```

In [1]:
# Open a file
my_file = open("file.txt", "w")
print ("Name of the file: ", my_file.name)

## there should now be a file called "file.txt" in your workfolder

Name of the file:  file.txt


In [2]:
# Open a file
dna_seqs_file = open("dna_file.txt", "w")           #the "w" parameter means that it will write a new file called file.txt and overwrite
                                       #if it already exists
dna_seqs_file.write("actggcatcgatcgatcgatacgatcgatcagtcgatcgatcgatcga\n")  #writes this dna sequence to the file. The \n indicates a new line

dna = "acgtacgatcgatgcatacgcatcgatcagtac"   #creates a new variable containing a DNA seq

dna_seqs_file.write(dna + "\n")         #writes this variable to the file

dna_seqs_file.close()                   #closes the file. Try to open file.txt and see what it contains.

# File handling – good practice
- After opening also close your files:
- The close() method of a file object flushes any unwritten information and closes the file object, after which no more writing can be done. It is a good practice to use close().

In [3]:
# Open a file
my_file = open("file.txt", "w")
# do stuff
my_file.close()

# File handling – good practice
- Before opening a file its also good to actually check the file exists

In [4]:
import sys ## module System-specific parameters and functions
try:
    my_file = open("file.txt", "r")
    # do stuff
    my_file.close()
except:
    sys.exit("File does not exist!")

# File handling – writing

- To write to a file you use the command below:
    - file.write("What you want to write")

- Try yourself to write some variables to a file

In [5]:
my_file = open("file.txt", "w")

my_file.write("Hello script!\n") ## write directly

line = "This is my output!"
my_file.write(line+"\n") ## write string

my_file.close()

# File Handling - Writing - Exercise
1. Create a new file with the ```"w"``` parameter
2. Create a for loop that loops 500 times
3. Within each loop, add a random dna sequence, remember we made a script for this yesterday in the modules section
4. Close the file

In [9]:
# 1.
file=open('first_try.txt','w')
# 2. and 3.
import random
a=''
for i in range(500):
    a=random.choice('ATCG')+random.choice('ATCG')+random.choice('ATCG')
    file.write(a+'\t')
# 4.
file.close()

In [10]:
print(len(a))

3


# File handling – reading
- In order to read files you also use the ```open()``` function, but this time use the ```"r"``` parameter
- ```file.read() ``` returns a single string with all the characters in the file (inclusing newlines tabs and spaces)
- ```file.readline()``` reads a single line from the file
- alternatively u can read all the lines by looping over the file variable.
- don't forget to close a file after using

In [11]:
# Most frequently a file is being read line by line using a loop. 
# This method also reads a file line by line just like readline()
# Example:

my_file = open("file.txt", "r")
for apple in my_file:
    print (apple,)
my_file.close()

my_file = open("dna_file.txt", "r")
for apple in my_file:
    print (apple,)
my_file.close()

Hello script!

This is my output!

actggcatcgatcgatcgatacgatcgatcagtcgatcgatcgatcga

acgtacgatcgatgcatacgcatcgatcagtac



# File handling – reading + writing - exercise
1. Make a new file 
- Write five new lines containing data from your experiment
    - for example
    ```python
    563.4
    653.7
    112.4
    4324.997
    1
    ```
- Close the file
- Open the file with the reading parameter
- Read all the lines and print them
- Close the file

In [15]:
# 1. 
file = open("experiment.txt", "w")
# 2.
a="43,23,453,65,88"
file.write(a+"\n") 
# 3.
file.close()
# 4.
file = open("experiment.txt", "r")
# 5.
for a in file:
    print(a)
# 6.
file.close()

43,23,453,65,88



# Newline characters
- If you read a line the ```\n``` also stays in the string 
- Most of the times the newline characters at the end of a line are simply said “annoying”, we can remove them using the following command:

```python
line.rstrip() ## removes newline character```

# File handling – splitting the lines

- To split your line, you can use the ```line.split()``` function using any delimiter (example below)
- For example, in the following string ```abc-def-ghi-jkl```, splitting ```-``` results in a list containing ```["abc"],["def"],["ghi"],["jkl"]```
- If there is no parameter provided, it will split at a newline

In [8]:
my_file = open("file.txt", "r")
for line in my_file:
    splitline = line.split()## you can use different (deliminators)!
    print (splitline)
my_file.close()

dna_file = open("dna_file.txt", "r")
for line in dna_file:
    print (line.split('t'))
dna_file.close()

['Hello', 'script!']
['This', 'is', 'my', 'output!']
['ac', 'ggca', 'cga', 'cga', 'cga', 'acga', 'cga', 'cag', 'cga', 'cga', 'cga', 'cga\n']
['acg', 'acga', 'cga', 'gca', 'acgca', 'cga', 'cag', 'ac\n']


# Very important!!!
## File handeling -- File to list or dictionary

Often we want to store the contents of a file in a list or dictionay

In [10]:
### add to list
bed_file = open("exercise.bed","r")

bed_data = []

for line in bed_file:
    bed_data += [line]
bed_file.close()
print (bed_data[0])

chr1	925942	926020



In [15]:
## add to dictionary  值得一记
bed_file = open("exercise.bed","r")

bed_data = {}
line_no = 0
for line in bed_file:
    data = line.split("\t")
    bed_data[line_no] = data
    line_no += 1
bed_file.close()
print (bed_data[1])

['chr1', '930150', '930341\n']


In [19]:
### Add to dictionary using a gene id as key 
xy_file = open("Human_ChrXY_coding_genes_start_stop.txt")
xy_data = {}
for line in xy_file:
    line = line.rstrip()
    data = line.split("|")
    xy_data[data[0]] = data[1:3]
xy_file.close()
print (list(xy_data.keys())[0])

>ENSG00000000003


In [None]:
### Add to dictionary using a gene id as key 
xy_file = open("Human_ChrXY_coding_genes_start_stop.txt")
xy_data = {}
for line in xy_file:
    line = line.rstrip()
    gene_data = line.split("|")
    xy_data[gene_data[0]] = {"start":gene_data[1],"stop":gene_data[2],"chr":gene_data[3]}
xy_file.close()
print (list(xy_data.keys())[0])

In [None]:
xy_data[">ENSG00000000003"]

# File handling – exercise (Assignment type)

1. The file "exercise.bed" contains genomic regions and is rather large, the perfect opportunity to use Python to process this file. Import module "re"

1. Read the BED file and make an output file were "chr" in front of the number is removed (e.g. chr1 will be 1).

2. Determine the number of regions covered in each chromosome (e.g. each line is a region chromosome 1 --> 71906 regions)

- Parse and print to screen the total number of regions and size covered for each chromosome

- Challenge: combine number of regions and size per chromosome in a nice report

In [None]:
# 1.  答案完美！
import re

# 2.
# Open the file using “r” for reading
bedfile = open("exercise.bed",'r')
flie= open("exercise.bed_without_chr",'r')
# Loop over the file and remove "chr". Note: chr only at the beginning of the line
for a in bedfile:
    a=re.sub("chr",'',a)
    out_file
    


In [None]:
# 3. Determine the number of regions covered in each chromosome



# As frequently requested two examples:
- Limit the raw_input to only nucleotides:

In [None]:
import re
input_str = ""
while not re.match("^[actg]{1,}$", input_str,re.I):
    input_str = input("Please provide some nucleotides:")
print (input_str)

# As frequently requested two examples:
- make reverse complement a sequence easy:

In [None]:
from string import maketrans
dna_code = "aCGttgagatcagat"
complement = maketrans("acgtACGT", "tgcaTGCA")
print ("dna_code:           ", dna_code)
print ("complement:         ", dna_code.translate(complement))
print ("reverse complement: ", dna_code.translate(complement)[::-1])