# Introduction to Coding

### Reading and writing files

#### Different ways of opening a file

Python access files using the command `open`. Files can be read or written depending of the argument we pass to the command file. The file, once manipulated, has to be closed with the command close() unless we use the `with` statement (recomended).  

Arguments available to open a file:  
```
    'r'  -->  read only  
    'w'  -->  write only (overwrite a file with the same name)  
    'a'  -->  append to the existing file (do not overwrite)  
    'r+' -->  open a file both in read and write mode  
    'b'  -->  binary mode  
    't'  -->  ascii mode  
```

#### **Read** 

Let's read an example fasta file with python.

In [None]:
# Reads a file (stores all the file in memory)
with open('../data/input.fa', 'r') as fd:
    whole = fd.readlines()

print(whole)

In [None]:
# It is better to print without the newline (\n) character
for line in whole:
    print(line.strip())   # same as: print(line, end='') 

Storing a whole file in memory can be dangerous, expecially if we have to read a big file. A safer option is to read the file line by line:

In [None]:
# Reads a file line by line (memory safe)
with open('../data/input.fa', 'r') as fd:
    for line in fd:
        # Here we write the instructions
        print(line.strip())

#### **Write**

#### **Pay attention!** with this instruction you will overwrite an existing file with the same name without any warning!

In [None]:
with open('../data/output.txt', 'w') as fd:
    fd.write('Hello world!!\n')

In [None]:
# Let's check if the file has been succesfully written
with open('../data/output.txt', 'r') as fd:
    for line in fd:
        print(line)

### Exercise

Try to write a code that reads the `../data/input.fa` file and writes to an output file only the DNA sequences (without the headers `>` of the sequences).

[![button](../figures/button_solution_small.png)](solutions.ipynb#Reading-and-writing-files)

### Write a script

Python can be used interactively (for instance in a terminal). This is fine for short task or for debugging a piece of code, but is not recommended for sophisticated tasks.  
In such cases it is much easier to write a script that we can eventually execute. 

The following code calculate the number of sequences in a fasta file. Write the code in a text editor and then save it with the extention `.py` (for example 'my_script.py')

In [None]:
# This script calculate the number of sequences in a fasta file
count = 0

with open('../data/input.fa', 'r') as fd:
    for line in fd:
        if line.startswith('>'):
            count += 1

print("The file contains", count, "sequences")

Save the script somewhere, for instance in the folder `scripts` with the name `my_script.py`.  
Now execute the script as following:
```
    python3 ../scripts/my_script.py
```