# Working with files

Until now, we have been working with DNA and other datasets that have been "hardcoded" into our notebooks. Typically, 
information we want to work with will be contained in a file. In this notebook we will introduce working with files, and also moving code we have written into a Python script. 

First, lets find a file that we can work with using some shell commands: 

In [None]:
# see the content of the data directory

!ls ~/biocoding-2018/notebooks/data/

The `hiv1_genome.txt` file is listed as a text file, but is actually in fasta format

In [None]:
!cat ~/biocoding-2018/notebooks/data/hiv1_genome.txt 

Once we know the path of a file, we can save it for Python to use; filepaths should be strings. 

In [None]:
filepath = "data/hiv1_genome.txt"

In [None]:
"""Open a file for reading and print the lines in the file. 
   If the file can not be opened, raise and error"""

try:
    #open a file for reading
    with open(filepath,'r') as input_file:
        for line in input_file:
            print(line)
except (IOError):
    raise

### Asking for user input

We can also use the `input()` function to prompt the user for input: 

In [None]:
filepath = input("Please input a complete file path:")

try:
    #open a file for reading
    with open(filepath,'r') as input_file:
        for line in input_file:
            print(line)
except (IOError):
    raise

## Manipulating and writing files

Finally, we can manipulate the contents of a file and write the result into a new file:

In [None]:
try:
    #open a file for reading
    with open(filepath,'r') as input_file:
        for line in input_file:
            if '>' in line:
                fasta_header = line.strip()
            else:
                sequence = line.strip()
except (IOError):
    raise
 
# make RNA into DNA

dna = sequence.replace('u','t')

#write the dna to file
with open('./dna_result.fasta','a') as result_file:
    result_file.write(fasta_header + '\n')
    result_file.write(dna + '\n')

We can check the contents of the resulting file...

In [None]:
!cat dna_result.fasta

### ... Make this into a script

Try this using the shell ARGV

`from sys import argv`

`filepath = argv[1]`