# Files

While a program is running,  data is stored in random access memory (RAM). RAM is **volatile**, and when a program ends, data in RAM disappears. 

To make data available the next time the computer is turned on and the program is started, it has to be written to a **non-volatile storage medium**, such a hard drive, usb drive, or CD-RW.
The non-volatile storage can be also remote, managed a server in the *cloud* accessed remotely using the network.

Data on **non-volatile** storage media is stored in **named locations** on the media called **files**. The operating system is in charge of managing these files, and can be invoked from within Python to read/write data. In the following, we discuss the abstractions made available by Python for accessing and managing files.

Working with files is like working with a *notebook*. 

1. To use a notebook, it has to be opened. 
2. When done, it has to be closed. 
3. While the notebook is open, it can either be read from or written to.
4. Read/Write can occur in natural order or skipping part of the pages.

## Writing a file 

Let’s begin with a simple program that writes three lines of text into a file.

**Opening a file** creates what we call a **file handle**. In this example, the variable ```myfile``` refers to the new handle object. Our program calls **methods on the handle** (```.write``` and ```.close```) to make changes to the actual file located on the disk.

With **"w"** we indicate the **opening mode** of the file: 
1. if there is no file named ```test.txt``` on the disk, it will be created. 
2. if there already is a file named ```test.txt```, it will be replaced by the file we are writing.

In [2]:
myfile = open("test.txt", "w")  # opening in "w" mode

myfile.write("My first file written from Python\n")
myfile.write("---------------------------------\n")
myfile.write("Hello Salvatore, world!\n")

myfile.close()                  # close the file

## Reading a file line-at-a-time

Now that the file exists on our disk, we can open it for reading, and read all the lines
in the file, one at a time.  This time, the mode argument is **"r"** (reading).

We will use the method ```.readline```, which returns everything up to and including the **newline character**. Since the read string already has its own newline, we suppress the **\n** character that ```print()``` usually appends to our strings.

If we tried to open a file that doesn't exist, we should get a run-time error:
>  mynewhandle = open("wharrah.txt", "r")<br/>
>  **FileNotFoundError**: [Errno 2] No such file or directory: "wharrah.txt"

In [3]:
mynewhandle = open("test.txt", "r")

while True: # Keep reading forever
    theline = mynewhandle.readline() # Try to read next line
    if len(theline) == 0: # If there are no more lines
        break # leave the loop
    # Now process the line we’ve just read
    print(theline, end="") # print by suppressing the \n character

mynewhandle.close()

My first file written from Python
---------------------------------
Hello Salvatore, world!


## Read a file as a list of lines

It is often useful to fetch data from a disk file and turn it into a list of lines. 

Suppose we have a file containing our friends and their email addresses, one per line in the file. 
If the file is not sorted, we could read everything into a list of lines (using the file method ```.readlines```), then
sort the list (using the list method ```.sort```), and then write the sorted list back to another file:

In [5]:

print('*** FILE UNSORTED ***')
f = open("friends.txt", "r")
xs = f.readlines()
print(type(xs))
for s in xs:
    print(s, end='')
f.close()

xs.sort()

g = open("sortedfriends.txt", "w")
for v in xs:
    g.write(v)
g.close()


print('\n\n*** FILE SORTED ***')
g = open("sortedfriends.txt", "r")
line = g.readline()
while len(line) != 0:
    print(line, end='')
    line = g.readline()
g.close()

*** FILE UNSORTED ***
<class 'list'>
Salvatore; orlando@unive.it
Mario; mario80@gmail.com
Andrea; andreaaa@yahoo.com


*** FILE SORTED ***
Andrea; andreaaa@yahoo.com
Mario; mario80@gmail.com
Salvatore; orlando@unive.it


## try ... except (Catching exceptions)

Things may go wrong when you try to read and write files. 
If you try to open a file that doesn’t exist, you get an error:

```python
    fin = open('bad_file')  # default is mode="r"
    FileNotFoundError: [Errno 2] No such file or directory: 'bad_file'
```

You can check the existyence of a file by using functions like `os.path.exists` and `os.path.isfile`. 

However, it is better to go ahead and try — and deal with problems if they happen — which is exactly
what the `try` statement does in combintion with `except`. 

The syntax is similar to an `if...else` statement, where the `except` branch is taken only if some exceptions are raised in the `try` branch:

```python
    try:
        fin = open('bad_file')
    except:
        print('Something went wrong.')
```

Look at the following code box, which checks if a file exists. 
If it exists, the code prints it line by line, otherwise the file is created.

Note also that we show another way to read the lines of a text file: within a `for` loop, by using directly the file handle:

```python
    for line in file_handle:
        ...
```

In [10]:
# Example of opening that can cause an exception
# try ...  except 

filename="newfile"
try:
    file=open(filename,'r')
    print("File opened for reading")
    for line in file:   # new way to read lines of a text file
        print(line, end='')
    file.close()
    print("file closed for reading")
except IOError:
    file=open(filename,'w')
    print("File created")
    for i in range(15):
        l = "This is line " + str(i) + "\n"
        file.write(l)
    print("write operation done")
    file.close()

File opened for reading
This is line 0
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
This is line 10
This is line 11
This is line 12
This is line 13
This is line 14
file closed for reading


# Reading a text file as a single string

We use this method of processing files if we were not interested in the line structure
of the file, or the file doesn't possess a structure with newlines that break the text file in lines.

In the following, we open a file in read mode (the default mode), read a single string, and then split the string to produce a list of words. 



In [6]:
f = open("sortedfriends.txt")  # "r" is the default mode
content = f.read()             # single string
f.close()

words = content.split()
print("There are", len(words), "words in the file.")
print(words)                   # print the list obtained after the split

# remove ';' from the single words
for i in range(len(words)):
    words[i] = words[i].replace(';','')  
    
print(words)                   # print the list obtained after the split


There are 6 words in the file.
['Andrea;', 'andreaaa@yahoo.com', 'Mario;', 'mario80@gmail.com', 'Salvatore;', 'orlando@unive.it']
['Andrea', 'andreaaa@yahoo.com', 'Mario', 'mario80@gmail.com', 'Salvatore', 'orlando@unive.it']


## Working with binary files

Files storing *photos*, *videos*, *zip files*, *executable programs*, etc. are called **binary files**.
They are not organized into lines, and cannot be opened with a normal text editor. 

Since the stream of bytes stored in the file does not store printable characters, if a text editor tries to interpret the file in terms of and ASCII encoding, the result produced is meaningless.

For such file, we read/write blocks of raw bytes. 
In the example, we copy one binary file to another, using as the opening modes **"rb"** and **"wb"** (binary read/write).


In [7]:
f = open("sortedfriends.txt", "rb")
g = open("copied_file.txt", "wb")

while True:
    buf = f.read(20)  # we read 20 or less bytes at a time
    if len(buf) == 0:
        break
    g.write(buf)

print(type(buf))    # note that in this case read() returns a type 'bytes', due to the mode "rb"
    
f.close()
g.close()

<class 'bytes'>


## Other ways to open a file

**'a'**	: Open for appending at the end of the file without truncating it. Creates a new file if it does not exist. All the subsequent writes occurs at the end of the file, by increasing the size of the file.

**'+'**: Open a disk file for updating (reading and writing)

If we use **'w+'**, we open for reading/writing by truncating the file to 0 bytes. 
If we **'r+'**, we open for reading/writing without truncation.

## Fetching a file from the Web

Python includes modules (libraries of functions) to do almost everythings. There is a simple module to download Web pages, and eventually store on the local disk.

```python
             import urllib.request
```


In [8]:
import urllib.request

url = "http://finanza.repubblica.it/BorsaItalia/Azioni"
destination_filename = "local.htm"

urllib.request.urlretrieve(url, destination_filename)

('local.htm', <http.client.HTTPMessage at 0x7f29ec18ce10>)

We can also open the url (Uniform Resource Locator) of the page, and read like a file.

In [9]:
import urllib.request

url = "http://finanza.repubblica.it/BorsaItalia/Azioni"

f = urllib.request.urlopen(url)
str_all = f.read(500) # read the first 500 characters
f.close()

print(str_all)

b'\r\n\r\n<!DOCTYPE html>\r\n\r\n<html xmlns="http://www.w3.org/1999/xhtml" class="no-js" lang="it">\r\n<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>\r\n\tLa borsa italiana dalla A alla Z - Economia e Finanza - Repubblica.it\r\n</title><meta name="viewport" content="width=1024" /><link rel="dns-prefetch" href="//www.repstatic.it" /><link rel="dns-prefetch" href="//oasjs.repubblica.it" /><link rel="dns-prefetch" href="//oasjs.kataweb.it" /><link rel="dns-prefetch" href="//data'


## Exercises

1. Write a program that reads a file and writes out a new file with the lines in reversed order (i.e. the first line in the old file becomes the last one in the new file.)
2. Write a program that reads a text file and produces an output file which is a copy of the input text file, except for the first five columns of each line, which have to contain a four digit line number, followed by a space. Start numbering the first line in the output file at 1. Ensure that every line number is formatted to the same width in the output file. <br/>
*Hint: to print a string that contains an integer formatted to 4 digits (padded with leading 0's), use the followint syntax:  "{0:04d}.format(number)"*


In [None]:
# Ex 1

In [2]:
# Ex 2

line = "bla bla bla"
line = "{0:04d} ".format(34)  + line   # add 0034 at the beginning of the line
print(line)


    


0034 bla bla bla
