### CS102/CS103

Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics<br />
NUI Galway

# Lecture 14: File Processing

Files contain data and programs.  Files provide permanent storage.
Files are used to transport data from one place to another.
Dealing with the physical aspects of storing, retrieving and
updating data on file is one of the tasks of the operating
system. A `python` programmer need not worry about these issues
and can access files through an abstract interface.  We'll
discuss the `python` tools provided for this. 

But first, a few more thoughts on encoding and encrypting. 

##  Encryption

Single letter message units over the 37-letter alphabet
`0`, `1`, ..., `9`, `A`, `B`, ..., `Z`, `_`
are encrypted by an **affine encryption function** of the form
$$
f \colon x \mapsto \alpha x + \beta,\, \mathbb{Z}_{37} \to \mathbb{Z}_{37}
$$

In [1]:
alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_"

def my_ord(c):
    "encode character c relative to alphabet"
    i = alphabet.index(c)
    return i

my_ord('C')

12

In [2]:
def my_chr(code):
    "decode code relative to alphabet"
    c = alphabet[code]
    return c

my_chr(12)

'C'

In [3]:
def text2code(message):
    "convert a textual message into a sequence of code points"
    codes = []
    for c in message:
        code = my_ord(c)
        codes.append(code)
    return codes

In [4]:
text2code("HI_THERE")

[17, 18, 36, 29, 17, 14, 27, 14]

In [5]:
def code2text(codes):
    "efficiently convert code points into text"
    chars = []
    for code in codes:
        c = my_chr(code)
        chars.append(c)
    message = "".join(chars)
    return message

In [6]:
code2text([17, 18, 36, 29, 17, 14, 27, 14])

'HI_THERE'

In [7]:
secret = open("secret.txt").read().strip()
secret

'K1OSROWOANUKO7SCUK1OCUKIRIUTSUWUTSK0NITLSYU3SK1OS4UK1ONS94SC97ONTSR9CQ0KITL'

In [8]:
text2code(secret[-9:])

[27, 9, 12, 26, 0, 20, 18, 29, 21]

In [9]:
text2code("COMPUTING")

[12, 24, 22, 25, 30, 29, 18, 23, 16]

In [10]:
from modular import modular_inverse
modular_inverse(7, 37) * 8 % 37

17

In [11]:
def encipher(x):
    "affine enciphering function"
    return (17 * x + 8) % 37

def decipher(x):
    "affine deciphering function"
    return (24 * (x - 8)) % 37
    

In [12]:
encipher(23), encipher(16)

(29, 21)

In [13]:
def encrypt(codes):
    "encryption"
    crypto = []
    for code in codes:
        crypto.append(encipher(code))
    return crypto

In [14]:
code2text(encrypt(text2code("COMPUTING")))

'R9CQ0KITL'

In [15]:
def decrypt(crypto):
    "decryption"
    codes = []
    for code in crypto:
        codes.append(decipher(code))
    return codes


In [16]:
code2text(decrypt(text2code(secret)))

'THE_CELEBRATED_MATHEMATICIAN_ALAN_TURING_WAS_THE_FATHER_OF_MODERN_COMPUTING'

## Files

* Conceptually, a file is a sequence of data that is stored in secondary memory,
usually on a disk drive. 
* Files can contain any type of data, but the easiest files to work with are those
containing text.
* File of text can be read and understood by humans.
* They are easily created and edited with general purpose text editors or
word processors.
* A file can be regarded as a (possibly long) string.
* A file typically consists of several lines.
* `python` uses the newline character (`\n`; ASCII and Unicode code point 10) as end-of-line marker.

In [17]:
ord('\n')

10

### Workflow

Working with files usually involves 3 steps.

1. **open** the file.
2. **read** from or **write** to the file.
3. **close** the file.

### Opening a file

* The `open()` function returns a **file object** and is most commonly used with two arguments:
```
open(<filename>, <mode>)
```
* The first argument, `<filename>`, is a string containing the name of the file in question.
* The second argument, `<mode>`, is another string, usually a single character, indicating
how the file will be used:
    * mode `'r'` opens the file for reading (only) - it is an error to
     try and read a file that does not exist;
    * mode `'w'` opens the file for writing (only) - should a file with that name exist, its
content is erased.  
* The `<mode>` argument is optional and defaults to `'r'` if omitted.

In [18]:
infile = open("students.csv", "r")
outfile = open("usernames.txt", 'w')

### Reading from a file

* `<file>.read()` reads the **entire file** and returns its content as 
a **single string**.

* `<file>.readline()` reads the **next line** of the file
and returns it as a string (including the end-of-line marker `\n`).

* `<file>.readlines()` reads the **entire file** and returns
its content as a **list of strings**, one string per line.

* `for <line> in <file>:` loops over the lines of the file.


In [19]:
infile.readline()

'CASEY,ELLEN\n'

In [20]:
infile.readline()

'O SHEA,HUGO\n'

In [21]:
lines = infile.readlines()
lines2 = infile.readlines()

In [22]:
lines2

[]

### Writing to a file

* `<file>.write(<string>)` writes (appends) a string to the file, unformatted.

* `print(<string>, file=<file>)` writes a string to the file, standard print formatting
is applied.

In [23]:
from username import username, csv2name

In [24]:
first, last = csv2name(lines[0])

In [25]:
uname = username(first, last)

In [26]:
uname

'sculliga'

In [27]:
outfile.write(uname)

8

### Closing a file

* `<file>.close()` closes the file.

In [28]:
infile.close()
outfile.close()

### Example:  Batch Usernames

In batch processing, input and output is done through files.
Let's generate usernames for **all** students on file.

In [29]:
def userfile(in_name, out_name):
    "read student names from infile, write usernames to outfile"

    # open the files
    infile = open(in_name, 'r')
    outfile = open(out_name, 'w')
    
    # process input file line by line
    for line in infile:
        first, last = csv2name(line)
        uname = username(first, last)
        print(uname, file = outfile)
    
    # close the files
    infile.close()
    outfile.close()

And off we go ...

In [30]:
userfile("students.csv", "usernames.txt")

## Summary: Files

* **Text files** are multi-line strings stored in **secondary memory**.

* A file may be opened for **reading** from or for **writing** to it.

* When opened for writing, the existing content of the file is **erased*.

* `python` provides three **file-reading methods**: `read()`, `readline()` and `readlines()`.

* It is possible to **iterate through the lines** of a file with a `for` loop.

* Data is **written** to a file using the `print` function.

* Files can also be used to **store `python` programs**.

* To read a `python` program **into the current session**, the `import` statement is used.