Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [13]:
NAME = ""
COLLABORATORS = ""

---

# File handling


## Creating our reference file

In our initial example we create a small reference file by executing the following cell. The so-called cell-magic command `%%file` takes its first argument as a filename and writes the remaining content of the cell to file

[More about cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

In [14]:
%%file composers.txt
dufay
ockeghem
josquin
willaert
orlando
sweelinck

Overwriting composers.txt



## Reading

In the first exercise we recall different forms of reading text files. In 1-3 there are single-line answers



#### Exercise 1
What single line is used to return the file content as a single string? Not that the string will contain the end-of-line (newline) character `\n`

In [15]:
# YOUR CODE HERE
################
open('composers.txt').read()

'dufay\nockeghem\njosquin\nwillaert\norlando\nsweelinck\n'

An underscore `_` refers to the value of the previoius command. In the test cell we are thus checking the return  of the last line in the previous cell

In [16]:
result = _
expected = 'dufay\nockeghem\njosquin\nwillaert\norlando\nsweelinck\n'
assert result == expected, f'{result=} != {expected=}'

#### Exercise 2
What single line is used to return the file content as a Python list of strings. 

In [17]:
# YOUR CODE HERE
################
open('composers.txt').readlines()

['dufay\n',
 'ockeghem\n',
 'josquin\n',
 'willaert\n',
 'orlando\n',
 'sweelinck\n']

In [18]:
result = _
expected = ['dufay\n', 'ockeghem\n', 'josquin\n',  'willaert\n',  'orlando\n',  'sweelinck\n']
assert result == expected, f'{result=} != {expected}'

#### Exercise 3
This function will return the lines of the file without the 'end-of-line' characters. Use the [`strip`](https://docs.python.org/3/library/stdtypes.html#str.strip) method and a [list comprehension](https://realpython.com/list-comprehension-python/)

In [19]:
def get_list_of_lines_without_eol_character(filename):
    return  [
    # YOUR CODE HERE
    ################
    ]
 
get_list_of_lines_without_eol_character('composers.txt')

[]

In [20]:
result =  get_list_of_lines_without_eol_character('composers.txt')
expected = ['dufay', 'ockeghem', 'josquin', 'willaert', 'orlando', 'sweelinck']
assert result == expected, f'{result=} != {expected}'

AssertionError: result=[] != ['dufay', 'ockeghem', 'josquin', 'willaert', 'orlando', 'sweelinck']

## Reading numeric data
All reading/writing is done with text objects, i.e. reading a '7' will be text string with a single character rather than a numeric value. 

To get numbers from file intended for computation we have to do a type conversion (typically `int` or `float`)

Consider a file with one number per line. 

    1
    2
    3
    4


### Exercise 4

Write a function that takes a file name as input and returns the sum of the numbers in the file, assuming one number per line

In [22]:
%%file numbers1.txt
1
2
3
4
5

Overwriting numbers1.txt


In [31]:
def sum_numbers_in_file(filename):
    number_sum = 0
    
    # YOUR CODE HERE
    ################
    # f = open('numbers1.txt') #do not hard-code the filename in source
    #f = open(filename)
    #for line in f:
    #    print(line, end="")
    #    number_sum = number_sum + int(line)
    #f.close()
    with open(filename) as f:
        for line in f:
            number_sum += int(line)
    
    return number_sum 
 
sum_numbers_in_file('numbers1.txt')

15

In [32]:
result = sum_numbers_in_file('numbers1.txt')
expected = 15
assert result == expected, f'{result=} != {expected=}'

### Exercise 5

It was realized that some files have more numbers on several lines such that for a file like

    1 2 3 4 5
    6 7 8 9 10
    
could not be handled by this function but we would get an error like

    E           ValueError: invalid literal for int() with base 10: '1 2 3 4 5\n'
    

In this exercise generalize the function above by copying here. Use the `split` method to split a line into words.
Example

    >>> '1 2 3 4 5\n'.split()
    ['1', '2', '3', '4', '5']
   

In [33]:
%%file numbers2.txt
1 2 3 4 5
6 7 8 9 10



Writing numbers2.txt


In [34]:
# You may uncomment this line and try
sum_numbers_in_file('numbers2.txt')

ValueError: invalid literal for int() with base 10: '1 2 3 4 5\n'

In this exercise generalize the function from exercise 4. Use the `split` method to split a line into words.

In [37]:
def sum_numbers_in_file2(filename):
    number_sum = 0
    # 
    # YOUR CODE HERE
    ################
    with open(filename) as f:
        for line in f:
            for word in line.split():
                number_sum += int(word)
    
    return number_sum 
   
    
sum_numbers_in_file2('numbers2.txt')

55

In [38]:
result1 = sum_numbers_in_file2('numbers1.txt')
expected1 = 15
assert result1 == expected1, f'{result1=} != {expected1=}'

result2 = sum_numbers_in_file2('numbers2.txt')
expected2 = 55
assert result2 == expected2, f'{result2=} != {expected2=}'

## Writing files

When we call the builtin function `open` with a filename as a single argument the default behaviour is to look for an existing file for reading. If it does not exist the system prints  a `FileNotFoundError` and a message like

    open('no_such_file')
    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
    <ipython-input-15-b99a18bdaf59> in <module>()
    ----> 1 open('no_such_file')

    FileNotFoundError: [Errno 2] No such file or directory: 'no_such_file'
    
 
 
To open a new file to be written we have to supply an additional parameter, `w` for writing.

    open('new_file', 'w')
 
The `open` function returns a file object which has methods for reading (`read`, `readlines`) and writing (`write`)

You can often look up the documentation of a function with

In [None]:
help(open)

### Exercise 6

With the content in the multi-line string write down code that writes the following content to a file of the form first_last.txt  and with capitalized names (First, Last) in the text.

Hint: consider string methods

    >>> 'JOHN'.lower()
    john
    >>> 'JOHN'.capitalize()
    John
    
    

In [55]:
def try_out_names(first, last):
    
    content = f"""
    'Tis but thy name that is my enemy;
    Thou art thyself, though not a {last}.
    What's {last}? It is nor hand, nor foot,
    Nor arm, nor face, nor any other part
    Belonging to a man. O, be some other name!
    What's in a name? That which we call a rose
    By any other word would smell as sweet;
    So {first} would, were he not {first} call'd,
    Retain that dear perfection which he owes
    Without that title. {first}, doff thy name,
    And for that name which is no part of thee
    Take all myself.
    """
    
    return content

def write_to_file(first, last):
    """
    Writes to file first_last.txt balcony scene with names
    Filename will be in lower case
    Names in text will have initial letter capitalized
    """
    # YOUR CODE HERE
    ################
    # 1. open a file with lower case names
    # 2. write content with capitalized names
    filename = first.lower() + "_" + last.lower() + ".txt"
    with open(filename, mode='w') as f:
        content = try_out_names(first.capitalize(), last.capitalize())
        f.write(content)


    

In [56]:
# Now write to new file and check content
from pathlib import Path
from difflib import context_diff

def rm_f(filename):
    """
    Delete a file if it exists 
    (bash: $ rm -f filename)
    """
    file = Path(filename)
    file.unlink(missing_ok=True)

rm_f('ronald_smith.txt')
write_to_file('RONALD', 'SMITH')
result = open('ronald_smith.txt').read()
expected= """
    'Tis but thy name that is my enemy;
    Thou art thyself, though not a Smith.
    What's Smith? It is nor hand, nor foot,
    Nor arm, nor face, nor any other part
    Belonging to a man. O, be some other name!
    What's in a name? That which we call a rose
    By any other word would smell as sweet;
    So Ronald would, were he not Ronald call'd,
    Retain that dear perfection which he owes
    Without that title. Ronald, doff thy name,
    And for that name which is no part of thee
    Take all myself.
    """

def diff(left, right):
    left_lines = left.splitlines(True)
    right_lines = right.splitlines(True)
    diff_lines = context_diff(left_lines, right_lines)
    return "".join(diff_lines)

assert result == expected, diff(result, expected)

In [57]:
rm_f('romeo_montague.txt')
write_to_file('ROMEO', 'MONTAGUE')
result = open('romeo_montague.txt').read()
expected = """
    'Tis but thy name that is my enemy;
    Thou art thyself, though not a Montague.
    What's Montague? It is nor hand, nor foot,
    Nor arm, nor face, nor any other part
    Belonging to a man. O, be some other name!
    What's in a name? That which we call a rose
    By any other word would smell as sweet;
    So Romeo would, were he not Romeo call'd,
    Retain that dear perfection which he owes
    Without that title. Romeo, doff thy name,
    And for that name which is no part of thee
    Take all myself.
    """
assert result == expected, diff(result, expected)

### Exercise 7
Implement a file copy command command with Python read an write commands


In [77]:
def cp(source, target):
    """
    Copies textfile source to target
    """
    # YOUR CODE HERE
    ################
    # open source for reading
    # open target for writing
    #with open(source) as f:
    #    with open(target, mode='w') as g:
    #        for line in f:
    #             g.write(line)

    # read write the whole file at once
    with open(source) as f, open(target, 'w') as g:
        #with open(target, 'w') as g:
            #content = f.read()
            #g.write(content)
        g.write(f.read())

In [78]:
%%file blake.txt
To see a World in a Grain of Sand
And a Heaven in a Wild Flower,
Hold Infinity in the palm of your hand 
And Eternity in an hour.

Overwriting blake.txt


In [79]:
import filecmp
cp('blake.txt', 'poem.txt')
assert filecmp.cmp('blake.txt', 'poem.txt'), 'blake.txt and poem.txt differ'