## Advanced Modules

We saw in the previous topic how to import some different modules and make use of the functions contained within them. In this subtopic we will take a closer look at the `copy` module and also move on to discuss the `io` module and look at how to deal with files, which brings with it its own set of challenges. This will allow us to do some simple file manipulation such as reading and writing, which is essential for future work. Very few programs rely on data entered manually by the programmer, usually data is read into a program from a variety of different sources, so learning how to handle this now will stand you in good stead as you move on to use your programming skills for problem solving.
Dealing with files is one of the first major areas where we are likely to run in to Python errors that may not be your fault. We will learn how to handle possible problems in a programmatic way using the Python keywords `try`, `except`, `else` and `finally`. This anticipation of potential problems should mean that our code doesn't just crash when something unexpected happens, but instead we catch the problems and deal with them gracefully. 




### The `copy` module

In this subtopic we will start with our second look at the `copy` module. As we saw in the previous topic, this is a tiny module with just a handful of functions avaiable. In that topic we looked at the `copy` function and saw an example of copying a list successfully. One of the exercises was for you to copy a list using the `copy` function and try changing some values to prove to yourself things work as expected.

In this topic we will take things one step further, and identify some more problems you might encounter when copying data structures, and how to resolve those issues using the `deepcopy` function. 

It is important you understand when you can use the `copy` function, and when you must use the `deepcopy` function, because the performance difference between them can be significant. When possible you should use `copy` and only use `deepcopy` as a last resort.

Below is an example that proves that the `copy` function does indeed work for lists.

In [1]:
import copy

L = [10,8,6,4,2]

L1 = copy.copy(L)

#Lets try changing an element in L
L[0] = 20

print(L)
print(L1)

[20, 8, 6, 4, 2]
[10, 8, 6, 4, 2]


However, the copy function doesn't always get the job done for us!

In [2]:
import copy

#L is a list of lists
L = [[1,2],[3,4],[5,6]]

L1 = copy.copy(L)

#Lets change an element in L
L[0][0] = 20

print(L)
print(L1)

[[20, 2], [3, 4], [5, 6]]
[[20, 2], [3, 4], [5, 6]]


This time the copy function has let us down. We tried to make a copy of our list of lists, and when we changed an element in one list it changed across both variables. We often refer to the `copy` function as a shallow copy. In simple terms it copies just "one level deep", so if your data structure doesn't contain any nested data structures the shallow copy will work for your needs. In our first example the list `L` was just a simple collection of integer values, and because it is just one level deep it copies correctly.

In the second example, the data is buried two layers down, stored in a list which is itself contained in a list, hence why the shallow copy is failing. Both `L` and `L1` share references to the same objects that represent the integer values. Hence, trying to change the values in one of these nested data structures, we end up editing both lists.

To deal with this, we need to copy recursively, rather than just taking a shallow copy. The `copy` module provides a `deepcopy` function to recurse through your entire data structure all the way down to the base child objects and create a new copy of each child. This would allow `L` and `L1` to be totally independent, so we could change elements in either list without fear of impacting the other.

In [3]:
import copy

#L is a list of lists
L = [[1,2],[3,4],[5,6]]

L1 = copy.deepcopy(L)

#Lets change an element in L
L[0][0] = 20

print(L)
print(L1)

[[20, 2], [3, 4], [5, 6]]
[[1, 2], [3, 4], [5, 6]]


As the `deepcopy` function looks to recurse through data structures to locate the child objects it can be quite slow, and is often noticeably slower than `copy`. You should try to make sure that you use the appropriate copy function if you want your code to be performant. Below you'll find some examples which demonstrate the performance difference! There are some exercises for you to practice using the two copy functions with different data structures to ensure you know which function to use when and why.   

In [4]:
import copy
import time

# Create a large nested data structure
nested_data = []

for i in range(1000):
    inner_list = []
    for j in range(100):
        inner_list.append(j)
    nested_data.append(inner_list)

# Measure shallow copy performance
start = time.perf_counter()
shallow = copy.copy(nested_data)
shallow_time = time.perf_counter() - start

# Measure deep copy performance
start = time.perf_counter()
deep = copy.deepcopy(nested_data)
deep_time = time.perf_counter() - start

# Print the results
print("Shallow copy time:", round(shallow_time, 6), "seconds")
print("Deep copy time:", round(deep_time, 6), "seconds")

Shallow copy time: 9.9e-05 seconds
Deep copy time: 0.04539 seconds


Numbers will vary with the underlying hardware, but for me this shows around a 50x performance difference! We may require a deep copy of our data for the functionality of our code, but it is important for performance that we select the correct method. This is particularly of concern when making copies of "shallow" data structures, where copy would be sufficient, but for safety you might use deepcopy. The example below shows that this is a bad idea for performance!

In [5]:
import copy
import time

# Create a large "shallow" data structure
data = list(range(100000))

# Measure shallow copy performance
start = time.perf_counter()
shallow = copy.copy(data)
shallow_time = time.perf_counter() - start

# Measure deep copy performance
start = time.perf_counter()
deep = copy.deepcopy(data)
deep_time = time.perf_counter() - start

# Print the results
print("Shallow copy time:", round(shallow_time, 6), "seconds")
print("Deep copy time:", round(deep_time, 6), "seconds")

Shallow copy time: 0.000463 seconds
Deep copy time: 0.039204 seconds


Deepcopy is still significantly slower, even though there are no nested data structures! The performance difference here can be around 10-100x over multiple runs.

### The `io` module

Although we mention the `io` module in the title of this subtopic, we can actually do the majority of the file operations that we require without the module. Infact, most operations that needed the `io` module in previous versions of Python can now be done using built-in Python functions. So there is generally no need for us to import `io` even though we're doing input output tasks. This is a clear example of how Python evolves over time, and we have to try to keep up with these changes!

#### Opening and closing files

This is surprisingly simple, we can simply use the `open` and `close` functions built in to Python, such as in the following code:

In [None]:
infile = open("files/example.txt")
infile.close()

If the file name we provide doesn't exist then Python will return an error. We'll see more about errors later on in this notebook.

In [None]:
infile = open("files/missingExample.txt")
infile.close()

This looks for a file called example.txt in a folder called files that should be in the same directory as this notebook. We generally use these relative pathnames that identify where a file is located in relation to where we are currently operating from. Imagine the file you're trying to open is in the folder one level above the folder we're running this notebook from. If we wanted to access that file our code would look like this:

In [None]:
infile = open("../example.txt")

There are actually a number of different ways in which we can open a file depending on how we want to interact with that file. Here is a non-exhaustive list of some different file modes:

| Mode             | Description                                  |
|------------------|----------------------------------------------|
| `r`              | Reading (default)                            |
| `w`              | Writing (if file exists, content is wiped)   |
| `a`              | Append (if file exists, writes are appended) |
| `r+`             | Reading and writing                          |
| `a+`             | Appending and reading                        |
| `t`              | Text (default)                               |
| `b`              | Binary                                       |

By default, if we open a file without specifying the mode it will be read-only in text format. If we want to be able to write data back to the file we will need to open that file for either reading and writing, or appending and reading. If we want to use these different file modes we can use them by providing the file mode as a parameter to the `open` function like this:

In [None]:
infile = open("files/example.txt", "a+")
infile.close()

#### Reading data from a file
Once we've opened a file we should probably do something with it! There are a number of different functions for reading data in from a file, and we'll look at some of them below. First however, we must understand the file *cursor*. Every opened file has an associated cursor, which points to a particular location in the file. If we opened the file for reading only then the cursor will be at the beginning of the file. Though if we open the file with mode 'a' or 'w' the cursor will be at the end of the file.

Now that we understand where the cursor will be located, we can make sense of the following table which explains the four main ways of reading in data from a file:

| Function     | Description                                                                      |
|--------------|----------------------------------------------------------------------------------|
| `read(n)`    | Read n characters starting from cursor; if fewer than n characters remain, read until the end of file |
| `read()`     | Read starting from cursor up to the end of the file                              |
| `readline()` | Read starting from cursor up to, and including, the end of line character        |
| `readlines()`| Read starting from cursor up to the end of the file and return list of lines     |


If we are using file modes 'a' or 'w' then the cursor is already at the end of the file. So if we want to read any data in from our file we will need to move the cursor. We can do this with the `seek` function and we will see this in an example below. For these first examples we will use the default file mode, so the cursor will be at the start of the file.

In [6]:
#This will read in the first 20 characters of the file and stores it in a string

infile = open("files/example.txt")
s = infile.read(20)
print(s)
infile.close()

This is an example t


In [7]:
#This will read in the entire contents of the file and stores it in a string

infile = open("files/example.txt")
s = infile.read()
print(s)
infile.close()

This is an example text file.
We will want to read this data in.


In [8]:
#This will read in the entire contents of the file and store each line as a string in a list.

infile = open("files/example.txt")
print(infile.readlines())
infile.close()

['This is an example text file.\n', 'We will want to read this data in.']


In [9]:
#This will read in the a line from the start of the file

infile = open("files/example.txt")
print(infile.readline())
infile.close()

This is an example text file.



In [11]:
#This will read in the a line from the cursor, which is at the end of the file in this case, so we read nothing!

infile = open("files/example.txt", "a+")
print(infile.readline())
infile.close()




In [14]:
#We move the cursor to start of the file using the seek method.

infile = open("files/example.txt", "a+")
infile.seek(27)
print(infile.readline())
infile.close()

e.



If we want to, we can change the value we give to the `seek` function to move the cursor into a different position in the file. Give it a try!

#### Writing data to a file

If we want to write information to a file, we must start as before by opening a file using the `open` function. If we want to create a new file we can simply provide a filename that doesn't already exist. To write to an existing file we just use its filename and set the file mode to either append or write.

Lets create a new file for writing to:

In [18]:
outfile = open("test.txt", "w")
outfile.write("Hello.")
outfile.close()

We've just created our first output file, and written the string "Hello" into it. Head to your file explorer window on the left and the file `test.txt` should be visible in the same folder as this notebook. If you try changing the string and running this code block again you'll see that the file should contain only that new string. This is because we've set the file to write mode, so it always wipes the contents of the file before it starts to write. 

If we change the file mode now to be append, we can add some information to that file instead.

In [19]:
outfile = open("test.txt", "a")
outfile.write("Hello Again")
outfile.close()

If you take a look at the file we've created you'll notice something looks a bit strange. All of the text is on the first line, with no spaces or breaks between the different strings that we asked Python to write to this file. This is something that you'll have to do manually! We can use the newline character `\n` to introduce a line break like that we would normally expect when we hit the enter key.

In [None]:
outfile = open("test.txt", "w")
outfile.write("Hello\n")
outfile.write("Hello Again")
outfile.close()

Remember though, if you open a file in write or append mode, it won't allow you to read data back in from the file. If you try this you'll be given an error.

In [None]:
outfile = open("test.txt", "w")
outfile.write("Hello\n")
outfile.write("Hello Again")

x = outfile.readlines()
print(x)

outfile.close()

### Errors and Exceptions

Errors are a normal part of programming - at some point all of you will make a mistake somewhere in your code, even the very best programmers will get things wrong. There are three types of errors that you might encounter, regardless of which programming language you are working in:
1. Syntax errors - Errors where your code doesn't correctly follow the syntactic rules of Python. Maybe you have some indentation errors, or you've forgotten a colon for example.
2. Runtime errors - Your code follows the syntactic rules of Python correctly, but encounters an issue while running, causing the code to fail.
3. Semantic errors - Your code executes correctly, but the result is not as expected. This means you have made a mistake somewhere which has altered the meaning of your program. You have to track these down manually because Python cannot help you to locate these errors.

Python generally gives you detailed information on syntax errors when you first try to run your code. It will usually identify a line number for you where it has found an error, though sometimes you might have to look to the line above the one identified to find the issue.

In [None]:
def f(n)
    if n > 0:
        print("Positive")
    else
        print("Negative")

Python kindly tells us that the syntax on line 1 is invalid, and suggests where it thinks the issue is. In this case it is exactly right, we've forgotten the colon in the function definition.

Similarly, Python will give us some helpful tips if it encounters runtimes errors. Below are some examples of different runtime errors that Python can identify.

In [None]:
print(a)

In [None]:
1 + 'a'

In [None]:
2/0

In [None]:
L = [1,2,3,4,5]
L[10]

You should make yourself aware of the different types of runtime error that Python can identify, and understand what they mean, as this will help you significantly when debugging your code. You may have seen some of these errors already as you've worked through the exercises, and you should have a good idea of how you would fix these issues.

Sometimes you may encounter runtime errors due to issues a little further beyond your control. For example, maybe you're trying to read in a file, but for some reason that file no longer exists, or you're trying to analyse some data and there is an unexpected zero in the dataset which can trigger a division by zero error.

Python provides us with some tools to help with these situations where we're worried that errors may occur. We have the keywords `try`, `except`, `else` and `finally` to allow us to gracefully deal with such issues. We can use all of these keywords together in the following structure:

In [None]:
try:
    print("We try to execute the code in this block, and see if we get an error")
except:
    print("If the previous code block raises an error we will run this code block")
else:
    print("If the first code block ran successfully we run this code block instead of the one above")
finally:
    print("Always run this regardless of if there was an error or not")

The `else` and `finally` statements are optional, and the majority of the time you will see code that just contains `try` and `except` statements. You might be wondering what the finally keyword is actually useful for, and to be honest it is a good question! Generally the `finally` statement is used for cleanup operations that relate to the operations in the `try` statement, particularly as it will still execute even if there is a return statement in any of the code blocks above it.
There is a couple of examples below of how you might use these tools:

In [None]:
def div(a, b):
    return a/b

try:
    print(div(5,0))
except:
    print("You tried to divide by zero.")

In [None]:
try:
    infile = open("files/example.txt")
except:
    print("Failed to open that file")
else:    
    print(infile.readline())
finally:
    infile.close()
    
#We can continue with other code here if we didn't cause the code to error
print("More code can run here")

This code will gracefully handle any issues where the file you're looking for doesn't exist. If the file does exist then the `open` function should complete successfully, so we can ignore the `except` block. We move on to the `else` block and execute the code there, which reads in the first line from the file and prints it out. We then execute the `finally` block which closes the file.

If the file doesn't exist (you could try changing the filename), then as we've seen, the `open` function would usually throw an error. However, now we catch that error, and execute the `except` code block instead, which prints us a helpful message to say the file couldn't be opened. The `else` block will be ignored, and we move on to the finally block, which will try to close the file. As we didn't manage to successfully open this file this won't do anything. The nice thing here is that some code that would usually throw an error doesn't interrupt the execution of our program, we just deal with the error and continue with the rest of our code.

<br>

We've gone into much greater depth of detail in this topic than we have done in previous topics. Hopefully you can now appreciate the complexity in Python that is generally hidden away unless you need to exploit it. Again, the only way to fully understand these concepts is to practice them for yourself. Move on now and take a look at the exercises in the final notebook of this topic, where you will have the chance to practice some of the things we've learnt here. You may find yourself wanting to look for further reading materials and problems to tackle, and I would encourage you to do that to ensure you keep practicing.