# Loading and Saving Data

Here we'll look at how to save or load data in numpy. There are a variety of ways to save python objects - for a more general approach to save any python object, we would encourage you to look into the [pickle package](https://docs.python.org/3/library/pickle.html).

We will cover the following:
* Exceptions and exception handling
* Opening, reading and writing to text and binary files
* Saving and loading numpy arrays

Here we will use the native functions in numpy to save and load data. The simplest is `numpy.save()`, which can be used to save a single numpy array to a file, in `.npy` file format.

### Exceptions

First we'll very quickly cover exception handling. We've encountered lots of examples before where we'd do something incorrect, and python would raise some form of exception. For example, accessing a variable that hasn't been defined leads to a `NameError`.

So far we've just let these exceptions happen. Sometimes, however, when a particular kind of exception happens we might want to "handle" the exception using some specially written code. This can be achieved using a `try` block.

In [4]:
# In the try block, we put the code we want to run, that may cause an exception to be raised.
try:
    print(undefined_variable)
except NameError:
    print("A NameError happened here, but I handled it!")

# In the except block, we specify what exceptions we will handle (here NameError) and then provide 
# a block of code to handle the exception.

A NameError happened here, but I handled it!


Note that the `NameError` exception log is not shown, as the exception was handled.

Handling exceptions is important - if an exception is not handled, python will just display the error message and trace for the exception, and immediately exit the current program. This is very often not desired behaviour. 

For example, if you were writing a calculator program in python, whenever a user attempted to calculate the square root of a negative number, if you use `math.sqrt(-1)` for example it will throw a `ValueError` and exit the calculator program completely.

How you want to handle exceptions will depend on the situation. In the calculator example, you might handle it by showing an error message to the user, but then keep the program running.

`try` blocks can also have a `finally` block. This contains code that will always execute, even if an exception is raised. The most common use case for this is to close a file or other resource if an error happens, so the file isn't corrupted.

In [2]:
import math
try:
    math.sqrt(-1) # This will throw a ValueError as sqrt(-1) is not a real number.
    print("This won't be executed, as the exception has been raised")
except ValueError:
    print("Handling the exception")
finally:
    print("This will always be executed, even if an exception is raised.")


Handling the exception
This will always be executed, even if an exception is raised.


Note that if an exception is thrown inside a function, and that function doesn't handle it, control flow will leave the function early:

In [3]:
def my_func():
    print("This line will be executed")
    math.sqrt(-1)
    print("This line will never run as this function doesn't handle the exception.")

try:
    my_func()
except ValueError:
    print("Handling the sqrt(-1) error")

This line will be executed
Handling the sqrt(-1) error


If functions call other functions, generally execution will keep exiting functions until the exception is handled. If it is never handled, the program will exit.

In [4]:
def func1():
    print("Starting func1()")
    try:
        func2()
    except ValueError:
        print("Handling the exception")
    print("Finishing func1()") # This line is executed, as the exception is handled.

def func2():
    print("Starting func2()")
    math.sqrt(-1)
    print("Finishing func2()") # Note this line is never executed, due to the exception

func1()

Starting func1()
Starting func2()
Handling the exception
Finishing func1()


### Files and file handles

Python has a built in function `open()` for opening files for both reading and writing.

This returns a file handle, which is an object representing the file. It's used by both built-in python functions like `write()` as well as other library functions (including `numpy.save()`).

When we open the file, we want to be exception safe. This means that even if an exception occurs, we want to make sure the file is properly closed.

Because of this we'll use a `try` block with a `finally` clause that closes the file, so even if something goes wrong during writing, the file will definitely be closed.

In [10]:
try:
    file_handle = open("output/test_text.txt", "w") 
    # The "w" argument is a string that specifies the file mode.
    # Here it's "w" which means write.
    file_handle.write("This is a test string")
finally:
    file_handle.close()

This works fine, but writing to files is something we do all the time, and this is a bit tedious to write. It's also easy to make a mistake and forget to close a file.

Fortunately, python has a handy `with` statement, which lets you open a file for a block of code. It automatically closes the file at the end, and is exception safe, just like the example above.

In [12]:
with open("output/test_text.txt", "w") as file_handle:
    file_handle.write("This is another test string")

### Note: Escaping Characters and Newlines

Now that we've been using strings for a while, you may have spotted a potential problem. We write a string literal using quotes `""`, but what if we want to have a string literal that actually contains quotes? For example, how do we write the string `She said "Hello!".`

We can do this by using the special backslash character `\`. This "escapes" the following character in the string. For `"`, this means the quote will be treated as a quote character within the string, not the end of the string:

In [1]:
print("She said \"Hello!\".")

She said "Hello!".


Note the escaped quotes `\"` do not cause the string to be terminated. This brings up another issue, of how to write the character `\` in a string. We do this using a double `\\`, so for example:

In [2]:
file_path = "C:\\Windows\\System32"
print(file_path)

C:\Windows\System32


As you can see the double `\\` in the string literal are interpreted as single backslash characters in the actual string. This is useful to know on Windows, where `\` is the path separator. If you want to write code that works on multiple OSs though, I'd encourage you to look at `os.path.join`.

The `\` is also used to write some special characters, including the newline character `\n` and the tab character `\t`. The newline character is used to add a new line to a string, like pressing the `ENTER` key when using a text editor:

In [3]:
print("These\nare\nmultiple\nlines\nof\ntext.")

These
are
multiple
lines
of
text.


If you're writing multiline strings with triple quotes `"""` you don't need to manually add the newline characters. Every new line in your source code, within the `"""` quotes counts as a `\n` newline character:

In [7]:
print("""These
are
multiple
lines
of
text.""")

These
are
multiple
lines
of
text.


The `\t` character adds a tab character to the string, like pressing the `TAB` key when using a text editor. How these special characters will appear really depends on what text editor you use to view them, and how its settings are configured. It's pretty common for them to appear as a space the same width as four regular spaces. Some editors align tabs to the next multiple of 4 columns, so they can be a way to show things in columns (but remember the results might look completely different to another person using a different editor).

In [6]:
print("Some\ttext\twith\ttabs.")
print("Another\tline\tof\ttext.")

Some	text	with	tabs.
Another	line	of	text.


### Exercise: Comma Separated Value Files

You now know how to write text strings to a file. One very long-lived, but still very useful format for data are [Comma Separated Value (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) files. These can store tables of data, and are still supported by modern spreadsheet software (LibreOffice Calc, Excel, Google Sheets, Numbers etc.). They're often a good way to quickly save some data from a python script and later load it and process it in a spreadsheet program.

The format of CSV files is super simple. They're just text files, and each entry in the table is separated from the next by a special character (usually a comma `,` or a semicolon `;`). Lines of the table are separated by newline characters. Here's a quick example:

<code>
Time, Temperature <br>
0, 20 <br>
1, 24 <br>
2, 25 <br>
3, 26 <br>
4, 24 <br>
</code>

This example contains a table with two columns and 6 rows, with the headings and values shown. If you try saving this text to a file called e.g. `times.csv`, you should then be able to open it in your spreadsheet software of choice.

Try saving some data about the performance of your current machine (CPU usage, memory usage etc.) to a CSV file, and then load it in Excel, or your software of choice. Then use the software to plot a line graph of the values over time. There's code to obtain this information available below.

If you get an error related to importing `psutil`, you may be able to install it with `pip install psutil` (if you run into trouble with this try saving some other data).

In [1]:
import psutil
import time
from datetime import datetime

# Open the file for writing - make sure to use "w" mode.
# Your code here...

# Write the headings (e.g. "Time, CPU Usage, Memory Usage")
# Your code here...

print("Time, Cpu Usage, Memory Usage")
for i in range(10):
    # Here's how to get some info about time, cpu usage and memory usage from psutil:
    current_time = datetime.now().strftime("%H:%M:%S")
    cpu_usage = psutil.cpu_percent(interval=1)
    percent_memory_used = psutil.virtual_memory().percent

    # Here, save these values to your CSV file rather than just printing them.
    print(current_time, cpu_usage, percent_memory_used)
    # Your code here...


Time, Cpu Usage, Memory Usage
13:25:35 7.7 38.8
13:25:36 6.5 38.8
13:25:37 4.0 38.8
13:25:38 4.7 38.8
13:25:39 3.3 38.8
13:25:40 5.7 38.8
13:25:41 3.3 38.8
13:25:42 3.3 38.8
13:25:43 5.9 38.8
13:25:44 2.8 38.8


### Saving a single numpy array

Above we saw how to open, write and read general text files in Python. If you'd like to save the values in a numpy array, numpy has some convenient functions and its own file format to achieve this.

The first function is `np.save()`, which can be used to save an array to disk. By convention we use the `.npy` file extension when saving numpy arrays.

In [1]:
import numpy as np

array = np.random.rand(10)

print(array)

np.save("output/saved_array.npy", array)

[0.52564129 0.77443034 0.7243376  0.99637794 0.35926575 0.19174401
 0.51097426 0.48825699 0.22277847 0.55514604]


One thing to note is that the text files you wrote in the code and exercises above store values as ascii characters - when you saved numbers to the file, they were converted into a string before saving. `np.save()` instead saves the individual values in the array directly to disk in binary format.

This is more efficient - for example consider saving the floating-point number `0.123456789`. If we save this as the string `"0.123456789"` it will occupy 11 bytes, one for each character in the string. But if the number is in 32-bit floating-point format, saving it directly will only occupy 4 bytes, using less than half as much memory. We also won't have to convert the value from binary to decimal, potentially losing some precision or slightly changing the value.

Now, let's try loading the array and check the contents are the same. This time we'll use the np.load() function.

In [2]:
loaded_array = np.load("output/saved_array.npy")

print(loaded_array)
if np.array_equal(loaded_array, array): # We don't need to use allclose() here as the arrays should be identical.
    print("The arrays are the same!")

[0.52564129 0.77443034 0.7243376  0.99637794 0.35926575 0.19174401
 0.51097426 0.48825699 0.22277847 0.55514604]
The arrays are the same!


## Further Reading

For more info on reading/writing files, see the [official documentation](https://docs.python.org/3/tutorial/inputoutput.html#tut-files).

As stated above, I'd recommend checking out the `pickle` module as a way to serialise general python data and classes - the [documentation is here](https://docs.python.org/3/library/pickle.html).