# Data Loading

This workbook will help you explore some of the examples about data loading. However, the practice will be carried out using Spyder. The exercises consists into loading, printing (in screen) and plotting of the downloaded data. So far you have acquired some tools for visualisation (plotting) which are required here to plot your data.


## Example 1

Loading a simple text file `Ex1_lines.txt` from the data_IO folder. Please note that every time you define the variable that has the handle (link) to the file (`fname`), the counter that controls the method `next()` is reset. In this case we are using iteration methods to access the files as:

Remember that we are opening the file to extract the data from, and therefore, we are accessing the file as `read`.

In [81]:
# First we open the file
fname = open('data_IO/Ex1_lines.txt', 'r')

# Lets print the first line
print(fname.next())


This is the first line



In [82]:
# and now we print the second line
print(fname.next())


This is line 2



In [83]:
# And lets try to print them all
print(fname.next())
print(fname.next())
print(fname.next())
print(fname.next())
print(fname.next())
print(fname.next())
print(fname.next())


This is line 3

This is line 4

This is the last line


StopIteration: 

As seen here, it is not possible to access a line which is beyond the end of the file. For this reason, we have to find safe way to explore the files, to avoid such errors. For this, we can load the file completely as:


In [84]:
# First we open the file
fname = open('data_IO/Ex1_lines.txt', 'r')

for line in fname:
    print(line)
    

This is the first line

This is line 2

This is line 3

This is line 4

This is the last line


Alternatively, we can access the lines directly by loading the complete data.

In [85]:
# Note that we are reloading the file
fname = open('data_IO/Ex1_lines.txt', 'r')

# Here we are printing all the array which is inside a list
print(fname.readlines())

# Note that we are reloading the file
fname = open('data_IO/Ex1_lines.txt', 'r')

# Here we are printing the content of the file
print(fname.read())


['This is the first line\n', 'This is line 2\n', 'This is line 3\n', 'This is line 4\n', 'This is the last line']
This is the first line
This is line 2
This is line 3
This is line 4
This is the last line


In [86]:
# Note that we are reloading the file
fname = open('data_IO/Ex1_lines.txt', 'r')

# Here we can access the second line in particular
lines = fname.readlines()
print(lines[1])


This is line 2



At this point the file `EX1_lines.txt` is open, and therefore it is not possible to be accessed by any other process. As a practice, try to go to the file in the windows explorer, and try to change its name. If the file is the correct one, an error like this will appear

![](figures\Io_Error.png)

To solve this issue, it is necessary to close the file. Please keep in mind that this is a common source of errors in your code. If the file was open by another process (other python instance/console for example, it will not be possible to read or write this file). To close the file we can use the method `close()` as:


In [87]:
# Here we are closing the handle fname.
fname.close()


If the operation was successful, now it is possible to do all the operations in your file as you will normally do.

In order to avoid errors with file opening and closing, we are going to open files in the `with` context. The `with`provides a context in which the files is open for the block of code in consideration. Once the block of code is finished, the with will automatically close the file.

Therefore, the application of this will look like:


In [88]:
with open('data_IO/Ex1_lines.txt', 'r') as fname:
    # Here the file in the fname variable will be available
    for line in fname:
        print(line)

# But it wont be available here and will raise an error due to close file
print(fname.next())

This is the first line

This is line 2

This is line 3

This is line 4

This is the last line


ValueError: I/O operation on closed file

# Data writing

Similar to reading, writing data follows the same standards. The file is open in writing mode `'w'` and it is has to be closed once it is finished. There are two main interaction modes, write (`'w'`) and append (`'a'`). Writing will rewrite any previous content in the file, while append will add additional lines to the content, enlarging the file.

For data writing, instead of the method read, we are going to use the method `write(str)`. This method takes as parameter the string that will be inserted in the file. To add a new line, you should add an end of line character `(\n)` to indicate that the line is finished. Please keep in mind that you can only write strings into your files.

Let us make an example that creates a file with a file with a vector.


In [92]:
# Create the vector that will be written
vec = range(10)

# Open the file (fname) in write mode, using the with statement
with open('files_out\exercise_file1.txt', 'w') as fname:
    # Make a loop to iterate over the elements of the vector
    for elem in vec:
        # Write each element in a separate line of the file (fname)
        fname.writelines(str(elem)+'\n')
        

Now, lets take a look in the file that has just been created (`'exercise_file1.txt'`). In this file, all of the elments of the list have been parsed to a different line as we added the end of line character. If the end of line character is not added, the file will be defined as a sequence of characters instead as:

In [93]:
# Create the vector that will be written
vec = range(10)

# Open the file (fname) in write mode, using the with statement
with open('files_out\exercise_file2.txt', 'w') as fname:
    # Make a loop to iterate over the elements of the vector
    for elem in vec:
        # Write each element in a separate line of the file (fname)
        fname.writelines(str(elem))

Now take a look at the file that has just been created (`'exercise_file2.txt'`). Here you will see that the elements were set one next to the other, as it was literally instructed.

# Operations with regular formats

The examples presented above are a very low-level way of interacting with arbitrary files. In many cases, the format of the files will be known, and therefore, it is possible to access its content in a systematic way. Many of the common data formats includes XLS, CSV, JSON (in fact, this Jupyter Notebook is a JSON file), XML, HTML, netCDF, etc. For many of this formats there are libraries which are able to parse the data directly, avioding building your own, and allowing operations such as compression. Consider that there are specialised libraries to read and write each type of file format.

For common data types, `numpy` and `scipy` libraries have a vast collection of functions which allows to parse data into  `numpy.array`, from which is possible to use. The uses include further data processing, and visualisation, among others. As an example, lets use `numpy` to open the file `data_IO\csv_example.csv`, using the `loadtxt` function.

In [94]:
# load the numpy library
import numpy as np

# Load the csv values into the variable data
data = np.loadtxt('data_IO\csv_example.csv')

# Visualise the results
print(data)

[  1.   2.   3.   5.   4.   6.   7.   8.   9.  10.]


It is possible to observe that here we are not opening or closing the file. This occurs because the `np.loadtxt` function is taking care of those operations directly, and we are only asked to provide the path to the file.

In the same form, it is possible to write files using the `np.savetxt` function. The behaviour is quite similar, only with the condition that we have to specify where we will be saving the data, as well as the variable that we are going to save. Please keep in mind that this is only possible for variables which are `np.array`, or that can be turn into it. 

let us try to write into the file `'my_new_data.txt'` a squared matrix (`matrix_a`) as:

In [95]:
# Create the matrix to be saved
matrix_a = np.array([[1,2],[3,4]])

# Define the name of the target file where the matrix will be saved
fname = 'files_out\my_new_data.txt'

# Save the matrix in text format
np.savetxt(fname, matrix_a)

Furthermore, it is possible not to save the results of numerical variables into text files, but into formats which are much more efficient for the machine to read and write. For the particular case of `numpy` we can save files in the format `.npy`. Fundamentally, the files are loaded and save in a similar manner as:



In [96]:
# Create the matrix to be saved
matrix_a = np.array([[1,2],[3,4]])

# Define the name of the target file where the matrix will be saved
fname = 'files_out\my_new_data.npy'

# Save the matrix in text format
np.save(fname, matrix_a)

# load the data into matrix b
matrix_b = np.load(fname)

print(matrix_b)

[[1 2]
 [3 4]]


Due to the charactheristics of the `numpy` library, it is only suited for saving sequences of numbers, so it is not versatile to write specific file types. Additioanally, it is not possible to save other objectes which are not convertible to `np.array` types. Therefore, it is not possible to save objects, functions, dictionaries, strings, etc.

As alternative, the "Pythonic" way for saving objects is the pickle (cPickle) format. This format allows to save most of python objects, while being extremely efficient in memory. As example, we are going to save a custom list (which cannot be turn into a `numpy` array) into a file, and then we can retrieve it back

In [97]:
# import the cPickle module. Mind the capitalisation of the word
import cPickle

# Create my custom list
custom_list = ['cat', 'dog', 1, 2, np.sin]

# Define the path to the file where im saving my file
fname = 'files_out\my_pickle_file.pkl'

# Save the list in the pickle format suing the dump command
cPickle.dump(custom_list, open(fname, 'w'))

# Load the list into another file
custom_list_2 = cPickle.load(open(fname, 'r'))

# print the loaded list
print(custom_list_2)

['cat', 'dog', 1, 2, <ufunc 'sin'>]


The `cPickle` library uses handles to the files, which is similar to the python way of dealing with files (manually opening and closing). Here in this point we show yet another altenative to carry out this task in a safe way. As we are not going to use the file further than in the specific save and load functions, we can pass the constructor of the handle (`open(fname, 'r')`) directly as a function argument, and therefore, it will end together with the function. If the handle is declared beforehand, it will not end together with the function and has to be explicitly closed (which is not necessarily a good practice). Therefore, we can also use a similar syntax as we did in the begging of the session as:


In [98]:
# Create my custom list
custom_list = ['cat', 'dog', 1, 2, np.sin]

# Define the path to the file where im saving my file
fname = 'files_out\my_pickle_file.pkl'

# Note that the file is open in write mode
with open(fname, 'w') as handle:
    # Save the list in the pickle format suing the dump command
    cPickle.dump(custom_list, handle)

# Note that the file is open in read mode 
with open(fname, 'r') as handle:   
    # Load the list into another file
    custom_list_2 = cPickle.load(handle)

# print the loaded list
print(custom_list_2)

['cat', 'dog', 1, 2, <ufunc 'sin'>]
