# PYTHON COURSE FOR SCIENTIFIC PROGRAMMING 
**Contributors:** \
Artur Llabrés Brustenga: Artur.Llabres@e-campus.uab.cat \
Gerard Navarro Pérez: Gerard.NavarroP@e-campus.uab.cat \
Arnau Parrilla Gibert: Arnau.Parrilla@e-campus.uab.cat \
Jan Scarabelli Calopa: Jan.Scarabelli@e-campus.uab.cat \
Xabier Oyanguren Asua: Xabier.Oyanguren@e-campus.uab.cat

Course material can be found at: https://llacorp.github.io/Python-Course-for-Scientific-Programming/ 

# LECTURE V : File manipulation

- Files are used to save all processed data in each execution
- We will learn some of the most common functions to manipulate files
- Despite each OS has its own system to create and access files, Python is independent of it as it uses a "file handle"

### Open
Firstly, let's see how to open or create a file.

In [1]:
nameHandle = open("File.txt","w") 

where:
- `nameHandle` stands for the name of the file handle
- `open()` is the function to open a file
- `"File.txt"` is the name (string) of the file we want to open
- `"w"` indicates we want to write on this file

There are different ways to open a file. In the following table, we can see a brief explanation of them. 

| Indicator | Opening mode | Opening mode + | Pointer |
| --- | --- | --- | --- |
| r/r+ | Read only | +writing | Beginning |
| w/w+ | Write only. Overwrites file if already existing. Creates file otherwise | +reading | Beginning |
| x/x+ | Write only. FileExistsError if already exists. Creates file otherwise | +reading | Beginning | 
| a/a+ | Add if file exists. Creates file otherwise. | +read & write | End

### Write and Close

We can refer the file handle as a variable with associated functions that allow the user to manipulate files. One of the functions is `write()`. Let's see an example.

In [2]:
nameHandle = open("File.txt","w") #Creation of the file
nameHandle.write("Hi!\nWelcome to the python course.\n")
nameHandle.write("Enjoy!\n")
nameHandle.close()

You may have noticed `'\n'`. The character `'\\'` is an escapement character, meaning that the following one must be treated in a speacial way. In this case, for example, the string `'\n'` indicates the beginning of a new line.

After having edited the file, we want to save the changes to let other programmes access its contents. To do so, we use `close()` function.

### Read

This instruction allows us to read a file. Let's see the following example

In [3]:
nameHandle = open("File.txt","r") #read only
print(nameHandle.read())
nameHandle.close()

Hi!
Welcome to the python course.
Enjoy!



Note that Python adresses files as if they were a sequence of lines. Consequently, we can use `for()` to iterate over their contents.

In [4]:
nameHandle = open("File.txt","r")
for line in nameHandle:
    print(line)
nameHandle.close()

Hi!

Welcome to the python course.

Enjoy!



Notice the blank line between lines. As each line is treated as a string, it is possible to avoid the `'\n'` by not taking the last character of the string.

In [5]:
nameHandle = open("File.txt","r")
for line in nameHandle:
    print(line[:-1])
nameHandle.close()

Hi!
Welcome to the python course.
Enjoy!


### Readline

`readline()` function allows us to read just one line, which will deppend on the pointer's position.

In [20]:
nameHandle = open("File.txt","r")
print(nameHandle.readline())
nameHandle.close()

Hi!



If we wanted to print only a specific line in the file, let's say the second line, we could use the following instruction.

In [23]:
nameHandle = open("File.txt","r")
print(nameHandle.readlines()[1])
nameHandle.close()

Welcome to the python course.



as `readlines()` returns a list with all the lines in the file

### Append

Each time an existing file is opened with `"w"` mode, its content is completely **erased**. To avoid it, we use `"a"` mode. As an example, we are going to modify the file using `"w"`, an then we are going to add another line without deleteing anything. 

In [26]:
#Overwrite file
nameHandle = open("File.txt","w") 
nameHandle.write("Everything has been erased!\n")
nameHandle.close()
#Print result of "w"
nameHandle = open("File.txt","r")
print(nameHandle.read())
nameHandle.close()

#Use of "a"
nameHandle = open("File.txt","a")
nameHandle.write("Use 'a' parameter not to overwrite it!")
nameHandle.close()
#Print result of "a"
nameHandle = open("File.txt","r")
print(nameHandle.read())
nameHandle.close()

Everything has been erased!

Everything has been erased!
Use 'a' parameter not to overwrite it!


## Numpy

#### Reading data from .txt files: numpy.loadtxt

With numpy, it is possible to read text files in order to extract their information and build arrays to work with. How should we do it?

Let's create a new file in wich we will introduce example values. Important: for now, the values must be written as strings and not as integer, floats, etc.

In [25]:
import numpy as np

In [37]:
npHandle = open("npex.txt","w")
npHandle.write("1 2 3")
npHandle.close()

In [45]:
np.loadtxt("npex.txt")

array([1., 2., 3.])

Also possible with multidimensional arrays:

In [55]:
npHandle = open("npex.txt","w")
npHandle.write("1 0 0\n0 1 0\n0 0 1\n")
npHandle.close()

In [66]:
np.loadtxt("npex.txt")

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

Notice how the data is written in `npex.txt` and how the `np.loadtxt()` function reads it. We can change it by modifying some parameters. For example, imagine we want an array of integers and the data we recieve comes with the delimiter `','`

In [61]:
npHandle = open("npex.txt","w")
npHandle.write("1,0,0\n0,1,0\n0,0,1\n")
npHandle.close()
np.loadtxt("npex.txt", dtype='int', delimiter=',')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

There are more useful parameters we can use to adapt the data to what we want. Let's see some of them.

##### skiprows and max_rows

In [62]:
npHandle = open("npex.txt","w")
npHandle.write("1 0 0 0\n0 1 0 0\n0 0 1 0\n0 0 0 1\n")
npHandle.close()

In [70]:
np.loadtxt("npex.txt", skiprows=1, max_rows=2)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.]])

##### usecols

In [78]:
np.loadtxt("npex.txt", usecols=[0,3])

array([[1., 0.],
       [0., 0.],
       [0., 0.],
       [0., 1.]])

##### ndmin (0 as default)

In [87]:
npHandle = open("npex.txt","w")
npHandle.write("1 2 3 4\n")
npHandle.close()

In [89]:
np.loadtxt("npex.txt")

array([1., 2., 3., 4.])

In [93]:
np.loadtxt("npex.txt", ndmin=2)

(1, 4)

#### Reading data from .npy or .npz files: numpy.load

A `.npy` file is a particular numpy file format from which we can extract the data we want to study, much like what `.txt` files or `.csv` can be used. The difference though, is that when talking about big datasets, `.npy` files have the possibility to be loaded with the `numpy.load()` function, which results to be much faster. 

If you want to, you can see the difference by executing the next cells.

In [33]:
from time import time

In [37]:
N = 10000000  # random datapoints
with open('data.txt', 'w') as data:
    for _ in range(N):
        data.write(str(10*np.random.random())+',')
data.close()

In [39]:
start = time()

with open('data.txt', 'r') as data:
    string_data = data.read()
    
list_data = string_data.split(',')
list_data.pop()
data_array = np.array(list_data, dtype=float).reshape(10000, 1000)

end = time()

print("### 10 million points of data ###")
print("\nData summary:\n", data_array)
print("\nData shape:\n", data_array.shape)
print(f"\nTime to read: {round(end-start,5)} seconds.")

### 10 million points of data ###

Data summary:
 [[0.10820114 7.15596723 1.58844348 ... 1.86958122 9.88226739 6.14871312]
 [5.43550898 6.55428381 1.11211584 ... 7.88261123 2.72506794 3.46990941]
 [4.50830109 8.70623338 1.09992113 ... 8.45104537 0.90751348 4.02527644]
 ...
 [1.70389254 6.63619137 3.06363746 ... 3.69386645 6.01401345 9.02886082]
 [1.84023979 3.266891   3.30508298 ... 7.60141099 6.883527   5.20704968]
 [2.31848056 3.94781645 5.37541682 ... 9.82003134 7.73077793 7.28568301]]

Data shape:
 (10000, 1000)

Time to read: 3.73539 seconds.


This was the timing with a `.txt` file

Let's see the `.npy` version

In [41]:
np.save('data.npy', data_array)

In [42]:
start=time()

data_array = np.load('data.npy')

end=time()

print("### 10 million points of data ###")
print("\nData summary:\n", data_array)
print("\nData shape:\n", data_array.shape)
print(f"\nTime to read: {round(end-start,5)} seconds.")

### 10 million points of data ###

Data summary:
 [[0.10820114 7.15596723 1.58844348 ... 1.86958122 9.88226739 6.14871312]
 [5.43550898 6.55428381 1.11211584 ... 7.88261123 2.72506794 3.46990941]
 [4.50830109 8.70623338 1.09992113 ... 8.45104537 0.90751348 4.02527644]
 ...
 [1.70389254 6.63619137 3.06363746 ... 3.69386645 6.01401345 9.02886082]
 [1.84023979 3.266891   3.30508298 ... 7.60141099 6.883527   5.20704968]
 [2.31848056 3.94781645 5.37541682 ... 9.82003134 7.73077793 7.28568301]]

Data shape:
 (10000, 1000)

Time to read: 0.01522 seconds.


Instead of using text files as we did before, it is possible, and in fact preferable, to use `.npy` and `.npz` files. The syntax for them does change a bit with numpy. Imagine we wanted to load a data array to work with it. Then, we would write a `.npy` file by executing the next line of code

In [57]:
np.save('npex.npy', np.random.rand(4,5))

and finally load the data to start working with it

In [58]:
np.load('npex.npy')

array([[0.64902097, 0.59225889, 0.92575478, 0.21710251, 0.3885865 ],
       [0.53283003, 0.7174583 , 0.29974907, 0.92281449, 0.66936547],
       [0.74472896, 0.80151565, 0.665304  , 0.3741239 , 0.9687948 ],
       [0.42185048, 0.26979424, 0.17137164, 0.33959611, 0.95260658]])

## Further topics

Apart from all functions and topics we've seen in the course, Python is a programming language that has many more utilities. Some of them could be:
- Object Oriented Programming
- Assertions and error controls, raising errors etc.
- Map and lambda functions
- Anaconda installer for other packages: *conda install*
- Using `pip` to install additional packages not available in the conda repository.
- ...

Further free courses to learn more about Python and other programming languages in:
- EdX
- Coursera
- ...