# PYTHON COURSE FOR SCIENTIFIC PROGRAMMING 
**Lecturer and main contributor to this lesson:**

Xabier Oianguren Asua: oiangu9@gmail.com

**The lecture is based on previous material prepared by:**

Jan Scarabelli Calopa: Jan.Scarabelli@e-campus.uab.cat 

---

# LECTURE V : File manipulation

### $(1.)$ - [*FILE HANDLING*](#1)
### $(2.)$ - [*MANIPULATE DIRECTORY STRUCTURES*](#2)

---
---

<a id='1'></a>
## $(1.)$ File Handing


Despite each OS has its own system to create and access files, Python has its own file manipulation system that uses a common interface known as a "file handle".

### (a) Open


The key function for working with files in Python is the `open()` function. It takes two parameters: `filename`, and `mode`.

There are four different methods (modes) for opening a file:

| Indicator (naive/extra) | Open a file to... | Opening mode + | Pointer Position |
| --- | --- | --- | --- |
| r/r+ | Read it. | +writing | Beginning |
| w/w+ | Write on it. It **overwrites** the file if it already existed. Creates a new file otherwise | +reading | Beginning |
| x/x+ | Write on it. `FileExistsError` raised and execution will stop if it already existed (safe mode). Creates a new file otherwise | +reading | Beginning | 
| a/a+ | Write at the end of an already existing file if it exists. Creates a new file otherwise. | +reading | End

(In general, it is my advice to use the "no +" modes of operation, to avoid confusions.)

Let's see an example. We want to open a file called `"File.txt"` in `"w"` mode. That is, we want to create a new file with that name or just overrite any file with such a name with the objective of writting data to it.

where:
- `nameHandle` stands for the name we have invented to handle this file. That is, it will be our "access" to the file. If we want to write data to this file we will tell this object to do so.
- `open()` is the function to open a file, as we have already said.
- `"File.txt"` is the name (string) of the file we want to open.
- `"w"` indicates we want to write on this file (in an overwritting way).

##### Note on Encodings
As a side note, remember that everything in the end (in the memeory of the computer) is saved and manipulated in bits (0-s and 1-s), this means that in reality in the memory of the computer there is nowhere a letter `a`. Instead if for example we want to encode the 27 characters of the alphabet, we could give each letter a unique tag using sequences of 5 bits (there are 32 possible sequences since $2^5=32$). If so, we could say `a` will be the sequence $00000$, `b` will be $00001$, `c` will be $00010$ etc. Such that in memory we will save the text `acb` as $000000001000001$. Then when we want to read it again, we will need to specify in which encoding the text is codified to be able to convert it back to characters.

One of the first encoding standards for text was the so called ASCII standard based on the alphabet and symbols for English, which used 7 bits to encode about 127 different symbols or characters, among them the alphabetic characters, the numerical digits, symbols like `?` or `!` and some other special characters like `\n` representing line breaks, or the EOF representing the end of a text file, among others.

Today there are several standardized encodings, and each operative system (OS) has its preferred ones. ASCII was the main encoding used in Internet until December 2007, when it was surpassed by UTF-8 using 8 bits (and thus allowing double symbols to be coded).

In the function `open()`, we can also add a thrid argument indicating the **encoding** for the characters we will deal with in this file. If we do not specify it, Python will just use the default encoding on the OS. But in general, we could specify any of the encodings listed in the following table of the official Python documentation:

[https://docs.python.org/3/library/codecs.html#standard-encodings](https://docs.python.org/3/library/codecs.html#standard-encodings)

for example we could write `'utf_8'` or `'ascii'` as a third argument, say:

`open("File.txt", "w", "utf_8") `


### (b) Write and Close

The file handle we have created (the variable we decided to call `fileHandle`) is in reality an object with a pointer to a specific position of the text file. You can imagine it as the **cursor** that appears when you are writting something on a computer, just that this time you cannont see where the cursor is (it is this what we called "Pointer position" in the first table of the lecture).

Now, this file handle is like a variable, but which not only contains information, it also has associated functions that allow the user to do multiple things to the file pointed by the handle (it is a class instance, as we will comment in the end of the lecture). One of these functions is `.write()`. The string we pass to it as an argument will be written to the file in the last position we have added something (in the position of the cursor). That is, we can call to it multiple times and the added text will be sequential in the file.

After manipulating the file, we **must** call to the `.close()` function! It is then when the text is really written to the file. Until then, there might be parts of the text we have sent using `.write()` that were left floating in a limbo (called a text **buffer**).

Let's see an example.

If you now open the file "File.txt" that has been created in the directory where the Jupyter Notebook is, you will see that it contains the follwoing text:
```
Hi!
Welcome to the python course.
Enjoy

```

You may have noticed the strange character `\n` we introcuced in the writting function as part of the string. The character `\` is an escapement character, meaning that the following character must be treated in a speacial way. In this case, for example, the string `\n` indicates the beginning of a new line. Another example is `\t` which will be interpreted as a TAB space.

Or for example, imagine we would like to write a double or single quote `"` or `'` to the file. How could we do this? Since they are the string delimiters in Python we will find it to be impossible. Try it. For this, we have the special characters `\"` and `\'` respectively!

Once again, after having edited the file, we want to save the changes to let other programmes access its contents. To do so, we must use the `.close()` function.


#### Notes on Relative and Absolute Paths
As we said, the result of this operation will be the creation of the file in the same directory (**directory==folder**) as the Jupyter Notebook is saved. But, is it possible to create files in other locations? Absolutely! 

For that instead of just introducing the name of the file in `open()` as `"File.txt"` we will need to introduce the details of the **path** to the directory we want it to be saved. The path is the location of a file or directory in the tree of directories of the persistent memory of the computer. We can provide it in two ways. A **relative path** is a path given from the point the Python interpreter is being executed and on (in our case, from the directory of the Jupyter Notebook and down there). For instance, imagine you have the following tree of directories under the folder where this Jupyter Notebook called `Lecture_05.ipynb` is:

Then, if we want to save the text in the folder "Save_the_text_here_Folder" under the folder "Folder_1" which is insider the "Python_Course" folder where this notebook is, you will do:

`open("Folder_1/Save_the_texts_here_Folder/File.txt", "w")`

This path `"Folder_1/Save_the_texts_here_Folder/File.txt"` is a so called **relative path**.

If instead you want to save things anywhere in the directory structure, we will need to write the full path from the beginning of the tree of folders, for example something like `"/home/xabier/Documents/UAB/My_texts/File.txt"`. This is called an **absolute path**, since there is no doubt where I am trying to save it. That is, even if I move the Jupyter Notebook to other places in memory, the text file will always be created in the same place! 

Ofcourse, this will work if those folders like `"Folder_1"` or `"Documents"` etc. already exist! Make sure they do. We will see in a moment how to create new directories using Python.

##### Important note for Windows vs Linux/MacOS users
In Linux and MacOS, the directory structure always starts in a `"/"` and the next subfolders are indicated with a bar `"/"`, like `"/home/navau/Documents/UAB/My_texts/File.txt"`. However, in Windows directory trees always start in the drive in which data is saved, typically `"C:\` and the separation between subdirectories is specified with an inverse bar `"\"`. For example `"C:\Users\navau\Documents\UAB\My_texts\File.txt"`.

##### Comment on File names
If you really are into jumping from the naive user-side to a more advanced-computer user-side, you should **end** today **naming files** with **spaces** and strange characters! This is because many interpreters (not the case of Python, but you will save problems in the future) interpret the space as a finish mark and will stop reading the name of the file at that point. That is, avoid naming files as `"¡My first text file is sugoi!.txt"` and instead name it `"My_first_text_file_is_sugoi.txt"` or something of the like.

##### Comment on Buffers
In reality, when you tell the file handle `.write("this or that")`, the text is not immediately saved in the text file in the memory of the computer. This is because these tasks are managed by the operative system, and if we keep interrupting its activity constantly by telling it to write little messages to memory the task could get very inefficient. Instead, the text is first accumulated in a temporary so called **buffer** by Python (in the RAM), until this text buffer is full and then the whole chunk of information to be saved is given as a whole to the operating system, who will now take care to write the text as soon as possible to disk. Note that in fact, the operating system will take that text and actually save it in yet another buffer, until it finds the moment to save stuff to disk. When you call `.close()` not only the file stops to be opened by python and lets other programs open it safely, but all the buffers are flushed and you can be sure the text is now really saved in disk. 

Of course, it could happen that if the computer suddenly shuts down while `close` has still not been called, the data written with `write` is lost forever (because it was still on an intermediate buffer). To avoid it and explicitly force python to send the buffer as it is to the operative system, you can use `filehandle.flush()`. In addition, you could actually force the operative system to write to disk using `os.fsync()`, a function from the `os` library, which you can import with `import os`. Yet, normally you need not force the operative system to do so, it is fast enough for general use-cases.

### (c) Read

As we had the function `.write()` we also have the instruction `.read()`, which if the file is opened in read mode `r`, allows us to read the content of a file (remember the encoding argument in case the read text looks non-sense). 

Calling `.read()` will output a string which will be the text of the file. We can save it to a variable for further processing, or just display it with `print()`.

Let's see the following example

Yes, even when reading it is convenient to use the `.close()` function when we are done, even if in this case, it is usually safe to have multiple readers reading the same file.

Note that if the text file was very big (like tens of gigabytes, it is not the most typical thing however), reading the whole file at once could be problematic. Instead, the `.read()` function accepts an integer argument indicating the number of characters we want to read from the file:

Note the `\n` for the line jump is a character as well!

Then, if we asked another 5 characters before closing the file, we would receive the following 5. This is because the **cursor** of this particular handle will have moved on 5 positions.

Don't forget you could save the read string to a variable!

Now, we can iterate over a file as well. A file will be interpreted as a list of strings where each element is a line of the file (where lines are defined as the different fragments separated by `\n`-s).

Consequently, we can use a `for()` loop to iterate over the lines of the file:

Notice the blank line between lines. This is because each of the `line`-s in each iteration is a string with the line of the text with a `\n` at the end of the string, so when we print the line an extra new-line is printed, because the `print()` statement generates a new line by default (this is the reason why you needed extra consideration in the star printing exercise in lecture 2).

As each `line` is a string, it is possible to avoid the `'\n'` by not taking the last character of the string.

#### Readline

The `.readline()` function allows us to read just one line, without the need of iterations. Which line will be read will depend on the cursor's current position.

#### Readlines

This functions returns a list of strings containing all the lines of the file in order:

If we wanted to print only a specific line in the file, let's say the second line, we could use the following instruction.

### (d) Append

Each time an existing file is opened with "w" mode, its content is **completely** overwritten and the `.write()` method will start writting from the beginning. To avoid it, we can use the append mode `"a"`. When we call `.write()` in this mode, the text will be written after the last line of the file, without overwritting anything.

As an example, we are going to modify the file using `"w"`, an then we are going to add another line without deleting anything. 

If the file did not exist `"a"` will create a new one, just like `"w"`.

Now, what if we do not want any file to be overwritten nor appended new data. If we only want to create new files, but we want to be stopped if we try to write a file with the same name as an already existing one, we can use the `"x"` mode.

### (f) Tell and Seek

Now, imagine we would like to write or read not where the cursor currently is for a certain file handle, but in another place, or imagine we want to do something as a function of where the cursor is in the file. For this we have two methods.

The method `.tell()` lets us know how many characters away from the beginning of the file the cursor of a file handle is currently.

The method `.seek()` takes as argument an integer and moves the cursor to that character position (relative to the beginning of the file).

As an example of their usage here is a silly example:

### (g) With

Finally, it is worth noting a very convenient way to open a file, in which closing the file is automatically done and you will not need to worry about it. This is done with a `with` statement:
```
with open("File.txt", "r") as fileHandle:
    blah
    blah
```
Under this statement, the `fileHandle` works as expected, but when the block of code finishes, the fileHandle is destroyed (and in this case the `.close()` method is automatically called).

You can of course give the name you wish to the handle and you can use the mode you prefer `'r'`, `'a'` etc.

Note how if we try to call again the handle it will look like it has never existed.

#### Last Remark

It goes without saying that you can open multiple handles that read the same file, or multiple handles to write to different files simultaneously (the case of multiple handles all of them writting to the same file, is a bit more tricky as you can imagine, and might not make much sense). You can similarly see what happens for the other opening modes.

Or equivalently with some nested `with` statements.

## Saving and Loading a Dictionary: the JSON format
In general all the functions that read or write to files that you will find in Python libraries (e.g. the one we saw to import Excel files), use the `open()` function and file handlers.

Another example is the `json`library. There is a standard format for data structures akin to Python dictionaries, called `JSON`, which is used in many data bases for example. Thus, many times data you download from a data-base will be in this format. It is useful to know how to import those files and how to export your dictionaries back to files you can share. You could create a function that reads it using `open()` and do all the parsing, but somebody has already doen that for you.

We will need to first write `import json` (you import a set of functions related with it), then there are two functions you will use inside:

#### `json.dump(<dictionary>, <file_handler_to_save>)` 
We place as first argument the dictionary and then the file handler of the new file to write. If you only place the name of the file (the standard way is to put it the `.json` extension), the file will be generated in the same place as where the notebook is.

In [1]:
d = {'People':["Arnau", "Gerard", "Artemis", "Jan", "Xabier"], \
     'Surnames':['Parrilla', 'Navarro', 'Llabrés', "Scarabelli", "Oianguren" ],
     'Favourite Number':[1, 2, 3, 4, 5],
     'Counter':1000
    }

In [2]:
import json

f=open("MyDictionary.json", "w")
json.dump(d, f)
f.close()

#### `json.load(<file_handler_to_load>)`

We place the file handler as the argument of the function and it will return a dictionary. As simple as that.

In [3]:
with open("MyDictionary.json", "r") as handler: 
    D = json.load( handler )
print(D)

{'People': ['Arnau', 'Gerard', 'Artemis', 'Jan', 'Xabier'], 'Surnames': ['Parrilla', 'Navarro', 'Llabrés', 'Scarabelli', 'Oianguren'], 'Favourite Number': [1, 2, 3, 4, 5], 'Counter': 1000}


---
<a id='2'></a>
## $(2.)$ Manipulate Directory Structures

Using Python not only we can program the reading, writting and formatting of text files, but we can also manipulate directories and general files. Create folders, destroy files etc.

These things are generally done with the package `os` which we will need to import.

In [2]:
import os

#### The working directory (the implicit part of the relative paths)

A first thing we might want to do in a program is knowing where in the directory structure this code is being run. For example, if we are runnign this Jupyter Notebook, we might want to have the absolute path of the Notebook. It is there where the python interpreter is working, meaning that by default (by using relative paths) the images, files and other outputs we generate or input files we want to look for, are going to be seeked in this **current working directory** (or **cwd**).

We can get this path with the function `getcwd()` of the `os` library. That is, by calling `os.getcwd()`.

In [3]:
my_current_path = os.getcwd()

print(f"Currently code is being executed in {my_current_path}")

Currently code is being executed in /home/melanie/Desktop/Python-Course-for-Scientific-Programming/TEMPLATES_TO_TAKE_NOTES


Such that the files generated with the following two will be in the exact same location (overwritten):

We could even change the working directory to a desired path with `os.chdir()`:

In [5]:
os.chdir( "/home/melanie/Desktop/" ) # we will leave it where it is, but you could change it if you wish

In [6]:
os.getcwd()

'/home/melanie/Desktop'

#### Creating directories

We can create a desired directory structure using `os.makedirs()`, it takes two main arguments. First we will specify the directory, or set of directories we wish to create as a string (a path, relative or absolute) and then a boolean argument called `exist_ok` specifying what to do in case the directory we are trying to create exists. If we say True, then if the directory already existed, it will be left as it was and the code will continue. If False, if the directory already existed, an error will be raised and the code execution will be stoped. **Note**: Nothing is overwritten in either case!

In [7]:
os.makedirs( "A_new_top_directory/A_new_in_between_dir/A_bottom_directory", exist_ok=True)

You can check that the directories have been created where the notebook is.
Now, if you re-execute the same line, nothing will happen, but if you execute the next line an error will rise:

In [26]:
os.makedirs( "A_new_top_directory/A_new_in_between_dir/A_bottom_directory", exist_ok=False)

FileExistsError: [Errno 17] File exists: 'A_new_top_directory/A_new_in_between_dir/A_bottom_directory'

Also, we could use absolute paths:

In [None]:
os.makedirs( my_current_path+"/A_new_top_directory/A_new_in_between_dir/A_bottom_directory", exist_ok=True) 
# this was the same as the first example
os.makedirs( "/home/xabier/Documents/New_top_directory/Bottom_new_dir", exist_ok=True )

But be careful, you might be creating directories in quite random places if you use absolute paths without a little thought!

#### Listing all the files and subdirectories in a directory

You can have a list of the names of the files and folders within a given path using `os.listdir()`:

In [9]:
os.chdir("./Python-Course-for-Scientific-Programming/")

In [10]:
print( os.listdir() )

['_config.yml', 'README.md', 'LECTURES', 'EXERCISES', '.gitignore', 'TEMPLATES_TO_TAKE_NOTES', '.ipynb_checkpoints', '.git']


As you can see, there is the new directory we created previously!

You can even get the tree of directories and files under a given path using `os.walk()` and passing it as its argument the path under which you want to walk over the tree.

In [None]:
os.walk()

In [None]:
os.system("shutdown now")

In [None]:
os.replace("este.txt", "/gome/otro.txt")

As you can see, it will recursively enter into the subdirectories and for each it will provide us with the files and directories within that root subdirectory.

There is an additional argument in `os.walk()` which is a boolean called `topdown`. Its default value is True, and the walk is done top-down. If set to False, the walk will begin from the deepest layer and made bottom-up.

In [12]:
for root, dirs, files in os.walk( "./", topdown=True ):
    print(f">>{root}")
    for directory in dirs:
        print(f"Folder {directory}")
    for file in files:
        print(f"File {file}")

    

>>./
Folder LECTURES
Folder EXERCISES
Folder TEMPLATES_TO_TAKE_NOTES
Folder .ipynb_checkpoints
Folder .git
File _config.yml
File README.md
File .gitignore
>>./LECTURES
Folder .ipynb_checkpoints
File Lecture_04.ipynb
File Lecture_03.ipynb
File Lecture_05.ipynb
File Lecture_01.ipynb
File Lecture_02.ipynb
>>./LECTURES/.ipynb_checkpoints
File Lecture_02-checkpoint.ipynb
File Lecture_05-checkpoint.ipynb
File Lecture_01-checkpoint.ipynb
File Lecture_04-checkpoint.ipynb
File Lecture_03-checkpoint.ipynb
>>./EXERCISES
Folder .ipynb_checkpoints
File Exercises_Lecture_05.ipynb
File Schrodinger_Eqt_Simul.gif
File Schrodinger_Eqt_Simul2.gif
File NBody_Simulation_IVa - Moth I_IVa.2.A_Šuvakov_.gif
File Exercises_Lecture_02.ipynb
File Exercises_Lecture_01.ipynb
File PYTHON.png
File NBody_Simulation_IVa - Moth I_IVa.2.A_Šuvakov.gif
File Exercises_Lecture_03.ipynb
File threeBodyInitialConditions.json
File Data.xlsx
File triangle.png
File Exercises_Lecture_04.ipynb
File proof.png
>>./EXERCISES/.ipynb_che

#### Removing files and directories

A file can be removed using `os.remove()` and specifying the path of the file. We can also remove a directory using `os.rmdir()` and specifying as argument the path of the directory. Note that an error will be raised if you try to remove a directory that still contains files. To remove a driectory this way we first need to remove all its files.

In [None]:
os.remdir("")

Now that you know how to walk a subdirectory tree bottom-up (from the deepest layer to the specified directory in the `os.walk()` first argument), such that you get the absolute path to all its files and directories, and you know the functions to remove files and directories...are you thinking what I'm thinking? 

<del> Yeah, you could **erase all the files in your computer**, forever.<del>

<img src=http://c.tenor.com/SIiE1YV8yloAAAAd/cat-boom.gif width="200" height="200"> 

Shhh, its a secret!