# PYTHON COURSE FOR SCIENTIFIC PROGRAMMING 
**Lecturer and main contributor to this lesson:**

Xabier Oyanguren Asua: oiangu9@gmail.com

**The lecture is heavily based on previous material prepared by:**

Jan Scarabelli Calopa: Jan.Scarabelli@e-campus.uab.cat 


All the course material can be found at: https://llacorp.github.io/Python-Course-for-Scientific-Programming/ 

---

# LECTURE V : File manipulation

### $(1.)$ - [*FILE HANDLING*](#1)
### $(2.)$ - [*MANIPULATE DIRECTORY STRUCTURES*](#2)
### $(3.)$ - [*MANAGE NUMPY DATA*](#3)
### $(4.)$ - [*ADDITIONAL FILE TYPES*](#4)

### [FURTHER TOPICS NOT COVERED IN THE COURSE](#5)

---
---

<a id='1'></a>
## $(1.)$ File Handing


Despite each OS has its own system to create and access files, Python has its own file manipulation system that uses a common interface known as a "file handle".

### (a) Open


The key function for working with files in Python is the `open()` function. It takes two parameters: `filename`, and `mode`.

There are four different methods (modes) for opening a file:

| Indicator (naive/extra) | Open a file to... | Opening mode + | Pointer Position |
| --- | --- | --- | --- |
| r/r+ | Read it. | +writing | Beginning |
| w/w+ | Write on it. It **overwrites** the file if it already existed. Creates a new file otherwise | +reading | Beginning |
| x/x+ | Write on it. `FileExistsError` raised and execution will stop if it already existed (safe mode). Creates a new file otherwise | +reading | Beginning | 
| a/a+ | Write at the end of an already existing file if it exists. Creates a new file otherwise. | +reading | End

(In general, it is my advice to use the "no +" modes of operation, to a void confusions.)

Let's see an example. We want to open a file called `"File.txt"` in `"w"` mode. That is, we want to create a new file with that name or just overrite any file with such a name with the objective of writting data to it.

In [2]:
nameHandle = open("File.txt","w")

where:
- `nameHandle` stands for the name we have invented to handle this file. That is, it will be our "access" to the file. If we want to write data to this file we will tell this object to do so.
- `open()` is the function to open a file, as we have already said.
- `"File.txt"` is the name (string) of the file we want to open.
- `"w"` indicates we want to write on this file (in an overwritting way).

##### Note on Encodings
As a side note, remember that everything in the end (in the memeory of the computer) is saved and manipulated in bits (0-s and 1-s), this means that in reality in the memory of the computer there is nowhere a letter `a`. Instead if for example we want to encode the 27 characters of the alphabet, we could give each letter a unique tag using sequences of 5 bits (there are 32 possible sequences since $2^5=32$). If so, we could say `a` will be the sequence $00000$, `b` will be $00001$, `c` will be $00010$ etc. Such that in memory we will save the text `acb` as $000000001000001$. Then when we want to read it again, we will need to specify in which encoding the text is codified to be able to convert it back to characters.

One of the first encoding standards for text was the so called ASCII standard based on the alphabet and symbols for English, which used 7 bits to encode about 127 different symbols or characters, among them the alphabetic characters, the numerical digits, symbols like `?` or `!` and some other special characters like `\n` representing line breaks, or some representing the end of a text file among others.

Today there are several standardized encodings, and each operative system (OS) has its preferred ones. ASCII was the main encoding used in Internet until December 2007, when it was surpassed by UTF-8 using 8 bits (and thus allowing double symbols to be coded).

In the function `open()`, we can also add a thrid argument indicating the **encoding** for the characters we will deal with in this file. If we do not specify it, Python will just use the default encoding on the OS. But in general, we could specify any of the encodings listed in the following table of the official Python documentation:

[https://docs.python.org/3/library/codecs.html#standard-encodings](https://docs.python.org/3/library/codecs.html#standard-encodings)

for example we could write `'utf_8'` or `'ascii'` as a third argument, say:

`open("File.txt", "w", "utf_8) `


### (b) Write and Close

The file handle we have created (the variable we decided to call `fileHandle`) is in reality an object with a pointer to a specific position of the text file. You can imagine it as the **cursor** that appears when you are writting something on a computer, just that this time you cannont see where the cursor is (it is this what we called "Pointer position" in the first table of the lecture).

Now, this file handle is like a variable, but which not only contains information, it also has associated functions that allow the user to do multiple things to the file pointed by the handle (it is a class instance, as we will comment in the end of the lecture). One of these functions is `.write()`. The string we pass to it as an argument will be written to the file in the last position we have added something (in the position of the cursor). That is, we can call to it multiple times and the added text will be sequential in the file.

After manipulating the file, we **must** call to the `.close()` function! It is then when the text is really written to the file. Until then, there might be parts of the text we have sent using `.write()` that were left floating in a limbo (called a text **buffer**).

Let's see an example.

In [2]:
fileHandle = open("File.txt","w") #Creation of the file
fileHandle.write("Hi!\nWelcome to the python course.\n")
fileHandle.write("Enjoy!\n")
fileHandle.close()

If you now open the file "File.txt" that has been created in the directory where the Jupyter Notebook is, you will see that it contains the follwoing text:

You may have noticed the strange character `\n` we introcuced in the writting function as part of the string. The character `\` is an escapement character, meaning that the following character must be treated in a speacial way. In this case, for example, the string `\n` indicates the beginning of a new line. Another example is `\t` which will be interpreted as a TAB space.

Or for example, imagine we would like to write a double or single quote `"` or `'` to the file. How could we do this? Since they are the string delimiters in Python we will find it to be impossible. Try it. For this, we have the special characters `\"` and `\'` respectively!

Once again, after having edited the file, we want to save the changes to let other programmes access its contents. To do so, we must use the `.close()` function.


#### Notes on Relative and Absolute Paths
As we said, the result of this operation will be the creation of the file in the same directory (**directory==folder**) as the Jupyter Notebook is saved. But, is it possible to create files in other locations? Absolutely! 

For that instead of just introducing the name of the file in `open()` as `"File.txt"` we will need to introduce the details of the **path** to the directory we want it to be saved. The path is the location of a file or directory in the tree of directories of the persistent memory of the computer. We can provide it in two ways. A **relative path** is a path given from the point the Python interpreter is being executed and on (in our case, from the directory of the Jupyter Notebook and down there). For instance, imagine you have the following tree of directories under the folder where this Jupyter Notebook called Lecture_05.ipynb is:

Then, if we want to save the text in the folder "Save_the_text_here_Folder" under the folder "Folder_1" which is insider the "Python_Course" folder where this notebook is, you will do:

`open("Folder_1/Save_the_texts_here_Folder/File.txt", "w")`

This path `"Folder_1/Save_the_texts_here_Folder/File.txt"` is the a so called relative path.

If instead you want to save things anywhere in the directory structure, we will need to write the full path from the beginning of the tree of folders, for example something like `"/home/xabier/Documents/UAB/My_texts/File.txt"`. This is called an **absolute path**, since there is no doubt where I am trying to save it. That is, even if I move the Jupyter Notebook to other places in memory, the text file will always be created in the same place! 

Ofcourse, this will work if those folders like `"Folder_1"` or `"Documents"` etc. already exist! Make sure they do. We will see in a moment how to create new directories using Python.

##### Important note for Windows vs Linux/MacOS users
In Linux and MacOS, the directory structure always starts in a `"/"` and the next subfolders are indicated with a bar `"/"`, like `"/home/xabier/Documents/UAB/My_texts/File.txt"`. However, in Windows directory trees always start in the drive in which data is saved, typically `"C:\` and the separation between subdirectories is specified with an inverse bar `"\"`. For example `"C:\Users\xabier\Documents\UAB\My_texts\File.txt"`.

##### Comment on File names
If you really are into jumping from the naive user side to a more advanced computer user side, you should end today naming files with spaces and strange characters. This is because many interpreters (not the case of Python, but you will save problems in the future) interpret the space as a finish mark and will stop reading the name of the file at that poingt. That is, avoid naming files as `"¡My first text file is sugoi!.txt"` and instead name it `"My_first_text_file_is_sugoi.txt"` or something of the like.

### (c) Read

As we had the function `.write()` we also have the instruction `.read()`, which if the file is opened in read mode, allows us to read the content of a file (remember the encoding argument in case the read text looks non-sense). 

Calling `.read()` will output a string which will be the text of the file. We can save it to a variable for further processing, or just display it with `print()`.

Let's see the following example

In [7]:
fileHandle = open("File.txt","r") #read only
print(fileHandle.read())
fileHandle.close()

Hi!
Welcome to the python course.
Enjoy!



Yes, even when reading it is convenient to use the `.close()` function when we are done, even if in this case, it is usually save to have multiple readers reading the same file.

Note that if the text file was very big (like tens of gigabytes, it is not the most typical thing however), reading the whole file at once could be problematic. Instead, the `.read()` function accepts an integer argument indicating the number of characters we want to read from the file:

In [8]:
fileHandle = open("File.txt","r") #read only
print(fileHandle.read(5))
fileHandle.close()

Hi!
W


Then, if we asked another 5 characters before closing the file, we would receive the following 5. This is because the **cursor** of this particular handle will have moved on 5 positions.

In [None]:
fileHandle = open("File.txt","r") #read only
print(fileHandle.read(5))
print("well well")
print(fileHandle.read(5))
fileHandle.close()

Don't forget you could save the read string to a variable!

In [None]:
fileHandle = open("File.txt","r") #read only
text = fileHandle.read(5)

text = text + fileHandle.read(5)
print(text)
fileHandle.close()

Now, we can iterate over a file as well. A file will be interpreted as a list of strings where each element is a line of the file (the different parts separated by `\n`-s).

Consequently, we can use a `for()` loop to iterate over the lines of the file:

In [9]:
fileHandle = open("File.txt","r")

for line in fileHandle:
    print(line)
    
fileHandle.close()

Hi!

Welcome to the python course.

Enjoy!



Notice the blank line between lines. This is because each of the `line`-s in each iteration is a string with the line of the text with a `\n` at the end of the string, so when we print the line an extra new-line is printed, because the `print()` statement generates a new line for default.

As each `line` is a string, it is possible to avoid the `'\n'` by not taking the last character of the string.

In [10]:
fileHandle = open("File.txt","r")
for line in fileHandle:
    print(line[:-1])
fileHandle.close()

Hi!
Welcome to the python course.
Enjoy!


In [None]:
fileHandle = open("File.txt","r")
for line in fileHandle:
    line = line[:-1] + 'xDD'
    print(line)
fileHandle.close()

#### Readline

The `.readline()` function allows us to read just one line, without the need of iterations. Which line will be read will depend on the cursor's current position.

In [11]:
fileHandle = open("File.txt","r")
one = fileHandle.read(5)
two = fileHandle.readline()
three = fileHandle.readline()
print(f"{one} \n {two} \n {three}")
fileHandle.close()

Hi!



#### Readlines

This functions returns a list of strings containing all the lines of the file in order:

In [13]:
fileHandle = open("File.txt","r")
list_of_lines = fileHandle.readlines()
fileHandle.close()

print(list_of_lines)

['Hi!\n', 'Welcome to the python course.\n', 'Enjoy!\n']


If we wanted to print only a specific line in the file, let's say the second line, we could use the following instruction.

In [14]:
fileHandle = open("File.txt","r")
second_line = fileHandle.readlines()[1]
fileHandle.close()
print(second_line)

Welcome to the python course.



### (d) Append

Each time an existing file is opened with "w" mode, its content is **completely** overwritten and the `.write()` method will start writting from the beginning. To avoid it, we can use the append mode `"a"`. When we call `.write()` in this mode, the text will be written after the last line of the file, without overwritting anything.

As an example, we are going to modify the file using `"w"`, an then we are going to add another line without deleteing anything. 

In [15]:
print("Result in writting mode:")
#Overwrite file
fileHandle = open("File.txt","w") 
fileHandle.write("Everything has been erased!\n")
fileHandle.close()
#Print result of "w"
fileHandle = open("File.txt","r")
print(fileHandle.read())
fileHandle.close()

print("Result with append mode:")
#Use of "a"
fileHandle = open("File.txt","a")
fileHandle.write("Use \'a\' parameter to avoid overwriting it!")
fileHandle.close()
#Print result of "a"
fileHandle = open("File.txt","r")
print(fileHandle.read())
fileHandle.close()

Result in writting mode:
Everything has been erased!

Result with append mode:
Everything has been erased!
Use 'a' parameter to avoid overwriting it!


If the file did not exist `"a"` will create a new one, just like `"w"`.

Now, what if we do not want any file to be overwritten nor appended new data. If we only wnat to create new files, but we want to be stopped if we try to write a file with the same name as an already existing one, we can use the `"x"` mode.

In [16]:
#Should return error if file already exists
fileHandle = open("File.txt","x") 
fileHandle.write("Let's try x mode\n")
fileHandle.close()

FileExistsError: [Errno 17] File exists: 'File.txt'

In [17]:
fileHandle = open("File2.txt","x")  # it will create a new file if it does not aleready exist
fileHandle.write("Let's try x mode\n")
fileHandle.close()

### (f) Tell and Seek

Now, imagine we would like to write or read not where the cursor currently is for a certain file handle, but in another place, or imagine we want to do something as a function of where the cursor is in the file. For this we have two methods.

The method `.tell()` lets us know how many characters away from the beginning of the file the cursor of a file handle is currently.

The method `.seek()` takes as argument an integer and moves the cursor to that character position (relative to the beginning of gthe file).

As an example of their usage here is a silly example:

In [48]:
fileHandle = open("File.txt","r")
print(fileHandle.tell())
print(fileHandle.read(5)) # read first 5 characters
print(fileHandle.tell())

fileHandle.seek( fileHandle.tell() - len(fileHandle.readlines()[0]) ) # move cursor to the end of this first line
print(fileHandle.tell())
print(fileHandle.read()) # read remaining text from the position of the cursor
fileHandle.close()

Use 'a' parameter
45


### (g) With

Finally, it is worth noting a very convenient way to open a file, in which closing the file is automatically done and you will not need to worry about it. This is done with a `with` statement:
```
with open("File.txt", "r") as fileHandle:
    blah
    blah
```
Under this statement, the `fileHandle` works as expected, but when the block of code finishes, the fileHandle is destroyed (and in this case the `.close()` method is automatically called).

You can of course give the name you wish to the handle and you can use the mode you prefer `'r'`, `'a'` etc.

In [1]:
with open("File.txt", "r") as fileHandle:
    print(fileHandle.read())

Everything has been erased!
Use 'a' parameter to avoid overwriting it!


Note how if we try to call again the handle it will look like it has never existed.

In [None]:
with open("File.txt", "r") as handle:
    print(handle.read(10))
print(handle.read())

#### Last Remark

It goes without saying that you can open multiple handles that read the same file, or multiple handles to write to different files simultaneously (the case of multiple handles all of them writting to the same file, is a bit more tricky as you can imagine, and might not make much sense). You can similarly see what happens for the other opening modes.

In [None]:
handle1_read = open("File.txt", "r")
handle2_read = open("File.txt", "r")
handle1_write = open("New.txt", "w")
handle2_write = open("New2.txt", "w")

handle1_write.write( handle1_read.read() )
print( handle2_read.read() )
handle2_write.write( "This is the last day!" )

handle1_read.close()
handle2_read.close()
handle1_write.close()
handle2_write.close()

Or equivalently with some nested `with` statements.

<a id='2'></a>
## $(2.)$ Manipulate Directory Structures

Using Python not only we can program the reading, writting and formatting of text files, but we can also manipulate directories and general files. Create folders, destroy files etc.

These things are generally done with the package `os` which we will need to import.

In [None]:
import os

#### The working directory (the implicit part of the relative paths)

A first thing we might want to do in a program is knowing where in the directory structure this code is being run. For example, if we are runnign this Jupyter Notebook, we might want to have the absolute path of the Notebook. It is there where the python interpreter is working, meaning that by default (by using relative paths) the images, files and other outputs we generate or input files we want to look for, are going to be seeked in this **current working directory** (or **cwd**).

We can get this path with the function `getcwd()` of the `os` library. That is, by calling `os.getcwd()`.

In [None]:
my_current_path = os.getcwd()

print(f"Currently code is being executed in {my_current_path}")

Such that the files generated with the following two will be in the exact same location (overwritten):

In [None]:
with open("File.txt", "w") as fileHandle:
    fileHandle.write("Using a relative path")
    
with open(my_current_path + "File.txt", "w") as fileHandle:
    fileHandle.write("Using an absolute path")

We could even change the working directory to a desired path with `os.chdir()`:

In [None]:
os.chdir( my_current_path ) # we will leave it where it is, but you could change it if you wish

#### Creating directories

We can create a desired directory structure using `os.makedirs()`, it takes two main arguments. First we will specify the directory, or set of directories we wish to create as a string (a path, relative or absolute) and then a boolean argument called `exist_ok` specifying what to do in case the directory we are trying to create exists. If we say True, then if the directory already existed, it will be left as it was and the code will continue. If False, if the directory already existed, an error will be raised and the code execution will be stoped.

In [None]:
os.makedirs( "/A_new_top_directory/A_new_in_between_dir/A_bottom_directory", exist_ok=True)

You can check that the directories have been created where the notebook is.
Now, if you re-execute the same line, nothing will happen, but if you execute the next line an error will rise:

In [None]:
os.makedirs( "/A_new_top_directory/A_new_in_between_dir/A_bottom_directory", exist_ok=False)

Also, we could use absolute paths:

In [None]:
os.makedirs( my_current_path+"/A_new_top_directory/A_new_in_between_dir/A_bottom_directory", exist_ok=True) 
# this was the same as the first example
os.makedirs( "/home/xabier/Documents/New_top_directory/Bottom_new_dir", exits_ok=True )

But be careful, you might be creating directories in quite random places if you use absolute paths without a little thought!

#### Listing all the files in a directory

You can have a list of the names of the files in a given path uing `os.listdir()`:

In [None]:
list_of_files_in_working_dir = os.listdir( my_current_path )
print(list_of_files_in_working_dir)

As you can see, there is the new directory we created previously!

You can even get the tree of directories and files under a given path using `os.walk()` and passing it as its argument the path under which you want to walk over the tree.

In [None]:
for root, dirs, files in os.walk( my_current_path ):
    print(f"\nFiles under {root}")
    for name in files:
        print(name)
    print(f"\n\nDirectories under {root}")
    for name in dirs:
        print(name)

As you can see, it will recursively enter into the subdirectories and for each it will provide us with the files and directories within that root subdirectory.

There is an additional argument in `os.walk()` which is a boolean called `topdown`. Its default value is True, and the walk is done top-down. If set to False, the walk will begin from the deepest layer and made bottom-up.

In [None]:
for root, dirs, files in os.walk( my_current_path, topdown=False ):
    print(f"\nFiles under {root}")
    for name in files:
        print(name)
    print(f"\n\nDirectories under {root}")
    for name in dirs:
        print(name)

#### Removing files and directories

A file can be removed using `os.remove()` and specifying the path of the file. We can also remove a directory using `os.rmdir()` and specifying as argument the path of the directory. Note that an error will be raised if you try to remove a directory that still contains files. To remove a driectory this way we first need to remove all its files.

In [None]:
os.remove("File.txt")
os.rmdir("/A_new_top_directory/A_new_in_between_dir/A_bottom_directory")
# you can check how both have disappeared

Now that you know how to walk a subdirectory tree bottom-up (from the deepest layer to the specified directory in the `os.walk()` first argument), such that you get the absolute path to all its files and directories, and you know the functions to remove files and directories...are you thinking what I'm thinking? 

<del> Yeah, you could **erase all the files in your computer**, forever.<del>

<img src=http://c.tenor.com/SIiE1YV8yloAAAAd/cat-boom.gif width="200" height="200"> 

Shhh, its a secret!

<a id='3'></a>
## $(3.)$ Manage Numpy Data

With numpy it is possible to read files in order to extract their information and build arrays to work with. Basically, we can work upon text files or .npy and .npz files, which turn to be more efficient when it comes to loading data from them. 

### Reading data from .txt files: .write() and .loadtxt()

Let's see text files first. To do so, let's create a new file in wich we will introduce example values. Important: for now, the values must be written as strings and not as integer, floats, etc. as they will be written in text files and the only datatype suported in them are $\texttt{char}$ or $\texttt{strings}$.

In [4]:
import numpy as np

In [9]:
with open("np.txt","w") as npHandle:
    npHandle.write("1 2 3")

Instead of $\texttt{read}()$ function, numpy uses $\texttt{loadtxt}()$ to read information from text files.

In [10]:
np.loadtxt("np.txt")

array([1., 2., 3.])

Also possible with multidimensional arrays, where the '\n' character will indicate by default the end of a row.

In [7]:
with open("np.txt","w") as npHandle:
    npHandle.write("1 0 0\n0 1 0\n0 0 1\n")

np.loadtxt("np.txt")

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Notice how the data is written in np.txt and how the loadtxt function reads it. We can change it by modifying some parameters. For example, imagine we want an array of integers and the data we recieve comes with de delimiter ',':

In [61]:
with open("npex.txt","w") as npHandle:
    npHandle.write("1,0,0\n0,1,0\n0,0,1\n")

np.loadtxt("npex.txt", dtype='int', delimiter=',')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

#### savetxt()

Until now we have been writing 'arrays' as strings in a text file, but with $\texttt{savetxt}()$ we can also save numpy arrays in a .txt or .csv file and they will be automatically converted to strings:

In [15]:
a = np.array([1,2,3])

np.savetxt("np.txt", a, delimiter=',') #default is ','

np.loadtxt("np.txt")

array([1., 2., 3.])

There are more useful parameters we can use to adapt the data to what we want. Let's see some of them.

##### skiprows and max_rows

In [16]:
np.savetxt("np.txt", np.identity(4))
np.loadtxt("np.txt")

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [17]:
np.loadtxt("np.txt", skiprows=1, max_rows=2)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.]])

##### usecols

In [78]:
np.loadtxt("npex.txt", usecols=[0,3])

array([[1., 0.],
       [0., 0.],
       [0., 0.],
       [0., 1.]])

##### ndmin (0 as default)

In [87]:
npHandle = open("npex.txt","w")
npHandle.write("1 2 3 4\n")
npHandle.close()

In [89]:
np.loadtxt("npex.txt")

array([1., 2., 3., 4.])

In [93]:
np.loadtxt("npex.txt", ndmin=2)

(1, 4)

### Reading data from .npy or .npz files

A .npy file is a binary file from which we extract the data we want to study, such as .txt files or .csv. The difference though, is that when talking about big datasets, .npy files result to be much more faster as they are binary files. An example of use is with datasets that are prepared to be used in machine learning algorithms.

Let's see how they work.

#### save() and load()

A numpy array can be saved into a .npy file by using $\texttt{save}()$ function, and it can be loaded with $\texttt{load}()$.

In [42]:
a = np.random.randint(low = 0, high = 100, size = (4,6))

np.save("npyfile.npy",a)
np.load("npyfile.npy")

array([[83,  7, 25,  0, 32, 33],
       [68, 10, 60, 38, 43, 77],
       [90, 91, 48, 41, 70, 67],
       [56, 83, 53, 94, 78, 39]])

On the other hand, we save data into a compressed .npz file with:

In [43]:
a = np.random.randint(low = 0, high = 100, size = (4,6))
b = np.random.randint(low = 0, high = 100, size = (4,6))

np.savez_compressed("npzfile.npz",a,b)
np.load("npzfile.npz")

<numpy.lib.npyio.NpzFile at 0x7f2ed4692af0>

As you can see, it does not print the array as expected. The reason of this is that $\texttt{load}()$ function for .npz files return a dictionary of arrays. To acces each of them, we can use $\textit{dict_data['arr_i']}$, where 'i' stands for the ith array:

In [44]:
data_dict = np.load("npzfile.npz")
print("First array:\n")
print(data_dict["arr_0"])

print("\nSecond array:\n")
print(data_dict["arr_1"])

First array:

[[92 42 81 13 29 87]
 [76 59  3 15 84 67]
 [ 9 26 84 29 27 88]
 [80 71 28 49  2 68]]

Second array:

[[66  2  0 27 13 33]
 [18 24 76 39 39 48]
 [37 65 94 49 81 31]
 [55 78 20 84 83 93]]


If you want to, you can see the difference by executing the next cells.

In [36]:
from time import time

In [10]:
Azaldu el time! Ein adibide bat en el que se vaya ejecutando algo y ke compruebe cada x tiempo tal.

SyntaxError: invalid syntax (<ipython-input-10-237fae4c8f7c>, line 1)

In [37]:
N = 10000000  # random datapoints
with open('data.txt', 'w') as data:
    for _ in range(N):
        data.write(str(10*np.random.random())+',')
data.close()

In [38]:
start = time()

with open('data.txt', 'r') as data:
    string_data = data.read()
    
end = time()
 
list_data = string_data.split(',')
list_data.pop()
data_array = np.array(list_data, dtype=float).reshape(10000, 1000)


print("### 10 million points of data ###")
print("\nData summary:\n", data_array)
print("\nData shape:\n", data_array.shape)
print(f"\nTime to read: {round(end-start,5)} seconds.")

### 10 million points of data ###

Data summary:
 [[4.09144984 3.84302398 7.16627551 ... 2.01428061 2.79600872 3.30568972]
 [2.56943462 1.89302601 0.32476652 ... 0.82489118 5.30315169 1.1540871 ]
 [7.37108472 0.90150474 6.49690544 ... 7.12834269 7.51508654 9.27592637]
 ...
 [9.36692545 2.33595017 1.21974856 ... 2.17260358 1.37939516 3.70853589]
 [8.78659462 1.00940435 0.25049989 ... 3.405163   5.45185069 6.97280555]
 [2.71625771 0.0726352  9.87130631 ... 3.27120981 3.12439974 4.9982651 ]]

Data shape:
 (10000, 1000)

Time to read: 0.12279 seconds.


Let's see the .npy version

In [40]:
np.save('data.npy', data_array)

In [41]:
start=time()

data_array = np.load('data.npy')

end=time()

print("### 10 million points of data ###")
print("\nData summary:\n", data_array)
print("\nData shape:\n", data_array.shape)
print(f"\nTime to read: {round(end-start,5)} seconds.")

### 10 million points of data ###

Data summary:
 [[4.09144984 3.84302398 7.16627551 ... 2.01428061 2.79600872 3.30568972]
 [2.56943462 1.89302601 0.32476652 ... 0.82489118 5.30315169 1.1540871 ]
 [7.37108472 0.90150474 6.49690544 ... 7.12834269 7.51508654 9.27592637]
 ...
 [9.36692545 2.33595017 1.21974856 ... 2.17260358 1.37939516 3.70853589]
 [8.78659462 1.00940435 0.25049989 ... 3.405163   5.45185069 6.97280555]
 [2.71625771 0.0726352  9.87130631 ... 3.27120981 3.12439974 4.9982651 ]]

Data shape:
 (10000, 1000)

Time to read: 0.0135 seconds.


For further information about numpy files, you can visit:

https://towardsdatascience.com/why-you-should-start-using-npy-file-more-often-df2a13cc0161

<a id='4'></a>

## $(4.)$ Additional File types

Since Python is open source, there are plenty of libraries for almost anything you can imagine. This means that you can find libraries for not only manipulating raw text files like we have, but also to automatically generate `excel` files with the data you have processed, `csv` files etc.

Not only that, but you can dump any kind of Python variable or object to a permanent file. Check the `pickle` or `dill`libraries for this. For dictionaries in particular, you can also use the `json` library (which is a standard file extension as well).

Also, we have not talked about it, but there are ways to not only dump the raw data to a file, but to dump it in a compressed manner! Such that the generated files occupy much less space. For numpy arrays for example, we have the `h5py` library that allows generating massive `.h5` dataset files in a compressed manner, with easy access to different numpy arrays just using a tag for them and without the need to import the whole dataset to RAM in order to use its parts! Yeah, miracles exist.

-------------------------------------
---
<a id='5'></a>

# Further Topics Not Covered in the Course!

Color in, color out...the course has arrived to its end!

But don't be sad! There are way more things to know about python that we didn't have time to cover here!

Here you have a list of the possible following topics you could attack to deepen your programming skills:

- Using `.py` scripts to code instead of a Jupyter Notebook, running scripts from a terminal/console.


- Object Oriented Programming: the `class` conondrum.


- Assertions and error controls, `try`-`except`-`catch` structures.


- `map` and `lambda` functions.


- Manipulating images, videos, sound and other more exotic data types (openCV, sounddevice ...)


- Machine Learning and Deep Learning (pytorch, scikit-learn, tensorflow ...)


- Parallel processing using several CPU cores or using the GPU itself (mpi4py, multiprocessing, pytorch ...)


- Calling C/C++ functions under the hood (ctypes ...)


- Graphical user interfaces (pyQt ...)


- Standalone executables (pyinstaller ...)


- Web/mobile application development (flask ...)


- ...

Now that you have the basic notions for Python programming, you could follow almost any online course to continue exploring everything Python has to offer! You can look for free courses in pages like EdX, Coursera or even Youtube itself!

What is more, once you have integrated the concepts you have learnt here, you could even try to give a chance to other lower level programming languages as well! They are more complicated to program (you must have into account more things), but the essentials are the same.

And if not, note that unless you need some very very fast and optimized code you can manage with python. In fact, remember that most of the important libraries (numpy, pytorch, mpi4py etc.) in reality execute C or Fortran code under the hood, so the speed up is already available with Python, without requiring you to go down to the kitchen of hell!


It has been a pleasure to have you here!

For anything you need, you have our emails at the beginning of the notebooks,


**LLA & SCN$^\textbf{2}$** 

In [None]:
ARIKETA POTENTZIAL BAT
Ein adibide bat donde se listen todos los archivos de un directorio y se genere un dictionary del contenido de cada uno