# Reading and writing files

We have made good progress and now we can get down to the more serious task of manipulating files. This is one of the very important points concerning this training. 


N.B: Most of the files in `./data/` are files that we will use to understand how file opening works. They don't have a special purpose other than that. 

To open/edit a file in python we use the `open()` function.

This function takes as first parameter the path of the file (*relative* or *absolute*) and as second parameter the type of opening, _i.e._ reading or writing mode.

A **relative path** in computing is a path that takes into account the current location. The path is **relative** to where it is called from

- **Example:** _./data/data.txt_

An **absolute path** is a complete path that can be read regardless of the reading location

- **Example:** _/Users/becodian/Desktop/BeCode/ai-track/content/2.python/2.python_advanced/04.File-handling/data/data.txt_

The best practice is to always use **relative** paths in your Python code. In this way your code can be shared **as it is** with your colleagues. An absolute path will generate an error since it exists only on your own computer.



In [3]:
filename = "./data/data.txt"
my_file = open(filename, "r")  # r for "read"

- `"r"`, for a read opening (READ).

- `"w"`, for a write opening (WRITE), each time the file is opened, the content of the file is overwritten. If the file does not exist, Python creates it. 

    *The Python docs say that `w+` will "overwrite the existing file if the file exists". So as soon as you open a file with `w+`, it is now an empty file: it contains 0 bytes. If it used to contain data, that data has been truncated — cut off and thrown away — and now the file size is 0 bytes, so you can't read any of the data that existed before you opened the file with `w+`. If you actually wanted to read the previous data and add to it, you should use `r+` instead of `w+`* [[Source]](https://stackoverflow.com/questions/16208206/confused-by-python-file-mode-w#comment83227862_16208298)
    
    

- `"a"`, for an opening in add mode at the end of the file (APPEND). If the file does not exist, Python creates it.

- `"x"`, creates a new file and opens it for writing

You can also append the character `+` and `b` to nearly all of the above commands. [[More info here]](https://stackabuse.com/file-handling-in-python/)

Like any open element, it must be closed again once the instructions have been completed. To do this, we use the `close()` method.

In [4]:
my_file.close()

In [5]:
# Let's find out what's going on there
my_file = open(filename, "r")
print(my_file.read())
my_file.close()

Hi everyone, I'm adding sentences to the file !


Another possibility of opening without closing by using a **with** statement. That's a **best practice** and you should use that as much as you can.

In [6]:
with open(filename, "r") as my_file:
    print(my_file.read())

Hi everyone, I'm adding sentences to the file !


Can you create a list based on the contents of this file? Each word should be an element of the list
*(Use `.split()` for example...)*

In [7]:
with open(filename, "r") as my_file:
    print(my_file.read().split(" "))

['Hi', 'everyone,', "I'm", 'adding', 'sentences', 'to', 'the', 'file', '!']


To write in a file, just **open** (existing or not), write in it and close it. We open it in mode `"w"` so that the previous data is deleted and new data can be added.

In [8]:

new_filename = "./data/data_new.txt"
file = open(new_filename, "w")
file.write("Hi everyone, I'm adding sentences to the file !")
file.close()

Can you take the content of the `data.txt` file from the `./data/` directory, capitalize all the words and write them in the file that you created just before, after the sentences you added?


In [9]:
# It's up to you to write the end
array = []
with open(filename, "r") as input_file:
    with open(new_filename, "a") as output_file:
        array = input_file.read().split(" ")
        for element in array:
            output_file.write(f" {element.capitalize()}")        


## Management of directory paths...

The `os` module is a library that provides a portable way of using operating system dependent functionality.
In this chapter, we are interested in using its powerful file path handling capabilities using `os.path`.

In [10]:
import os

Each file or folder is associated with a kind of address that makes it easy to find it without errors. It is not possible to have a file with an identical name as another inside the same folder (except if the file extension is different).

As said before, there are two kinds of paths: the absolute path from the root of your file system and the relative path from the folder being read.

By using `help` function, we can see the available methods.

In [11]:
help(os.path)

Help on module posixpath:

NAME
    posixpath - Common operations on Posix pathnames.

MODULE REFERENCE
    https://docs.python.org/3.12/library/posixpath.html

    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    Instead of importing this module directly, import os and refer to
    this module as os.path.  The "os.path" name is an alias for this
    module on Posix systems; on other systems (e.g. Windows),
    os.path provides the same operations in a manner specific to that
    platform, and is an alias to another module (e.g. ntpath).

    Some of this can actually be useful on non-Posix systems too, e.g.
    for manipulation of the pathname component of URLs.

FUNCTIONS
    abspath(path)
        Return 

To know your current absolute path, use `abspath('')`

In [12]:
# In Python a path is a string, so there are methods to manipulate it.
path = os.path.abspath("")
print(path)
print(type(path))

/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling
<class 'str'>


 To get the **directory** containing a path, usr `dirname(path)`.

In [13]:
path_1 = os.path.dirname(path)
print(path_1)

/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced


To only get the file name of a path (or directory name if this is a directory), use `basename(path)`.

In [14]:
os.path.basename(path)

'04-FileHandling'

To add a directory, let's say `"text"` to the path, we use `join()`. 

The cool thing is that it is compatible across operating systems. Meaning that on Windows it will automatically add `\` between the arguments of `os.path.join`, and on Linux it will add `/`. The same code thus works on every operating system!

In [15]:
rep_text = os.path.join(path, "text")
print(rep_text)

/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/text


To retrieve all the elements of a folder as a list, you can use the `listdir()` method.

In [16]:
# Items are returned as a list and includes folders and hidden files.
os.listdir("../")

['06-Concurrency',
 '08-Typing',
 'README.md',
 '04-FileHandling',
 '01-OOP',
 '02-ExceptionHandling',
 '07-Decorators',
 '10-UnitTesting',
 '03-Regex',
 '09-DataStructures',
 '05-Scraping']

### How to display all the elements of a folder as well as its child folders? 

With the `walk()` function:

```
walk(top, topdown=True, onerror=None, followlinks=False)
```


In [17]:
folder_path = os.path.abspath("./")
print(folder_path)

for path, dirs, files in os.walk(folder_path):
    for filename in files:
        print(os.path.join(path, filename))

/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling
/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/file-handling.ipynb
/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX09.txt
/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/mail.txt
/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX01.txt
/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX90.txt
/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX89.txt
/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHan

Create a list of all the **`.txt` files** from the `data/` directory

In [18]:

txt_files = []
new_path = os.path.abspath("./data")
print(new_path)

for path, dirs, files in os.walk(new_path):
    for filename in files:
        if ".txt" in filename:
            txt_files.append(os.path.join(path, filename))

print(txt_files)


/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data
['/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX09.txt', '/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/mail.txt', '/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX01.txt', '/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX90.txt', '/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX89.txt', '/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-PythonAdvanced/04-FileHandling/data/VOEUX75.txt', '/home/siegfried2021/Bureau/BeCode_AI/LGG-Thomas4-Mathieu/01-TheField/02-Python/02-Pyt

Open all the files of the list, and add their content into a new file `final.txt` that you will create in `data/`.

In [19]:
with open("./data/final.txt", "w", encoding="latin-1") as new_file:
    for filename in txt_files:
        with open(filename, "r", encoding="latin-1") as text_file:
            content = text_file.read()
            new_file.write(content)
            new_file.write("\n")
        
with open("./data/final.txt", "r", encoding="latin-1") as test:
    print(test.read())


L'année qui s'achève a été difficile pour tous. Aucun continent, aucun pays, aucun secteur n'a été épargné. La crise économique a imposé de nouvelles peines, de nouvelles souffrances, en France comme ailleurs. Je pense en particulier à ceux qui ont perdu leur emploi. Cependant notre pays a été moins éprouvé que beaucoup d'autres. Nous le devons à notre modèle social qui a amorti le choc, aux mesures énergiques qui ont été prises pour soutenir l'activité et surtout pour que personne ne reste sur le bord du chemin.

Mais c'est à chacun d'entre vous que revient le plus grand mérite. Je veux rendre hommage ce soir au sang-froid et au courage des Français face à la crise. Je veux rendre un hommage particulier aux partenaires sociaux qui ont fait preuve d'un grand sens des responsabilités, aux associations qui ont secouru ceux qui en avaient le plus besoin, aux chefs d'entreprises, ils sont nombreux, qui se sont efforcés de sauver des emplois.

Ensemble nous avons évité le pire. Mais nous av