# Reading and writing files

We have made good progress and now we can get down to the more serious task of manipulating files. This is one of the very important points concerning this training. 


N.B: Most of the files in `./data/` are files that we will use to understand how file opening works. They don't have a special purpose other than that. 

To open/edit a file in python we use the `open()` function.

This function takes as first parameter the path of the file (*relative* or *absolute*) and as second parameter the type of opening, _i.e._ reading or writing mode.该函数的第一个参数是文件路径（*相对*或*绝对*），第二个参数是打开类型，即_读取或写入模式。

A **relative path** in computing is a path that takes into account the current location. The path is **relative** to where it is called from

- **Example:** _./data/data.txt_

An **absolute path** is a complete path that can be read regardless of the reading location

- **Example:** _/Users/becodian/Desktop/BeCode/ai-track/content/2.python/2.python_advanced/04.File-handling/data/data.txt_

The best practice is to always use **relative** paths in your Python code. In this way your code can be shared **as it is** with your colleagues. An absolute path will generate an error since it exists only on your own computer.计算中的**相对路径**是考虑到当前位置的路径。该路径与调用它的位置**相关**

- 例如：** _./data/data.txt_

绝对路径**是一个完整的路径，无论读取位置在哪里都可以读取

- 示例：** _/用户/Becodian/Desktop/BeCode/ai-track/content/2.python/2.python_advanced/04.File-handling/data/data.txt_。

最佳做法是在 Python 代码中始终使用**相关**路径。这样，您的代码就可以**原样**地与同事共享。绝对路径会产生错误，因为它只存在于您自己的计算机上。



In [None]:
filename = "./data/data.txt"
my_file = open(filename, "r")  # r for "read"

- `"r"`, for a read opening (READ).

- `"w"`, for a write opening (WRITE), each time the file is opened, the content of the file is overwritten. If the file does not exist, Python creates it. 

    *The Python docs say that `w+` will "overwrite the existing file if the file exists". So as soon as you open a file with `w+`, it is now an empty file: it contains 0 bytes. If it used to contain data, that data has been truncated — cut off and thrown away — and now the file size is 0 bytes, so you can't read any of the data that existed before you opened the file with `w+`. If you actually wanted to read the previous data and add to it, you should use `r+` instead of `w+`* [[Source]](https://stackoverflow.com/questions/16208206/confused-by-python-file-mode-w#comment83227862_16208298) *Python文档说`w+`会 "覆盖现有文件（如果文件存在）"。因此，一旦用 `w+` 打开一个文件，它现在就是一个空文件：它包含 0 字节。如果它曾经包含数据，那么这些数据已经被截断--被切断并丢弃--现在文件大小为 0 字节，所以你无法读取用 `w+` 打开文件之前存在的任何数据。如果你真的想读取之前的数据并添加到其中，你应该使用 `r+` 而不是 `w+`    

- `"a"`, for an opening in add mode at the end of the file (APPEND). If the file does not exist, Python creates it.
"a"`，表示在文件末尾以添加模式打开 (APPEND)。如果文件不存在，Python 将创建它。

- `"x"`, creates a new file and opens it for writing`       "x"`，创建一个新文件并打开它供写入。

You can also append the character `+` and `b` to nearly all of the above commands. 几乎所有上述命令都可以附加字符 `+` 和 `b`。[[More info here]](https://stackabuse.com/file-handling-in-python/)

Like any open element, it must be closed again once the instructions have been completed. To do this, we use the `close()` method.

In [None]:
my_file.close()

In [None]:
# Let's find out what's going on there
my_file = open(filename, "r")
print(my_file.read())
my_file.close()

Another possibility of opening without closing by using a **with** statement. That's a **best practice** and you should use that as much as you can.

In [None]:
with open(filename, "r") as my_file:
    print(my_file.read())

Can you create a list based on the contents of this file? Each word should be an element of the list
*(Use `.split()` for example...)*

In [None]:
# YOUR CODE HERE
# Define the file path
filename = "./data/data.txt"

# Open the file, read its contents, and split it into a list of words
with open(filename, "r") as my_file:
    text_content = my_file.read()

# Split the content into a list of words
words_list = text_content.split()

# Output the list
print(words_list)

To write in a file, just **open** (existing or not), write in it and close it. We open it in mode `"w"` so that the previous data is deleted and new data can be added.

In [None]:

new_filename = "./data/data_new.txt"
file = open(new_filename, "w")
file.write("Hi everyone, I'm adding sentences to the file !")
file.close()

Can you take the content of the `data.txt` file from the `./data/` directory, capitalize all the words and write them in the file that you created just before, after the sentences you added?您能否从`./data/`目录中提取`data.txt`文件的内容，将所有单词大写，并将其写入您刚刚创建的文件中，写在您添加的句子之后？


In [None]:
# It's up to you to write the end
array = []
with open(filename, "r") as input_file:
      text_content = input_file.read()
      words_list = [word.upper() for word in text_content.split()]
      array.extend(words_list)
    with open(new_filename, "a") as output_file:
        # Write your code here
        output_file.write(" ".join(words_list) + "\n")
        
print(array)

## Management of directory paths...

The `os` module is a library that provides a portable way of using operating system dependent functionality.
In this chapter, we are interested in using its powerful file path handling capabilities using `os.path`.os` 
模块是一个库，它为使用依赖于操作系统的功能提供了一种可移植的方式。
在本章中，我们将使用 `os.path` 来使用它强大的文件路径处理能力。

In [None]:
import os

Each file or folder is associated with a kind of address that makes it easy to find it without errors. It is not possible to have a file with an identical name as another inside the same folder (except if the file extension is different).每个文件或文件夹都与一种地址相关联，这种地址可以方便查找，不会出错。在同一个文件夹中，不可能有与另一个文件名相同的文件（除非文件扩展名不同）。
As said before, there are two kinds of paths: the absolute path from the root of your file system and the relative path from the folder being read.如前所述，路径有两种：来自文件系统根目录的    绝对路径         和来自被读取文件夹的                         相对路径。

By using `help` function, we can see the available methods.

In [None]:
help(os.path)

To know your current absolute path, use `abspath('')`

In [None]:
# In Python a path is a string, so there are methods to manipulate it.
path = os.path.abspath("")
print(path)
print(type(path))

 To get the **directory** containing a path, usr `dirname(path)`.

In [None]:
os.path.dirname(path)

To only get the file name of a path (or directory name if this is a directory), use `basename(path)`.

In [None]:
os.path.basename(path)

To add a directory, let's say `"text"` to the path, we use `join()`. !要在路径中添加一个目录，比方说`"text"，我们使用`join()`。

The cool thing is that it is compatible across operating systems. Meaning that on Windows it will automatically add `\` between the arguments of `os.path.join`, and on Linux it will add `/`. The same code thus works on every operating system!x最酷的是，它可以兼容不同的操作系统。也就是说，在 Windows 上，它会自动在 `os.path.join` 的参数之间添加 `\`，而在 Linux 上，它会添加 `/`。因此，相同的代码可以在每种操作系统上运行！

In [None]:
rep_text = os.path.join(path, "text")
print(rep_text)

To retrieve all the elements of a folder as a list, you can use the `listdir()` method.要以列表形式检索文件夹中的所有元素，可以使用 `listdir()` 方法。

In [None]:
# Items are returned as a list and includes folders and hidden files.
os.listdir("../")

### How to display all the elements of a folder as well as its child folders? 如何显示文件夹及其子文件夹的所有元素？

With the `walk()` function:

```
walk(top, topdown=True, onerror=None, followlinks=False)
```


In [None]:
folder_path = os.path.abspath("./")
print(folder_path)

for path, dirs, files in os.walk(folder_path):
    for filename in files:
        print(os.path.join(path, filename))

Create a list of all the **`.txt` files** from the `data/` directory

In [None]:

txt_files = []

Open all the files of the list, and add their content into a new file `final.txt` that you will create in `data/`.

In [None]:
""" I do this exercice"""
import os

# Define the directory and final file path
directory = "data/"
final_file_path = os.path.join(directory, "final.txt")

# List to hold .txt files
txt_files = []

# Walk through the directory and collect .txt files
for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith(".txt"):
            txt_files.append(os.path.join(root, file))

# Open the final file in write mode
with open(final_file_path, 'w') as final_file:
    # Iterate through the list of .txt files
    for txt_file in txt_files:
        # Open each .txt file and read its content
        with open(txt_file, 'r') as file:
            content = file.read()
            # Write the content to the final file
            final_file.write(content)
            # Optionally add a new line to separate contents from different files
            final_file.write("\n")