# Read and write files

We have made good progress and now we can get down to the more serious task of manipulating files. This is one of the very important points concerning this training. 


N.B: Most of the files in `./data/` are files that we will use to understand how file opening works. They don't have a special purpose other than that. 

To open/edit a file in python we use the `open()` function.

This function takes as first parameter the path of the file (*relative* or *absolute*) and as second parameter the type of opening, _i.e._ reading or writing mode.

A **relative path** in computing is a path that takes into account the current location. The path is **relative** to where it is called from

- **Example:** _./data/data.txt_

An **absolute path** is a complete path that can be read regardless of the reading location

- **Example:** _/Users/becodian/Desktop/BeCode/ai-track/content/2.python/2.python_advanced/04.File-handling/data/data.txt_

The best practice is to always use **relative** paths in your Python code. In this way your code can be shared **as it is** with your colleagues. An absolute path will generate an error since it exists only on your own computer.



In [1]:
filename = "./data/data.txt"
my_file = open(filename, "r")  # r for "read"

- `"r"`, for a read opening (READ).

- `"w"`, for a write opening (WRITE), each time the file is opened, the content of the file is overwritten. If the file does not exist, Python creates it. 

    *The Python docs say that `w+` will "overwrite the existing file if the file exists". So as soon as you open a file with `w+`, it is now an empty file: it contains 0 bytes. If it used to contain data, that data has been truncated — cut off and thrown away — and now the file size is 0 bytes, so you can't read any of the data that existed before you opened the file with `w+`. If you actually wanted to read the previous data and add to it, you should use `r+` instead of `w+`* [[Source]](https://stackoverflow.com/questions/16208206/confused-by-python-file-mode-w#comment83227862_16208298)
    
    

- `"a"`, for an opening in add mode at the end of the file (APPEND). If the file does not exist, Python creates it.

- `"x"`, creates a new file and opens it for writing

You can also append the character `+` and `b` to nearly all of the above commands. [[More info here]](https://stackabuse.com/file-handling-in-python/)

Like any open element, it must be closed again once the instructions have been completed. To do this, we use the `close()` method.

In [2]:
my_file.close()

In [13]:
# Let's find out what's going on there
my_file = open(filename, mode = "r", encoding = 'utf-8')
print(my_file.read(12)) # Read the first 12 characters 
print(my_file.readlines()) # Reach each new line in the file

print(my_file.seek(0)) # Return to the begining of the file 
print(my_file.read())
my_file.close()

Hi everyone,
[' \n', "I'm adding sentences to the file !\n", 'Yay! ']
0
Hi everyone, 
I'm adding sentences to the file !
Yay! 


Another possibility of opening without closing by using a **with** statement. That's a **best practice** and you should use that as much as you can.

In [None]:
with open(filename, "r") as my_file:
    print(my_file.read())

Can you create a list based on the contents of this file? Each word should be an element of the list
*(Use `.split()` for example...)*

In [24]:
# File name relative directory 
filename = "./data/data.txt"

word_list = []

# Open the file in read mode &     # Read the entire file content as a string
with open(filename, "r") as my_file:
    file_contents = my_file.read()

    # Split the file content into words using whitespace as the separator
    word_list = file_contents.split()
    
    print(word_list)


['Hi', 'everyone,', "I'm", 'adding', 'sentences', 'to', 'the', 'file', '!', 'Yay!']


To write in a file, just **open** (existing or not), write in it and close it. We open it in mode `"w"` so that the previous data is deleted and new data can be added.

In [23]:
new_filename = "./data/data_new.txt"

# Open the file in append mode ("a+")
# the file s opened using 'with' and automatically closed when the with block exits
with open(new_filename, "a+") as file:
    file.write("Hi there, I'm adding the first sentence to the file!\n")
    file.write("Hi again, I'm adding new sentences to the file!\n")

    # After writing, you can read the file's contents if needed
    file.seek(0)  # Move the cursor to the beginning of the file
    print(file.read())


Hi there, I'm adding the first sentence to the file !Hi again, I'm adding new sentences to the file !Hi there, I'm adding the first sentence to the file!
Hi again, I'm adding new sentences to the file!
Hi there, I'm adding the first sentence to the file!
Hi again, I'm adding new sentences to the file!



Can you take the content of the `data.txt` file from the `./data/` directory, capitalize all the words and write them in the file that you created just before, after the sentences you added?


In [27]:
# ./data/data.txt 
filename = "./data/data.txt"

content = []
with open(filename, "r") as input_file:
    with open(new_filename, "a") as output_file:
        content = input_file.readlines() # Read content from input file
        capitalized_content = [line.upper() for line in content]
        output_file.writelines(capitalized_content) # Write the content to the output file
print(f"Content copied from '{filename}' to '{new_filename}'.")

# Print the content of the source file (unchanged)
print(f"Content of '{filename}':")
for line in content:
    print(line, end="")

# Print the content of the destination file (capitalized)
print(f"\nContent of '{new_filename}' (capitalized):")
for line in capitalized_content:
    print(line, end="")


Content copied from './data/data.txt' to './data/data_new.txt'.
Content of './data/data.txt':
Hi everyone, 
I'm adding sentences to the file !
Yay! 
Content of './data/data_new.txt' (capitalized):
HI EVERYONE, 
I'M ADDING SENTENCES TO THE FILE !
YAY! 

## Management of directory paths...

The `os` module is a library that provides a portable way of using operating system dependent functionality.
In this chapter, we are interested in using its powerful file path handling capabilities using `os.path`.

In [32]:
import os

# Specify the directory name
directory_name = "mydirectory"

# Check if the specified name exists and is a directory
if os.path.exists(directory_name) and os.path.isdir(directory_name):
    print(f"'{directory_name}' already exists as a directory.")
else:
    try:
        # Create the directory if it doesn't exist
        os.mkdir(directory_name)
        print(f"Directory '{directory_name}' created successfully.")
    except FileExistsError:
        print(f"A file with the name '{directory_name}' already exists.")

# List the contents of the current directory to verify
print("Contents of the current directory:")
for item in os.listdir():
    print(item)



'mydirectory' already exists as a directory.
Contents of the current directory:
data
file_handling.ipynb
mydirectory


Each file or folder is associated with a kind of address that makes it easy to find it without errors. It is not possible to have a file with an identical name as another inside the same folder (except if the file extension is different).

As said before, there are two kinds of paths: the absolute path from the root of your file system and the relative path from the folder being read.

By using `help` function, we can see the available methods.

In [33]:
help(os.path)

Help on module ntpath:

NAME
    ntpath - Common pathname manipulations, WindowsNT/95 version.

MODULE REFERENCE
    https://docs.python.org/3.11/library/ntpath.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    Instead of importing this module directly, import os and refer to this
    module as os.path.

FUNCTIONS
    abspath(path)
        Return the absolute version of a path.
    
    basename(p)
        Returns the final component of a pathname
    
    commonpath(paths)
        Given a sequence of path names, returns the longest common sub-path.
    
    commonprefix(m)
        Given a list of pathnames, returns the longest common leading component
    
    dirname(p)
        Returns the di

To know your current absolute path, use `abspath('')`

In [34]:
# In Python a path is a string, so there are methods to manipulate it.
path = os.path.abspath("")
print(path)
print(type(path))

c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling
<class 'str'>


 To get the **directory** containing a path, usr `dirname(path)`.

In [35]:
os.path.dirname(path)

'c:\\Users\\becode\\OneDrive\\Desktop\\week1\\repos\\DataOperator_Week1\\content\\course\\2.python\\2.python_advanced'

To only get the file name of a path (or directory name if this is a directory), use `basename(path)`.

In [36]:
os.path.basename(path)

'04.File-handling'

To add a directory, let's say `"text"` to the path, we use `join()`. 

The cool thing is that it is compatible across operating systems. Meaning that on Windows it will automatically add `\` between the arguments of `os.path.join`, and on Linux it will add `/`. The same code thus works on every operating system!

In [37]:
rep_text = os.path.join(path, "text")
print(rep_text)

c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling\text


To retrieve all the elements of a folder as a list, you can use the `listdir()` method.

In [38]:
# Items are returned as a list and includes folders and hidden files.
os.listdir("../")

['01.OOP',
 '02.Exception-handling',
 '03.Regex',
 '04.File-handling',
 '05.Scraping',
 '06.Concurrency',
 '07.Decorator',
 '08.Typing',
 '09.Good_practices',
 '10.Data-structure',
 '11.unittest']

### How to display all the elements of a folder as well as its child folders? 

With the `walk()` function:

```
walk(top, topdown=True, onerror=None, followlinks=False)
```


In [39]:
# List all files within current working directory and its subdirectories
# For each file, print absolute path of current working directory
# Recusively traverses the directory tree, starting from current directory

# Provide absolute path of current working directory 
#  ensure they are found regardless of the current working directory
# "./": relative path of current directory 
folder_path = os.path.abspath("./") 
print(folder_path)

for path, dirs, files in os.walk(folder_path):
    # folder_path: starting directory for the traversal
    # path: current directory being traversed
    # dirs: a list of subdirectories within the current directory 
    # files: list of filenames within the current directory 
    
    for filename in files:
        # print absolute path, for each file name
        # join the path (current directory) and filename
        print(os.path.join(path, filename))

c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling
c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling\file_handling.ipynb
c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling\text
c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling\data\comptagevelo2017.csv
c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling\data\comptagevelo2017.xlsx
c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling\data\data.txt
c:\Users\becode\OneDrive\Desktop\week1\repos\DataOperator_Week1\content\course\2.python\2.python_advanced\04.File-handling\data\data_new.txt
c:\Users\becode\OneDrive\Desk

Create a list of all the **`.txt` files** from the `data/` directory

In [41]:

import os

# Specify the directory path
directory_path = "./data"  # Replace with the actual path to your data directory

# Initialize an empty list to store the .txt files
txt_files = []

# Walk through the directory and its subdirectories
for root, dirs, files in os.walk(directory_path):
    for file in files:
        if file.endswith(".txt"):
            # Append the absolute path of the .txt file to the list
            txt_files.append(os.path.join(root, file))

# Print the list of .txt files
for txt_file in txt_files:
    print(txt_file)



./data\data.txt
./data\data_new.txt
./data\mail.txt
./data\VOEUX01.txt
./data\VOEUX05.txt
./data\VOEUX06.txt
./data\VOEUX07.txt
./data\VOEUX08.txt
./data\VOEUX09.txt
./data\VOEUX74.txt
./data\VOEUX75.txt
./data\VOEUX79.txt
./data\VOEUX83.txt
./data\VOEUX87.txt
./data\VOEUX89.txt
./data\VOEUX90.txt
./data\VOEUX94.txt
./data\write.txt
./data\lequipe-du sport en continu._files\f(10).txt
./data\lequipe-du sport en continu._files\f(2).txt
./data\lequipe-du sport en continu._files\f(3).txt
./data\lequipe-du sport en continu._files\f(4).txt
./data\lequipe-du sport en continu._files\f(5).txt
./data\lequipe-du sport en continu._files\f(6).txt
./data\lequipe-du sport en continu._files\f(7).txt
./data\lequipe-du sport en continu._files\f(8).txt
./data\lequipe-du sport en continu._files\f(9).txt
./data\lequipe-du sport en continu._files\f.txt


Open all the files of the list, and add their content into a new file `final.txt` that you will create in `data/`.

In [42]:
import os

# Specify the directory path
directory_path = "./data"  # Replace with the actual path to your data directory

# Initialize an empty list to store the .txt files
txt_files = []

# Walk through the directory and its subdirectories
for root, dirs, files in os.walk(directory_path):
    for file in files:
        if file.endswith(".txt"):
            # Append the absolute path of the .txt file to the list
            txt_files.append(os.path.join(root, file))

# Create a new file named 'final.txt' in the data directory
output_file_path = os.path.join(directory_path, "final.txt")

# Open the 'final.txt' file for appending
with open(output_file_path, "a") as output_file:
    # Iterate over the list of .txt files and append their content to 'final.txt'
    for txt_file in txt_files:
        with open(txt_file, "r") as input_file:
            file_content = input_file.read()
            output_file.write(file_content)
            output_file.write("\n")  # Add a newline between file contents

print(f"Contents of {len(txt_files)} .txt files have been appended to 'final.txt'.")


Contents of 28 .txt files have been appended to 'final.txt'.


In [43]:

# Test for fun! 
# Print the line 20th of final.txt 

# Specify the path to the 'final.txt' file in the data directory
final_file_path = os.path.join(directory_path, "final.txt")

# Check if the file exists before opening it
if os.path.isfile(final_file_path):
    with open(final_file_path, "r") as final_file:
        lines = final_file.readlines()
        if len(lines) >= 20:
            line_20 = lines[19]  # Index 19 corresponds to the 20th line (0-based index)
            print("20th line of 'final.txt':")
            print(line_20)
        else:
            print("There are fewer than 20 lines in 'final.txt'.")
else:
    print("'final.txt' does not exist in the data directory.")


20th line of 'final.txt':
matthewlulloff@mail.adlp.me

