# Read and write files

We have made good progress and now we can get down to the more serious task of manipulating files. This is one of the very important points concerning this training. 


N.B: Most of the files in `./data/` are files that we will use to understand how file opening works. They don't have a special purpose other than that. 

To open/edit a file in python we use the `open()` function.

This function takes as first parameter the path of the file (*relative* or *absolute*) and as second parameter the type of opening, i.e. reading or writing mode.

*A relative path in computing is a path that takes into account the current location.*
The path is **relative** to where it is called from.

*An absolute path is a complete path that can be read regardless of the reading location.*



In [72]:
filename = "./data/data.txt"
my_file = open(filename, "r")  # r for "read"

- `"r"`, for a read opening (READ).

- `"w"`, for a write opening (WRITE), each time the file is opened, the content of the file is overwritten. If the file does not exist, Python creates it. 

    *The Python docs say that `w+` will "overwrite the existing file if the file exists". So as soon as you open a file with `w+`, it is now an empty file: it contains 0 bytes. If it used to contain data, that data has been truncated — cut off and thrown away — and now the file size is 0 bytes, so you can't read any of the data that existed before you opened the file with `w+`. If you actually wanted to read the previous data and add to it, you should use `r+` instead of `w+`* [[Source]](https://stackoverflow.com/questions/16208206/confused-by-python-file-mode-w#comment83227862_16208298)
    
    

- `"a"`, for an opening in add mode at the end of the file (APPEND). If the file does not exist, Python creates it.

- `"x"`, creates a new file and opens it for writing

You can also append the character `+` and `b` to nearly all of the above commands. [[More info here]](https://stackabuse.com/file-handling-in-python/)

Like any open element, it must be closed again once the instructions have been completed. To do this, we use the `close()` method.

In [73]:
my_file.close()

In [74]:
# Let's find out what's going on there
my_file = open(filename, "r")
print(my_file.read())
my_file.close()

Hi everyone, I'm adding sentences to the file !


Another possibility of opening without closing. That's a **best practice** and you should use that as much as you can.

In [75]:
with open(filename, "r") as my_file:
    print(my_file.read())

Hi everyone, I'm adding sentences to the file !


**Can you put the contents of this file in the form of a list in which each element is a sentence ?**
*(Use `.split()` for example...)*

In [76]:
# YOUR CODE HERE
with open(filename,"r") as my_file:
    m = my_file.readlines()
    for each_element in m:
        my_List = each_element.split(",")
    print(my_List)


['Hi everyone', " I'm adding sentences to the file !"]


To write in a file, just open a file (existing or not), write in it and close it. We open it in mode `"w"` so that the previous data is deleted and new data can be added.

In [79]:
newFile = "./data/Rana.txt"
file = open(newFile, "w")
file.write(" Hi everyone, I'm adding sentences to the file ! ")
file.close()

# added from me to understand the effect of "w"-mode
file = open(newFile, "r")
print(file.readlines())
file.close()


[" Hi everyone, I'm adding sentences to the file ! "]


Can you take the content of the `data.txt` file from the `.data/` directory, capitalize all the words and write them in the file that you created just before, after the sentences you added?



In [80]:
# to open the two files one to read and the other to write:
originFile = open(filename,"r") 
secondfile = open(newFile,"a")
lines = originFile.readlines()
    
for each_line in lines:
    secondfile.write(each_line.upper())
    
#print(secondfile.read())



In [None]:
# It's up to you to write the end
array = []
with open(filename, "r+") as input_file:
    pass  # Add your code

## Management of directory paths...

The `os` module is a library that provides a portable way of using operating system dependent functionality.
In this chapter, we are interested in using its powerful file path handling capabilities using `os.path`.

In [29]:
import os

Each file or folder is associated with a kind of address that makes it easy to find it without errors. It is not possible to have a file with an identical name as another inside the same folder (except if the file extension is different).

As said before, there are two kinds of paths: the absolute path from the root of your file system and the relative path from the folder being read.

By using `help` function, we can see the available methods.

In [30]:
help(os.path)

Help on module ntpath:

NAME
    ntpath - Common pathname manipulations, WindowsNT/95 version.

MODULE REFERENCE
    https://docs.python.org/3.10/library/ntpath.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    Instead of importing this module directly, import os and refer to this
    module as os.path.

FUNCTIONS
    abspath(path)
        Return the absolute version of a path.
    
    basename(p)
        Returns the final component of a pathname
    
    commonpath(paths)
        Given a sequence of path names, returns the longest common sub-path.
    
    commonprefix(m)
        Given a list of pathnames, returns the longest common leading component
    
    dirname(p)
        Returns the di

To know your current absolute path, use `abspath('')`

In [31]:
# In Python a path is a string, so there are methods to manipulate it.
path = os.path.abspath("")
print(path)
print(type(path))

c:\Users\user\Desktop\Training\ANT-Theano-4\content\2.python\2.python_advanced\04.File-handling
<class 'str'>


 To know the part of the path that consists of directories, use `dirname(path)`.

In [32]:
os.path.dirname(path)

'c:\\Users\\user\\Desktop\\Training\\ANT-Theano-4\\content\\2.python\\2.python_advanced'

To only get the filename, use `basename(path)`.

In [19]:
os.path.basename(path)

'04.File-handling'

To add a directory, let's say `"text"` to the path, we use `join()`. 

The cool thing is that it is compatible across operating systems. Meaning that on Windows it will automatically add `\` between the arguments of `os.path.join`, and on Linux it will add `/`. The same code thus works on every operating system!

In [33]:
rep_text = os.path.join(path, "text")
print(rep_text)

c:\Users\user\Desktop\Training\ANT-Theano-4\content\2.python\2.python_advanced\04.File-handling\text


To retrieve all the elements of a folder as a list, you can use the `listdir()` method.

In [53]:
help(os.listdir())

Help on list object:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate sign

In [34]:
# Items are returned as a list and includes folders and hidden files.
os.listdir("../")

['01.OOP',
 '02.Exception-handling',
 '03.Regex',
 '04.File-handling',
 '05.Scraping',
 '06.Concurrency',
 '07.Decorator',
 '08.Typing',
 '09.Good_practices',
 '10.Data-structure',
 '11.unittest']

### How to display all the elements of a folder as well as its child folders? 

With the `walk()` function:

```
walk(top, topdown=True, onerror=None, followlinks=False)
```


In [9]:
folder_path = os.path.abspath("./")
print(folder_path)

for path, dirs, files in os.walk(folder_path):
    for filename in files:
        print(filename)

c:\Users\user\Desktop\Training\ANT-Theano-4\content\2.python\2.python_advanced\04.File-handling
comptagevelo2017.xlsx
data.txt
discours_politicien.zip
file_handling.ipynb
final.txt
L'équipe,du sport en continu..html
lequipe-du sport en continu._files
mail.txt
Rana.txt
VOEUX01.txt
VOEUX05.txt
VOEUX06.txt
VOEUX07.txt
VOEUX08.txt
VOEUX09.txt
VOEUX74.txt
VOEUX75.txt
VOEUX79.txt
VOEUX83.txt
VOEUX87.txt
VOEUX89.txt
VOEUX90.txt
VOEUX94.txt
weather_2012.csv
weather_2017.csv
write.txt
comptagevelo2017.csv
comptagevelo2017.xlsx
data.txt
discours_politicien.zip
L'équipe,du sport en continu..html
mail.txt
Rana.txt
VOEUX01.txt
VOEUX05.txt
VOEUX06.txt
VOEUX07.txt
VOEUX08.txt
VOEUX09.txt
VOEUX74.txt
VOEUX75.txt
VOEUX79.txt
VOEUX83.txt
VOEUX87.txt
VOEUX89.txt
VOEUX90.txt
VOEUX94.txt
weather_2012.csv
weather_2017.csv
write.txt
0a315.jpg
0CA20181029124202ADs201810291242024F98d8q4aBxcs.js
0d3f9.png
0f245.jpg
110.png
125278988146629
140(1).jpg
140(10).jpg
140(11).jpg
140(2).jpg
140(3).jpg
140(4).jpg
140(5

Put all the **`.txt` files** from the `data/` directory into a variable.
    Then, copy the content of all the files from this variable into a file in `data/` that you will name `final.txt`.


In [82]:
# 
folder_path = os.path.abspath("./data")
print(folder_path)

newfilesList = []
textfiles = []

# put all files in one variable list3

list2 = os.listdir()
allfiles = os.listdir(list2[0])
#print(allfiles)

#put all text file in one list:

for each_file in allfiles:
    if each_file.endswith('.txt'):
        textfiles.append(each_file)

# creat a new file final.txt in append mode:
originFile = ""
final = open("final.txt",'a+')

# copy each file contents into final.txt:
for file_1 in textfiles:
    filename_ = f"./data/{file_1}"
    originFile = open(filename_,'r')

    lines = originFile.readlines()
    
    for each_line in lines:
        final.write(each_line)

                 
#print("The text files are: \n",final.readlines())


c:\Users\user\Desktop\Training\ANT-Theano-4\content\2.python\2.python_advanced\04.File-handling\data
The text files are: 
 []


In [6]:
# Try a code to open the zip file without extrating:
# To deal with Zip file I should import the zipfile module:

import zipfile
import io

xpath = os.path.abspath("./data")
print(xpath)
target = '.\data\discours_politicien.zip'
Listt = zipfile.ZipFile(target)
handell = Listt.namelist()
#print(handell)
for x in handell:
    with io.TextIOWrapper(Listt.open(x), encoding='utf8', errors='ignore') as f:
        data = f.readlines()

        for line in data:
            zip_List.append(line)
#print(zip_List)


c:\Users\user\Desktop\Training\ANT-Theano-4\content\2.python\2.python_advanced\04.File-handling\data


New task. Using a loop, can you open all the files from your `data/` directory and save all their contents in a variable ?

In [21]:
import zipfile
import io
import csv                  # used to read csv files
import openpyxl             # used to read xlsx files
from pathlib import Path     # used to read xlsx files

#
# opening the directory
xpath = os.path.abspath("./data")
print(xpath)

# put all files in a list named SuperList:
SuperList = os.listdir('./data')
#print(len(SuperList),"\n",SuperList)

# Creat sub-variables "" the values of dictionary"for every type of files:
txt_List= []
csv_List = []
zip_List = []
xlsx_List = []
folder_List = []

# check the type of each file:
# open the file with its own method
# read the file
# append the contents in its list


for each_File in SuperList:
    
    if each_File == "lequipe-du sport en continu._files":
        for path, dirs, files in os.walk("./data/lequipe-du sport en continu._files"):
            for filename in files:
                fname = f"./data/lequipe-du sport en continu._files/{filename}"
                folderfile = open(fname,'r', encoding='utf8', errors='ignore')
                for line in folderfile.readlines():
                    folder_List.append(line)
        #print(folder_List)
        
    # check if the file is zip-file if true open and read without extracting and put all the content in zip_list.
    elif each_File.endswith('.zip'):
        
        target = '.\data\discours_politicien.zip'
        Listt = zipfile.ZipFile(target)
        handell = Listt.namelist()

        for x in handell:
            with io.TextIOWrapper(Listt.open(x), encoding='utf8', errors='ignore') as f:
                data = f.readlines()
                for line in data:
                    zip_List.append(line)

    # check if the type is txt then open and read        
    elif each_File.endswith('.txt'):
        file_name = f"./data/{each_File}"
        NewList = open(file_name,'r')
        Nline = NewList.readlines()
        for eline in Nline:
            txt_List.append(eline)

    # check if it is csv file:
    elif each_File.endswith('.csv'):
        csvfile_name = f"./data/{each_File}"
        with open(csvfile_name, 'r') as file :
            my_reader = csv.reader(file, delimiter=',')
            for row in my_reader:
                csv_List.append(row)
    #print(csv_List)

    # check if it is xlxs file:
    elif each_File.endswith('.xlsx'):
        xlsx_file_name = f"./data/{each_File}"
        print(xlsx_file_name)
        # to creat the werkbook object
        xlsxFile_wb = openpyxl.load_workbook(xlsx_file_name)
        print(xlsxFile_wb)
        # make the werkbook active sheet:
        activeFile = xlsxFile_wb.active

        xlsx_List.append(activeFile) 
    #print(xlsx_List)       
        
# creat a dictionary with keys and values the keys are the names of filetypes and the values are their own lists
final_dictionary = {}
final_dictionary["Text File"] = txt_List
final_dictionary["CSV File"] = csv_List
final_dictionary["Xlxs File"] = xlsx_List
final_dictionary["zip File"] = zip_List
final_dictionary[" Subfolder"] = folder_List

##print(final_dictionary)





c:\Users\user\Desktop\Training\ANT-Theano-4\content\2.python\2.python_advanced\04.File-handling\data
[]
./data/comptagevelo2017.xlsx
<openpyxl.workbook.workbook.Workbook object at 0x00000198541C09D0>
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]
[<Worksheet "comptagevelo2017">]


Finally, save this concatenated information (assemblies) in a new file.