## 4 Accessing files and OS-functionality in python

Navigating the file system and loading, reading and writing to and from files is an important aspect of working with data. That's why we will take a closer look at some of the functionalities provided by python to deal with this.

### Accessing the OS-specific file system
Whenever you want to access directories, navigate around the filesystem or create files it is usually a good idea to use the "os" package which comes with python by default. You just need to import it like this:

In [None]:
import os

Get current working directory:

In [None]:
print(f"os.cwd(): {os.getcwd()}")

Different operation systems usually address folders and files differently. While you use "\\" in Windows to separate different folder levels (e.g. "C:\Windows\system32"), POSIX systems like Linux and macOS use "/". These separators can lead to compatibility issues when being hardcoded. Luckily the os package offers solutions to this:

In [None]:
print(f"Seperator used in your current OS: {os.sep}")

l = ["Path","to","folder"]
print(f"os.sep.join(): {os.sep.join(l)}")

path_win = "C:\Windows\system32"
path_posix = "/home/myprofile/testdata"
path_extension = "path_extension"

print(f"win_path with extension: {os.path.join(path_win,path_extension)}")
print(f"posix_path with extension: {os.path.join(path_posix,path_extension)}")

Depending on your OS one of the two paths printed out at the end should look correct (win_path for Windows, posix_path for Linux and MacOS ).

If you want to split up a given path, the function os.path.split() might come in handy. It returns a tuple of two strings (head, tail), the second of which contains every character after the final slash. To make this more clear:

In [None]:
# for windows

print(f"windows path: {path_win}")
split_path = os.path.split(path_win)
print(f"windows path split: {split_path}  | head: '{split_path[0]}'   tail: '{split_path[-1]}'")

In [None]:
# for linux and macos

print(f"posix path: {path_posix}")
split_path = os.path.split(path_posix)
print(f"posix path split: {split_path}  | head: '{split_path[0]}'   tail: '{split_path[-1]}'")

As you will notice again, only one of the two cells above will work correctly due to <i>os.path.split()</i> relying on the separator for the underlying OS you are running the script/cell on.

One more useful method that comes with the os-package is splitext() which again returns a tuple of two strings, albeit this time split at the final dot (second string still contains the "." at the front of the returned extension!):

In [None]:
print(f"splitext() win: {os.path.splitext(os.path.join(path_win,'abc.pdf'))}")
print(f"splitext() posix: {os.path.splitext(os.path.join(path_posix,'abc.pdf'))}")

There are also functions that help you evaluate whether or not some path exists as well as if it leads to a file or a folder.

In [None]:
print(f"Does {os.path.join(path_win,path_extension)} exist? {os.path.exists(os.path.join(path_win,path_extension))}")

testfolder_path = os.path.join(os.getcwd(),'testfolder')
testfile_path = os.path.join(os.getcwd(),"04_accessing_files.ipynb")

print(f"Does {testfolder_path} exist? {os.path.exists(testfolder_path)}")
print(f"Does {testfile_path} exist? {os.path.exists(testfile_path)}")

In [None]:
print(f"Is {testfolder_path} a file? {os.path.isfile(testfolder_path)}")
print(f"Is {testfolder_path} a directory? {os.path.isdir(testfolder_path)}")

print(f"Is {testfile_path} a file? {os.path.isfile(testfile_path)}")
print(f"Is {testfile_path} a directory? {os.path.isdir(testfile_path)}")

#### Creating files and directories

You can also use the OS package to create a new directory directly from your python script.

In [None]:
os.mkdir(testfolder_path)

Trying to blindly create files or folders can lead to errors as seen above where the folder already exists. A better way of doing this is to check whether or not a file or folder already exists: 

In [None]:
if not os.path.exists(testfolder_path):
    os.mkdir(testfolder_path)
else:
    print("Path already exists. Not creating new directory.")
    
if not os.path.exists(testfile_path):
    os.mkdir(testfile_path)
else:
    print("Path already exists. Not creating new file.")
    
new_folder = os.path.join(os.getcwd(),"new_folder")

if not os.path.exists(new_folder):
    os.mkdir(new_folder)
    print(f"Created folder {new_folder}")
else:
    print("Path already exists. Not creating new file.")

You can use os.mknod() to create a new file. This will work on Windows and Linux but may fail on macOS:

In [None]:
new_file = os.path.join(os.getcwd(),"new_file.pdf")

if not os.path.exists(new_file):
    os.mknod(new_file)
    print(f"Created file {new_file}")
else:
    print("Path already exists. Not creating new file.")

#### Deleting files and directories
Files and directories can also be deleted. There are different options which are explained in more detail [here](https://stackoverflow.com/questions/6996603/how-can-i-delete-a-file-or-folder-in-python#6996628). Deleting files and folders via the os package can be performed in the following way:

In [None]:
new_file2 = os.path.join(os.getcwd(),"new_file2.pdf")

os.remove(new_file2)

Trying to remove a file or folder that does not exist will lead to a "FileNotFoundError" and crash your program. Thus you should always check if the path exists, just like in the examples shown to create things. A better way of deleting file(s) would be:

In [None]:
if os.path.exists(new_file):
    os.remove(new_file)
    print(f"Deleted file {new_file}")
else:
    print(f"File '{new_file}' does not exist.")

Empty directories can be deleted in a similar fashion when using the os package:

In [None]:
if os.path.exists(new_folder):
    os.rmdir(new_folder)
    print(f"Deleted directory {new_folder}")
else:
    print(f"Directory '{new_folder}' does not exist.")

os.rmdir only works if the folder you want to delete is empty, that means does not contain any files or subfolders, as can be seen in the following example which will run into an error:

In [None]:
if not os.path.exists(new_folder):
    os.mkdir(new_folder)
    print(f"Created folder {new_folder}")
else:
    print("Path already exists. Not creating new file.")
    
new_folder_file = os.path.join(new_folder,"new_folder_file")

if not os.path.exists(new_folder_file):
    os.mknod(new_folder_file)
    print(f"Created file {new_folder_file}")
else:
    print("Path already exists. Not creating new file.")
    
if os.path.exists(new_folder):
    os.rmdir(new_folder)
    print(f"Deleted directory {new_folder}")
else:
    print(f"Directory '{new_folder}' does not exist.")

One way to circumvent this problem is to manually delete everything in that folder and then falling back on the os package. An easier way in this case is to utilize the "shutil" package which also comes directly with python:

In [None]:
import shutil

if os.path.exists(new_folder):
    shutil.rmtree(new_folder)
    print(f"Deleted directory {new_folder}")
else:
    print(f"Directory '{new_folder}' does not exist.")

#### More useful commands

List all files and subdirs in a folder:

In [None]:
dir_list = os.listdir(os.getcwd())
print(f"list of files/folders in directory: {dir_list}")

Get size (in bytes) for a specific file or folder:

In [None]:
print(f"Size of '{testfile_path}': {os.path.getsize(testfile_path)} bytes")
print(f"Size of '{testfolder_path}': {os.path.getsize(testfolder_path)} bytes")

### Reading and Writing files
Now that you have learned how to use the os package to navigate around the file system and get the path of certain files or folders, it's time to take a look at how to work with some of the types of files you might encounter. We will focus on reading and writing from and to text files in this notebook, more examples of handling different types of files (like images or dataframes) will be shown in upcoming notebooks.<br>
<table style="width:50%" align="left">
  <tr>
    <th>Mode</th>
    <th>Parameter</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>write</td>
    <td>w </td>
    <td>Creates and writes to file if it does not exist yet. Otherwise opens existing file and <i><b>overwrites</b></i> any of the content in it. You can not read the content of the file while in this mode ("UnsupportedOperation: not readable").</td>
  </tr>
   <tr>
    <td>read</td>
    <td>r</td>
    <td>Opens an <i><b>existing</b></i> file for reading -> Contents of file cannot be changed in this mode ("UnsupportedOperation: not writable").</td>
  </tr>
  <tr>
    <td>append</td>
    <td>a</td>
    <td>Similar to 'write', but appends text to the end of a file instead of overwriting its previous content. You can not read the content of the file while in this mode ("UnsupportedOperation: not readable").</td>
  </tr>
</table>


#### Writing to a file
Following are examples for the usage of the different modes. By using the "with" syntax you do not have to explicitly deal with closing the opened file as it automatically gets closed after the indended block ends:

In [None]:
f = open("text_write_open.txt","w")

f.write("Writing to files is oftentimes needed.")
f.write("Once you wrap your head around the different modes its not that complicated.")
f.write("You just have to try it out and get familiar with it.")

f.close()

# this will achieve the same result, but you do not manually have to remember to close the file afterwards.
with open("text_write.txt","w") as file:
    file.write("Writing to files is oftentimes needed.")
    file.write("Once you wrap your head around the different modes its not that complicated.")
    file.write("You just have to try it out and get familiar with it.")

Now go to your file explorer (or even easier, use your IDE) and open up the files you just created. Are there any differences between them? Is this the way you intended to insert the text?
Run the next code snippet and compare the results:

In [None]:
with open("text_write_lb.txt","w") as file:
    file.write("Writing to files is oftentimes needed.\n")
    file.write("Once you wrap your head around the different modes its not that complicated.\n")
    file.write("You just have to try it out and get familiar with it.\n")

The same result can also be achieved like this, note that the variable name you use to access the file can of course be arbitrarily chosen:

In [None]:
l = ["Writing to files is oftentimes needed.\n","Once you wrap your head around the different modes its not that complicated.\n","You just have to try it out and get familiar with it.\n"]

with open("text_writelines.txt","w") as random_variable_name:
    random_variable_name.writelines(l)

It's easy to forget that the write() and writelines() methods <i><b>do not automatically add a linebreak</b></i> when used, so you have to manually add this to the text you want to insert (\n).

#### Appending to a file

In [None]:
with open("text_append.txt","a") as file:
    file.write("I am learning Python!\n")
    file.write("I am really enjoying it!\n")
    file.write("And I want to add more lines to say how much I like it\n")

If you want to add further lines to this file, you can simply open it up again in append mode and write to it:

In [None]:
with open("text_append.txt","a") as file:
    file.write("This line will be appended at the end of the file\n")

#### Reading from a file
There are different ways of reading from a file, the simplest one being this:

In [None]:
with open("text_write_lb.txt","r") as file:
    print(file.read())

The readlines() method reads a file upfront, line-by-line, and stores its content in a list. This is my preferred way of working with (smaller) files due to its simplicity, but it can become problematic once text files become very big (as it reads all the content upfront -> might become time consuming or even run into memory errors).

In [None]:
with open("text_write_lb.txt","r") as file:
    lines = file.readlines()
    print(f"lines: {lines}\n")
    for line in lines:
        print(line)

Similar to the previous example, but in this case the content of the file isn't "cached" in a list but only read from the file once needed. This offers advantages when it comes to big files, but the syntax is a little less convenient:

In [None]:
with open("text_write_lb.txt","r") as file:
    l = file.readline()
    while l:
        print(l)
        l = file.readline()

### Tasks

#### 4.1
Write code that generates four arbitrarily named files containing the suffix ".pdf" as well as two arbitrarily names files with a random/without a suffix in the "testfolder" subdirectory. Additionally add two directories with names of your choice to the same directory ("testfolder"), at least one of those subdirectories should end with the suffix ".pdf".

In [None]:
### Your code here

#### 4.2
Write a list comprehension that puts all contents of the "testfolder" directory into a list.

In [None]:
### Your code here

#### 4.3
Now write two more list comprehensions, but this time use conditional statements to make sure that one list only contains actual files while the other only contains the folders from "testfolder". 

In [None]:
### Your code here

#### 4.4
Use list comprehensions again to only add files (not folders) containing the ".pdf" extension to the resulting list.

In [None]:
### Your code here

#### 4.5
Create a folder named "writelines" which contains a file named "text_45.txt" and write five random lines of text to it, then close the file. Afterwards read from it, change the content of the third line and write it back to the stored file.

In [None]:
### Your code here

#### 4.6
Create a file named "text_46.txt" in the folder "writelines" and write the numbers from 1 to 10 (including 10) to it, each in its own line. Then open this file for reading, sum up all the values and afterwards append the result to it.

In [None]:
### Your code here