## 4 Accessing files and OS-functionality in python

Navigating the file system and loading, reading and writing to and from files is an important aspect of working with data. That's why we will take a closer look at some of the functionalities provided by python to deal with this.

### Accessing the OS-specific file system
Whenever you want to access directories, navigate around the filesystem or create files it is usually a good idea to use the "os" package which comes with python by default. You just need to import it like this:

In [5]:
import os

Get current working directory:

In [50]:
print(f"os.cwd(): {os.getcwd()}")

os.cwd(): /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os


Differing operation systems usually address folders and files differently. While you use "\\" in Windows to separate different folder levels (e.g. "C:\Windows\system32"), Linux and macOS use "/". These separators can lead to compatibility issues when being hardcoded. Luckily the os package offers solutions to this:

In [25]:
print(f"Seperator used in your current OS: {os.sep}")

l = ["Path","to","folder"]
print(f"os.sep.join(): {os.sep.join(l)}")

path_win = "C:\Windows\system32"
path_linux = "/home/myprofile/testdata"
path_extension = "path_extension"

print(f"win_path with extension: {os.path.join(path_win,path_extension)}")
print(f"linux_path with extension: {os.path.join(path_linux,path_extension)}")

Seperator used in your current OS: /
os.sep.join(): Path/to/folder
win_path with extension: C:\Windows\system32/path_extension
linux_path with extension: /home/myprofile/testdata/path_extension


There are also functions that help you evaluate whether or not some path exists as well as if it leads to a file or a folder.

In [43]:
print(f"Does {os.path.join(path_win,path_extension)} exist? {os.path.exists(os.path.join(path_win,path_extension))}")

testfolder_path = os.path.join(os.getcwd(),'testfolder')
testfile_path = os.path.join(os.getcwd(),'testfile.pdf')

print(f"Does {testfolder_path} exist? {os.path.exists(testfolder_path)}")
print(f"Does {testfile_path} exist? {os.path.exists(testfile_path)}")

Does C:\Windows\system32/path_extension exist? False
Does /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfolder exist? True
Does /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfile.pdf exist? True


In [49]:
print(f"Is {testfolder_path} a file? {os.path.isfile(testfolder_path)}")
print(f"Is {testfolder_path} a directory? {os.path.isdir(testfolder_path)}")

print(f"Is {testfile_path} a file? {os.path.isfile(testfile_path)}")
print(f"Is {testfile_path} a directory? {os.path.isdir(testfile_path)}")

Is /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfolder a file? False
Is /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfolder a directory? True
Is /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfile.pdf a file? True
Is /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfile.pdf a directory? False


### Creating files and directories

You can also use the OS package to create a new directory directly from your python script.

In [52]:
os.mkdir(testfolder_path)

FileExistsError: [Errno 17] File exists: '/media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfolder'

Trying to blindly create files or folders can lead to errors as seen above where the folder already exists. A better way of doing this is to check whether or not a file or folder already exists: 

In [81]:
if not os.path.exists(testfolder_path):
    os.mkdir(testfolder_path)
else:
    print("Path already exists. Not creating new directory.")
    
if not os.path.exists(testfile_path):
    os.mkdir(testfile_path)
else:
    print("Path already exists. Not creating new file.")
    
new_folder = os.path.join(os.getcwd(),"new_folder")

if not os.path.exists(new_folder):
    os.mkdir(new_folder)
    print(f"Created folder {new_folder}")
else:
    print("Path already exists. Not creating new file.")

Path already exists. Not creating new directory.
Path already exists. Not creating new file.
Created folder /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_folder


You can use os.mknod() to create a new file. This will work on Windows and Linux but may fail on macOS:

In [85]:
new_file = os.path.join(os.getcwd(),"new_file.pdf")

if not os.path.exists(new_file):
    os.mknod(new_file)
    print(f"Created file {new_file}")
else:
    print("Path already exists. Not creating new file.")

Created file /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_file.pdf


### Deleting files and directories
Files and directories can also be deleted. There are different options which are explained in more detail https://stackoverflow.com/questions/6996603/how-can-i-delete-a-file-or-folder-in-python#6996628 . Deleting files and folders via the os package can be performed in the following way:

In [68]:
new_file2 = os.path.join(os.getcwd(),"new_file2.pdf")

os.remove(new_file2)

FileNotFoundError: [Errno 2] No such file or directory: '/media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_file2.pdf'

Trying to remove a file or folder that does not exist will lead to a "FileNotFoundError" and crash your program. Thus you should always check if the path exists, just like in the examples shown to create things. A better way of deleting file(s) would be:

In [84]:
if os.path.exists(new_file):
    os.remove(new_file)
    print(f"Deleted file {new_file}")
else:
    print(f"File '{new_file}' does not exist.")

Deleted file /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_file.pdf


Empty directories can be deleted in a similar fashion when using the os package:

In [86]:
if os.path.exists(new_folder):
    os.rmdir(new_folder)
    print(f"Deleted directory {new_folder}")
else:
    print(f"Directory '{new_folder}' does not exist.")

Deleted directory /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_folder


os.rmdir only works if the folder you want to delete is empty, that means does not contain any files or subfolders, as can be seen in the following example which will run into an error:

In [101]:
if not os.path.exists(new_folder):
    os.mkdir(new_folder)
    print(f"Created folder {new_folder}")
else:
    print("Path already exists. Not creating new file.")
    
new_folder_file = os.path.join(new_folder,"new_folder_file")

if not os.path.exists(new_folder_file):
    os.mknod(new_folder_file)
    print(f"Created file {new_folder_file}")
else:
    print("Path already exists. Not creating new file.")
    
if os.path.exists(new_folder):
    os.rmdir(new_folder)
    print(f"Deleted directory {new_folder}")
else:
    print(f"Directory '{new_folder}' does not exist.")

Created folder /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_folder
Created file /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_folder/new_folder_file


OSError: [Errno 39] Directory not empty: '/media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_folder'

One way to circumvent this problem is by manually deleting everything in that folder and then fall back on the os package. An easier way in this case is to utilize the "shutil" package which also comes directly with python:

In [102]:
import shutil

if os.path.exists(new_folder):
    shutil.rmtree(new_folder)
    print(f"Deleted directory {new_folder}")
else:
    print(f"Directory '{new_folder}' does not exist.")

Deleted directory /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_folder


### More useful commands

List all files and subdirs in a folder:

In [107]:
dir_list = os.listdir(os.getcwd())
print(f"list of files/folders in directory: {dir_list}")

list of files/folders in directory: ['04_accessing_files.ipynb', 'new_file.pdf', 'testfile.pdf', 'testfile.pdf2', 'testfolder', 'testfolder.pdf']


Get size (in bytes) for a specific file or folder:

In [116]:
print(f"Size of {new_file}: {os.path.getsize(new_file)} bytes")
print(f"Size of {testfolder_path}: {os.path.getsize(testfolder_path)} bytes")

Size of /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/new_file.pdf: 0 bytes
Size of /media/dan/1407183E63647361/Sciebo/WHK/Exercises/Python/programming_python/code/ntbks/04_files_os/testfolder: 0 bytes


## Tasks

### 4.1
Write code that generates four arbitrarily named files containing the suffix ".pdf" as well as two arbitrarily names files with a random/without a suffix in the "testfolder" subdirectory. Additionally add two directories with names of your choice to the same directory ("testfolder").

In [None]:
### Your code here

### 4.2
Write a list comprehension that puts all contents of the "testfolder" directory into a list.

In [None]:
### Your code here

### 4.3
Now write two more list comprehensions, but this time use conditional statements to make sure that one list only contains actual files while the other only contains the folders from "testfolder". 

In [117]:
### Your code here

### 4.4
Use list comprehensions again to only add files containing the ".pdf" extension to the resulting list.

In [None]:
### Your code here