# File Handling

### File Path

A ***path*** is a string that gives the location of a file. It is made up of a directory path and a file name.

For example, in the file path `C:\Users\john\hello.txt`, `C:\Users\john` is the directory path and `hello.txt` is the file name.

### Careful about backward slashes

Note that:

- Windows uses backward slashes: `C:\Users\john\hello.txt`
- macOS use forward slashes: `/Users/john/hello.txt`

When copying and pasting file paths, you should change the backward slash to forward slash.

```python
'C:/Users/john/hello.txt'
```

In [2]:
import os.path

windows_path = "C:/Users/John/Documents/file.txt"

head, tail = os.path.split(windows_path)
print(head)
print(tail)

C:/Users/John/Documents
file.txt


In [3]:
import os.path

unix_path = "/home/user/my_file.txt"

head, tail = os.path.split(unix_path)
print(head)
print(tail)

/home/user
my_file.txt


#### Exercise

Which of the following is invalid file path in Python?

- `file_path_a = "C:\Users\john\hello.txt"`
- `file_path_b = "C:\\Users\\john\\hello.txt"`
- `file_path_c = "C:/Users/john/hello.txt"`
- `file_path_d = "/Users/john/hello.txt"`

Answer: 

The following is a list of methods often used with files:

```py

import os

# Create a directory
os.mkdir('new_dir') 

# Get current directory 
current_dir = os.getcwd()

# Change directory
os.chdir('new_dir')

# Create a new file
with open('file.txt', 'w') as f:
   f.write('Hello World!')
   
# Rename file   
os.rename('file.txt', 'hello.txt') 

# Get file size
file_size = os.path.getsize('hello.txt')

# Check if file exists
file_exists = os.path.isfile('hello.txt') 

# Delete file
os.remove('hello.txt')

# Delete directory 
os.rmdir('new_dir')

# List files in directory  
files = os.listdir('.')

print(files)

```

### Current Working Directory `os.getcwd()`

- Absolute file paths always begin with the root folder. Example: `C:\` or `/`.

- Relative file paths do not begin with the root folder. Example: `example.txt` or `./example.txt`.

Run the following to get the current working directory:

In [1]:
import os

os.getcwd()

'c:\\Users\\thund\\OneDrive\\Teach\\Python\\tutorials'

### What does the `.` and `..` mean in file paths

- `.` means the current directory
- `..` means the parent directory

Hence, `os.listdir('.')` means list all files in the current directory.

#### Exercise

- Use `os.listdir('.')`
- Use `os.listdir('..')`

In [None]:
# try it

### Create and remove file

In [2]:
# Create a new file
with open('example.txt', 'w') as f:
   f.write('Hello World!')

In [5]:
# Check if a file exists
file_path = "./example.txt"
if os.path.exists(file_path):
    print("Yes. Exists")
else:
    print("Not Found")

Not Found


In [4]:
os.remove('example.txt')

### Create and remove directory

In [6]:
import os

os.mkdir("new_directory")

In [9]:
# Check if a file exists
file_path = "new_directory"
if os.path.exists(file_path):
    print("Yes. Exists")
else:
    print("Not Found")

Not Found


In [8]:
os.rmdir("new_directory")

### Find files on your system

In [15]:
# Convert file sizes to MBs and GBs
def get_file_size(file_path):
    # see: https://stackoverflow.com/a/1392549
    size = sum( os.path.getsize(os.path.join(dirpath,filename)) for dirpath, dirnames, filenames in os.walk( file_path ) for filename in filenames )
    size_kb = size / 1024
    size_mb = size_kb / 1024
    size_gb = size_mb / 1024
    if size_gb > 0.1:
        return size_gb, "GB"
    elif size_mb > 0.1:
        return size_mb, "MB"
    return size_kb, "KB"

In [2]:
import os
import platform

def walk_directory(root_dir):
    """Walks through a directory and prints information about its files and subdirectories."""

    for root, directories, files in os.walk(root_dir):
        print(f"Root directory: {root}")

        # Print directories
        for directory in directories:
            print(f"- Directory: {os.path.join(root, directory)}")

        # Print files
        for file in files:
            print(f"- File: {os.path.join(root, file)}")

        print("-" * 20)  # Separate entries for readability

In [23]:
# Operating system-specific file paths
if platform.system() == "Windows":
    user_folder = os.path.expanduser("~")
    downloads_folder = os.path.join(user_folder, "Downloads")
    
    system_root = os.environ.get("SystemRoot", "C:\\Windows")
    built_in_apps_folder = os.path.join(system_root, "System32")
    
    apps_folder = os.environ.get("ProgramFiles", "C:\\Program Files")
elif platform.system() == "Darwin":  # macOS
    user_folder = os.path.expanduser("~")
    downloads_folder = os.path.join(user_folder, "Downloads")
    
    system_root = "/"
    built_in_apps_folder = "/System/Applications"
    
    apps_folder = "/Applications"

# Walk through common file paths
print(f'{system_root}')
print(os.listdir(system_root))

print(f'{user_folder}')
print(os.listdir(user_folder))

# File Size
size, unit = get_file_size(apps_folder)
print(f"Size of {apps_folder}: {size:.2f} {unit}")

size, unit = get_file_size(downloads_folder)
print(f"Size of {downloads_folder}: {size:.2f} {unit}")

size, unit = get_file_size(built_in_apps_folder)
print(f"Size of {built_in_apps_folder}: {size:.2f} {unit}")

print('downloads folder:')
walk_directory(downloads_folder)

C:\Windows
['appcompat', 'apppatch', 'AppReadiness', 'assembly', 'bcastdvr', 'bfsvc.exe', 'BitLockerDiscoveryVolumeContents', 'Boot', 'bootstat.dat', 'Branding', 'BrowserCore', 'CbsTemp', 'Containers', 'Core.xml', 'CSC', 'Cursors', 'debug', 'diagnostics', 'DiagTrack', 'DigitalLocker', 'Downloaded Program Files', 'DtcInstall.log', 'ELAMBKUP', 'en-US', 'explorer.exe', 'Fonts', 'GameBarPresenceWriter', 'Globalization', 'Help', 'HelpPane.exe', 'hh.exe', 'IdentityCRL', 'IME', 'ImmersiveControlPanel', 'InboxApps', 'INF', 'InputMethod', 'Installer', 'L2Schemas', 'LanguageOverlayCache', 'LiveKernelReports', 'Logs', 'lsasetup.log', 'Media', 'mib.bin', 'Microsoft.NET', 'Migration', 'ModemLogs', 'notepad.exe', 'OCR', 'Offline Web Pages', 'Panther', 'Performance', 'PFRO.log', 'PLA', 'PolicyDefinitions', 'Prefetch', 'PrintDialog', 'Professional.xml', 'Provisioning', 'py.exe', 'pyshellext.amd64.dll', 'pyw.exe', 'regedit.exe', 'Registration', 'RemotePackages', 'rescache', 'Resources', 'SchCache', 'sc

# File I/O

*Reading* and *writing* files is the primary way for a program to **interact with the real-world**:

* `I: input:  read files`
* `O: output: write files`

The key function for working with files in Python is the `open()` function.

The `open()` function takes two parameters; `<filename>`, and `<mode>` which accepts characters having the following meaning:

| Character | Meaning |
|---|---|
| r | Open for reading (default) |
| w | Open for writing, truncating the file first |
| x | Open for exclusive creation, failing if the file already exists |
| a | Open for writing, appending to the end of file if it exists |
| b | Binary mode (e.g., images, audio, or video) |
| t | Text mode (default) |
| + | Open for updating (reading and writing) |

### Read the file in one go

`file.read()` reads the entire file and returns it as a string.

In [10]:
file = open('./data/students.txt')
print(file.read())
file.close()

Sophia
Bell
Firdaus


### Read line by line

`file.readline()` reads a single line from the file; a newline character (`\n`) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline.

In [13]:
file = open('./data/students.txt')
print(file.readline())
print(file.readline())
file.close()

Sophia

Bell



In [14]:
file = open('./data/students.txt')
while True:
    line = file.readline()
    line = line.rstrip() # remove the newline character "\n"
    if line == '':
        break
    print(line)
file.close()

Sophia
Bell
Firdaus


### Automatically close files using the `with` block (Context Manager)

In [20]:
file_path = './data/students.txt'
with open(file_path, "r") as f:
    while True:
        line = f.readline()
        line = line.rstrip() # remove the newline character "\n"
        if line == '':
            break
        print(line)

1 Sophia
2 Bell
3 Firdaus


Let's now print the line number as well: `i` starting from `1`.

In [15]:
file_path = './data/students.txt'
i = 1
with open(file_path, "r") as f:
    while True:
        line = f.readline()
        line = line.rstrip() # remove the newline character "\n"
        if line == '':
            break
        print(i, line)
        i += 1

1 Sophia
2 Bell
3 Firdaus


#### Exercise

- Read the file `customer.csv` in the `data` folder and print it.
- Count the number of **lines** in the file.
- Count the number of **characters** in the file.
- Count the number of **words** in the file.

In [None]:
# try it

### Handle File Errors

When working with files, it is important to handle errors that can occur. For example:

1. the file may not exist
1. the file may be write-protected

To handle errors when working with files, you can use a `try...except` block. The `try` block contains the code that you want to execute. If an error occurs, the `except` block will be executed.

Here is an example of how to handle errors when working with files in Python:

In [16]:
file_path = 'new_file.txt'
with open(file_path, "w") as f:
    f.write("zip zap")

In [16]:
try:
    with open("new_file.txt", "r") as f:
        contents = f.read()
        print('success')
except FileNotFoundError:
    print("The file does not exist!")
except PermissionError:
    print("The file is write-protected!")
except Exception as e:
    print("some error happend:", e)

The file does not exist!


# The `csv` module

CSV stands for: Comma Separated Values.

The `csv` module implements classes to read and write tabular data in CSV format.

In [17]:
import csv

In [23]:
data = []
with open('./data/customers.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        data.append(row)

print(data)

[['name', 'phone', 'email'], ['Anna', '+91 99999 11111', 'anna@example.com'], ['Sia', '+91 98765 12345', 'sia@example.com'], ['Leo', '+91 90909 10101', 'leo@example.com'], ['Bob', '+91 99999 23232', 'bob@example.com'], ['Den', '+91 98765 11223', 'den@example.com'], ['Mark', '+91 90909 88776', 'mark@example.com']]


In [28]:
# Open the CSV file for reading.
data = []
with open('./data/customers.csv', "r") as f:
    reader = csv.reader(f)

    # Iterate over the rows of the CSV file.
    for row in reader:
        data.append(row)

header = data[0]
rows = data[1:]

In [29]:
print(header)
print(rows)

['name', 'phone', 'email']
[['Anna', '+91 99999 11111', 'anna@example.com'], ['Sia', '+91 98765 12345', 'sia@example.com'], ['Leo', '+91 90909 10101', 'leo@example.com'], ['Bob', '+91 99999 23232', 'bob@example.com'], ['Den', '+91 98765 11223', 'den@example.com'], ['Mark', '+91 90909 88776', 'mark@example.com']]


In [31]:
# Open the CSV file for writing.
with open("asdf.csv", "w") as f:
    writer = csv.writer(f, lineterminator='\n')

    # Write the header row.
    writer.writerow(["name", "age"])

    # Write the data rows.
    rows = [
        ["Alice", 25],
        ["Bob", 30],
    ]
    writer.writerows(rows)