# Comprehensive Tutorial on Using Pathlib In Python For File System Manipulation

## Introduction

One of the most frustrating aspects of Python up until version 3.4 was file system manipulation. Developers often struggled with tangled strings representing paths. Their code broke frequently due to path inconsistencies on different operating systems (Windows vs. Unix-like). That's when the `pathlib` module was introduced to the standard library.

`pathlib` offers a long-awaited object-oriented approach to path manipulation. It offers a powerful and elegant solution to handling file system paths, ensures platform-agnostic behavior, and promotes code clarity and maintainability. 

The module has matured significantly over the years, making it an essential tool for any Pythonista. This comprehensive tutorial will teach you the features and methods of `pathlib` that will probably enough for 99% of your daily needs. Let's get started.

## Python `os` module vs. `pathlib`

Some of our readers might ask "Why learn a new library when we have the Python `os` module?". That's a fair question. 

Let's say we want to find all `png` files inside a given directory and all its sub-directories (a common task in data science). If we were using the `os` module, we would have to write code like this:

In [None]:
import os

dir_path = "/home/user/documents"

# Find all text files inside a directory
files = [
    os.path.join(dir_path, f)
    for f in os.listdir(dir_path)
    if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".png")
]

This snippet has many disadvantages:
1. It is long and unreadable for such a simple operation.
2. Requires knowledge of list comprehensions.
3. It involves string operations which are error-prone.

If we were using `pathlib`, then our code would be much simpler:

In [None]:
from pathlib import Path

# Create a path object
dir_path = Path(dir_path)

# Find all text files inside a directory
files = list(dir_path.glob("*.png"))

If you continue reading the article, you will discover many more benefits of `pathlib` over the `os` module besides simplicity and readability. So, shall we?

## `Path` objects

The entire `pathlib` library revolves around `Path` objects:

In [2]:
from pathlib import Path

These objects represent file system paths in a structured and - this is key - platform-independent way. Unlike working with raw strings, these objects offer a more user-friendly approach to manipulating file system paths.

We can create `Path` objects in several ways:

1. __From strings__

You can directly create a `Path` object by passing a string that represents a file system path:

In [4]:
file_path_str = "data/union_data.csv"
data_path = Path(file_path_str)

print(type(data_path))

<class 'pathlib.PosixPath'>


2. __From other `path` objects__

Existing `Path` objects can be building blocks to create new paths. You can combine them using operators:

In [10]:
base_path = Path("/home/user")
data_dir = Path("data")

file_path = base_path / data_dir / "prices.csv"  # Combining multiple paths
print(file_path)

/home/user/data/prices.csv


By using a forward slash, you can extend `Path` objects with another object or a string path. 

3. __From the current working directory__

The `Path.cwd()` method gives a fast access to the current working directory as a Path object:

In [9]:
cwd = Path.cwd()

print(cwd)

/home/bexgboost/articles/2024/4_april/8_pathlib


4. __From the home directory__

In [11]:
home = Path.home()

home / "downloads" / "projects"

PosixPath('/home/bexgboost/downloads/projects')

__An important note__: The `Path` class itself doesn't perform any file system operations such as path validation, creating directories or files. It is designed for representing and manipulating paths. To actually interact with the file system (checking existence, reading/writing files), we will have to use special methods of `Path` objects and for some advanced cases, get help from the `os` module. More on this later.

## `Path` components 

Just like a physical address has different parts (street number, city, country, zip code, etc.), a file system path can be broken down into smaller components. `pathlib` allows us to access and manipulate these components using path attributes through dot-notation. 

Here are some common path attributes and how to retrieve them in `pathlib`:

- __Root__: This refers to the top level of the file system (e.g., "/" on Unix-like systems, drive letter like "C:" on Windows).

In [13]:
image_file = home / "downloads" / "midjourney.png"

image_file.root

'/'

- __Parent__: This attribute returns a `Path` object representing the directory containing the current path.

In [14]:
image_file.parent

PosixPath('/home/bexgboost/downloads')

- __name__: This attribute returns the entire filename (including extension) as a string.

In [15]:
image_file.name

'midjourney.png'

- __suffix__: This attribute returns the file extension (including the dot) as a string, or an empty string if there's no extension.

In [16]:
image_file.suffix

'.png'

- __stem__: This attribute returns the file name without the extension. It is useful when converting files to different formats.

In [17]:
image_file.stem

'midjourney'

If you want to split a `Path` object into its components, you can use the `.parts` attribute:

In [18]:
image_file.parts

('/', 'home', 'bexgboost', 'downloads', 'midjourney.png')

If you want these components to be `Path` objects in themselves, you can use the `.parents` attribute, which returns a generator:

In [19]:
list(image_file.parents)

[PosixPath('/home/bexgboost/downloads'),
 PosixPath('/home/bexgboost'),
 PosixPath('/home'),
 PosixPath('/')]

## Common path operations using `pathlib`

### Listing directories

### Checking path existence

### Creating and deleting paths

## Advanced path manipulation

### Relative vs. absolute paths

### Joining and splitting paths

### Globbing

## Working with files

### Reading files

### Writing files

### File renaming and moving

## Additional functionalities

### Iterating over file trees

### Temporary files and directories

### Permissions and file system information

## Conclusion