# Pathlib 
The pathlib module was introduced in Python 3.4
> Credit: [Geir Arne Hjelle](https://realpython.com/team/gahjelle/) @[RealPython.com](https://realpython.com/python-pathlib/)

Working with files and interacting with the file system are important for many different reasons. 

The simplest cases may involve only reading or writing files, but sometimes more complex tasks are at hand. 
- Maybe you need to list all files in a directory of a given type, 
- find the parent directory of a given file, or 
- create a unique file name that does not already exist.

Traditionally, Python has represented file paths using regular text strings. With support from the ```os.path``` standard library, this has been adequate although a bit cumbersome. However, since paths are not strings, important functionality is spread all around the standard library, including libraries like os, glob, and shutil. 

The following example needs three import statements just to move all text files to an archive directory:

In [179]:
import glob
import os
import shutil

for file_name in glob.glob('*.txt'):
    new_path = os.path.join('archive', file_name)
    shutil.move(file_name, new_path)

**Iterate over files in a directory**

In [165]:
p = pathlib.Path.cwd()
print(p)
list(p.iterdir())

/Users/ratchet/Projects/desertpy


[PosixPath('/Users/ratchet/Projects/desertpy/archive'),
 PosixPath('/Users/ratchet/Projects/desertpy/python_basics_stdlib.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/file_paths.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/python_basics_stdlib.slides.html'),
 PosixPath('/Users/ratchet/Projects/desertpy/string_formatting.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/python-string-formatting-flowchart.4ecf0148fd87.png'),
 PosixPath('/Users/ratchet/Projects/desertpy/README.md'),
 PosixPath('/Users/ratchet/Projects/desertpy/dataclasses.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/haversine_formula_150.fb2b87d122a4.png'),
 PosixPath('/Users/ratchet/Projects/desertpy/.ipynb_checkpoints'),
 PosixPath('/Users/ratchet/Projects/desertpy/data'),
 PosixPath('/Users/ratchet/Projects/desertpy/Logging'),
 PosixPath('/Users/ratchet/Projects/desertpy/survey_results.ipynb')]

**Find directories in a file path**

In [132]:
p = pathlib.Path.cwd()
[x for x in p.iterdir() if x.is_dir()]

[PosixPath('/Users/ratchet/Projects/desertpy/archive'),
 PosixPath('/Users/ratchet/Projects/desertpy/.ipynb_checkpoints'),
 PosixPath('/Users/ratchet/Projects/desertpy/data'),
 PosixPath('/Users/ratchet/Projects/desertpy/Logging')]

In [186]:
p = pathlib.Path.cwd()
print(type(p))
list(p.glob("*.ipynb"))

<class 'pathlib.PosixPath'>


[PosixPath('/Users/ratchet/Projects/desertpy/python_basics_stdlib.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/file_paths.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/string_formatting.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/dataclasses.ipynb'),
 PosixPath('/Users/ratchet/Projects/desertpy/survey_results.ipynb')]

***
# Creating Paths

All you really need to know about is the ```pathlib.Path``` class. 

There are a few different ways of creating a path. First of all, there are classmethods like ```.cwd()``` (Current Working Directory) and ```.home()``` (your user’s home directory):

In [134]:
p = pathlib.Path.cwd()
data_path = p / "data/"
data_path

PosixPath('/Users/ratchet/Projects/desertpy/data')

In [135]:
# A path can also be explicitly created from its string representation
pathlib.Path(r'Projects/desertpy/file_paths.ipynb/file.txt')

PosixPath('Projects/desertpy/file_paths.ipynb/file.txt')

In [136]:
# A third way to construct a path is to join the parts of the path using the special operator '/'
pathlib.Path.home() / 'Projects' / 'desertpy' / 'test.py'

PosixPath('/Users/ratchet/Projects/desertpy/test.py')

The / can join several paths or a mix of paths and strings (as above) as long as there is at least one Path object. If you do not like the special / notation, you can do the same thing with the .joinpath() method:

In [137]:
pathlib.Path.home().joinpath('Projects', 'desertpy', 'test.py')

PosixPath('/Users/ratchet/Projects/desertpy/test.py')

***
# Checking for Existence

In [138]:
list(data_path.iterdir())

[PosixPath('/Users/ratchet/Projects/desertpy/data/iris.csv'),
 PosixPath('/Users/ratchet/Projects/desertpy/data/.ipynb_checkpoints')]

In [143]:
sample_data = data_path / "iris.csv"
print(sample_data)
print(type(sample_data))
print(sample_data.is_file())
sample_data.exists()

/Users/ratchet/Projects/desertpy/data/iris.csv
<class 'pathlib.PosixPath'>
True


True

***
# Reading and Writing Files

Traditionally, the way to read or write a file in Python has been to use the built-in open() function. This is still true as the open() function can use Path objects directly. The following example finds all headers in a Markdown file and prints them:

In [187]:
path = pathlib.Path.cwd() / 'README.md'
with path.open(mode='r') as f:
    headers = [line.strip() for line in f if line.startswith('#')]

print('\n'.join(headers))

# Hello DesertPy!


In fact, Path.open() is calling the built-in open() behind the scenes. Which option you use is mainly a matter of taste.

For simple reading and writing of files, there are a couple of convenience methods in the pathlib library:
- .read_text(): open the path in text mode and return the contents as a string.
- .read_bytes(): open the path in binary/bytes mode and return the contents as a bytestring.
- .write_text(): open the path and write string data to it.
- .write_bytes(): open the path in binary/bytes mode and write data to it.

Each of these methods handles the opening and closing of the file, making them trivial to use, for instance:

In [145]:
path = pathlib.Path.cwd() / 'README.md'
path.read_text()

'# Hello DesertPy!'

In [147]:
# or...
pathlib.Path('README.md').read_text()

'# Hello DesertPy!'

In [148]:
# or slice it...
pathlib.Path('README.md').read_text()[:5]

'# Hel'

***
# Picking Out Components of a Path
The different parts of a path are conveniently available as properties. Basic examples include:

.name: the file name without any directory
.parent: the directory containing the file, or the parent directory if path is a directory
.stem: the file name without the suffix
.suffix: the file extension
.anchor: the part of the path before the directories
Here are these properties in action:

In [103]:
path

PosixPath('/Users/ratchet/Projects/desertpy/README.md')

In [104]:
path.name

'README.md'

In [105]:
path.stem # os.path.splitext(os.path.basename('/Users/ratchet/Projects/desertpy/README.md'))[0]

'README'

In [106]:
path.suffix # filename, file_extension = os.path.splitext('/Users/ratchet/Projects/desertpy/README.md')

'.md'

In [107]:
path.parent # os.path.abspath(os.path.join('/Users/ratchet/Projects/desertpy/README.md', os.pardir))

PosixPath('/Users/ratchet/Projects/desertpy')

In [108]:
path.parent.parent

PosixPath('/Users/ratchet/Projects')

In [109]:
path.anchor

'/'

***
# Moving and Deleting Files
Through pathlib, you also have access to basic file system level operations like moving, updating, and even deleting files. 

For the most part, these methods do not give a warning or wait for confirmation before information or files are lost. ***Be careful when using these methods.***

To move a file, use .replace(). Note that if the destination already exists, .replace() will overwrite it. Unfortunately, pathlib does not explicitly support safe moving of files. To avoid possibly overwriting the destination path, the simplest is to test whether the destination exists before replacing:

In [149]:
source = pathlib.Path('README.md').read_text()
destination = pathlib.Path.cwd() / 'README.md'

if not destination.exists():
    source.replace(destination)

However, this does leave the door open for a possible race condition. 

Another process may add a file at the destination path between the execution of the if statement and the ```.replace()``` method. If that is a concern, a safer way is to open the destination path for exclusive creation and explicitly copy the source data:

In [150]:
with destination.open(mode='xb') as fid:
    fid.write(source.read_bytes())

FileExistsError: [Errno 17] File exists: '/Users/ratchet/Projects/desertpy/README.md'

The code above will raise a ```FileExistsError``` if destination already exists. Technically, this copies a file. To perform a move, simply delete source after the copy is done (see below). Make sure no exception was raised though.

When you are renaming files, useful methods might be ```.with_name()``` and ```.with_suffix()```. They both return the original path but with the name or the suffix replaced, respectively.

For instance:

In [156]:
path

PosixPath('/Users/ratchet/Projects/desertpy/README.md')

In [157]:
path.with_suffix('.md')

PosixPath('/Users/ratchet/Projects/desertpy/README.md')

In [158]:
path.replace(path.with_suffix('.md'))

Directories and files can be deleted using ```.rmdir()``` and ```.unlink()``` respectively. ***(Again, be careful!)***

***
# Examples
In this section, you will see some examples of how to use pathlib to deal with simple challenges.

### Counting Files
There are a few different ways to list many files. The simplest is the .iterdir() method, which iterates over all files in the given directory. The following example combines .iterdir() with the collections.Counter class to count how many files there are of each filetype in the current directory:

In [159]:
import collections
collections.Counter(p.suffix for p in pathlib.Path.cwd().iterdir())

Counter({'': 4, '.ipynb': 5, '.html': 1, '.png': 2, '.md': 1})

### Display a Directory Tree
The next example defines a function, tree(), that will print a visual tree representing the file hierarchy, rooted at a given directory. Here, we want to list subdirectories as well, so we use the .rglob() method:

In [160]:
def tree(directory):
    print(f'+ {directory}')
    for path in sorted(directory.rglob('*')):
        depth = len(path.relative_to(directory).parts)
        spacer = '    ' * depth
        print(f'{spacer}+ {path.name}')
        
tree(pathlib.Path.cwd())

+ /Users/ratchet/Projects/desertpy
    + .ipynb_checkpoints
        + README-checkpoint.md
        + dataclasses-checkpoint.ipynb
        + file_paths-checkpoint.ipynb
        + python_basics_stdlib-checkpoint.ipynb
        + string_formatting-checkpoint.ipynb
        + survey_results-checkpoint.ipynb
    + Logging
        + .ipynb_checkpoints
            + config-checkpoint.yaml
            + example-checkpoint.log
            + file-checkpoint.conf
            + logging-checkpoint.ipynb
        + config.yaml
        + example.log
        + file.conf
        + logging.ipynb
    + README.md
    + archive
        + file.txt
    + data
        + .ipynb_checkpoints
            + iris-checkpoint.csv
        + iris.csv
    + dataclasses.ipynb
    + file_paths.ipynb
    + haversine_formula_150.fb2b87d122a4.png
    + python-string-formatting-flowchart.4ecf0148fd87.png
    + python_basics_stdlib.ipynb
    + python_basics_stdlib.slides.html
    + string_formatting.ipynb
    + survey_results.ipy

### Find the Last Modified File
The .iterdir(), .glob(), and .rglob() methods are great fits for generator expressions and list comprehensions. To find the file in a directory that was last modified, you can use the .stat() method to get information about the underlying files. For instance, .stat().st_mtime gives the time of last modification of a file:

In [161]:
from datetime import datetime


directory = pathlib.Path.cwd()

time, file_path = max((f.stat().st_mtime, f) for f in directory.iterdir())

print(datetime.fromtimestamp(time), file_path)

2019-05-22 11:32:59.466734 /Users/ratchet/Projects/desertpy/file_paths.ipynb


### Create a Unique File Name
The last example will show how to construct a unique numbered file name based on a template. First, specify a pattern for the file name, with room for a counter. Then, check the existence of the file path created by joining a directory and the file name (with a value for the counter). If it already exists, increase the counter and try again:

In [164]:
def unique_path(directory, name_pattern):
    counter = 0
    while True:
        counter += 1
        path = directory / name_pattern.format(counter)
        if not path.exists():
            return path

path = unique_path(pathlib.Path.cwd(), 'test_{:03d}.txt')

print(path)

/Users/ratchet/Projects/desertpy/test_001.txt


***
# Why Pathlib? 
(_IMHO_)

- _Easier_ and safer handling of pathnames
- Less ```os.path.*``` noise in your code
- Moves your path names (varaibles) towards the left and _in focus_
    ```python
    # Before
    if os.path.isdir(path):
        os.rmdir(path)

    # After
    if path.is_dir():
        path.rmdir()
    ```
- More powerful, with most necessary methods and properties available directly on the object
- More consistent across operating systems, as peculiarities of the different systems are hidden by the ```Path``` object