#  The Pathlib module

## Windows or not Windows ...  

As you may know, there are different Operating Systems (Windows or Mac among others). These Operating Systems have an impact when you need for instance to open files. 


In [1]:
# Opening a file with a Mac : 

with open(r'/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile.txt','r') as f:
    for line in f:
        print(line)
        
# <with> : protocol manager. Automatically shuts down the file once we exit the instruction bloc
# <r> : precise that the incoming input will be a string
# <, 'r'> : precise that we will be reading the file
# <as f:> : name given to the file, ":" delimitates the instruction bloc

Hi there !



Blabla


In [2]:
# Same code with a Windows : 

with open(r'\Users\Basile\Documents\Dauphine\M2_203\Python\Session_5\testfile.txt','r') as f:
    for line in f:
        print(line)
        

FileNotFoundError: [Errno 2] No such file or directory: '\\Users\\Basile\\Documents\\Dauphine\\M2_203\\Python\\Session_5\\testfile.txt'

The only difference is the utilisation of </> with Mac whereas <\> is used with Windows. 

To avoid such mismatch which can be really troubling and hard to find, Python developed a module, Pathlib. It creates paths objects with a appropriate semantic for the considered operating system. In other words, with such Paths, you can manipulate files whatever the computer your working on.   

## First operation

In [3]:
import pathlib

## Pure Paths and Concrete Paths

A Pure Path allows to handle path operations. These operations do not access a filesystem. In other words, you can travel among the different paths but you cannot enter a '.txt' file for instance.

A Concrete Path is a sub-class of Pure Paths. They provide the same methods as of Pure Paths, differentiating themselves by the available in and out operations. In other words, with Concrete Paths, you can enter a '.txt' file for instance. 


There are 3 "flavours" (=class) for Pure Paths : 
    - PurePath : Generic term. When you don't specify whether you have a Windows filesystem or not
    - PurePosixPath : When you don't have a Windows filesystem
    - PureWindowsPath : When you specify you have a Windows filesystem

There are 3 "flavours" (=class) for Concrete Paths : 
    - Path : Generic term. When you don't specify whether you have a Windows filesystem or not
    - PosixPath : When you don't have a Windows filesystem
    - WindowsPath : When you specify you have a Windows filesystem

We will focus on Concrete Paths because they are the most used ones and because ultimately we will need to access the data within the files. 

## Creating a (Concrete) Path ...

There are several ways to create a Path (as all the Paths lead to Roma) :

### ... With an explicit string 

In [4]:
Path_1 = pathlib.Path(r'/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile.txt')
Path_1

PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile.txt')

We see that we correctly obtain a path object, adapted to my Operating System !

### ... Building it

In [5]:
import pathlib

pathlib.Path.home()/'Documents'/'Dauphine'/'M2_203'/'Python'/'Session_5'/'testfile.txt'


# <pathlib.Path.hom()> Return a new path object representing 
# the user’s home directory
# </> correspond automatically to your Operating System

PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile.txt')

In [6]:
# Or equivalently 
pathlib.Path.home().joinpath('Documents', 'Dauphine', 'M2_203','Python','Session_1','testfile.txt')

PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_1/testfile.txt')

## Some important Methods and Properties of Pure and Concrete Paths

Property : has a value
Method : do an action and return a value

### .parts

You can cut a Path and access the different parts with the following methods and properties : 

In [7]:
Path_1.parts # Return a tuple with the path's components

# you can thus access each element of a path easily


('/',
 'Users',
 'Basile',
 'Documents',
 'Dauphine',
 'M2_203',
 'Python',
 'Session_5',
 'testfile.txt')

### .parents[x] 

You can also access the ancestors of the path : 

In [10]:
Path_1.parents[0]
# here, you want to access the first ancestor of the 'testfile.txt' file
# i.e. the folder 'Session_5'

Path_1.parents[1]

Path_1.parents[0].parents[0]
# here, you want to access the first ancestor of the first ancestor
# i.e. the folder 'Python'

PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python')

### .name / .suffix / .stem

you can also decompose the filename in its sub components : 

filename = stem + suffix

It can be useful when you want to access each file which has a Python
extension (.py) for instance !

In [12]:
Path_1.name

'testfile.txt'

In [11]:
Path_1.suffix

'.txt'

In [13]:
Path_1.stem

'testfile'

## Concrete Paths Methods

### .cwd()

In [14]:
# Return a Path representing the current directory 
# (i.e the folder Session_5 here)

Path_1.cwd()


PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5')

### .home()

In [15]:
# Return a Path representing the user's home directory

Path_1.home()

PosixPath('/Users/Basile')

### .exists()

In [16]:
# Return a boolean whether the path points to an existing file or directory

Path_1.exists()


True

In [17]:
pathlib.Path(r'/Users/Basile/Documents/Dauphine').exists()

True

In [18]:
pathlib.Path(r'/Users/Basile/Documents/Dauphine/MoreThan14InSto1').exists()

False

### .glob(x) 

In [19]:
sorted(Path_1.cwd().glob('*.txt'))

# here, using the method .cwd() which points at the current directory 



[PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile copy.txt'),
 PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile.txt')]

.glob(x) is can be used as a recursive method. Indeed, you can ask Python to search also in the other folders with the same ancestor :

In [20]:
sorted(Path_1.parents[1].glob('*/*.txt'))
# here : parents[1] points at the 'Python' folder, the ancestor of 
# the ancestor of 'testfile.txt'
# <*/*.txt> : you ask Python to search in all the folders in 'Python' to 
# to find all the files '.txt'



[PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_1/Another_Text_File.txt'),
 PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_1/testfile copy.txt'),
 PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile copy.txt'),
 PosixPath('/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/testfile.txt')]

### .mkdir()

In [21]:
# Create a folder at the given path (here Sub_Foler_3 does not exist yet !)
pathlib.Path(r'/Users/Basile/Documents/Dauphine/M2_203/Python/Session_5/Sub_Folder_3').mkdir()

### .open()

In [22]:
with Path_1.open() as f:
    print(f.readlines())
    
# Leaving the indentation block will automatically close the file

['Hi there !\n', '\n', 'Blabla']


### .read_text()

In [23]:
Path_1.read_text()

'Hi there !\n\nBlabla'

### .write_text(x) 

In [24]:
temp = Path_1.read_text()
Path_1.write_text(temp + '\nNew text')

# be careful with write_text : it will erase all the previous content in 
# the file ! That's why I use a temporary variable

27

### replace(x)

In [25]:
Path_1.parents[0].joinpath('testfile copy.txt').replace('Sub_Folder_1/testfile_v2.txt')

PosixPath('Sub_Folder_1/testfile_v2.txt')

# Conclusion

Pathlib is a library useful in order to manipulate easily some files and folders. In order to avoid mistakes due to different operating systems, using Pathlib will make your life easier and brighter ! 