# Understanding Pathlib for path handling

## Description 

This is a Note on path handling using `os.path` module, which I will cover these topic:
- specific path
- access to directory and list down content
- checking whether it's a file or directory
- File management like rename, remove, create,etc. 

### pathlib module

using `pathlib` you can use either one `import pathlib` or `from pathlib import Path`
- `import pathlib` then define as `pathlib.Path()`
- `from pathlib import Path` then define as `Path()`
- `pathlib.__all__`: list all the supported functions and attributes, like Path()

In this topic I will cover pathlib which is use on file handling. There is an alternative method os using the os.path library. Pathlib is write base on `os.path` module. You might seem some are similar some aren't. I will mention so os.path. If you want to know more with os.path, then you can refer my other note on os.path. The different between both are:
- pathlib uses object 
- os.path uses string 

### Structure of Path

This is the example directory I will be using, you can see the hierarchy structure of the directory and files look like

<a id='toc'></a>
## TOC (Table of Content):
* [import module](#library)
* [Part 1 Using os.path](#Part1)
    - [1. Show Current Directory](#1.cwp)
    - [2. Path Resolution](#2.PathResolution)
    - [3. Access Specific Directory](#3.AccessSpecificDirectory)
    - [4. Lists entries in a directory](#4.listcontent)
    - [5. Checking file/directory](#5.checkingfiledir)
    - [6. File extension](#6.fileExt)
    - [7. Matching Pattern Files and recursive](#7.matchingFileGlob)
* [Part 2 File and Directory Operations](#Part2)
    - [1. Create file/directory](#Create_fileDirectory)
    - [2. Remove Directory and File](#RemoveDir_File)
    - [3. Copying](#copying)
    - [4. Rename/moving](#RenameCopying)
    - [5. Read and write file](#ReadwriteFile)
* [Summary example](#Summary)

<a id='library'></a>
## import module [🔝](#toc)

In [243]:
#import library
import os
from pathlib import Path

#chekc python version 
from platform import python_version
print(python_version())

3.10.9


<a id='Part1'></a>
## Part 1 Using Pathlib [🔝](#toc)

- [1. Show Current Directory](#1.cwp)
    - [1.1 get current directory path](#1.1-get-current-directory-path)
        - `Path.cwd()`
    - [1.2 get default path](#1.2-get-default-path)
        - `Path.home()`: user's home directory
        - `Path("~").expanduser()`: Expands `~` to user's home directory
        - `Path("~/Pictures").expanduser()`: using specific directory of user's home directory
- [2. Path Resolution](#2.PathResolution)
	- [2.1 Path()](#2.1-Path())
		- `Path()`: same as `.`, which is current working directory
		- `Path('filename').absolute()`: convert a relative path to absolute path
	- [2.2 Resolving Relative Paths with](#2.2Resolving)
		- `Path('filename').resolve()`: converts the path to an absolute path
- [3. Access Specific Directory](#3.AccessSpecificDirectory)
	- [3.1 Access specific path](#3.1-Access-specific-path)
		- `Path(Path)`: assign specific directory or path
	- [3.2 Parent Directory Access](#3.2-Parent-Directory-Access)
		- `Path.parent()`: parent of the path (file or directory)
		- `Path.parents()`: sequence of all ancestors, indexed from parent upward
	- [3.3 Join Path](#3.3-Join-Path)
		- `Path.joinpath()`: Combines multiple paths into one path
		- `/` to join path : use `/` to combine new path
- [4. Lists entries in a directory](#4.listcontent)
	- `os.listdir()`: list content of working directory
	- `Path().iterdir()`: list content of working directory 
- [5. Checking file/directory](#5.checkingfiledir) 
	- [5.1 checking file type or exist](#5.1-checking-file-type-or-exist)
		- `Path(file/dir).exists()`
		- `Path(file).is_file()`
		- `Path(dir).is_dir()`
		- `Path(path).is_absolute()`
- [6. file extension](#6.fileExt)
	- [6.1 Path Attribute](#6.1-Path-Attribute)
		- `Path().name`:Gets the filename
		- `Path().suffix`: Gets the file extension 
		- `Path().stem`: Gets the filename without the extension 
		- `Path().parts`: Split into parts
		- `Path().anchor`: part of the path before the directories
	- [6.2 Parent Directory Access](#6.2-Parent-Directory-Access)
		- `Path.parent`:Returns the parent directory of the Path
		- `Path.parents`: sequence of all ancestors indexed from the immediate parent upward
	- [6.3 startswith() and endswith()](#6.3startswith_endswith)
- [7. Matching Pattern Files and recursive](#7.matchingFileGlob)
	- [7.1 Non-Recursive search](#7.1-Non-Recursive-search)
		- `Path.glob(pattern)`: search specific pattern string of path
	- [7.2 Recursive search](#7.2-Recursive-search)
		- `Path.rglob(pattern)`:search specific pattern string of path recursive
		- `*`*: Recursive Directory Searching: search specific pattern string of path recursive

<a id='1.cwp'></a>
### 1. Show Current Directory [🔙](#Part1)
You can use `Path.cwd()`, `Path.home()` or with alternative method with `os.getcwd()`

#### 1.1 get current directory path
Shows the full path of your current working directory, just like linux command pwd to check your full path.

##### -  `Path.cwd()`: 
Returns the current working directory as a Path object, alternative use os.getcwd()

In [19]:
#get your current working directory full path 
print(Path.cwd()) 

C:\Users\test\python_test


#### 1.2 get default path　

##### - `Path.home()`:Returns the user's home directory.

In [8]:
print(Path.home()) #C:\Users\User

C:\Users\test


##### - `Path("~").expanduser()`: Expands `~` to the user's home directory 
> - Linux (`~`): /home/username
> - Windows (`~`): C:\Users\username


In [21]:
print(Path("~").expanduser()) #C:\Users\User

C:\Users\test


##### - `Path("~/Pictures").expanduser()`
Returns a specific folder (e.g., "Pictures") in the user's home directory.

In [20]:
print(Path("~/Pictures").expanduser()) #C:\Users\test\Pictures

C:\Users\test\Pictures


<a id='2.PathResolution'></a>
### 2. Path Resolution  [🔙](#Part1)

If you want to check specfic file or directory's absolute Path you can use `Path()`, `Path().absolute`, and `Path().resolve` these function. The alternative method you can use `os.path.abspath` which also have the same effect. 

- Path(relativepath): Creates a Path object from a relative or absolute path. If you didn't provide path it refers to the current directory (".")
- Path().absolute():  Converts a relative path to an absolute path (but does not resolve symlinks).
- Path().resolve(): Converts a relative path to an absolute path and normalizes . or ...

#### 2.1 Path()
returns the absolute path of a specific Path instance. 

##### - `Path()`: 
Represents a path object without resolution, if you didn't provide path it refers to the current directory (`"."`). You can also call these `absolute` or `resolve` function as below:
> - `Path().absolute()`: Returns the absolute path as a Path object (does not resolve symbolic links).
> -  `Path().resolve()`: Returns the absolute path as a Path object (does not resolve symbolic links).

In [36]:
print(Path()) #. 
print(Path().absolute())# get absolute path  
print(Path().cwd()) # returns the current working directory
print(Path().resolve())# use resolve to show full path

.
C:\Users\test\python_test
C:\Users\test\python_test
C:\Users\test\python_test


##### - `Path('filename').absolute()` : 
assign specific directory and convert a relative path to absolute path(full path)

In [37]:
print(Path('plotpy.py'))#returns the absolute path of a specific Path instance
print(Path('plotpy.py').absolute())# display with full path  
print(Path('plotpy.py').resolve())# display with full path  

plotpy.py
C:\Users\test\python_test\plotpy.py
C:\Users\test\python_test\plotpy.py


<a id='2.2Resolving'></a>
#### 2.2 Resolving Relative Paths with `.` or `..`

##### - `Path('filename').resolve()`
> - `..`: This mean root or parent directory (linux is /, window is c:\)
> - `.`: current directory
> - `resolve`:  converts the path to an absolute path.

In [30]:
print(Path(".."))# Outputs: ..
print(Path("..").absolute()) # Outputs the absolute path but retains `.. 
print( Path("..").resolve()) # Resolves `..` path 

..
C:\Users\test\python_test\..
C:\Users\test


In [38]:

print(Path('../plotpy.py')) 
print(Path('../plotpy.py').absolute()) #absolute path remain .. 
print(Path('../plotpy.py').resolve()) #Resolves the full canonical path of the file

..\plotpy.py
C:\Users\test\python_test\..\plotpy.py
C:\Users\test\plotpy.py


<a id='3.AccessSpecificDirectory'></a>
### 3. Access Specific Directory [🔙](#Part1)

Understand different slash:
- forward slash(`\`): Need a raw string prefix (`r"\"`) to avoid errors, used for window path
- back slash(`\\`): Escape each backslash.
- double slash(`/`): Also valid and are commonly used in Linux, but pathlib supports them on all platforms.


#### 3.1 Access specific path

In [40]:
print(Path(r"C:\Users\User")) # Raw string with a single backslash
print(Path("C:\\Users\\User")) # Double backslash
print(Path("C:/Users/User")) # Forward slash

C:\Users\User
C:\Users\User
C:\Users\User


#### - `Path(Path)`

In [42]:
my_dir=Path(r"C:\demo_testfile\test")
print(my_dir)
print(my_dir.absolute())# return absolute path of my_dir, this is  use in linux path

C:\demo_testfile\test
C:\demo_testfile\test


#### 3.2 Parent Directory Access

You can check your Path's parent using `Path.parent` option, or `Path.parents` which will give all parent. The alternative of it is using `os.path.dirname(path)`. 

#### - `Path.parent()`
is access parent of the path (file or directory)

In [46]:
my_dir=Path(r"C:\demo_testfile\test")
my_file=Path(r"C:\demo_testfile\AAA.txt")
#parent direcotry 
print(my_dir)
print(f"parent directory: {my_dir.parent}") #Returns the parent directory.
print(f"parent directory of file: {my_file.parent}") #Returns the parent directory.


C:\demo_testfile\test
parent directory: C:\demo_testfile
parent directory of file: C:\demo_testfile


#### - `Path.parents()`
> - `Path.parents`: is a sequence of all ancestors, indexed from the immediate parent upward
> - `Path.parent`: is a direct property to access only the immediate parent of the path

In [48]:
#get the Nth parent folder pathlib
print(my_dir.parents[0]) #immediate parent
print(my_dir.parents[1]) #the root
print(my_dir.parent.absolute()) #show the parent path
print(my_dir.absolute().parent) #show the parent path

C:\demo_testfile
C:\
C:\demo_testfile
C:\demo_testfile


#### 3.3 Join Path

Combines multiple paths into one path, you can use joinpath(), to combine multiple path or use `/` to combine multiple path. The alterative of join path is `os.path.join()`

##### - `Path.joinpath()`:  join multiple path

In [53]:
#pathlib joinpath
newpath=Path("c:\\").joinpath("test") #c:/test
print(newpath)

c:\test


##### - `/` to join path(eg:`Path.cwd()/newpath`) :
Uses the / operator for path joining.

In [52]:
#pathlib join path with slash 
print(Path('C:\demo_testfile')/'file.txt')
print(Path('c:\\')/'file.txt')
print(Path.cwd()/"img")

C:\demo_testfile\file.txt
c:\file.txt
C:\Users\test\python_test\img


<a id='4.listcontent'></a>
### 4. Lists entries in a directory [🔙](#Part1)

This is like the OS cli command line: `ls -al` linux or `dir` window will list all file directory in current working directory. You can use:
> - `Path.iterdir()`  which return an **path object** 
> - `os.listdir()` which return **string**

These two method both list working directory content which is use often. 

#### - `os.listdir()`
List content for all files and directories in the specified directory.

In [2]:
import os 
print(os.listdir()) #default current directory
#specific directory
print(os.listdir(r'c:'))

['.ipynb_checkpoints', 'FileMatching_glob_fnmatch.ipynb', 'iterate_filter_file.ipynb', 'os.path.ipynb', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib.ipynb']
['.ipynb_checkpoints', 'FileMatching_glob_fnmatch.ipynb', 'iterate_filter_file.ipynb', 'os.path.ipynb', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib.ipynb']


#### - `Path().iterdir()`
Returns a generator for all files and directories in the specified directory.

In [62]:
#print(list(Path().iterdir())) #list wordking directory
print(list(Path('C:\\tmp').iterdir( ))) #list specific working directory

[WindowsPath('C:/tmp/git'), WindowsPath('C:/tmp/powerlevel10k'), WindowsPath('C:/tmp/powerlevel10k_full'), WindowsPath('C:/tmp/takuya.omp.json')]


In [60]:
#Iterate current directory, and filter only list file that are directory 
for p in Path().iterdir():
    #filter directory under current directory 
    #if p.is_dir():
    print(p) 

.ipynb_checkpoints
Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
iterate_filter_file.ipynb
jupterNote.ipynb
numpy.ipynb
os.path_reorganize.ipynb
Pandas
Pathlib Basics(orgainize_final).ipynb
Pathlib_reorganize.ipynb
tmp.ipynb
Understanding File Pattern Matching with Glob and Fnmatch.ipynb


In [78]:
#list comprensive method
print([p for p in Path().iterdir() if p.is_dir() ])

[WindowsPath('.ipynb_checkpoints'), WindowsPath('DataType'), WindowsPath('done_toremove'), WindowsPath('elog_example'), WindowsPath('hahow_scrap_pandas_mathplot'), WindowsPath('Pandas')]


In [75]:
#filter file that is .pynb
for file in Path().iterdir():
    if str(file).endswith('.ipynb'):
        print(file)

Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb
iterate_filter_file.ipynb
jupterNote.ipynb
numpy.ipynb
os.path_reorganize.ipynb
Pathlib Basics(orgainize_final).ipynb
Pathlib_reorganize.ipynb
tmp.ipynb
Understanding File Pattern Matching with Glob and Fnmatch.ipynb


In [73]:
#filter except file condition, display other than .pynb file type
for file in Path().iterdir():
    if str(file).endswith('.ipynb'):
        continue 
    print(file)

.ipynb_checkpoints
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
Pandas


<a id='5.checkingfiledir'></a>
### 5. Checking file/directory [🔙](#Part1)

This is useful to check whether your file exist or check is it a file or directory. This is normally use to check with working directory to filter specfic file or directory that exist or not. You can use `exists` to check file exist or not, `is_file` or `is_dir` is it file or directory. The alternative method you can also use are `exists`, `isfile`, `isdir`. 


#### 5.1 checking file type or exist 
You can use this to check your file type is it a file or directory, or even does it exist. 

In [82]:
filename='index.html'
f=Path(filename).absolute()
print(f'path: {f}')
print(f"check file or directory exist: {f.exists()}") 
print(f"checking file exist: {f.is_file()}") 
print(f"checking not exist directory: {f.is_dir()}") 
print(f"checking exist directory: {(Path.cwd()/ 'notes').is_dir()}")
print(f"checking absolute path: {f.is_absolute()}") 

path: C:\Users\test\python_test\index.html
check file or directory exist: False
checking file exist: False
checking not exist directory: False
checking exist directory: False
checking absolute path: True


- ##### `Path(file/dir).exists()`:
Checks if the path exists.

In [107]:
path2 = Path("C:\\demo_testfile")
if path2.exists():
    print("File exists")
else:
    print("File does not exist")

File exists


In [115]:
for item in path2.iterdir():  # Iterate through items in the directory
    #if item.exists():  # Check if the item is a file
    print(item, "result: ", item.exists())  # Print the full path

C:\demo_testfile\AAA.txt result:  True
C:\demo_testfile\data result:  True
C:\demo_testfile\datanew result:  True
C:\demo_testfile\ex1.py result:  True
C:\demo_testfile\ex2.py result:  True
C:\demo_testfile\hell[o].txt result:  True
C:\demo_testfile\index.html result:  True
C:\demo_testfile\notes result:  True
C:\demo_testfile\page.html result:  True
C:\demo_testfile\read.txt result:  True
C:\demo_testfile\renamefile result:  True
C:\demo_testfile\test result:  True
C:\demo_testfile\test -copy result:  True
C:\demo_testfile\test.txt result:  True
C:\demo_testfile\testFile1.txt result:  True


- ##### `Path(file).is_file()`
Checks if the path is a file.

In [116]:
for item in path2.iterdir():  # Iterate through items in the directory
    if item.is_file():  # Check if the item is a file
        print(item)  # Print the full path

C:\demo_testfile\AAA.txt
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py
C:\demo_testfile\hell[o].txt
C:\demo_testfile\index.html
C:\demo_testfile\page.html
C:\demo_testfile\read.txt
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt


- ##### `Path(dir).is_dir()`
Checks if the path is a directory

In [119]:
for item in path2.iterdir():  # Iterate through items in the directory
    if item.is_dir():  # Check if the item is a file
        print(item)  # Print the full path

C:\demo_testfile\data
C:\demo_testfile\datanew
C:\demo_testfile\notes
C:\demo_testfile\renamefile
C:\demo_testfile\test
C:\demo_testfile\test -copy


- ##### `Path(path).is_absolute()`
Checks if the path is an absolute path.

In [146]:
filename='index.html'
print(f"path:{filename}, checking path is abosolute path => {Path(filename).is_absolute()}")
abspath=Path(filename).absolute()
print(f"path:{abspath}, checking path is abosolute path => {abspath.is_absolute()}")

path:index.html, checking path is abosolute path => False
path:C:\Users\test\python_test\index.html, checking path is abosolute path => True


<a id='6.fileExt'></a>
### 6. file extension [🔙](#Part1)

#### 6.1 Path Attribute:
- `.name`: Gets the filename(like os.path.basename)
- `.suffix`: Gets the file extension (like os.path.splitext)
- `.stem`: Gets the filename without the extension (like os.path.splitext)
- `.parts`: Split into parts
- `.anchor`: The part of the path before the directories (like os.path.splitdrive)

##### - `Path().name`
Gets the filename(like os.path.basename)

In [149]:
my_file2=Path(r"C:\demo_testfile\file_1.txt")
my_dir=Path(r"C:\demo_testfile")
print(f"Path().name for file: {my_file2.name}")
print(f"Path().name for dir: {my_dir.name}")

Path().name for file: file_1.txt
Path().name for dir: demo_testfile


In [285]:
#list all the file    
directory = Path()  # Convert to a Path object
directories = [entry.name for entry in directory.iterdir()]
print(directories)
print('='*80)
#list only directory
directories = [entry.name for entry in directory.iterdir() if entry.is_dir()]
print(directories)

['.ipynb_checkpoints', 'Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb', 'DataType', 'done_toremove', 'elog_example', 'hahow_scrap_pandas_mathplot', 'iterate_filter_file.ipynb', 'jupterNote.ipynb', 'numpy.ipynb', 'os.path_reorganize.ipynb', 'Pandas', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib_reorganize.ipynb', 'test123', 'tmp.ipynb', 'Understanding File Pattern Matching with Glob and Fnmatch.ipynb']
['.ipynb_checkpoints', 'DataType', 'done_toremove', 'elog_example', 'hahow_scrap_pandas_mathplot', 'Pandas', 'test123']


##### - `Path().suffix`
Gets the file extension (like `os.path.splitext`)

In [160]:
#Path.suffix: get the file name wiithout extenstion
print(f"Path for file: {my_file2}")
print(f"suffix for file => {my_file2.suffix}")
print(f"Path for dir: {my_dir}")
print(f"suffix for dir => {my_dir.suffix}") #empty because it's directory no ext

Path for file: C:\demo_testfile\file_1.txt
suffix for file => .txt
Path for dir: C:\demo_testfile
suffix for dir => 


##### - `Path().stem`
Gets the filename without the extension (like os.path.splitext)

In [161]:
#Path.stem: get only file name no ext
print(f"Path for file: {my_file2}")
print(f"stem for file => {my_file2.stem}")
print(f"Path for dir: {my_dir}")
print(f"stem for dir => {my_dir.stem}")

Path for file: C:\demo_testfile\file_1.txt
stem for file => file_1
Path for dir: C:\demo_testfile
stem for dir => demo_testfile


##### -  `Path().parts`
Split Path into parts, is cleaner and more structured than manually splitting the string. The alternative is `os.path.splitdrive(path)` 

In [169]:
my_file2=Path(r"C:\demo_testfile\file_1.txt")
my_dir=Path(r"C:\demo_testfile")

print(f"File path: {my_file2} => {my_file2.parts}")
print(f"Dir path: {my_dir} => {my_dir.parts}")

File path: C:\demo_testfile\file_1.txt => ('C:\\', 'demo_testfile', 'file_1.txt')
Dir path: C:\demo_testfile => ('C:\\', 'demo_testfile')


##### - `Path().anchor`
The root (directory) part of the part, like `c:\test` anchor will get `c:\` which is the root of the path. The alternative is `os.path.splitdrive(path)[0]`

In [171]:
print(f"File path: {my_file2} => {my_file2.anchor}")
print(f"Dir path: {my_dir} => {my_dir.anchor}")

File path: C:\demo_testfile\file_1.txt => C:\
Dir path: C:\demo_testfile => C:\


#### 6.2 Parent Directory Access

##### - `Path.parent:`

Returns the parent directory of the Path.

In [182]:
my_dir=Path(r"C:\demo_testfile\test")
my_file=Path(r"C:\demo_testfile\AAA.txt")
print(f"dir Path: {my_dir}, \tparent dir=> {my_dir.parent}") #Returns the parent directory.
print(f"file Path: {my_file}, \tparent dir=> {my_file.parent}") #Returns the parent directory.


dir Path: C:\demo_testfile\test, 	parent dir=> C:\demo_testfile
file Path: C:\demo_testfile\AAA.txt, 	parent dir=> C:\demo_testfile


In [186]:
print(f'File path: {my_file}')
print(my_file.parent.absolute()) #show the parent path of the file
print(my_file.absolute().parent) #show the parent path of the file

File path: C:\demo_testfile\AAA.txt
C:\demo_testfile
C:\demo_testfile


##### - `Path.parents`:
is a sequence of all ancestors, indexed from the immediate parent upward. Returns all parent directories as a list.


In [194]:
#get the Nth parent folder
print(f'File path: {my_file}')
print(my_file.parents[0]) # parent of file
print(my_file.parents[1]) #the root of the file

File path: C:\demo_testfile\AAA.txt
C:\demo_testfile
C:\


In [201]:
my_file2=Path(r"C:\demo_testfile\test\AAA.txt")
print(f'File path: {my_file2}')

print(my_file2.parents[0]) # parent of file
print(my_file2.parents[1]) #the parent of directory
print(my_file2.parents[2]) #the root of the file

File path: C:\demo_testfile\test\AAA.txt
sub-directory index: 3
C:\demo_testfile\test
C:\demo_testfile
C:\


In [209]:
#determine how many parent levels are available
my_file2=Path(r"C:\demo_testfile\test\AAA.txt")
print(f"File path: {my_file2}")

print("Method1" .center(50, '#'))
print(f"Total parent levels: {len(my_file2.parents)}")  # Count available parent levels


print("Method2" .center(50, '#'))

# Iterate through all available parents
for i, parent in enumerate(my_file2.parents):
    print(f"Parent {i}: {parent}")

File path: C:\demo_testfile\test\AAA.txt
#####################Method1######################
Total parent levels: 3
#####################Method2######################
Parent 0: C:\demo_testfile\test
Parent 1: C:\demo_testfile
Parent 2: C:\


<a id='6.3startswith_endswith'></a>
#### 6.3 startswith() and endswith(): 
In pathlib you can use the attribute as above, however the alternative method you can use `startswith` or `endswith` string methods, that are commonly use with os.path when filtering files by extension or prefix. 

> - startswith(): file substring/suffix starts with a specific string
> - endswith(): file substring/suffix ends with a specific string

##### - `startswith('directory/file')`:
Checks if a filename ends with a specific extension and return True is find, else return False


In [134]:
filename_check='index.html'
dir_check='test'
print(f"check file startswith : {filename_check.startswith('index')}")
print(f"check directory startswith : {dir_check.startswith('tes')}")

check file startswith : True
check directory startswith : True


##### - `endswith('directory/file|')`:
Checks if a filename starts with a specific prefix and return True is find, else return False


In [135]:
filename_check='index.html'
dir_check='test'
print(f"check file endswith : {filename_check.endswith('.html')}")
print(f"check directory endswith : {dir_check.endswith('')}")

check file endswith : True
check directory endswith : True


In [212]:
path = Path("C:/demo_testfile")

for file in path.iterdir():
    if file.suffix == ".txt":  # Equivalent to endswith(".txt")
        print(f"Text file: {file.name}")

    if file.name.startswith("report_"):
        print(f"Report file: {file.name}")

Text file: AAA.txt
Text file: hell[o].txt
Text file: read.txt
Text file: test.txt
Text file: testFile1.txt


<a id='7.matchingFileGlob'></a> 
### 7. Matching Pattern Files and recursive [🔙](#Part1)

In this part I will mention a basic of using `glob` or `rglob`, however I will have another note more about the glob and matching file. Please keep in mind `glob.glob` and `Pathlib.glob` are two different module, please keep in mind. 


**pathlib.glob syntax:**
> - pathlib.Path.glob(): Non-Recursive search
> - pathlib.Path.rglob(): recursive search
> **Return value**
>> as generative, which is memory-efficient way to produce a sequence of values.
>> `print(new_dir.glob("*")) #<generator object Path.glob at 0x000001B0AD66DA80>`

**Wildcard Pattern:**
- `*`: **Wildcards** Matches zero or more characters (e.g., *.txt matches file.txt, notes.txt).
    - `*/**`: recusrive searching, match subdirectory 
- `?`: **Single Character Wildcard** Matches exactly one character (e.g., test?? matches test12, testAB; h?t matches hat, hit, hot).
- `[]`: **Character Ranges** Matches any single character inside the brackets (e.g., b[ae]ll matches ball, bell).
        - `-`: Specifies a range of characters to match (e.g., [a-z] matches any lowercase letter; b[a-c]d matches bad, bbd, bcd).

[Glob Documentation Link](https://docs.python.org/3/library/glob.html)

#### 7.1 Non-Recursive search 

##### - `Path.glob(pattern)`

In [569]:
#Iterate over the generator and list all result, * mean all content in current directory
for p in Path().glob("*"):
    print(p)

.ipynb_checkpoints
Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb
data123
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
iterate_filter_file.ipynb
jupterNote.ipynb
newtest
numpy.ipynb
os.path_reorganize.ipynb
Pandas
Pathlib Basics(orgainize_final).ipynb
Pathlib_reorganize.ipynb
test.txt
test123
testdir1
testFile1.txt
tmp.ipynb
Understanding File Pattern Matching with Glob and Fnmatch.ipynb


In [575]:
#list content like just like above 
new_dir = Path(r"C:\demo_testfile")  #assign specfic path 
allfiles = current_dir.glob('*')
for file in allfiles:
    print(file)#

.ipynb_checkpoints
Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb
data123
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
iterate_filter_file.ipynb
jupterNote.ipynb
newtest
numpy.ipynb
os.path_reorganize.ipynb
Pandas
Pathlib Basics(orgainize_final).ipynb
Pathlib_reorganize.ipynb
test.txt
test123
testdir1
testFile1.txt
tmp.ipynb
Understanding File Pattern Matching with Glob and Fnmatch.ipynb


In [592]:
# filter single file type
print("Filter txt file type" .center(50, '#')) 
new_dir = Path(r"C:\demo_testfile")
singlefiles = new_dir.glob("*.txt")
print('single file type')
for p in singlefiles:
    print(p)
print("Filter multiple file type" .center(50, '#'))   
#filter multiple file type:
files = list(new_dir.glob("*.txt")) + list(new_dir.glob("*py*"))
for p in files:
    print(p)

###############Filter txt file type###############
single file type
C:\demo_testfile\BBB.txt
C:\demo_testfile\hell[o].txt
C:\demo_testfile\read.txt
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt
############Filter multiple file type#############
C:\demo_testfile\BBB.txt
C:\demo_testfile\hell[o].txt
C:\demo_testfile\read.txt
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py
C:\demo_testfile\test -copy


In [598]:
#Find all files starting with "data" in the current directory
for file_path in  new_dir.glob("ex*"):
    print(file_path) 

C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py


In [599]:
# Find all files with a three character filename and the ".py" extension
for single_char_py_files in  new_dir.glob("???.py"):
    print(single_char_py_files) 

C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py


#### 7.2 Recursive search 

#### - `Path.rglob(pattern)`
Recursively matches file patterns (returns a generator). Example as below
```python
list(Path.rglob(pattern))# Converts the generator to a list of matches.
```
> - glob uses `**` within the pattern for recursive searching.
> - pathlib uses the `rglob()` method for recursive searching.


In [626]:
#search throught all subdirectory 
for p in testdirectory.rglob("*"):
    print(p)

C:\demo_testfile\BBB.txt
C:\demo_testfile\data
C:\demo_testfile\datanew
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py
C:\demo_testfile\hell[o].txt
C:\demo_testfile\index.html
C:\demo_testfile\notes
C:\demo_testfile\page.html
C:\demo_testfile\read.txt
C:\demo_testfile\renamefile
C:\demo_testfile\test
C:\demo_testfile\test -copy
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt
C:\demo_testfile\data\copy2_AAA.txt
C:\demo_testfile\data\copy2_ex1.py
C:\demo_testfile\data\copy2_ex2.py
C:\demo_testfile\data\copy2_hell[o].txt
C:\demo_testfile\data\copy2_index.html
C:\demo_testfile\data\copy2_page.html
C:\demo_testfile\data\copy_AAA.txt
C:\demo_testfile\data\copy_ex1.py
C:\demo_testfile\data\copy_ex2.py
C:\demo_testfile\data\copy_file - with - space-1.py
C:\demo_testfile\data\copy_file - with - space-1.py.bk
C:\demo_testfile\data\copy_hell[o].txt
C:\demo_testfile\data\copy_index.html
C:\demo_testfile\data\copy_page.html
C:\demo_testfile\datanew\copy2_page.html
C:\demo_testfile\datanew

In [609]:
#recursive sub directory and filter csv file
for p in testdirectory.rglob("*.csv"):
    print(p)

C:\demo_testfile\notes\billboard_ratings.csv
C:\demo_testfile\notes\billboard_songs.csv
C:\demo_testfile\notes\billboard_songs2023.csv
C:\demo_testfile\test\data\billboard.csv
C:\demo_testfile\test\data\concat_1.csv
C:\demo_testfile\test\data\concat_2.csv
C:\demo_testfile\test\data\concat_3.csv
C:\demo_testfile\test\data\country_timeseries.csv
C:\demo_testfile\test\data\pew.csv
C:\demo_testfile\test\data\survey_person.csv
C:\demo_testfile\test\data\survey_site.csv
C:\demo_testfile\test\data\survey_survey.csv
C:\demo_testfile\test\data\survey_visited.csv
C:\demo_testfile\test\data\weather.csv
C:\demo_testfile\test -copy\data\billboard.csv
C:\demo_testfile\test -copy\data\concat_1.csv
C:\demo_testfile\test -copy\data\concat_2.csv
C:\demo_testfile\test -copy\data\concat_3.csv
C:\demo_testfile\test -copy\data\country_timeseries.csv
C:\demo_testfile\test -copy\data\pew.csv
C:\demo_testfile\test -copy\data\survey_person.csv
C:\demo_testfile\test -copy\data\survey_site.csv
C:\demo_testfile\te

In [627]:
#multiple pattern         
for file_path in testdirectory.rglob("*"):
    if file_path.suffix in {".py", ".txt"}:
        print(file_path)   

C:\demo_testfile\BBB.txt
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py
C:\demo_testfile\hell[o].txt
C:\demo_testfile\read.txt
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt
C:\demo_testfile\data\copy2_AAA.txt
C:\demo_testfile\data\copy2_ex1.py
C:\demo_testfile\data\copy2_ex2.py
C:\demo_testfile\data\copy2_hell[o].txt
C:\demo_testfile\data\copy_AAA.txt
C:\demo_testfile\data\copy_ex1.py
C:\demo_testfile\data\copy_ex2.py
C:\demo_testfile\data\copy_file - with - space-1.py
C:\demo_testfile\data\copy_hell[o].txt
C:\demo_testfile\renamefile\data\copy2_file - with - space-1.py
C:\demo_testfile\renamefile\data\copy_file - with - space-1.py
C:\demo_testfile\renamefile\data_copy\copy2_file - with - space-1.py
C:\demo_testfile\renamefile\data_copy\copy_file - with - space-1.py
C:\demo_testfile\renamefile\data_copyfile22\file - with - space-1.py
C:\demo_testfile\test\1.txt
C:\demo_testfile\test\2.txt
C:\demo_testfile\test -copy\1.txt
C:\demo_testfile\test -copy\2.txt


In [625]:
# skip specific directory
SKIP_DIRS = ["notes", "data","renamefile"]
directory = Path('c:\demo_testfile')
#With a for loop
for item in directory.rglob("*"):
     if set(item.parts).isdisjoint(SKIP_DIRS):
        print(item)

c:\demo_testfile\BBB.txt
c:\demo_testfile\datanew
c:\demo_testfile\ex1.py
c:\demo_testfile\ex2.py
c:\demo_testfile\hell[o].txt
c:\demo_testfile\index.html
c:\demo_testfile\page.html
c:\demo_testfile\read.txt
c:\demo_testfile\test
c:\demo_testfile\test -copy
c:\demo_testfile\test.txt
c:\demo_testfile\testFile1.txt
c:\demo_testfile\datanew\copy2_page.html
c:\demo_testfile\datanew\copy_page.html
c:\demo_testfile\test\1.txt
c:\demo_testfile\test\2.txt
c:\demo_testfile\test -copy\1.txt
c:\demo_testfile\test -copy\2.txt


##### - `**`: Recursive Directory Searching 
This is another recursive method using wildcards pattern (`**/*`), which work just like `Path.rglob`.

In [604]:
for fileref in Path('C:/demo_testfile/').glob('**/*.csv'):
    filename = str(fileref)
    print(filename)

C:\demo_testfile\notes\billboard_ratings.csv
C:\demo_testfile\notes\billboard_songs.csv
C:\demo_testfile\notes\billboard_songs2023.csv
C:\demo_testfile\test\data\billboard.csv
C:\demo_testfile\test\data\concat_1.csv
C:\demo_testfile\test\data\concat_2.csv
C:\demo_testfile\test\data\concat_3.csv
C:\demo_testfile\test\data\country_timeseries.csv
C:\demo_testfile\test\data\pew.csv
C:\demo_testfile\test\data\survey_person.csv
C:\demo_testfile\test\data\survey_site.csv
C:\demo_testfile\test\data\survey_survey.csv
C:\demo_testfile\test\data\survey_visited.csv
C:\demo_testfile\test\data\weather.csv
C:\demo_testfile\test -copy\data\billboard.csv
C:\demo_testfile\test -copy\data\concat_1.csv
C:\demo_testfile\test -copy\data\concat_2.csv
C:\demo_testfile\test -copy\data\concat_3.csv
C:\demo_testfile\test -copy\data\country_timeseries.csv
C:\demo_testfile\test -copy\data\pew.csv
C:\demo_testfile\test -copy\data\survey_person.csv
C:\demo_testfile\test -copy\data\survey_site.csv
C:\demo_testfile\te

<a id='Part2'></a> 
## Part 2 File and Directory Operations  [🔝](#toc)

- [1. Create file/directory](#Create_fileDirectory)
    - [1.1 File](#1.1-File)
        - `Path().touch()`
    - [1.2 Directory](#1.2-Directory)
        - `Path().mkdir()`
        - `Path().mkdir(exist_ok=True)`
        - `Path().mkdir(parents=True, exist_ok=True)`
- [2. Remove Directory and File](#RemoveDir_File)
    - [2.1 Remove Empty Directory](#2.1-Remove-Empty-Directory)
        - `Path.rmdir()`: remove empty directory
    - [2.2 Remove non-empty directory](#2.2-Remove-non-empty-directory)
        - `shutil.rmtree(path)`: remove non empty directory
    - [2.3 Remove File](#2.3-Remove-File) 
        - `Path.unlink()`
        - `send2trash()`:
- [3. Copying](#copying)
    - [3.1 File](#3.1-File)
        - `copy()`: copy file without meta data
        - `copy2()`: copy file with meta data
        - `stat()`: show file information description
    - [3.2 Directory](#3.2-Directory)
        - `shutil.copytree(src, dst)`
- [4. Rename/moving](#RenameCopying)
    - `shutil.rename(src, dst)`:rename file/directory
    - `shutil.move(src, dst)`:move file/directory
- [5. Read and write file](#ReadwriteFile)
    - [5.1 using pathlib](#5.1-using-pathlib)
        - `Path().write_text`: write file
        - `Path().read_text`: read file
    - [5.2 using with built-in](#5.2-using-with-built-in)
        - `with open(filename, mode)`: using with built-in read and write

<a id='Create_fileDirectory'></a> 
### 1. Create file/directory [🔙](#Part2)

#### 1.1 File 

#### - `Path().touch()`
Create an empty file 

In [396]:
new_file=Path.cwd()/"testFile.txt" 
new_file.touch()
print(new_file.exists())
for item in Path().glob("*.txt"):  # Use glob('*') for non-recursive
    print(item)

True
testFile.txt


#### 1.2 Directory 

##### - `Path('directoryname').mkdir()`
Creates non exist directory at the specified path, if directory exist will occur ERROR. alternative is `os.makedir()`

In [251]:
#mkdir method 1:
pathdir = Path("new")  # Create a Path object
pathdir.mkdir()  # Create directory (if not exists)

In [259]:
#or method2
newfolder=Path.cwd()/"new2" #Creates a directory 
newfolder.mkdir()

In [261]:
#using exception to check directory exist or not
filecreate=Path('new')
try:
    filecreate.mkdir()
except FileExistsError as ex:
    print(ex)

[WinError 183] 當檔案已存在時，無法建立該檔案。: 'new'


In [252]:
# Check if the directory exists
if pathdir.exists() and pathdir.is_dir():
    print(f"Directory '{pathdir}' was created successfully.")
else:
    print(f"Failed to create directory '{pathdir}'.")

Directory 'new' was created successfully.


In [287]:
# List all files and folders recursively
for item in Path().glob("*"):  # Use glob('*') for non-recursive
    print(item)

.ipynb_checkpoints
Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
iterate_filter_file.ipynb
jupterNote.ipynb
numpy.ipynb
os.path_reorganize.ipynb
Pandas
Pathlib Basics(orgainize_final).ipynb
Pathlib_reorganize.ipynb
test123
tmp.ipynb
Understanding File Pattern Matching with Glob and Fnmatch.ipynb


##### - `Path('directoryname').mkdir(exist_ok=True)`
> - `exist_ok=True`: prevents errors if the directory already exists.


In [255]:
pathdir = Path("new")  # Create a Path object
#since new is create, if you want to create again it need to use exist_ok=True else will occur error
pathdir.mkdir(exist_ok=True)  # Create directory (if not exists)

##### - `Path('directoryname').mkdir(parents=True, exist_ok=True)`
- create recursively directory, if parent not exist will raise error, alternative `os.makedirs()`
    > - `parents=True`: ensures all missing parent directories are created.
    > - `exist_ok=True`:prevents errors if the directory already exists.
    

In [318]:
directory = Path()  # Convert to a Path object
directories = [entry.name for entry in directory.iterdir() if entry.is_dir()]
directories
print(directories)

['.ipynb_checkpoints', 'DataType', 'done_toremove', 'elog_example', 'hahow_scrap_pandas_mathplot', 'Pandas', 'test123']


In [339]:
path = Path('test123')
# Check if there are subdirectories before creation
subdirs = [subdir for subdir in path.rglob('*') if subdir.is_dir()]
if not subdirs:
    print('empty subdirectory')
else:
    for subdir in subdirs:
        print(subdir)

test123\emptydir
test123\emptydir\emptydir2


In [340]:
# Create new subdirectories
emptysubdir = path / 'emptydir' / 'emptydir2'
emptysubdir.mkdir(parents=True, exist_ok=True)

# Display subdirectories after creation
print("\nAfter creating subdirectories:")
for subdir in path.rglob('*'):
    if subdir.is_dir():
        print(subdir)


After creating subdirectories:
test123\emptydir
test123\emptydir\emptydir2


In [387]:
#without storing in list 
path = Path('test123')
found = False  # Track if any subdirectory is found

for item in Path('test123').rglob('*'):
    if item.is_dir():
        print(item)
        found = True  # Mark as found
  
if not found:
    print("Empty subdirectory")
    emptysubdir = path / 'emptydir' / 'emptydir2'
    emptysubdir.mkdir(parents=True, exist_ok=True)
                      
# After creation, check again
if any(item.is_dir() for item in path.rglob('*')):
    print("Subdirectory exists now.")
else:
    print("Still no subdirectories.")    

test123\emptydir
test123\emptydir\emptydir2
Subdirectory exists now.


<a id='RemoveDir_File'></a> 
### 2. Remove Directory and File [🔙](#Part2)

#### 2.1 Remove Empty Directory

##### - `Path.rmdir()`: 
Remove **empty directory**, if directory contain file or other directories will raise Error

In [408]:
# rmdir: remove empty directory 
newdir=Path.cwd()/'test123'
if not newdir.exists(): #if directory not exist create it
    newdir.mkdir()
print("======check directory exist====")      
print(Path('test123').exists())       
print("======Remove directory=====")        
Path('test123').rmdir()
print(Path('test123').exists())

True
False


In [410]:
#check remove non empty directory occur error
print("======Create directory and file=====")       
newdir=Path.cwd()/'test123'
newdir.mkdir()
newfile = newdir / 'test.py'  
newfile.touch()  # Create the file
print("======check directory and file exist====")
print(f'check dir exist:{newdir.exists()}')
print(f'check file exist:{newfile.exists()}')
print("======Remove directory=====")        
Path('test123').rmdir()

check dir exist:True
check file exist:True


OSError: [WinError 145] 目錄不是空的。: 'test123'

In [411]:
# adding exception msg to avoid raise error 
try:
    Path('test123').rmdir()
except OSError as e:
    print('Error, seem like your directory is not empty')

Error, seem like your directory is not empty


<a id='copying'></a>
#### 2.2 Remove non-empty directory

##### - `shutil.rmtree(path)`: 
From above not able to remove non-empty director. `shutil` allow you to remove **non-empty directories** and their contents (files and subdirectories) **recursively**. It can delete a whole directory tree, regardless of whether it contains files or subdirectories.

> - import shutil module: `import shutil`

In [419]:
import shutil
for item in newdir.rglob('*'):
    print(item)
        
print(f'check dir exist:{newdir.exists()}')
shutil.rmtree(newdir)
print(f'check dir exist:{newdir.exists()}')


C:\Users\test\python_test\test123\test.py
check dir exist:True
check dir exist:False


#### 2.3 Remove File
Delete a file from the filesystem, if the file does not exist, they will raise a `FileNotFoundError`

#####  - ` Path.unlink()`: 
only remove **individual files**,is faster and more direct, but offers no recovery option. Alternative is using `os.remove()`. If you want to remove multiple file, please use the add `unlink` into loop or use `shutil`.

> - Soft Delete: Moves the file to the Recycle Bin or Trash, allowing for potential recovery
> - User-Friendly: Provides a safety net for accidental deletions


In [421]:
# remove empty file
new_file=Path.cwd()/"testFile.txt" 
new_file.touch()
print(f'check file exist: {new_file.exists()}')#True
new_file.unlink() #pathlib
#os.remove(new_file) #os.module
print(f'check file exist: {new_file.exists()}')#True

check file exist: True
check file exist: False


#####  - ` send2trash()`: 
is safer and more user-friendly

> - Hard Delete: Permanently removes the file from the system.
> - No Recovery: Once deleted, the file cannot be restored.
> - Built-in to Python's os module
> - install module: ` pip install send2trash`
> - module: `import send2trash as s2t`

In [8]:
import os, time
from pathlib import Path
import send2trash as s2t
new_file=Path(r'C:\\demo_testfile\\deletefile.txt')
new_file.touch()
os.listdir(new_file.parent)

['AAA.txt',
 'data',
 'datanew',
 'deletefile.txt',
 'ex1.py',
 'ex2.py',
 'hell[o].txt',
 'index.html',
 'notes',
 'page.html',
 'read.txt',
 'renamefile',
 'test',
 'test -copy',
 'test.txt',
 'testFile1.txt']

In [9]:
# delete single file
import send2trash as s2t
try:
    s2t.send2trash(new_file)
    print ('Job Done')
except:
    print('Can\'t delete that file')
os.listdir(new_file.parent)

Job Done


['AAA.txt',
 'data',
 'datanew',
 'ex1.py',
 'ex2.py',
 'hell[o].txt',
 'index.html',
 'notes',
 'page.html',
 'read.txt',
 'renamefile',
 'test',
 'test -copy',
 'test.txt',
 'testFile1.txt']

In [None]:
#search specfic file and delete 
target ='C:\\demo_testfile'

for x in os.listdir(target):
    if x.endswith('.css'):
        #need to spefic full path
        s2t.send2trash(target + x)

<a id='copying'></a>
### 3. Copying [🔙](#Part2)

#### 3.1 File
Copies file to a new location

> - `copy` or `copy1()`: single file 
Copies **only the file's content**. It creates a new file at the destination with **default metadata** (e.g., current timestamp, default permissions). 
>> ex: `shutil.copy("source.txt", "destination.txt")`

>  - `copy2`: single file with metadata (timestamps, permissions)
Copies **both the content and metadata** of the file. The destination file will **retain the same timestamps, permissions, and ownership** as the original file.
>> ex: `shutil.copy2("source.txt", "destination.txt")`

> - `os.stat`
check **metadeta** of file information

In [423]:
path=Path(r'C:\demo_testfile\renamefile')
datadir= path / 'data'
# Create the data directory if it doesn't exist
if not datadir.exists():
    datadir.mkdir(exist_ok=True)
    
for file in os.listdir(path):
    # Skip unwanted files or directories
    if file in ['__pycache__', 'data','.idea']:
        continue 
    # Full path to the source file
    source_file = path / file
    print(source_file) #C:\demo_testfile\renamefile\file....
    if source_file.is_file():
        # Paths for copied files
        copy_path = datadir / f"copy_{file}"
        copy2_path = datadir / f"copy2_{file}"
        
    # Copy the file
    shutil.copy(source_file, copy_path)  # Copy without metadata
    shutil.copy2(source_file, copy2_path)  # Copy with metadata
    
    #print(f"Copied {source_file} to {copy_path} and {copy2_path}")

    # Display timestamps using os.stat
    original_stats = os.stat(source_file)
    copy_stats = os.stat(copy_path)
    copy2_stats = os.stat(copy2_path)

    print(f"File: {file}")
    print(f"  Original Modified Time: {original_stats.st_mtime}")
    print(f"  Copy Modified Time:     {copy_stats.st_mtime}")
    print(f"  Copy2 Modified Time:    {copy2_stats.st_mtime}")
    print()

C:\demo_testfile\renamefile\file - with - space-1.py
File: file - with - space-1.py
  Original Modified Time: 1736910459.6833546
  Copy Modified Time:     1740037415.2194378
  Copy2 Modified Time:    1736910459.6833546

C:\demo_testfile\renamefile\file - with - space-1.py.bk
File: file - with - space-1.py.bk
  Original Modified Time: 1736910459.6833546
  Copy Modified Time:     1740037415.229438
  Copy2 Modified Time:    1736910459.6833546



#### - `stat()`
check file detail 

In [425]:
copypath=Path(r'C:\demo_testfile\renamefile\data')
for file in copypath.iterdir():
    if file.is_file():
        print(f"File: {file.name}")
        print(f"File stat: {file.stat()}")

File: copy2_file - with - space-1.py
File stat: os.stat_result(st_mode=33206, st_ino=43347146413752563, st_dev=4271817228, st_nlink=1, st_uid=0, st_gid=0, st_size=14, st_atime=1740037620, st_mtime=1736910459, st_ctime=1737699389)
File: copy2_file - with - space-1.py.bk
File stat: os.stat_result(st_mode=33206, st_ino=39125021763269030, st_dev=4271817228, st_nlink=1, st_uid=0, st_gid=0, st_size=14, st_atime=1740037620, st_mtime=1736910459, st_ctime=1737699389)
File: copy_file - with - space-1.py
File stat: os.stat_result(st_mode=33206, st_ino=25051272927612597, st_dev=4271817228, st_nlink=1, st_uid=0, st_gid=0, st_size=14, st_atime=1740037620, st_mtime=1740037620, st_ctime=1737699389)
File: copy_file - with - space-1.py.bk
File stat: os.stat_result(st_mode=33206, st_ino=18014398509968944, st_dev=4271817228, st_nlink=1, st_uid=0, st_gid=0, st_size=14, st_atime=1740037620, st_mtime=1740037620, st_ctime=1737699389)


In [526]:
#convert the time to readable
file= Path('C:\demo_testfile')
filestat=file.stat()
print(filestat)

# converts the st_mtime (modification time) from a Unix timestamp to a human-readable local time format
from datetime import datetime
convertime=datetime.fromtimestamp(filestat.st_mtime) #locatime
print(convertime)

os.stat_result(st_mode=16895, st_ino=75435293758701416, st_dev=4271817228, st_nlink=1, st_uid=0, st_gid=0, st_size=4096, st_atime=1740104578, st_mtime=1740104577, st_ctime=1733472133)
2025-02-21 10:22:57.535324


#### 3.2 Directory

##### - `shutil.copytree(src, dst)`
Copy directories (including all files & subdirectories)

In [442]:
path=Path(r'C:\demo_testfile\renamefile')
# Create the data directory if it doesn't exist
if path.exists():
    directories = [entry.name for entry in path.iterdir() if entry.is_dir()]
    print(directories)
else: 
    print('directory not exist')
    path.mkdir(exist_ok=True)
    
    
copydatadir = path / 'data_copy'
shutil.copytree(path/'data', copydatadir)

#if folder exist remove it
# if copydatadir.exists():
#     shutil.rmtree(copydatadir)  # Remove existing directory first

['.idea', 'data', 'data_copy']


WindowsPath('C:/demo_testfile/renamefile/data_copy')

<a id='RenameCopying'></a>
### 4. Rename/moving [🔙](#Part2)

##### - `shutil.rename(src, dst)`
Renames a file or directory, it can move directories across filesystems, unlike os.rename()

##### - `shutil.move(src, dst)` 
for moving files/directories across filesystems (copies first, then deletes).

In [461]:
import shutil
from pathlib import Path

base_path = Path(r"C:\demo_testfile\renamefile")  # Your base directory
data_path = base_path / "data_copyfile22"  # Target "data" directory inside renamefile

if not data_path.exists():
    data_path.mkdir()
    
    
for file in os.listdir(base_path):
    #if file  =='__pycache__' or file=='data': #ignore this file. amd print rest
    if file in ['__pycache__', 'data','.idea']:
        continue 
    if file.endswith('.py'):     
        full_file_path = base_path / file  # Convert to full path
        shutil.move(str(full_file_path), str(data_path))  # Move the file

C:\demo_testfile\renamefile\file - with - space-1.py
C:\demo_testfile\renamefile\data_copyfile22


In [467]:
#move file back to original place
base_path = Path(r"C:\demo_testfile\renamefile")
data_path = base_path / "data_copyfile22"
file_name = "file - with - space-1.py"

# Move file up one level
shutil.move(str(data_path / file_name), str(base_path))

# Undo move: Move it back into "data_copyfile22"
shutil.move(str(base_path / file_name), str(data_path))

'C:\\demo_testfile\\renamefile\\data_copyfile22\\file - with - space-1.py'

<a id='ReadwriteFile'></a>
### 5. Read and write file [🔙](#Part2)

you can use either with pathlib `write_text`, `read_text`, or use `with` built-in. If you use os.path then with is better choice. 

#### 5.1 using pathlib 
if the file does not exist, it will be created automatically. If the file exists, it will be overwritten.

##### - `Path().write_text`: write file
##### - `Path().read_text`: read file

In [486]:
#write and read pathlib
new_file=Path.cwd()/"testFile1.txt" 

#read and write using pathlib
#write file
new_file.write_text("Hello")
#read file
print(new_file.read_text())

Hello


#### 5.2 using with built-in

##### - `with open(filename, mode) ` using with built-in

> Mode:
>> - `r`: read as default
>> - `w` for write: Overwrites the file if it exists, creates new if it doesn't.
>> - `a` for append: Opens file for writing, keeps existing content, appends new data.
>> - `x` for Exclusive Create: Fails if file exists, only creates a new file.
>> - `b` for Binary: Used with `rb`, `wb`, etc., for binary files.
>> - `+` for read and write: 
    - `r+`: Read & write (error if file doesn't exist, no overwrite).
    - `w+`: Read & write (overwrites file).
    - `a+`: Read & append (keeps existing content). 

In [518]:
# write file 
with open (new_file, 'w') as file:
    file.write('This is test file')

In [519]:
#read
#shorter way with read_text
print(new_file.read_text())

with new_file.open() as file:
    print(file.read())
with open(new_file, "r") as f:
    print(f.read())

This is test file
This is test file
This is test file


In [520]:
# Keeps existing content, adds new content at the end.
with open (new_file, 'a') as file:
    file.write('\nThis is test file')
print(new_file.read_text())

This is test file
This is test file


## Part3 Other Utitlies (optional)

### OS Path Separators

When using `os.path`, you don’t need to worry about platform-specific path separators, automatically detects the operating system and uses the appropriat separator for the current operating system. However it may cause issues when handling paths from a different OS. 

- Why should you use this:

> **handing path with different OS** : If you're **developing under Linux** and need to **handle Windows paths** or using under window and use handle linux path then you need to use it. 
>> - **Linux handling Windows paths**: On Linux, os.path uses **`posixpath`**, which might not interpret Windows paths correctly (e.g., backslashes `\` or drive letters like `C:`).
>> - **Windows handling Linux paths**: On Windows, os.path uses **`ntpath`**, which might not handle POSIX-style paths properly (e.g., forward slashes `/`).
>> - **Solution**: To handle paths specific to another operating system, you can explicitly use ntpath (for Windows-style paths) or posixpath (for POSIX-style paths) for greater flexibility and accuracy.

##### - `ntpath.sep`: 
Represents the default path separator (`\`).
##### - `ntpath.altsep`: 
Represents an alternative path separator (`/`), which is also supported by Windows.

In [559]:
#compare different between os.path and ntpath
import ntpath 
print("os.path.sep:", os.path.sep)
print("ntpath.sep", ntpath.sep)
print("os.path.altsep:", os.path.altsep)
print("ntpath.altsep", ntpath.altsep)

os.path.sep: \
ntpath.sep \
os.path.altsep: /
ntpath.altsep /


In [561]:
import ntpath
import posixpath
file = r"C:\\demo_testfile\\ex1.py"
directory=r"C:\\demo_testfile\\notes"
# Gets the filename from the path.
print(ntpath.basename(file))
#Gets the directory name from the path.
print(ntpath.dirname(directory))
#Splits the path into head and tail.
directory, filename=ntpath.split(file)
print(directory, ' ', filename)
#Splits the drive and the rest of the path
print(ntpath.splitdrive(directory))

ex1.py
C:\\demo_testfile
C:\\demo_testfile   ex1.py
('C:', '\\\\demo_testfile')


##### - `ntpath.join`
Join path using ntpath

In [565]:
print(ntpath.join(*["c:\\", "user", "documents", "file.txt"])) 
print(ntpath.join('c:\\', 'dir', 'subdir', 'filename.ext'))
print(ntpath.join('c:', 'dir', 'subdir', 'filename.ext'))#c:dir\subdir\filename.ext
print(ntpath.join('c:', ntpath.sep, 'dir', 'subdir', 'filename.ext'))# c:\dir\subdir\filename.ext

c:\user\documents\file.txt
c:\dir\subdir\filename.ext
c:dir\subdir\filename.ext
c:\dir\subdir\filename.ext


In [563]:
print(ntpath.altsep.join(['dir', 'subdir', 'filename.ext']))#dir/subdir/filename.ext
print(ntpath.altsep.join(['c:', 'subdir', 'filename.ext']))#c:/subdir/filename.ext
print(posixpath.join('home','test'))

dir/subdir/filename.ext
c:/subdir/filename.ext
home/test


### Comparing Files and Directories

#####  - `filecmp.cmp()`:
Just checks file metadata (size & modification time). If both match, it considers files identical without reading them(shallow mode, default).This is faster but less accurate.

In [548]:
from pathlib import Path
import filecmp

path = Path(r'C:\demo_testfile')
file1 = path / 'ex1.py'
file2 = path / 'ex2.py'
#file1 and file2 are same file you can use md5 to check also
print(filecmp.cmp(file1, file2))


True


In [554]:
#current working directory
print(Path().cwd())
print(filecmp.cmp('testFile1.txt', 'test.txt'))# only use in current directory

C:\Users\test\python_test
False


##### - `filecmp.cmp(..., shallow=False)`: 
Reads entire content of both files byte-by-byte. If any byte is different, it detects the difference. Slower but more accurate.

In [556]:
print(Path().cwd())
print(filecmp.cmp('testFile1.txt', 'test.txt', shallow=False))# only use in current directory

C:\Users\test\python_test
False


## Summary 

### List Directory

In [None]:
#list content in current working directory
for item in Path().iterdir():  
    if item.is_dir():
        print(item)
        
#list all content
for item in Path().glob("*"):  
    print(item)
    
#list content with end with txt file
for item in Path().glob("*.txt"): 
    print(item)

In [535]:
#List directory current working directory
directories = [entry.name for entry in Path().iterdir() if entry.is_dir()]
directories

['.ipynb_checkpoints',
 'data123',
 'DataType',
 'done_toremove',
 'elog_example',
 'hahow_scrap_pandas_mathplot',
 'newtest',
 'Pandas',
 'test123',
 'testdir1']

In [534]:
#list recusrive(dsubdirectory)
subdirs = [subdir for subdir in Path().rglob('*') if subdir.is_dir()]
if not subdirs:
    print('empty subdirectory')
else:
    for subdir in subdirs:
        print(subdir)

.ipynb_checkpoints
data123
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
newtest
Pandas
test123
testdir1
DataType\.ipynb_checkpoints
DataType\reports
DataType\reports\Reports
DataType\reports\Reports\REPORTS-in_HERE
done_toremove\.ipynb_checkpoints
elog_example\.ipynb_checkpoints
elog_example\2024_flex
elog_example\excelconvert
elog_example\review
elog_example\2024_flex\.ipynb_checkpoints
elog_example\excelconvert\.ipynb_checkpoints
elog_example\review\.ipynb_checkpoints
newtest\subdir
Pandas\.ipynb_checkpoints
Pandas\notebooks
Pandas\notebooks\.ipynb_checkpoints
Pandas\notebooks\data
Pandas\notebooks\notes
Pandas\notebooks\notes\.ipynb_checkpoints
test123\subdir
testdir1\subdir
testdir1\subdir\emptydir


In [628]:
SKIP_DIRS = ["notes", "data","renamefile"]
directory = Path('c:\demo_testfile')
#With a for loop
for item in directory.glob("*"):
     if set(item.parts).isdisjoint(SKIP_DIRS):
         print(item)

c:\demo_testfile\BBB.txt
c:\demo_testfile\datanew
c:\demo_testfile\ex1.py
c:\demo_testfile\ex2.py
c:\demo_testfile\hell[o].txt
c:\demo_testfile\index.html
c:\demo_testfile\page.html
c:\demo_testfile\read.txt
c:\demo_testfile\test
c:\demo_testfile\test -copy
c:\demo_testfile\test.txt
c:\demo_testfile\testFile1.txt


## Reference
- https://builtin.com/software-engineering-perspectives/python-pathlib