# Understanding os.path module for path handling

Description: This is a Note on path handling using os.path module, which will include:
- specific path
- access to directory and list down content
- checking whether it's a file or directory
- File management like rename, remove, create,etc. 

However this is only for os.path library, which might not cover alot. You can refer another note related Pathlib, which is the extend of os.path, which I think is much easier to use. Both os.module and pathlib can you together, there are some case you might use `os.path`, some case use `pathlib.path`. You can perfer your perference, but there're many more advance skill that I might not cover in these note. I only pick I think might be useful in real time case. 

**Structure of Path**

This is the example directory I will be using, you can see the hierarchy structure of the directory and files look like

<a id='toc'></a>
## TOC (Table of Content):
* [import module](#library)
* [Part 1 Using os.path](#Part1)
    - [1. Show current directory path](#1.cwp)
    - [2. Path Resolution](#2.PathResolution)
    - [3. Access specific path](#3.Access-specific-path)
    - [4. Lists entries in a directory]((#4.listDirectory)
    - [5. Checking file/directory](#checkFileDir)
    - [6. File extension](#6.FileExt)
    - [7. Matching Pattern Files and recursive](#7.FileMatching)
* [Part 2 ile and Directory Operations](#part2)
    - [1. Create directory](#part2_createdir)
    - [2. Remove Directory and File](#part2_RemoveDirectoryFile)
    - [3. Copying](#part2_Copyingg)
    - [4. Rename](#part4-Rename)
    - [5. Read and write file](#part2-ReadWriteFile)
* [Summary example](#Summary)

<a id='library'></a>
import module

In [4]:
#import library
import os, glob

#chekc python version 
from platform import python_version
print(python_version())

3.10.9


<a id='Part1'></a>
## Part 1 Using os.path [🔝](#toc)

- [1.Show current directory path](#1.cwp)
    - [1.1 get current directory path](#1.1-get-current-working-directory-path)
        - `os.getcwd()`: get current working directory
    - [1.2 get default path](#1.2-get-default-path)
        - `os.environ.get()`: 
- [2. Path Resolution](#2.PathResolution)
    - [2.1 Get absolute Path](#2.1-Get-absolute-Path)
        - `os.path.abspath`: get absolute path
- [3. Access specific path](#3.Access-specific-path)
    - [3.1 Access specific path](#3.1-Access-specific-path)
        - `chdir(): change directory`: change current directory location
    - [3.2 Parent Directory Access](#3.2-Parent-Directory-Access)
        - `os.path.dirname(path)`: get parent Directory
    - [3.3 join path](#3.3-join-path)
        - `os.path.join(path1, path2)`: join path with file
- [4. Lists entries in a directory](#4.listDirectory)
    - `os.listdir(path)`: list directory
- [5. Checking file/directory](#checkFileDir)
    - `os.path.exists(path)`: check file/directory exist
    - `os.path.isfile(path)`: check if file exist
    - `os.path.isdir(path)`: check is it directory
    - `os.path.isabs(path)` :check is absolute path
- [6. file extension](#6.FileExt)
    - [6.1 Attribute](#6.1-Attribute)
        - `os.path.basename(path)`
        - `os.path.splitext(path)`
        - `os.path.split(path)`
        - `split(os.sep)`
        - `os.path.splitdrive(path)`
    - [6.2 startswith() and endswith()](#statswith_endswith)
        - `startswith()`: file string starts with
        - `endswith()`: file string ends with
- [7. Matching Pattern Files and recursive](#7.FileMatching)
    - `glob.glob()`: search file pattern
    - `glob.glob(recursive=True)`: Recursive file matching
    - `os.walk()`:Recursive Searching `

<a id='1.cwp'></a>
### 1. Show current working directory path [🔙](#Part1)

#### 1.1 get current working directory path

#### -  `os.getcwd()`: Returns the current working directory as a string.

In [6]:
print(os.getcwd())

C:\Users\test\python_test\filehandling


#### 1.2 get default path

##### -  `os.environ.get('USERPROFILE')`: Gets the home directory 
- **Window** use `'USERPROFILE'` for home working directory
- **Linux**: use `home` for home working directory

In [9]:
print(os.environ.get('USERPROFILE'))

C:\Users\test


<a id='2.PathResolution'></a>
### 2. Path Resolution [🔙](#Part1)

### 2.1 Get absolute Path

#### - `os.path.abspath('path')`: Returns the absolute path as a string. 

In [7]:
#Check path for current working directory
print(os.path.abspath('.')) #need to assign you path 
print(os.path.abspath('my_file.txt')) #get full path of specfic file
print(os.path.abspath(r'C:\demo_testfile'))

C:\Users\test\python_test\filehandling
C:\Users\test\python_test\filehandling\my_file.txt
C:\demo_testfile


In [24]:
#Check path for current working directory 
print(os.path.abspath(os.getcwd()))

C:\Users\test\python_test


In [25]:
#Check path for specific file 
print(os.path.dirname(os.path.abspath("my_file.txt")))#C:\demo_testfile

C:\Users\test\python_test


<a id='3.Access-specific-path'></a>
### 3. Access specific path [🔙](#Part1)

understanding different slash to define path:

- forward slash(`\`): Need a raw string prefix (`r"\"`) to avoid errors, used for window path
- back slash(`\\`): Escape each backslash.
- double slash(`/`): Also valid and are commonly used in Linux, but pathlib supports them on all platforms.


#### 3.1 Access specific path

In [9]:
base_path = r"C:\demo_testfile"
file_path = os.path.join(base_path, "example.txt")
file_path

'C:\\demo_testfile\\example.txt'

#### - `chdir()`: change directory

In [30]:
#change directory
print(os.getcwd()) #get current working directory
currentdir=os.getcwd()
print(currentdir)#get your current working directory
os.chdir('c:\\') #change working directory
print(os.getcwd())#get your current working directory

os.chdir(currentdir) #return back to orginal path
print(os.getcwd())

C:\Users\test\python_test
C:\Users\test\python_test
c:\
C:\Users\test\python_test


#### 3.2 Parent Directory Access

#### - `os.path.dirname(path)`: get the parent directory
Returns the directory name of the given path, including the full path up to the parent directory

In [32]:
#get parent directory
print(os.path.dirname(r"C:\demo_testfile\test")) #get the parent of directory

C:\demo_testfile


In [10]:
#get parent directory of a file
fullpath=r'C:\demo_testfile\test\heello.txt'
print(f'Path: {fullpath}')
print(f'parent dir: {os.path.dirname(fullpath)}') #get the parent of directory

Path: C:\demo_testfile\test\heello.txt
parent dir: C:\demo_testfile\test


#### 3.3 join path
Combines multiple paths into one.
 > syntax:`os.path(path, *paths)`

#### - `os.path.join(path1, path2)`

Joins paths into a single string.

In [35]:
print(os.path.join(r'C:\demo_testfile', 'file.txt'))
print(os.path.join(os.getcwd(),'images'))

C:\demo_testfile\file.txt
C:\Users\test\python_test\images


<a id='4.listDirectory'></a>
### 4. Lists entries in a directory [🔙](#Part1)
This is like the OS cli command line: `ls -al` linux or `dir` window will list all file directory in current working directory. 

#### - `os.listdir(path)`
Lists all files and directories in the specified directory, without value, use current working directory 

In [13]:
print(os.listdir()) #list working directory
print(os.listdir('C:\\tmp')) #list specific working directory

['.ipynb_checkpoints', 'File Pattern Matching with Glob and Fnmatch.ipynb', 'iterate_filter_file.ipynb', 'Juypter Notebook.ipynb', 'os.path.ipynb', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib.ipynb']
['git', 'powerlevel10k', 'powerlevel10k_full', 'takuya.omp.json']


In [40]:
#Iterate current directory
for p in os.listdir(): 
    print(p)

.ipynb_checkpoints
Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
iterate_filter_file.ipynb
jupterNote.ipynb
numpy.ipynb
os.path_reorganize.ipynb
Pandas
Pathlib Basics(orgainize_final).ipynb
Pathlib_reorganize.ipynb
tmp.ipynb
Understanding File Pattern Matching with Glob and Fnmatch.ipynb


In [41]:
#filter except file condition
for file in os.listdir():
    #if file  =='__pycache__': #ignore this file. amd print rest
    if file in ['__pycache__'] or file.endswith('.txt'):
        continue 
    print(file)

.ipynb_checkpoints
Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb
DataType
done_toremove
elog_example
hahow_scrap_pandas_mathplot
iterate_filter_file.ipynb
jupterNote.ipynb
numpy.ipynb
os.path_reorganize.ipynb
Pandas
Pathlib Basics(orgainize_final).ipynb
Pathlib_reorganize.ipynb
tmp.ipynb
Understanding File Pattern Matching with Glob and Fnmatch.ipynb


list only directory of parent path

In [12]:
fullpath=r'C:\demo_testfile\heello.txt'
directory=os.path.dirname(fullpath)#get aprent of file
print(directory)#C:\demo_testfile
all_entries = os.listdir(directory)
directories = [entry for entry in all_entries if os.path.isdir(os.path.join(directory, entry))]
directories

C:\demo_testfile


['data', 'datanew', 'notes', 'renamefile', 'test', 'test -copy']

<a id='checkFileDir'></a>
### 5. Checking file/directory [🔙](#Part1)

In [51]:
filename='index.html'
f=os.path.abspath(filename)
print(f'Path: {f}')

print(f"check file or directory exist: {os.path.exists(f)}") 
print(f"checking is it file: {os.path.isfile(f)}") 
print(f"checking is it directory: {os.path.isdir(f)}") 
print(f"checking absolute path: {os.path.isabs(f)}") 

Path: C:\Users\test\python_test\index.html
check file or directory exist: False
checking is it file: False
checking is it directory: False
checking absolute path: True


#### - `os.path.exists(path)`:
Checks if the path exists and is accessible will return True

In [55]:
path2 = 'C:\\demo_testfile' 
if os.path.exists(path2):
    print("File exists")
else:
    print("File does not exist")

File exists


#### - `os.path.isfile(path)`
Checks if the path is a file and return True, if folder return False. To iterate to specific path, you need to use join to combine the path and file name. 

In [84]:
#check if file and list it 
path2 = 'C:\\demo_testfile'
for file in os.listdir(path2):
    full_path = os.path.join(path2, file) # Combine path2 + file
    #print(os.path.isfile(full_path))  # Checks inside 'C:\demo_testfile'
    #filter only file in specfic path 
    if os.path.isfile(full_path):
        print(full_path)

C:\demo_testfile\AAA.txt
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py
C:\demo_testfile\hell[o].txt
C:\demo_testfile\index.html
C:\demo_testfile\page.html
C:\demo_testfile\read.txt
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt


In [95]:
# list comprehensive
files = [
os.path.join(path2, f)
for f in os.listdir(path2)
#list all file
if os.path.isfile(os.path.join(path2, f))
#filter specfic file type
]
print("\n".join(files))

C:\demo_testfile\AAA.txt
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py
C:\demo_testfile\hell[o].txt
C:\demo_testfile\index.html
C:\demo_testfile\page.html
C:\demo_testfile\read.txt
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt


Understand why need to use the join path with filename

In [91]:
path2 = 'C:\\demo_testfile'
for file in os.listdir(path2):
    print(f'filename: {file} ,and path: {path2}')
    #full_path = os.path.join(path2, file) # Combine path2 + file
    #print(full_path)
    #print(os.path.isfile(full_path))  # Checks inside 'C:\demo_testfile'

filename: AAA.txt ,and path: C:\demo_testfile
filename: data ,and path: C:\demo_testfile
filename: datanew ,and path: C:\demo_testfile
filename: ex1.py ,and path: C:\demo_testfile
filename: ex2.py ,and path: C:\demo_testfile
filename: hell[o].txt ,and path: C:\demo_testfile
filename: index.html ,and path: C:\demo_testfile
filename: notes ,and path: C:\demo_testfile
filename: page.html ,and path: C:\demo_testfile
filename: read.txt ,and path: C:\demo_testfile
filename: renamefile ,and path: C:\demo_testfile
filename: test ,and path: C:\demo_testfile
filename: test -copy ,and path: C:\demo_testfile
filename: test.txt ,and path: C:\demo_testfile
filename: testFile1.txt ,and path: C:\demo_testfile


In [77]:
# this is incorrect without join path and filename
print("Current Working Directory:", os.getcwd())  # Show where it's looking

for file in os.listdir(path2):  
    print(f"Checking: {file} -> Exists in CWD? {os.path.isfile(file)}") 
    #file is just a filename, Python assumes the path is os.getcwd() + file → CWD\file
    #os.path.join(path2, file), Python knows you're giving it a specific path → path2\file

Current Working Directory: C:\Users\test\python_test
Checking: AAA.txt -> Exists in CWD? False
Checking: data -> Exists in CWD? False
Checking: datanew -> Exists in CWD? False
Checking: ex1.py -> Exists in CWD? False
Checking: ex2.py -> Exists in CWD? False
Checking: hell[o].txt -> Exists in CWD? False
Checking: index.html -> Exists in CWD? False
Checking: notes -> Exists in CWD? False
Checking: page.html -> Exists in CWD? False
Checking: read.txt -> Exists in CWD? False
Checking: renamefile -> Exists in CWD? False
Checking: test -> Exists in CWD? False
Checking: test -copy -> Exists in CWD? False
Checking: test.txt -> Exists in CWD? False
Checking: testFile1.txt -> Exists in CWD? False


#### - `os.path.isdir(path)`
Checks if the path is a directory, and retrun True, if file return False
    

In [96]:
#check if file and list it 
path2 = 'C:\\demo_testfile'
for file in os.listdir(path2):
    full_path = os.path.join(path2, file) # Combine path2 + file
    #filter only directory in specfic path 
    if os.path.isdir(full_path):
        print(full_path)

C:\demo_testfile\data
C:\demo_testfile\datanew
C:\demo_testfile\notes
C:\demo_testfile\renamefile
C:\demo_testfile\test
C:\demo_testfile\test -copy


#### `os.path.isabs(path)`
Checks if the path is an absolute path.

In [98]:
path2 = 'C:\\demo_testfile' 
if os.path.isabs(path2):
    print("is absolute path")
else:
    print("not absolute path")

is absolute path


In [99]:
path2 = 'C:\\demo_testfile'
for file in os.listdir(path2):
    full_path = os.path.join(path2, file) # Combine path2 + file
    #filter only directory in specfic path 
    if os.path.isabs(full_path):
        print(full_path)

C:\demo_testfile\AAA.txt
C:\demo_testfile\data
C:\demo_testfile\datanew
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py
C:\demo_testfile\hell[o].txt
C:\demo_testfile\index.html
C:\demo_testfile\notes
C:\demo_testfile\page.html
C:\demo_testfile\read.txt
C:\demo_testfile\renamefile
C:\demo_testfile\test
C:\demo_testfile\test -copy
C:\demo_testfile\test.txt
C:\demo_testfile\testFile1.txt


<a id='6.FileExt'></a>
### 6. file extension [🔙](#Part1)

#### 6.1 Attribute
You can use these attribute to split the filename and it's extenstion
- `os.path.basename(path)`
- `os.path.splitext(path)`
- `os.path.split(path)`
- `os.path.splitdrive(path)`

##### - `os.path.basename(path)`
returns the filename with the extension.

In [129]:
my_file=r"C:\demo_testfile\file_1.txt"
#print(f"basename with directory: {os.path.basename(my_dir)}") 
print(f"basename with filename: {os.path.basename(my_file)}") 


basename with filename: file_1.txt


##### - `os.path.splitext(path)`
The `splitext` is use to splits filename, which will split filename and and extension

In [131]:
# using splitext to split filename and it's extension
my_file=r"file_1.txt"
my_filewithpath=r"C:\demo_testfile\file_1.txt"
print(f'using splitext with file only: {os.path.splitext(my_file)}') 
print(f'using splitext with path: {os.path.splitext(my_filewithpath)}') #not common use case

using splitext with file only: ('file_1', '.txt')
using splitext with path: ('C:\\demo_testfile\\file_1', '.txt')


In [150]:
#get directory name of the file 
my_file2 = '/test/file_1.txt'
print(f'file path=> {my_file2}')
print("parent dir:", os.path.dirname(my_file2))
subdirname = os.path.basename(os.path.dirname(my_file2))
print("basename of parent dir: " , subdirname)

#split directory and file 
filename=os.path.split(my_file2) 
print(filename)
dirname, basename = os.path.split(my_file2)
print(f'parent dir=>{dirname}, and filename=> {basename}')

file path=> /test/file_1.txt
parent dir: /test
basename of parent dir:  test
('/test', 'file_1.txt')
parent dir=>/test, and filename=> file_1.txt


In [138]:
#iterate through working directory and split the filename and it extension
for file in os.listdir():
    if os.path.isfile(file):
        name, ext=os.path.splitext(file)
        print(f"file name: {name} , ext file: {ext}")

file name: Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified) , ext file: .ipynb
file name: iterate_filter_file , ext file: .ipynb
file name: jupterNote , ext file: .ipynb
file name: numpy , ext file: .ipynb
file name: os.path_reorganize , ext file: .ipynb
file name: Pathlib Basics(orgainize_final) , ext file: .ipynb
file name: Pathlib_reorganize , ext file: .ipynb
file name: tmp , ext file: .ipynb
file name: Understanding File Pattern Matching with Glob and Fnmatch , ext file: .ipynb


##### - `os.path.split(path)`
os.path.split() only splits the last component. If split fil, it splits the **parent path and file name** into head and tail(not use often with file handling). ex: `c:\test.txt`, will split into `c:\` and `test.txt`

In [132]:
my_file=r"file_1.txt"
my_filewithpath=r"C:\demo_testfile\file_1.txt"
print(f'using split with file only: {os.path.split(my_file)}') #not common case
print(f'using split with path:: {os.path.split(my_filewithpath)}') 

using split with file only: ('', 'file_1.txt')
using split with path:: ('C:\\demo_testfile', 'file_1.txt')


in case you want to use split to split filename, you can use this method

In [137]:
# use the split to split filename
filename=os.path.basename(my_file).split('.', 1) #remove the first dot
print(f"split file with extension :{filename}")
print(f"file name :{filename[0]}, and ext: {filename[1]}")

split file with extension :['file_1', 'txt']
file name :file_1, and ext: txt


##### - `split(os.sep)`
`split()` splits the last component, but split(os.sep) split all parts base on `\` on Path.

In [173]:
#you can also use os.sep to seperate \
pathwindow='c:\\test\\subtest'
print(f'os.sep for window: {pathwindow.split(os.sep)}')

os.sep for window: ['c:', 'test', 'subtest']


##### - `os.path.splitdrive(path)`
Splits the drive and the rest of the path into twp part

In [168]:
path='c:\\test\\subtest'
print(f'path: {path}')
drive= os.path.splitdrive(path)
print(f'splitdrive: {drive}')
print("Drive of path '%s:'" %path, drive[0]) 
print("Tail of path '%s:'" %path, drive[1], "\n") 

path: c:\test\subtest
splitdrive: ('c:', '\\test\\subtest')
Drive of path 'c:\test\subtest:' c:
Tail of path 'c:\test\subtest:' \test\subtest 



<a id='statswith_endswith'></a>
#### 6.2 startswith() and endswith(): 
`startswith()` and `endswith()` are string methods, that are commonly use with os.path when filtering files by extension or prefix. 

> - startswith(): file substring/suffix starts with a specific string
> - endswith(): file substring/suffix ends with a specific string

##### - `startswith('directory/file'`:
Checks if a filename ends with a specific extension and return True is find, else return False


In [117]:
directory = 'testfile' 
file='test.py'
#filter only file in specfic path 
print(directory.startswith('test'))
print(file.startswith('test'))

True
True


##### - `endswith('directory/file|')`:
Checks if a filename starts with a specific prefix and return True is find, else return False

In [119]:
directory = 'testfile' 
file='test.py'
#filter only file in specfic path 
print(directory.endswith('st'))
print(file.endswith('.py'))

False
True


In [122]:
#return filename result
path2 = 'C:\\demo_testfile' 
print(os.listdir())

['.ipynb_checkpoints', 'Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb', 'DataType', 'done_toremove', 'elog_example', 'hahow_scrap_pandas_mathplot', 'iterate_filter_file.ipynb', 'jupterNote.ipynb', 'numpy.ipynb', 'os.path_reorganize.ipynb', 'Pandas', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib_reorganize.ipynb', 'tmp.ipynb', 'Understanding File Pattern Matching with Glob and Fnmatch.ipynb']


In [128]:
#filter python file on specific path
for item in os.listdir(path2):
    if item.endswith('.py'):
        print(item)

ex1.py
ex2.py


In [127]:
path2 = 'C:\\demo_testfile' 
os.listdir(path2)
#check for file and filter python file ext 
for item in os.listdir(path2):
    full_path = os.path.join(path2, item)
    if os.path.isfile(full_path) and item.endswith(".py"):
        #print(item)
        print(full_path)

C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py


In [126]:
# list comprehensive
path3 = 'C:\\demo_testfile' 
files = [
os.path.join(path3, f)
for f in os.listdir(path3)
#if os.path.isfile(os.path.join(path3, f)) and f.endswith(".py")
if f.endswith(".py")
]
print(files)
print("\n".join(files))

['C:\\demo_testfile\\ex1.py', 'C:\\demo_testfile\\ex2.py']
C:\demo_testfile\ex1.py
C:\demo_testfile\ex2.py


<a id='7.FileMatching'></a>
### 7. Matching Pattern Files and recursive [🔙](#Part1)

In [188]:
# import glob module
import glob 

##### - `glob.glob()`: search file pattern 
Matches file patterns (returns a list), you can either use the `*` or `*file-ext`. This is useful to filer file extension. 

In [187]:
#print all content
print(glob.glob("*"))
print()
#filter txt file 
print(glob.glob("*.ipynb") )

['Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb', 'DataType', 'done_toremove', 'elog_example', 'hahow_scrap_pandas_mathplot', 'iterate_filter_file.ipynb', 'jupterNote.ipynb', 'numpy.ipynb', 'os.path_reorganize.ipynb', 'Pandas', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib_reorganize.ipynb', 'tmp.ipynb', 'Understanding File Pattern Matching with Glob and Fnmatch.ipynb']

['Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb', 'iterate_filter_file.ipynb', 'jupterNote.ipynb', 'numpy.ipynb', 'os.path_reorganize.ipynb', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib_reorganize.ipynb', 'tmp.ipynb', 'Understanding File Pattern Matching with Glob and Fnmatch.ipynb']


In [197]:
#search specific path 
print(glob.glob("C:\\demo_testfile\\*txt"))

['C:\\demo_testfile\\AAA.txt', 'C:\\demo_testfile\\hell[o].txt', 'C:\\demo_testfile\\read.txt', 'C:\\demo_testfile\\test.txt', 'C:\\demo_testfile\\testFile1.txt']


##### - `glob.glob('*/**', recursive=True)`: Recursive file matching
This allow to search the sub-directory resursive

In [203]:
#search specific path and recursive search .txt file
glob.glob('C:\\demo_testfile\\*/**.txt', recursive=True) 


['C:\\demo_testfile\\data\\copy2_AAA.txt',
 'C:\\demo_testfile\\data\\copy2_hell[o].txt',
 'C:\\demo_testfile\\data\\copy_AAA.txt',
 'C:\\demo_testfile\\data\\copy_hell[o].txt',
 'C:\\demo_testfile\\test\\1.txt',
 'C:\\demo_testfile\\test\\2.txt',
 'C:\\demo_testfile\\test -copy\\1.txt',
 'C:\\demo_testfile\\test -copy\\2.txt']

In [204]:
#recursive search current working directory and subdirectory with  all file type 
glob.glob('*/**', recursive=True) 

['DataType\\',
 'DataType\\List.ipynb',
 'DataType\\reports',
 'DataType\\reports\\Reports',
 'DataType\\reports\\Reports\\REPORTS-in_HERE',
 'DataType\\reports\\Reports\\REPORTS-in_HERE\\TPS-Report-01-11-2022.txt',
 'DataType\\reports\\Reports\\REPORTS-in_HERE\\TPS-Report-01-Aug.txt',
 'DataType\\reports\\Reports\\REPORTS-in_HERE\\TPS-Report-01-September-2021.txt',
 'DataType\\reports\\Reports\\REPORTS-in_HERE\\TPS-Report-01-September-2022.txt',
 'DataType\\strings in Python.ipynb',
 'DataType\\stringtest.ipynb',
 'DataType\\test.py',
 'done_toremove\\',
 'done_toremove\\pathlib[ok].ipynb',
 'done_toremove\\[debug]_directoryFile_renmae_remove.ipynb',
 'elog_example\\',
 'elog_example\\2024_flex',
 'elog_example\\2024_flex\\11.txt',
 'elog_example\\2024_flex\\elog_gnb_du_layer2.0',
 'elog_example\\2024_flex\\elog_multiplyUE',
 'elog_example\\2024_flex\\[2024]Log Parser CDU_Quick_understand.ipynb',
 'elog_example\\elog_multiplyUE',
 'elog_example\\elog_singleIUE',
 'elog_example\\excelc

##### - `os.walk()`:  Recursive Searching
Recursively iterate through all files, directories, and subdirectories. It will return path, subdirectory, and files. 

In [205]:
directory=r'c:\demo_testfile'
for  path, subdirs, files in os.walk(directory):
    print(f'path:{path}')
    print(f'directory:{subdirs}')
    print(f'files:{files}')
    print('==========')

path:c:\demo_testfile
directory:['data', 'datanew', 'notes', 'renamefile', 'test', 'test -copy']
files:['AAA.txt', 'ex1.py', 'ex2.py', 'hell[o].txt', 'index.html', 'page.html', 'read.txt', 'test.txt', 'testFile1.txt']
path:c:\demo_testfile\data
directory:[]
files:['copy2_AAA.txt', 'copy2_ex1.py', 'copy2_ex2.py', 'copy2_hell[o].txt', 'copy2_index.html', 'copy2_page.html', 'copy_AAA.txt', 'copy_ex1.py', 'copy_ex2.py', 'copy_file - with - space-1.py', 'copy_file - with - space-1.py.bk', 'copy_hell[o].txt', 'copy_index.html', 'copy_page.html']
path:c:\demo_testfile\datanew
directory:[]
files:['copy2_page.html', 'copy_page.html']
path:c:\demo_testfile\notes
directory:['.ipynb_checkpoints']
files:['.gitignore', '01-intro.ipynb', '02-tidy.ipynb', '03-merge.ipynb', 'billboard_ratings.csv', 'billboard_songs.csv', 'billboard_songs2023.csv']
path:c:\demo_testfile\notes\.ipynb_checkpoints
directory:[]
files:['01-intro-checkpoint.ipynb', '02-tidy-checkpoint.ipynb']
path:c:\demo_testfile\renamefile


In [207]:
#concatenate the directory and file name
directory=r'c:\demo_testfile'
for  path, subdirs, files in os.walk(directory):
    for name in files:
        print(os.path.join(path, name))

c:\demo_testfile\AAA.txt
c:\demo_testfile\ex1.py
c:\demo_testfile\ex2.py
c:\demo_testfile\hell[o].txt
c:\demo_testfile\index.html
c:\demo_testfile\page.html
c:\demo_testfile\read.txt
c:\demo_testfile\test.txt
c:\demo_testfile\testFile1.txt
c:\demo_testfile\data\copy2_AAA.txt
c:\demo_testfile\data\copy2_ex1.py
c:\demo_testfile\data\copy2_ex2.py
c:\demo_testfile\data\copy2_hell[o].txt
c:\demo_testfile\data\copy2_index.html
c:\demo_testfile\data\copy2_page.html
c:\demo_testfile\data\copy_AAA.txt
c:\demo_testfile\data\copy_ex1.py
c:\demo_testfile\data\copy_ex2.py
c:\demo_testfile\data\copy_file - with - space-1.py
c:\demo_testfile\data\copy_file - with - space-1.py.bk
c:\demo_testfile\data\copy_hell[o].txt
c:\demo_testfile\data\copy_index.html
c:\demo_testfile\data\copy_page.html
c:\demo_testfile\datanew\copy2_page.html
c:\demo_testfile\datanew\copy_page.html
c:\demo_testfile\notes\.gitignore
c:\demo_testfile\notes\01-intro.ipynb
c:\demo_testfile\notes\02-tidy.ipynb
c:\demo_testfile\notes\

In [210]:
#filter single file type 
def single_filtertype(dir):
    
    for root, dirs, files in os.walk(dir):
        for filename in files:
            if filename.endswith ('.txt'):
                print(filename)

def multiple_filtertype(dir):                
    for root, dirs, files in os.walk(dir):
        for filename in files:
            if filename.endswith (('.txt', '.py')):
                print(filename)                

In [211]:
directory = r'c:\demo_testfile' 
#filter only one file type
single_filtertype(directory)

AAA.txt
hell[o].txt
read.txt
test.txt
testFile1.txt
copy2_AAA.txt
copy2_hell[o].txt
copy_AAA.txt
copy_hell[o].txt
1.txt
2.txt
1.txt
2.txt


In [213]:
#filter multiple file type, ex: .py and .txt
multiple_filtertype(directory)

AAA.txt
ex1.py
ex2.py
hell[o].txt
read.txt
test.txt
testFile1.txt
copy2_AAA.txt
copy2_ex1.py
copy2_ex2.py
copy2_hell[o].txt
copy_AAA.txt
copy_ex1.py
copy_ex2.py
copy_file - with - space-1.py
copy_hell[o].txt
file - with - space-1.py
copy2_file - with - space-1.py
copy_file - with - space-1.py
1.txt
2.txt
1.txt
2.txt


<a id='part2'></a>
## Part 2 File and Directory Operations  [🔝](#toc)

Note: Create file, please go to write section, os,path don't have touch this method 

- [1. Create directory](#part2_createdir)
    - `os.mkdir(path)`
    - `os.makedirs(path)`
- [2. Remove Directory and File](#part2_RemoveDirectoryFile)
    - [2.1 Remove Empty Directory](#2.1-Remove-Empty-Directory)
        - `os.rmdir(path)`
        - `os.removedirs()`
    - [2.2 Remove individual File](#2.2-removeFile)
    - [2.3 Remove Non Empty content](#part2-2.3-RemoveNonEmptyDir)
        - `shutil.rmtree`
- [3. Copying](#part2_Copying)
    - [3.1 Copy File](#3.1-Copy-File)
        - `shutil.copy` and `shutil.copy2` 
    - [3.2 Copy Directory](#3.2-Copy-Directory) 
        - `shutil.rmtree`
- [4. Rename](#part4-Rename)
    - `os.rename(src, dst)`
- [5. Read and write file](#part2-ReadWriteFile)
    - [5.1 with built-in](#5.1-with-built-in)

<a id='part2_createdir'></a>
### 1. Create directory [🔙](#part2)

##### - `os.mkdir(path)`
Creates a new directory at the specified path.

In [217]:
#os module
os.mkdir('test123') #Creates a directory.
print(os.listdir())

['.ipynb_checkpoints', 'Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb', 'DataType', 'done_toremove', 'elog_example', 'hahow_scrap_pandas_mathplot', 'iterate_filter_file.ipynb', 'jupterNote.ipynb', 'numpy.ipynb', 'os.path_reorganize.ipynb', 'Pandas', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib_reorganize.ipynb', 'test123', 'tmp.ipynb', 'Understanding File Pattern Matching with Glob and Fnmatch.ipynb']


In [219]:
#check directory exist if not create it
if not os.path.exists('dirname'):
    os.mkdir('dirname')
else: 
    print('directory exist')
print(os.listdir())

['.ipynb_checkpoints', 'Basic Guide to Pandas! Tricks, Shortcuts( Python Simplified).ipynb', 'DataType', 'dirname', 'done_toremove', 'elog_example', 'hahow_scrap_pandas_mathplot', 'iterate_filter_file.ipynb', 'jupterNote.ipynb', 'numpy.ipynb', 'os.path_reorganize.ipynb', 'Pandas', 'Pathlib Basics(orgainize_final).ipynb', 'Pathlib_reorganize.ipynb', 'test123', 'tmp.ipynb', 'Understanding File Pattern Matching with Glob and Fnmatch.ipynb']


##### - `os.makedirs`
Create sub directories

In [240]:
path = os.path.join(os.getcwd(), "test123", "subdir")
os.makedirs(path, exist_ok=True)
for root, dirs, files in os.walk(path):
    print(f"Root: {root}")
    for d in dirs:
        print(f" - {os.path.join(root, d)}")

Root: C:\Users\test\python_test\test123\subdir


<a id='part2_RemoveDirectoryFile'></a>
### 2. Remove Directory and File [🔙](#part2)

#### 2.1 Remove Empty Directory

##### - `os.rmdir(path)`
delete empty directory 

In [19]:
# Create directories
path = "testdir/subdir/emptydir"
os.makedirs(path, exist_ok=True)

def list_dirs(base_path):
    return [os.path.join(root, d) for root, dirs, files in os.walk(base_path) for d in dirs]

# Show directories before removal
print("Before Removal:", list_dirs("testdir"))

# Remove only 'emptydir' (works since it's empty)
os.rmdir("testdir/subdir/emptydir")
# Show directories before removal
print("After  Removal:", list_dirs("testdir"))

Before Removal: ['testdir\\subdir', 'testdir\\subdir\\emptydir']
After  Removal: ['testdir\\subdir']


In [21]:
#display subdirectory

# Create directories
path = "testdir/subdir/emptydir"
os.makedirs(path, exist_ok=True)

    
# Function to show directory structure
def show_dirs(base_path, title):
    print(f"\n{title}")
    for root, dirs, files in os.walk(base_path):
        print(f"Root: {root}")
        #for d in dirs:
            #print(f" - {os.path.join(root, d)}")

# Show structure before removal
show_dirs("testdir", "Before Removal")

# Remove only 'emptydir' (works since it's empty)
os.rmdir("testdir/subdir/emptydir")

# Show structure after removal
show_dirs("testdir", "After Removal")



Before Removal
Root: testdir
Root: testdir\subdir
Root: testdir\subdir\emptydir

After Removal
Root: testdir
Root: testdir\subdir


##### - `os.removedirs()`
deletes empty directories recursively

In [22]:
path = "testdir/subdir/emptydir"
# Create directories
os.makedirs(path, exist_ok=True)

def list_dirs(base_path):
    return [os.path.join(root, d) for root, dirs, files in os.walk(base_path) for d in dirs]

# Show directories before removal
print("Before Removal:", list_dirs("testdir"))

# Remove only 'emptydir' (works since it's empty)

os.removedirs("testdir/subdir/emptydir")
# Show directories before removal
print("After  Removal:", list_dirs("testdir"))

Before Removal: ['testdir\\subdir', 'testdir\\subdir\\emptydir']
After  Removal: []


display subdirectory

In [23]:
# Create directories
path = "testdir/subdir/emptydir"
os.makedirs(path, exist_ok=True)
    
# Function to show directory structure
def show_dirs(base_path, title):
    print(f"\n{title}")
    for root, dirs, files in os.walk(base_path):
        print(f"Root: {root}")
        #for d in dirs:
            #print(f" - {os.path.join(root, d)}")

# Show structure before removal
show_dirs("testdir", "Before Removal")

# Remove only 'emptydir' (works since it's empty)
os.removedirs("testdir/subdir/emptydir")

# Show structure after removal
show_dirs("testdir", "After Removal")



Before Removal
Root: testdir
Root: testdir\subdir
Root: testdir\subdir\emptydir

After Removal


#### 2.2 Remove individual File

<a id='2.2-removeFile'></a>
##### - `os.remove()`  or  `os.unlink()`
only remove individual files, `os.unlink(path)` is an alias for `os.remove(path)`

In [54]:
import os

dir_path = r'C:\demo_testfile\datanew'  # Directory path
filename = os.path.join(dir_path, 'testfile.txt')  # File path
def list_file(base_path):
    return [os.path.join(root, f) for root, dirs, files in os.walk(base_path) for f in files]

def list_files(directory):
    """List all files in the specified directory."""
    return os.listdir(directory)

# Step 1: Show files before creation
print("Before file creation:", list_files(dir_path))

# Step 2: Create a file
with open(filename, 'w') as file:
    file.write('This is a test file')

# Step 3: Show files after creation
print("After file creation:", list_files(dir_path))

# Step 4: Delete the file
os.remove(filename)

# Step 5: Show files after deletion
print("After file deletion:", list_files(dir_path))



Before file creation: ['copy2_page.html', 'copy_page.html']
After file creation: ['copy2_page.html', 'copy_page.html', 'testfile.txt']
After file deletion: ['copy2_page.html', 'copy_page.html']


<a id='part2-2.3-RemoveNonEmptyDir'></a>
#### 2.3 Remove Non Empty content (file/directory)

##### - `shutil.rmtree`
> import module: `import shutil`

- Create File in Directory

In [56]:
import shutil
path = "testdir"
def creatfile(p):
    # Create the directory
    path=p
    os.makedirs(path, exist_ok=True)

    # Create a file inside the 'emptydir' directory
    file_path = os.path.join(path, "example.txt")
    with open(file_path, 'w') as f:
        f.write("This is a file inside the directory.")  # Writing some content to the file
def checkdir(path):
    if os.path.exists(path):
        for root, dirs, files in os.walk(path):
            print(f"Root: {root}")
            for d in files:
                print(f" - {os.path.join(root, d)}")
    else:
        print('directory not exist')
            
print('Create directory and file')
creatfile(path)
checkdir(path)


Create directory and file
Root: testdir
 - testdir\example.txt


- remove directory(which not empty directory) will raise error 

In [58]:
#remove remove non empty directory occur error 
os.rmdir(path)  

OSError: [WinError 145] 目錄不是空的。: 'testdir'

- using shutil remove nonempty directory

In [60]:
import shutil
checkdir(path)
print('Delete the directory:')   
shutil.rmtree("testdir")
checkdir(path)

Root: testdir
 - testdir\example.txt
Delete the directory:
directory not exist


<a id='part2_Copying'></a>
### 3. Copying File/Directory [🔙](#part2)

#### 3.1 Copy File

##### - `shutil.copy` and `shutil.copy2`

> - `copy` or `copy1()`: single file 
Copies **only the file's content**. It creates a new file at the destination with **default metadata** (e.g., current timestamp, default permissions).

>  - `copy2`: single file with metadata (timestamps, permissions)
Copies **both the content and metadata** of the file. The destination file will **retain the same timestamps, permissions, and ownership** as the original file

> - `os.stat`
check **metadeta** of file information

In [102]:
import shutil
import os

path = r'C:\demo_testfile'

os.chdir(r'C:\demo_testfile\renamefile')

# Ensure 'data2' directory exists
if not os.path.exists("data2"):
    os.mkdir('data2')

# Iterate over files in the directory
for file in os.listdir():
    if file in ['__pycache__', 'data', '.idea']:
        continue

    # Ensure we only process files, not directories
    if not os.path.isfile(file):
        print(f"Skipping directory: {file}")
        continue

    # Paths for copied files
    copy_path = os.path.join('data2', f"copy_{file}")
    copy2_path = os.path.join('data2', f"copy2_{file}")

    #print(f"Copying: {file} -> {copy_path}")
    shutil.copy(file, copy_path)  # Copy without metadata
    shutil.copy2(file, copy2_path)  # Copy with metadata
    original_stats = os.stat(file)
    copy_stats = os.stat(copy_path)
   
    # Display timestamps using os.stat
    original_stats = os.stat(file)
    copy_stats = os.stat(copy_path)
    copy2_stats = os.stat(copy2_path)

    print(f"File: {file}")
    print(f"  Original Modified Time: {original_stats.st_mtime}")
    print(f"  Copy Modified Time:     {copy_stats.st_mtime}")
    print(f"  Copy2 Modified Time:    {copy2_stats.st_mtime}")
    print()
# Switch back to original directory
os.chdir(path)


Copying: copy2_file - with - space-1.py -> data2\copy_copy2_file - with - space-1.py
File: copy2_file - with - space-1.py
  Original Modified Time: 1736910459.6833546
  Copy Modified Time:     1740388569.6539938
  Copy2 Modified Time:    1736910459.6833546

Copying: copy_file - with - space-1.py -> data2\copy_copy_file - with - space-1.py
File: copy_file - with - space-1.py
  Original Modified Time: 1740037620.6427584
  Copy Modified Time:     1740388569.6598165
  Copy2 Modified Time:    1740037620.6427584

Skipping directory: data2
Copying: fff.txt -> data2\copy_fff.txt
File: fff.txt
  Original Modified Time: 1740384182.245061
  Copy Modified Time:     1740388569.6638126
  Copy2 Modified Time:    1740384182.245061

Copying: file - with - space-1.py.bk -> data2\copy_file - with - space-1.py.bk
File: file - with - space-1.py.bk
  Original Modified Time: 1736910459.6833546
  Copy Modified Time:     1740388569.6688087
  Copy2 Modified Time:    1736910459.6833546

Copying: test.txt -> data

In [None]:
for file in os.listdir():
    if file in ['__pycache__', 'data','.idea']:
        continue 
    # Paths for copied files
    copy_path = os.path.join('data', f"copy_{file}")
    copy2_path = os.path.join('data', f"copy2_{file}")
    print(f"copy_path: {copy_path}")
    shutil.copy(file, copy_path) #Copy without metadata
    print(f"copy2_path: {copy2_path}")
    shutil.copy2(file, copy2_path)  # Copy with metadata

    original_stats = os.stat(file)
    copy_stats = os.stat(copy_path)
   
    # Display timestamps using os.stat
    original_stats = os.stat(file)
    copy_stats = os.stat(copy_path)
    copy2_stats = os.stat(copy2_path)

    print(f"File: {file}")
    print(f"  Original Modified Time: {original_stats.st_mtime}")
    print(f"  Copy Modified Time:     {copy_stats.st_mtime}")
    print(f"  Copy2 Modified Time:    {copy2_stats.st_mtime}")
    print()

#switch to default directory
os.chdir(path)

#### 3.2 Copy Directory

##### - `shutil.rmtree`: copy directory

In [307]:
path = r'C:\demo_testfile\renamefile'

# Check if the directory exists
if os.path.exists(path):
    directories = [entry for entry in os.listdir(path) if os.path.isdir(os.path.join(path, entry))]
    print(directories)
else:
    print('Directory does not exist')
    os.makedirs(path, exist_ok=True)  # Create the directory

copydatadir = os.path.join(path, 'data_copy2')

# Ensure the destination directory does not exist before using copytree
if os.path.exists(copydatadir):
    shutil.rmtree(copydatadir)

shutil.copytree(os.path.join(path, 'data'), copydatadir)

['.idea', 'data', 'data_copy']


'C:\\demo_testfile\\renamefile\\data_copy2'

<a id='part4-Rename'></a>
### 4. Rename [🔙](#part2)

##### - `os.rename(src, dst)`
renaming or moving within the same filesystem

In [106]:
path = r'C:\demo_testfile'
old_name = os.path.join(path, 'AAA.txt')  # Full path of the existing file
new_name = os.path.join(path, 'BBB.txt')  # Full path of the new file
print('#####list directory\'s filename####')
for file in os.listdir(path):
    #print(file)
    #full_path = os.path.join(path2, file) 
    if file.endswith('.txt'):
        print(file)
print('#####rename\'s filename####')        
if os.path.exists(old_name):  # Check if AAA.txt exists
    os.rename(old_name, new_name)
    print(f"Renamed: {old_name} -> {new_name}")
else:
    print(f"{old_name} does not exist")

#####list directory's filename####
BBB.txt
hell[o].txt
read.txt
test.txt
testFile1.txt
#####rename's filename####
C:\demo_testfile\AAA.txt does not exist


In [78]:
import os

path = r'C:\demo_testfile'
old_name = os.path.join(path, 'AAA.txt')  # Full path of the existing file
new_name = os.path.join(path, 'BBB.txt')  # Full path of the new file

if os.path.exists(old_name):  # Check if AAA.txt exists
    os.rename(old_name, new_name)
    print(f"Renamed: {old_name} -> {new_name}")
else:
    print(f"{old_name} does not exist")


C:\demo_testfile\AAA.txt does not exist


<a id='part2-ReadWriteFile'></a>
### 5. Read and write file [🔙](#part2)

#### 5.1 with built-in

##### - `with open(filename, mode) ` using with built-in

> Mode:
>> - `r`: read as default
>> - `w` for write: Overwrites the file if it exists, creates new if it doesn't.
>> - `a` for append: Opens file for writing, keeps existing content, appends new data.
>> - `x` for Exclusive Create: Fails if file exists, only creates a new file.
>> - `b` for Binary: Used with `rb`, `wb`, etc., for binary files.
>> - `+` for read and write: 
    - `r+`: Read & write (error if file doesn't exist, no overwrite).
    - `w+`: Read & write (overwrites file).
    - `a+`: Read & append (keeps existing content). 

In [76]:
with open ('test.txt', 'w') as file:
    file.write('This is test file')

In [77]:
with open('test.txt', "r") as f:
    print(f.read())

This is test file


In [319]:
# Keeps existing content, adds new content at the end.
with open ('test.txt', 'a') as file:
    file.write('\nThis is test file')
with open('test.txt', "r") as f:
    print(f.read())

This is test file
This is test file


<a id='Summary'></a>
## Summary Example [🔝](#toc)

-  join path: this allow you to write, rename file

In [61]:
dir_path = r'C:\demo_testfile\datanew'  # Directory path
filename = os.path.join(dir_path, 'testfile.txt')
print(filename)

C:\demo_testfile\datanew\testfile.txt


- list current directory

In [72]:
base_path = r'C:\demo_testfile'
os.listdir(base_path)

['BBB.txt',
 'data',
 'datanew',
 'ex1.py',
 'ex2.py',
 'hell[o].txt',
 'index.html',
 'notes',
 'page.html',
 'read.txt',
 'renamefile',
 'test',
 'test -copy',
 'test.txt',
 'testFile1.txt']

- list subdirectory

In [74]:
base_path = r'C:\demo_testfile'
[os.path.join(root, d) for root, dirs, files in os.walk(base_path) for d in dirs]

['C:\\demo_testfile\\data',
 'C:\\demo_testfile\\datanew',
 'C:\\demo_testfile\\notes',
 'C:\\demo_testfile\\renamefile',
 'C:\\demo_testfile\\test',
 'C:\\demo_testfile\\test -copy',
 'C:\\demo_testfile\\notes\\.ipynb_checkpoints',
 'C:\\demo_testfile\\renamefile\\.idea',
 'C:\\demo_testfile\\renamefile\\data',
 'C:\\demo_testfile\\renamefile\\data_copy',
 'C:\\demo_testfile\\renamefile\\data_copyfile',
 'C:\\demo_testfile\\renamefile\\data_copyfile22',
 'C:\\demo_testfile\\renamefile\\.idea\\inspectionProfiles',
 'C:\\demo_testfile\\test\\data',
 'C:\\demo_testfile\\test -copy\\data']

- filter except file condition

In [75]:
base_path = r'C:\demo_testfile'
#filter except file condition
for file in os.listdir(base_path):
    #if file  =='__pycache__': #ignore this file. amd print rest
    if file in ['__pycache__'] or file.endswith('.txt'):
        continue 
    print(file)

data
datanew
ex1.py
ex2.py
index.html
notes
page.html
renamefile
test
test -copy
