# Working with OS Module 

Python has several built-in modules and functions for handling files. These functions are spread out over several modules such as os, os.path, shutil, and pathlib, to name a few.


In this notebook, you’ll learn how to:

1.Retrieve file properties

2.Create directories

3.Match patterns in filenames

4.Traverse directory trees

5.Make temporary files and directories

6.Delete files and directories

7.Copy, move, or rename files and directories

8.Create and extract ZIP and TAR archives

9.Open multiple files using the fileinput module

Python’s “with open(…) as …” Pattern

Reading and writing data to files using Python is pretty straightforward. To do this, you must first open files in the appropriate mode. Here’s an example of how to use Python’s “with open(…) as …” pattern to open a text file and read its contents:

In [2]:
with open('Employee.txt','r') as f:
    data=f.read()

open() takes a filename and a mode as its arguments. r opens the file in read only mode. To write data to a file, pass in w as an argument instead:

In [4]:
with open('Employee.txt','w') as f:
    print('This is the data that I have shown')
    f.write(data)

This is the data that I have shown


In the examples above, open() opens files for reading or writing and returns a file handle (f in this case) that provides methods that can be used to read or write data to the file

# Getting a Directory Listing

Suppose your current working directory has a subdirectory called my_directory that has the following contents:

The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os.listdir() in legacy versions of Python or os.scandir() in Python 3.x. os.scandir() is the preferred method to use if you also want to get file and directory properties such as file size and modification date.

In versions of Python prior to Python 3, os.listdir() is the method to use to get a directory listing:

In [16]:
import os
print(os.getcwd())
entries=os.listdir('E://Python Basics/')
print(entries)

E:\Python Basics
['.ipynb_checkpoints', '2019-12-14 (2).png', '5 and 7.png', 'acme.csv', 'acme_file.csv', 'Andrew NG Notes.txt', 'Andrew NG Notes_reversed.txt', 'Daily Activities.txt', 'debug.log', 'Employee.txt', 'employee_file.csv', 'employee_file2.csv', 'Notes.txt', 'Notes_reversed.txt', 'OS Module In Python.ipynb', 'Python Basics.ipynb', 'Reading and  Writing File in Python.ipynb', 'Session 1.1 - Python-Installation and Basics.ipynb', 'Snip.txt', 'Snippets.txt', 'Snip_r.txt']


os.listdir() returns a Python list containing the names of the files and subdirectories in the directory given by the path argument:

A directory listing like that isn’t easy to read. Printing out the output of a call to os.listdir() using a loop helps clean things up:

In [18]:
entries=os.listdir('E://Machine Learning notes/')
for entry in entries:
    print(entry)

Communication Skills.txt
Daily Activities.txt
flask_demo
Flask_demo notes.txt
flask_demo.zip
interview-question-data-science--master
interview-question-data-science--master.zip
Machine Learning Algorithms.txt
MachineLearningModelToAWS-master
MachineLearningModelToAWS-master (1).zip
MachineLearningModelToAzure-master
MachineLearningModelToAzure-master.zip
MachineLearningModelToGCP-master
MachineLearningModelToGCP-master.zip
MachineLearningModelToHeroku-master
MachineLearningModelToHeroku-master.zip
sample resume
sample resume -20191223T164759Z-001.zip
Understanding the Data.txt


Directory Listing in Modern Python Versions

In modern versions of Python, an alternative to os.listdir() is to use os.scandir() and pathlib.Path().

os.scandir() was introduced in Python 3.5 and is documented in PEP 471. os.scandir() returns an iterator as opposed to a list when called:

In [19]:
entries=os.scandir('E://Machine Learning Notes/')
entries

<nt.ScandirIterator at 0x2bb068e2600>

The ScandirIterator points to all the entries in the current directory. You can loop over the contents of the iterator and print out the filenames:

In [21]:
with os.scandir('E://Machine Learning Notes') as entries:
    for entry in entries:
        print(entry.name)

Communication Skills.txt
Daily Activities.txt
flask_demo
Flask_demo notes.txt
flask_demo.zip
interview-question-data-science--master
interview-question-data-science--master.zip
Machine Learning Algorithms.txt
MachineLearningModelToAWS-master
MachineLearningModelToAWS-master (1).zip
MachineLearningModelToAzure-master
MachineLearningModelToAzure-master.zip
MachineLearningModelToGCP-master
MachineLearningModelToGCP-master.zip
MachineLearningModelToHeroku-master
MachineLearningModelToHeroku-master.zip
sample resume
sample resume -20191223T164759Z-001.zip
Understanding the Data.txt


Here, os.scandir() is used in conjunction with the with statement because it supports the context manager protocol. Using a context manager closes the iterator and frees up acquired resources automatically after the iterator has been exhausted.

Another way to get a directory listing is to use the pathlib module:

In [27]:
from pathlib import Path

entries=Path('E://Machine Learning Notes/')
for entry in entries.iterdir():
    print(entry.name)

Communication Skills.txt
Daily Activities.txt
flask_demo
Flask_demo notes.txt
flask_demo.zip
interview-question-data-science--master
interview-question-data-science--master.zip
Machine Learning Algorithms.txt
MachineLearningModelToAWS-master
MachineLearningModelToAWS-master (1).zip
MachineLearningModelToAzure-master
MachineLearningModelToAzure-master.zip
MachineLearningModelToGCP-master
MachineLearningModelToGCP-master.zip
MachineLearningModelToHeroku-master
MachineLearningModelToHeroku-master.zip
sample resume
sample resume -20191223T164759Z-001.zip
Understanding the Data.txt


The objects returned by Path are either PosixPath or WindowsPath objects depending on the OS.

pathlib.Path() objects have an .iterdir() method for creating an iterator of all files and folders in a directory. Each entry yielded by .iterdir() contains information about the file or directory such as its name and file attributes. pathlib was first introduced in Python 3.4 and is a great addition to Python that provides an object oriented interface to the filesystem.

In the example above, you call pathlib.Path() and pass a path argument to it. Next is the call to .iterdir() to get a list of all files and directories in my_directory.

pathlib offers a set of classes featuring most of the common operations on paths in an easy, object-oriented way. Using pathlib is more if not equally efficient as using the functions in os. Another benefit of using pathlib over os is that it reduces the number of imports you need to make to manipulate filesystem paths.

Using pathlib.Path() or os.scandir() instead of os.listdir() is the preferred way of getting a directory listing, especially when you’re working with code that needs the file type and file attribute information. pathlib.Path() offers much of the file and path handling functionality found in os and shutil, and it’s methods are more efficient than some found in these modules. We will discuss how to get file properties shortly.

Function	             Description

os.listdir()	         Returns a list of all files and folders in a directory

os.scandir()	         Returns an iterator of all the objects in a directory including file attribute information

pathlib.Path.iterdir()	 Returns an iterator of all the objects in a directory including file attribute information


These functions return a list of everything in the directory, including subdirectories. This might not always be the behavior you want. The next section will show you how to filter the results from a directory listing.

Listing All Files in a Directory

This section will show you how to print out the names of files in a directory using os.listdir(), os.scandir(), and pathlib.Path(). To filter out directories and only list files from a directory listing produced by os.listdir(), use os.path:

In [34]:

# List all files in a directory using os.listdir
basepath = 'E://CarPrediction Project/'

for entry in os.listdir(basepath):
    if os.path.isfile(os.path.join(basepath, entry)):
        print(entry)

app.py
car data.csv
CAR DETAILS FROM CAR DEKHO.csv
Car_Prediction.ipynb
main.py
Procfile
random_forest_regression_model.pkl
README.md
requirement.txt


Here, the call to os.listdir() returns a list of everything in the specified path, and then that list is filtered by os.path.isfile() to only print out files and not directories. This produces the following output:

An easier way to list files in a directory is to use os.scandir() or pathlib.Path():

In [41]:


# List all files in a directory using scandir()
basepath = 'E://Machine Learning Notes/'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_file():
            print(entry.name)

Communication Skills.txt
Daily Activities.txt
Flask_demo notes.txt
flask_demo.zip
interview-question-data-science--master.zip
Machine Learning Algorithms.txt
MachineLearningModelToAWS-master (1).zip
MachineLearningModelToAzure-master.zip
MachineLearningModelToGCP-master.zip
MachineLearningModelToHeroku-master.zip
sample resume -20191223T164759Z-001.zip
Understanding the Data.txt


Using os.scandir() has the advantage of looking cleaner and being easier to understand than using os.listdir(), even though it is one line of code longer. Calling entry.is_file() on each item in the ScandirIterator returns True if the object is a file. Printing out the names of all files in the directory gives you the following output:

In [42]:
from pathlib import Path

basepath = Path('E://Machine Learning Notes/')
files_in_basepath = basepath.iterdir()
for item in files_in_basepath:
    if item.is_file():
        print(item.name)

Communication Skills.txt
Daily Activities.txt
Flask_demo notes.txt
flask_demo.zip
interview-question-data-science--master.zip
Machine Learning Algorithms.txt
MachineLearningModelToAWS-master (1).zip
MachineLearningModelToAzure-master.zip
MachineLearningModelToGCP-master.zip
MachineLearningModelToHeroku-master.zip
sample resume -20191223T164759Z-001.zip
Understanding the Data.txt


The code above can be made more concise if you combine the for loop and the if statement into a single generator expression.

In [43]:
from pathlib import Path

# List all files in directory using pathlib
basepath = Path('E://Machine Learning Notes/')
files_in_basepath = (entry for entry in basepath.iterdir() if entry.is_file())
for item in files_in_basepath:
    print(item.name)


Communication Skills.txt
Daily Activities.txt
Flask_demo notes.txt
flask_demo.zip
interview-question-data-science--master.zip
Machine Learning Algorithms.txt
MachineLearningModelToAWS-master (1).zip
MachineLearningModelToAzure-master.zip
MachineLearningModelToGCP-master.zip
MachineLearningModelToHeroku-master.zip
sample resume -20191223T164759Z-001.zip
Understanding the Data.txt


This produces exactly the same output as the example before it. This section showed that filtering files or directories using os.scandir() and pathlib.Path() feels more intuitive and looks cleaner than using os.listdir() in conjunction with os.path.

# Listing Subdirectories

To list subdirectories instead of files, use one of the methods below. Here’s how to use os.listdir() and os.path():


In [45]:
import os

# List all subdirectories using os.listdir
basepath = 'E://CarPrediction Project/'
for entry in os.listdir(basepath):
    if os.path.isdir(os.path.join(basepath, entry)):
        print(entry)

.ipynb_checkpoints
templates


Manipulating filesystem paths this way can quickly become cumbersome when you have multiple calls to os.path.join().

In [52]:


# List all subdirectories using os.listdir
basepath = 'E://New Downloads/'
for entry in os.listdir(basepath):
    if os.path.isdir(os.path.join(basepath, entry)):
        print(entry)

4.3.0
Credit-Card-Dataset
Datasets for SAS Training
Download from C
ICU-Data
imarticus-master
lwmc1
lwpg3
New Folder For Important
PGDDS DAY9 ACTIVITY DATA SETS
shakespeare
Verzeo-master
visibility_climate


In [54]:
#Here’s how to use os.scandir()
# List all subdirectories using scandir()
basepath = 'E://New Downloads/'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_dir():
            print(entry.name)

4.3.0
Credit-Card-Dataset
Datasets for SAS Training
Download from C
ICU-Data
imarticus-master
lwmc1
lwpg3
New Folder For Important
PGDDS DAY9 ACTIVITY DATA SETS
shakespeare
Verzeo-master
visibility_climate


As in the file listing example, here you call .is_dir() on each entry returned by os.scandir(). If the entry is a directory, .is_dir() returns True, and the directory’s name is printed out. The output is the same as above:

In [55]:
#Here’s how to use pathlib.Path():
from pathlib import Path

# List all subdirectory using pathlib
basepath = Path('E://New Downloads/')
for entry in basepath.iterdir():
    if entry.is_dir():
        print(entry.name)


4.3.0
Credit-Card-Dataset
Datasets for SAS Training
Download from C
ICU-Data
imarticus-master
lwmc1
lwpg3
New Folder For Important
PGDDS DAY9 ACTIVITY DATA SETS
shakespeare
Verzeo-master
visibility_climate


Calling .is_dir() on each entry of the basepath iterator checks if an entry is a file or a directory. If the entry is a directory, its name is printed out to the screen, and the output produced is the same as the one from the previous example:

# Getting File Attributes

Python makes retrieving file attributes such as file size and modified times easy. This is done through os.stat(), os.scandir(), or pathlib.Path().

os.scandir() and pathlib.Path() retrieve a directory listing with file attributes combined. This can be potentially more efficient than using os.listdir() to list files and then getting file attribute information for each file.

The examples below show how to get the time the files in my_directory/ were last modified. The output is in seconds:

In [60]:
with os.scandir('E://Machine Learning Notes/') as dir_contents:
    for entry in dir_contents:
        info = entry.stat()
        print(info.st_atime)

1603110328.0842679
1603810546.7673225
1603031992.683642
1603028692.7671752
1602937053.0821383
1602934498.4650266
1602934491.3675172
1603110422.6012292
1603279827.573068
1603277919.2488546
1603281155.087778
1603281150.242499
1603207371.660478
1603207367.342267
1603128992.3564856
1603128563.5810304
1602935270.191883
1602935258.3482625
1600613451.2655282


os.scandir() returns a ScandirIterator object. Each entry in a ScandirIterator object has a .stat() method that retrieves information about the file or directory it points to. .stat() provides information such as file size and the time of last modification. In the example above, the code prints out the st_mtime attribute, which is the time the content of the file was last modified.

The pathlib module has corresponding methods for retrieving file information that give the same results:

In [66]:
from pathlib import Path
current_dir = Path('E://Python Basics/')
for path in current_dir.iterdir():
    info = path.stat()
    print(info.st_mtime)

1603891733.5769243
1603872842.7686799
1603809835.720226
1603885347.4449027
1603888049.2689757
1603880153.822116
1603880751.0309038
1603890766.5830302
1603806907.6014438
1603892599.388816
1603887704.1365414
1603888866.4990203
1603872001.8037064
1603880679.1256607
1603897101.9928901
1603804605.7740688
1603891192.9194372
1603289504.5499544
1603871334.2964723
1603871028.7012851
1603881144.9319036


In the example above, the code loops through the object returned by .iterdir() and retrieves file attributes through a .stat() call for each file in the directory list. The st_mtime attribute returns a float value that represents seconds since the epoch. To convert the values returned by st_mtime for display purposes, you could write a helper function to convert the seconds into a datetime object:

In [73]:
from datetime import datetime
from os import scandir

def convert_date(timestamp):
    d = datetime.utcfromtimestamp(timestamp)
    formated_date = d.strftime('%d %b %Y')
    return formated_date

def get_files():
    dir_entries = scandir('E://Machine Learning Notes/')
    for entry in dir_entries:
        if entry.is_file():
            info = entry.stat()
            print(f'{entry.name}\t Last Modified: {convert_date(info.st_mtime)}')

In [74]:
get_files()

Communication Skills.txt	 Last Modified: 19 Oct 2020
Daily Activities.txt	 Last Modified: 27 Oct 2020
Flask_demo notes.txt	 Last Modified: 18 Oct 2020
flask_demo.zip	 Last Modified: 17 Oct 2020
interview-question-data-science--master.zip	 Last Modified: 17 Oct 2020
Machine Learning Algorithms.txt	 Last Modified: 19 Oct 2020
MachineLearningModelToAWS-master (1).zip	 Last Modified: 21 Oct 2020
MachineLearningModelToAzure-master.zip	 Last Modified: 21 Oct 2020
MachineLearningModelToGCP-master.zip	 Last Modified: 20 Oct 2020
MachineLearningModelToHeroku-master.zip	 Last Modified: 19 Oct 2020
sample resume -20191223T164759Z-001.zip	 Last Modified: 17 Oct 2020
Understanding the Data.txt	 Last Modified: 20 Sep 2020


This will first get a list of files in my_directory and their attributes and then call convert_date() to convert each file’s last modified time into a human readable form. convert_date() makes use of .strftime() to convert the time in seconds into a string.

The arguments passed to .strftime() are the following:

1.%d: the day of the month

2.%b: the month, in abbreviated form

3.%Y: the year

# Making Directories

Sooner or later, the programs you write will have to create directories in order to store data in them. os and pathlib include functions for creating directories. We’ll consider these:

Function	            Description

os.mkdir()	            Creates a single subdirectory

pathlib.Path.mkdir()	Creates single or multiple directories

os.makedirs()	        Creates multiple directories, including intermediate directories\



#Creating a Single Directory
To create a single directory, pass a path to the directory as a parameter to os.mkdir():

In [80]:
import os
os.mkdir('E://New Directory/')

If a directory already exists, os.mkdir() raises FileExistsError. Alternatively, you can create a directory using pathlib:

In [82]:
from pathlib import Path

p = Path('E://example_directory/')
p.mkdir()

In [83]:
p.mkdir()

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'E:\\example_directory'

If the path already exists, mkdir() raises a FileExistsError:

To avoid errors like this, catch the error when it happens and let your user know:

In [84]:
from pathlib import Path

p = Path('E://example_directory/')
try:
    p.mkdir()
except FileExistsError as exc:
    print(exc)

[WinError 183] Cannot create a file when that file already exists: 'E:\\example_directory'


Alternatively, you can ignore the FileExistsError by passing the exist_ok=True argument to .mkdir():

In [85]:
from pathlib import Path

p = Path('E://example_directory/')
p.mkdir(exist_ok=True)