# Sort files sensibly

By default:

1. The order in which files are accessed from a disk is not predictable, which means that when the order matters we need to specify it.
1. The default order supported by Python sorting may not be informationally correct. For example, files *file9.txt*, *file10.text*, *file11.txt* will sort with *file9.txt* last because the sort is alphabetical by default. This could be fixed by changing the filenames (e.g., *file09.txt* instead of *file9.txt*), but we may not always have the access rights needed to rename files.

This function:

1. Assumes that filenames have a text portion followed by an integer portion followed by a filename extension, e.g., *file9.txt* consists of *file* (text portion), then *9* (integer portion), then *.txt* (filename extension). The integer can be of any length and may or may not have leading (padding) zeros. There must be no other integers in the filename.
1. Isolates the integer (e.g., *9*) and uses it to sort the filenames.
1. Returns the filenames in sorted order.

Given input of `['file10.txt', 'file9.txt', 'file11.txt']` the function will return `['file9.txt', 'file10.txt', 'file11.txt]`.

The function is not guaranteed to return meaningful results if the filenames do not conform to the assumption above.

In [1]:
# Initialize
import os, glob # manage file access
import re # to isolate integer portion of filename
import pprint # pretty print output
pp = pprint.PrettyPrinter(indent=4)

In [2]:
def sort_filenames(filepath_pattern):
    """Sort filenames by integer component
    
    Parameter: filepath_pattern as string
        (filepath glob pattern, with wildcard for integer portion)
    
    Returns: list of file paths as strings
    
    Notes:
        Assumes filenames of shape \D+\d+\.\D+, that is, 
            non-digits, then digits, then dot, then non-digits
        Assumes no digits anywhere else in the path
    """
    pat = re.compile(r'\D+(\d+)') # regex to capture integer portion of filename
    filepaths = glob.glob(filepath_pattern)
    filepaths.sort(key=lambda x: int(pat.match(x).group(1))) # sort by integer portion
    return (filepaths)

In [3]:
# Example to test sorting
path = os.path.join('private', 'ch*.pkl') # form path from tuple of strings
print("Filepath glob\n---")
pp.pprint(f"{path=}") # show file glob path
print("---\nUnsorted order\n---")
pp.pprint(f"{glob.glob(path)=}") # show unsorted order
print("---\nSorted order")
pp.pprint(f"{sort_filenames(path)=}") # show sorted order

Filepath glob
---
"path='private/ch*.pkl'"
---
Unsorted order
---
("glob.glob(path)=['private/ch1.pkl', 'private/ch2.pkl', 'private/ch3.pkl', "
 "'private/ch7.pkl', 'private/ch6.pkl', 'private/ch4.pkl', 'private/ch5.pkl', "
 "'private/ch20.pkl', 'private/ch21.pkl', 'private/ch23.pkl', "
 "'private/ch22.pkl', 'private/ch32.pkl', 'private/ch26.pkl', "
 "'private/ch27.pkl', 'private/ch33.pkl', 'private/ch19.pkl', "
 "'private/ch25.pkl', 'private/ch31.pkl', 'private/ch30.pkl', "
 "'private/ch24.pkl', 'private/ch18.pkl', 'private/ch15.pkl', "
 "'private/ch29.pkl', 'private/ch28.pkl', 'private/ch14.pkl', "
 "'private/ch16.pkl', 'private/ch17.pkl', 'private/ch13.pkl', "
 "'private/ch12.pkl', 'private/ch10.pkl', 'private/ch11.pkl', "
 "'private/ch8.pkl', 'private/ch9.pkl']")
---
Sorted order
("sort_filenames(path)=['private/ch1.pkl', 'private/ch2.pkl', "
 "'private/ch3.pkl', 'private/ch4.pkl', 'private/ch5.pkl', 'private/ch6.pkl', "
 "'private/ch7.pkl', 'private/ch8.pkl', 'private/ch9.pkl', 'p