## File System Navigation

Folders can contain folders or files. Since folders can contain other folders, they are a recursive data structure. In fact, they are a kind of recursive structure called a tree (where each value has exactly one parent, and there is a topmost or "root" value). Traversing such a recursive data structure is a natural use of a recursive algorithm!


In [None]:
import os
def printFiles(path):
    # Base Case: a file. Just print the path name.
    if os.path.isfile(path):
        print(path)
    else:
        # Recursive Case: a folder. Iterate through its files and folders.
        for filename in os.listdir(path):
            printFiles(path + '/' + filename)

printFiles('sample_data')

# Note: if you see .DS_Store files in the sampleFiles folders, or in the
# output of your function (as often happens with Macs, in particular),
# don't worry: this is just a metadata file and can be safely ignored.

listFiles

In [None]:
import os
def listFiles(path):
    if os.path.isfile(path):
        # Base Case: return a list of just this file
        return [ path ]
    else:
        # Recursive Case: create a list of all the recursive results from
        # all the folders and files in this folder
        files = [ ]
        for filename in os.listdir(path):
            files += listFiles(path + '/' + filename)
        return files

print(listFiles('sample_data'))

removeTempFiles  

Note: Be careful when using os.remove(): it's permanent and cannot be undone!

That said, this can be handy, say to remove .DS_Store files on Macs, and can be modified to remove other kinds of files, too. Just be careful.

In [None]:
import os
def removeTempFiles(path, suffix='.DS_Store'):
    if path.endswith(suffix):
        print(f'Removing file: {path}')
        os.remove(path)
    elif os.path.isdir(path):
        for filename in os.listdir(path):
            removeTempFiles(path + '/' + filename, suffix)

removeTempFiles('sample_data') # be careful

## Memoization

Memoization is a general technique to make certain recursive applications more efficient. The Big idea: when a program does a lot of repetitive computation, store results as they are computed, then look up and reuse results when available.

1. The problem

In [None]:
def fib(n):
    if (n < 2):
        return 1
    else:
        return fib(n-1) + fib(n-2)

import time

def testFib(maxN=40):
    for n in range(maxN+1):
        start = time.time()
        fibOfN = fib(n)
        ms = 1000*(time.time() - start)
        print(f'fib({n:2}) = {fibOfN:8}, time = {ms:5.2f}ms')

testFib() # gets really slow!

2. A solution:

In [None]:
fibResults = dict()

def fib(n):
    if (n in fibResults):
        return fibResults[n]
    if (n < 2):
        result = 1
    else:
        result = fib(n-1) + fib(n-2)
    fibResults[n] = result
    return result

import time
def testFib(maxN=40):
    for n in range(maxN+1):
        start = time.time()
        fibOfN = fib(n)
        ms = 1000*(time.time() - start)
        print(f'fib({n:2}) = {fibOfN:8}, time = {ms:5.2f}ms')

testFib() # ahhh, much better!

3. A more elegant solution:

In [None]:
def memoized(f):
    # You are not responsible for how this decorator works. You can just use it!
    import functools
    cachedResults = dict()
    @functools.wraps(f)
    def wrapper(*args):
        if args not in cachedResults:
            cachedResults[args] = f(*args)
        return cachedResults[args]
    return wrapper

@memoized
def fib(n):
    if (n < 2):
        return 1
    else:
        return fib(n-1) + fib(n-2)

import time
def testFib(maxN=40):
    for n in range(maxN+1):
        start = time.time()
        fibOfN = fib(n)
        ms = 1000*(time.time() - start)
        print(f'fib({n:2}) = {fibOfN:8}, time = {ms:5.2f}ms')

testFib() # ahhh, much better!