# File I/O and directory management


**ENGSCI233: Computational Techniques and Computer Systems** 

*Department of Engineering Science, University of Auckland*

This notebook contains a few exercises to show how Python is used to explore the file system and extract data from text files.

Let's start with directory exploration. The utility below prints the directory tree.

In [None]:
from pathlib import Path

def tree(directory=Path.cwd()):
    print(f'+ {directory}')
    for path in sorted(directory.rglob('*')):
        depth = len(path.relative_to(directory).parts)
        spacer = '    ' * depth
        print(f'{spacer}+ {path.name}')       
        
tree()

In [None]:
Path.cwd()

In [None]:
tree(Path.cwd() / 'data')

In [None]:
Path.cwd() / 'data'

In [None]:
Path.cwd()

In [None]:
Path.home()

Aside from the three notebook files `*.ipynb`, we can see there are three folders - `.ipynb_checkpoints`, `data` and `processed` - one of which, `data`, contains two subfolders - `station1` and `station2`. There are various text files inside the folders.

Ordinarily, you could open each text file from the filesystem and inspect its contents. However, this is far to vanilla for a Python partisan, and so we can also [do it from within a notebook](http://www.reactiongifs.com/r/but-why.gif) using the `%pycat` command.

In [None]:
# note the use of a '/' to indicate that 'processed' is a directory and that 
# 'station_corrections.txt' is contained within it
%pycat processed/station_corrections.txt

In [None]:
import numpy as np
output = np.genfromtxt('processed/station_corrections.txt', delimiter = ';', skip_header=1).T
print(output)


In [None]:
%pycat data/station_list.txt

Let's read the file `station_list.txt` a different way: by opening with a file pointer, and processing the lines, as strings, one by one.

In [None]:
fp = open('data/station_list.txt','r')         # opening in read mode, note again the '/'

hdr = fp.readline()
ln1 = fp.readline()

print(hdr)
print(ln1)

fp.close()

In [None]:
type(ln1)

In [None]:
parts = ln1.split(':')
print(parts)  # Note the whitespace

In [None]:
ln1_stripped = [d.strip() for d in parts]
print(ln1_stripped)

In [None]:
sum([float(d.strip()) for d in ln1_stripped[1].split(',')])

In [None]:
def process_data_input(ln1):
    parts = ln1.split(':')
    ln1_stripped = [d.strip() for d in parts]
    total = sum([float(d.strip()) for d in ln1_stripped[1].split(',')])
    return ln1_stripped[0], total

process_data_input(ln1)

Now suppose we want to start looking at files in many different directories. We won't know ahead of time how many directories there are, so we'll need code that is smart and flexible.

In [None]:
path = Path.cwd()
path

In [None]:
path.glob('*')

In [None]:
list(path.glob('*'))

In [None]:
output = list(path.glob('data/*'))
output

In [None]:
# the glob commands lets us find PATHS to files and folders that conform to a particular name pattern
from glob import glob

# for example, ALL files and folders inside the current directory
output = list(path.glob('data/*'))
print(output)

In [None]:
# for example, ALL files and folders inside the subdirectory 'data'
output = glob('data/*')
print(output)