# File I/O and directory management


**ENGSCI233: Computational Techniques and Computer Systems** 

*Department of Engineering Science, University of Auckland*

This notebook contains a few exercises to show how Python is used to explore the file system and extract data from text files.

Let's start with directory exploration. The utility below prints the directory tree.

In [None]:
from tree import tree
tree(skip='*py*')             # the skip argument hides any folders conforming to *py* 

Aside from the two notebook files and `tree.py`, we can see there are two folders - `data` and `processed` - one of which, `data`, contains two subfolders - `station1` and `station2`. There are various text files inside the folders.

Ordinarily, you could open each text file from the filesystem and inspect its contents. However, this is far to vanilla for a Python partisan, and so we can also [do it from within a notebook](http://www.reactiongifs.com/r/but-why.gif) using the `%pycat` command.

In [None]:
# note the use of a '/' to indicate that 'processed' is a directory and that 
# 'station_corrections.txt' is contained within it
%pycat processed/station_corrections.txt

# TASKS
# 1. modify the command above to inspect the contents of station_list.txt
#    and the two data.txt files in the subfolders station1 and station2

# 2. uncomment the genfromtxt command below to pull the corrections out of station_corrections.txt
#    - why does the first row contain 'nan'?

#import numpy as np
#output = np.genfromtxt('processed/station_corrections.txt', delimiter = ';', skip_header=1).T
#print(output)

# 3. duplicate the genfromtxt command to try pull data out of the other files


Let's read the file `station_list.txt` a different way: by opening with a file pointer, and processing the lines, as strings, one by one.

In [None]:
fp = open('data/station_list.txt','r')         # opening in read mode, note again the '/'

hdr = fp.readline()
ln1 = fp.readline()

print(hdr)
print(ln1)

fp.close()

# TASKS
# 1. uncomment the split method below and answer the following questions:
#    - what does split() do?
#    - what are the outputs? what are their types?
#    - what happens if you change ':' to ","? (look carefully, which commas are inside strings
#      and which are separating items of a list)
#    - what does '\n' mean?

#parts = ln1.split(':')
#print(parts)

# 2. uncomment the strip method below and answer the following questions:
#    - what does strip() do?
#    - isn't it pretty rad that we can chain two methods together like that?

#ln1_stripped = ln1.strip().split(':')
#print(ln1_stripped)

# 3. uncomment the float command below to convert the last entry to a number
#    - what is the difference between last_entry and last_number? (hint: try adding 1 to both)?

#last_entry = ln1.strip().split(',')[-1]
#last_number = float(last_entry)
#print(last_entry)
#print(last_number)

# 4. using two split() methods, extract all three numbers from ln1. Do the same for
#    the third line of station_list.txt


Now suppose we want to start looking at files in many different directories. We won't know ahead of time how many directories there are, so we'll need code that is smart and flexible.

In [None]:
# the glob commands lets us find PATHS to files and folders that conform to a particular name pattern
from glob import glob

# for example, ALL files and folders inside the current directory
output = glob('*')
print(output)

# for example, ALL files and folders inside the subdirectory 'data'
output = glob('data/*')
print(output)

# TASKS
# 1. Glob outputs a list. A list of what? What types?

# 2. Uncomment the os.path commands below, which test whether a path is a directory or a file
#    - try putting one of these tests inside an IF condition and then print whether a path
#      points to a directory or a file

#import os
#print(output[0])
#print(os.path.isfile(output[0]))
#print(os.path.isdir(output[0]))

# 3. Uncomment the commands below that concatenate strings to create new paths.
#    - add to the last command so that it prints the path to a file 'data.txt'
#    - test the new path inside os.path.isfile() to verify it returns True

#print(output[1])
#print(output[1] + '/')
#print(output[1] + os.sep)

# 4. Write a loop over the all the paths that:
#    - tests whether the path is a directory
#    - if it IS a directory, reads the data inside of it using np.genfromtxt
