# Working with files (file system I/O)
HCI574 lectures 11, 25 and 26

- file: 
    - a file (text, image, data) on your "local" drive (incl. cloud connections like Box)
    - stream of data between you local drive and your python code
- as opposed to stream of data from a internet source (server, via URL)
- advanced topic: there are also "fake" files (streams) that live in fact fully in memory

## OS (operating system) file I/O in python
- python modules: os, os.path, glob
- simulates what human users do via a OS file manager (or terminal)
    - see a list of files (os.listdir() or glob.glob())
    - make a new folder os.mkdir()
    - rename, move, copy, delete file or folders (module shutil)
- if something fails (no permission, file not found) you'll get an exception with error info


- the interpeter always "sits" inside a folder within your folder tree 
- when a .py/.ipynb file is started it typically sits in the same folder as the .py file (exception: if you have a workspace ins VS it will be the folder with the .code-workspace file)
- a path is the description of a route through folders (string!)
- Windows: 
    - drive letter as root, use \ as separator 
    - use a raw string (r in front)  r"c:\Users\Chris\tmp\bla.txt" path to a file 
- Mac/*nix: / is root,  use / as separators no need for r : "/Users/chris/tmp/bla.txt"
- absolute path:  always start from root: r"c:\Users\Chris\tmp\" (folder)   
- relative path: start from current folder: 
    - assume we are sitting in "c:\Users\Chris"
    - "../Bob/tmp/stuff.doc" means go up from Chris to Users (..) the into Bob, etc.


In [2]:
# some common file system related operations
import os

# what folder (path) are we sitting in right now?
print(os.getcwd())  # get current working directory (=folder)

# make a folder in current folder
try:
    os.mkdir("tmp")  # will raise an error if folder already exists!
except Exception as e:
    print(e) # print error message
else:
    print("tmp was created")

print("currently I'm sitting in", os.getcwd()) # we are still in the same folder!

c:\Users\charding\Box\HCI584\HCI584_python_refresher\refresher part 4
[WinError 183] Cannot create a file when that file already exists: 'tmp'
currently I'm sitting in c:\Users\charding\Box\HCI584\HCI584_python_refresher\refresher part 4


In [3]:
# go into tmp (relative path)
os.chdir("./tmp") 
print("currently I'm sitting in", os.getcwd())

currently I'm sitting in c:\Users\charding\Box\HCI584\HCI584_python_refresher\refresher part 4\tmp


## writing and read into files

- open() takes a file name and a mode (read, write, append) and creates a file object (file pointer in C/C++)
- we then read/write through this file object and close it when done
- reading is done with read() (or readlines())
- writing/appending is done with print(<data>, file=<file object>)


In [4]:
# Write string into a new text file and read it back

# Long text with several lines (\n)
txt = """ Python Wikipedia intro

Python is an interpreted, high-level, general-purpose programming language. 
Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace.

End.
"""
print(repr(txt)) # repr() will make us see the \n

# make file object pointing to file name, (over) write mode
fo = open("python_intro.txt", "w+") # w+ means overwrite if already exists
print(txt, file=fo)
fo.close() # <= important!

# again, now for reading 
fo =  open("python_intro.txt", "r")
full = fo.read() # read in entire string
fo.close()
print(full)

fo =  open("python_intro.txt", "r")
lines = fo.readlines() # read in but turn into list of lines (\n for line ends)
fo.close()

# plit each line into words
for l in lines:
    print(l.split())



" Python Wikipedia intro\n\nPython is an interpreted, high-level, general-purpose programming language. \nCreated by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace.\n\nEnd.\n"
 Python Wikipedia intro

Python is an interpreted, high-level, general-purpose programming language. 
Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace.

End.


['Python', 'Wikipedia', 'intro']
[]
['Python', 'is', 'an', 'interpreted,', 'high-level,', 'general-purpose', 'programming', 'language.']
['Created', 'by', 'Guido', 'van', 'Rossum', 'and', 'first', 'released', 'in', '1991,', "Python's", 'design', 'philosophy', 'emphasizes', 'code', 'readability', 'with', 'its', 'notable', 'use', 'of', 'significant', 'whitespace.']
[]
['End.']
[]


In [5]:
import shutil # OS shell util 

# copy guido.jpg from parent folder to current folder (.)
shutil.copy("../guido.jpg", ".") # even on Win, you can usually use / for os functions


# list names of file/folders following a pattern description
from glob import glob # simple pattern matcher

print(glob("*")) # * matches all file names
print(glob("*.txt")) # all files ending in .txt
print(glob("g*.jpg")) # all files starting with g and ending in .jpg

# delete file
try:
    os.remove("guido.jpg")  # for some reason there's no shutil.remove() !
except Exception as e:
    print(e) # print error message

# rename file
try:
    os.rename("python_intro.txt", "bla.txt")
except Exception as e:
    print(e) # print error message

# list content of tmp
print(glob("../tmp/*"))

['guido.jpg', 'python_intro.txt']
['python_intro.txt']
['guido.jpg']
['../tmp\\bla.txt']
