## This chapter in Dive Into Python also covers sys.modules, directory traversal, and for loops.

## Basics: Python uses try...except to handle exceptions and raise to generate them.

standard exception types:

    · Accessing a non−existent dictionary key will raise a KeyError exception.
    · Searching a list for a non−existent value will raise a ValueError exception.
    · Calling a non−existent method will raise an AttributeError exception.
    · Referencing a non−existent variable will raise a NameError exception.
    · Mixing datatypes without coercion will raise a TypeError exception.
    Screwing up file IO will raise an IOError exception.
    Failing an assert statement will cause an AssertionError (I love assert). "assert" is in the base language.
    Importing a module that does not exist will raise an ImportError exception.
    
You can also define your own exceptions by creating a class that inherits from the built−in Exception class, and
then raise your exceptions with the raise command.

A try...except block can have an else clause, like an if statement. If no exception is raised during the
try block, the else clause is executed afterwards. Example:

    try:
        import msvcrt
    except ImportError:
        try:
            from EasyDialogs import AskPassword
        except ImportError:
            getpass = default_getpass
        else:
            getpass = AskPassword
    else:
        getpass = win_getpass

## File Handling

    f = open("/music/_singles/kairo.mp3", "rb")
    
### file methods

The tell() method of a file object tells you your current position in the open file.

The seek() method of a file object moves to another position in the open file. The second parameter
specifies what the first one means; 0 means move to an absolute position (counting from the start of the
file), 1 means move to a relative position (counting from the current position), and 2 means move to a
position relative to the end of the file.

The read() method reads a specified number of bytes from the open file and returns a string with the data
that was read.

The readline() method reads to the next newline character.


In [85]:
print file.__doc__ #explains the modes

file(name[, mode[, buffering]]) -> file object

Open a file.  The mode can be 'r', 'w' or 'a' for reading (default),
writing or appending.  The file will be created if it doesn't exist
when opened for writing or appending; it will be truncated when
opened for writing.  Add a 'b' to the mode for binary files.
Add a '+' to the mode to allow simultaneous reading and writing.
If the buffering argument is given, 0 means unbuffered, 1 means line
buffered, and larger numbers specify the buffer size.  The preferred way
to open a file is with the builtin open() function.
Add a 'U' to mode to open the file for input with universal newline
support.  Any line ending in the input file will be seen as a '\n'
in Python.  Also, a file so opened gains the attribute 'newlines';
the value for this attribute is one of None (no newline read yet),
'\r', '\n', '\r\n' or a tuple containing all the newline types seen.

'U' cannot be combined with 'w' or '+' mode.



In [86]:
fname = "/mnt/xferUbuntu/jupyterNotes/PyRefreshers/music_singles/01_Red Rain_Peter Gabriel.mp3"
f = open(fname, "rb")
f.closed

False

In [87]:
f.close()
f.closed # checks whether the file if closed. 

True

In [88]:
# graceful way to read and close a file
try:
    fsock = open(fname, "rb", 0)
    try:
        fsock.seek(-128, 2) # tags are stored in the last 128 bytes of an MP3 file
        tagdata = fsock.read(128)
        print tagdata
    finally:
        fsock.close()
except IOError:
    pass

TAGRed Rain                      Peter Gabriel                 So                            1986                             


### Environment variables: access with: os.environ (this is a dictionary). 

## Accessing modules from sys.modules

sys.modules is a dictionary containing key=name of module (string), value = module (module)

sys contains some other things, like recursion limit information 

    sys.getrecursionlimit()
    sys.setrecursionlimit(100)
    
As new modules are imported, they are added to sys.modules. This explains why importing the
same module twice is very fast: Python has already loaded and cached the module in
sys.modules, so importing the second time is simply a dictionary lookup.



In [89]:
import sys, os
type(sys)

module

In [90]:
## The function below uses sys.modules to pull the correct file info class for the file type

In [125]:
class FileInfo:
    """base class for FileInfo stack"""
    def __init__(self, filename=None):
        print "FileInfo init"
        self.fname = filename
        
    def __repr__(self):
        return str(self.__class__) and "class %s: filename %s".format(str(self.__class__), str(self.fname)) or "None"
        
        
class TXTFileInfo(FileInfo):
    def __init__(self, filename):
        print "TXTFileInfo init"
        FileInfo.__init__(self, filename)

    
    def __repr__(self):
        FileInfo.__repr__(self)
        
class MP3FileInfo(FileInfo):
    def __init__(self, filename):
        print "MP3FileInfo init"
        FileInfo.__init__(self, filename)
    
    def __repr__(self):
        FileInfo.__repr__(self)

def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
    "get file info class from filename extension"
    
    # get the name of the right subclass of FileInfo
    # os.path.splitext splits the filename into a basename and an extension that includes the period (i.e., ".txt")
    # slicing the extension so it starts at 1 ditches the period.
    # upper() creates the uppercase. 
    # note the use of the formatting operator %
    subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
    
    # note the A and B or C idiom
    # if the extension was a file that doesn't have a fileinfo type, A will be 0 so C (the base class, FileInfo) will be returned. 
    # if it's a recognized type the subclass will be returned.
    return hasattr(module, subclass) and getattr(module, subclass) or FileInfo

finst = getFileInfoClass("filename.txt")("filename.txt")
print finst.__class__
print finst
print getFileInfoClass("filename.mp3")
print getFileInfoClass("filename.avi")

TXTFileInfo init
FileInfo init
__main__.TXTFileInfo
__main__.TXTFileInfo
filename.txt


TypeError: __str__ returned non-string (type NoneType)

## Directories and directory names

os.path is a module whose function is system-dependent. 

In [92]:
import os

# Here's a function I didn't know about. It gets you the 'home directory' on whatever system you're working on.
# it will get you other people's home directories if you use the '~other' syntax.
os.path.expanduser("~")
dirname = os.path.join(os.path.expanduser("~carolyn"), "Music" )
print dirname

/home/carolyn/Music


In [93]:
# os.path.split splits a full path filename into the directory part and the file part.
# it assumes the file part is whatever is on the end.
splitme = os.path.split(dirname)
splitme

('/home/carolyn', 'Music')

In [94]:
# tells you if the object is a file.
# os.path.isfile(dirname)
os.path.isdir(dirname)

True

In [95]:
os.listdir("~") # DOESN'T WORK

OSError: [Errno 2] No such file or directory: '~'

In [96]:
dirlist = os.listdir(os.path.expanduser("~")) 
dirlist

['.compiz',
 'Templates',
 '.local',
 '.mozilla',
 'Documents',
 '.condarc',
 'pyprosail',
 '.Xauthority',
 'mount.sh',
 '.dbus',
 'examples.desktop',
 '.cache',
 '.vboxclient-draganddrop.pid',
 '.bash_history',
 '.profile',
 '.condarc~',
 '.jupyter',
 '.bashrc-anaconda.bak',
 '.conda',
 'mount.sh~',
 '.xsession-errors.old',
 '.bashrc',
 'Pictures',
 '.gconf',
 'Music',
 '.continuum',
 'Desktop',
 'jupyter-notebooks',
 '.vboxclient-seamless.pid',
 '.config',
 '.qgis2',
 '.emacs.d',
 '.ICEauthority',
 'Downloads',
 'Videos',
 '.vboxclient-display.pid',
 'Public',
 'HW1.Prob4.pdf',
 '.bash_logout',
 'anaconda',
 '.xsession-errors',
 '.ipython',
 '.vboxclient-clipboard.pid',
 '.dmrc']

In [97]:
#WORKS: lists files in the home directory.

pdirlist

['/home/carolyn/.compiz',
 '/home/carolyn/Templates',
 '/home/carolyn/.local',
 '/home/carolyn/.mozilla',
 '/home/carolyn/Documents',
 '/home/carolyn/.condarc',
 '/home/carolyn/pyprosail',
 '/home/carolyn/.Xauthority',
 '/home/carolyn/mount.sh',
 '/home/carolyn/.dbus',
 '/home/carolyn/examples.desktop',
 '/home/carolyn/.cache',
 '/home/carolyn/.vboxclient-draganddrop.pid',
 '/home/carolyn/.bash_history',
 '/home/carolyn/.profile',
 '/home/carolyn/.condarc~',
 '/home/carolyn/.jupyter',
 '/home/carolyn/.bashrc-anaconda.bak',
 '/home/carolyn/.conda',
 '/home/carolyn/mount.sh~',
 '/home/carolyn/.xsession-errors.old',
 '/home/carolyn/.bashrc',
 '/home/carolyn/Pictures',
 '/home/carolyn/.gconf',
 '/home/carolyn/Music',
 '/home/carolyn/.continuum',
 '/home/carolyn/Desktop',
 '/home/carolyn/jupyter-notebooks',
 '/home/carolyn/.vboxclient-seamless.pid',
 '/home/carolyn/.config',
 '/home/carolyn/.qgis2',
 '/home/carolyn/.emacs.d',
 '/home/carolyn/.ICEauthority',
 '/home/carolyn/Downloads',
 '/ho

In [98]:
# I had a lot of trouble just now with the one-step list comprehension:
# justdirs = [y for y in [os.path.join(os.path.expanduser("~"), x) for x in dirlist] if os.path.isdir(y)]
# so I did it in two, and that worked.
pdirlist = [os.path.join(os.path.expanduser("~"), x) for x in dirlist]
justdirs = [y for y in pdirlist if os.path.isdir(y)]

In [99]:
justdirs

['/home/carolyn/.compiz',
 '/home/carolyn/Templates',
 '/home/carolyn/.local',
 '/home/carolyn/.mozilla',
 '/home/carolyn/Documents',
 '/home/carolyn/pyprosail',
 '/home/carolyn/.dbus',
 '/home/carolyn/.cache',
 '/home/carolyn/.jupyter',
 '/home/carolyn/.conda',
 '/home/carolyn/Pictures',
 '/home/carolyn/.gconf',
 '/home/carolyn/Music',
 '/home/carolyn/.continuum',
 '/home/carolyn/Desktop',
 '/home/carolyn/jupyter-notebooks',
 '/home/carolyn/.config',
 '/home/carolyn/.qgis2',
 '/home/carolyn/.emacs.d',
 '/home/carolyn/Downloads',
 '/home/carolyn/Videos',
 '/home/carolyn/Public',
 '/home/carolyn/anaconda',
 '/home/carolyn/.ipython']

In [100]:
justdirs2 = [y for y in [os.path.join(os.path.expanduser("~"), x) for x in dirlist] if os.path.isdir(y)]

In [101]:
justdirs2 # it worked!

['/home/carolyn/.compiz',
 '/home/carolyn/Templates',
 '/home/carolyn/.local',
 '/home/carolyn/.mozilla',
 '/home/carolyn/Documents',
 '/home/carolyn/pyprosail',
 '/home/carolyn/.dbus',
 '/home/carolyn/.cache',
 '/home/carolyn/.jupyter',
 '/home/carolyn/.conda',
 '/home/carolyn/Pictures',
 '/home/carolyn/.gconf',
 '/home/carolyn/Music',
 '/home/carolyn/.continuum',
 '/home/carolyn/Desktop',
 '/home/carolyn/jupyter-notebooks',
 '/home/carolyn/.config',
 '/home/carolyn/.qgis2',
 '/home/carolyn/.emacs.d',
 '/home/carolyn/Downloads',
 '/home/carolyn/Videos',
 '/home/carolyn/Public',
 '/home/carolyn/anaconda',
 '/home/carolyn/.ipython']

## the glob module

The glob module, on the other hand, takes a wildcard and returns the full path of all files and
directories matching the wildcard. In short it works like the unix command line.

In [102]:
import glob
glob.glob(os.path.join(os.path.expanduser("~"), ".*")) #finds all your dot files & dirs

['/home/carolyn/.compiz',
 '/home/carolyn/.local',
 '/home/carolyn/.mozilla',
 '/home/carolyn/.condarc',
 '/home/carolyn/.Xauthority',
 '/home/carolyn/.dbus',
 '/home/carolyn/.cache',
 '/home/carolyn/.vboxclient-draganddrop.pid',
 '/home/carolyn/.bash_history',
 '/home/carolyn/.profile',
 '/home/carolyn/.condarc~',
 '/home/carolyn/.jupyter',
 '/home/carolyn/.bashrc-anaconda.bak',
 '/home/carolyn/.conda',
 '/home/carolyn/.xsession-errors.old',
 '/home/carolyn/.bashrc',
 '/home/carolyn/.gconf',
 '/home/carolyn/.continuum',
 '/home/carolyn/.vboxclient-seamless.pid',
 '/home/carolyn/.config',
 '/home/carolyn/.qgis2',
 '/home/carolyn/.emacs.d',
 '/home/carolyn/.ICEauthority',
 '/home/carolyn/.vboxclient-display.pid',
 '/home/carolyn/.bash_logout',
 '/home/carolyn/.xsession-errors',
 '/home/carolyn/.ipython',
 '/home/carolyn/.vboxclient-clipboard.pid',
 '/home/carolyn/.dmrc']

## SUMMARY

Putting all the pieces together.

In [103]:
def listDirectory(directory, fileExtList):
    "get list of file info objects for files of particular extensions"
    
    #normcase normalizes the filename case. For Linux this does nothing.
    fileList = [os.path.normcase(f)
        for f in os.listdir(directory)] # multiline is okay for a list comprehension
    
    
    # note this is one of those 2-step file comprehensions like I did before.
    # the 2nd step just gets the full pathnames of files that have an extension matching one in fileExtList.
    fileList = [os.path.join(directory, f)
        for f in fileList
        if os.path.splitext(f)[1] in fileExtList]
    
    # this function was analyzed above. It returns a subclass of FileInfo if the extension is matched. 
    # it returns FileInfo otherwise. 
    def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
        "get file info class from filename extension"
        
        # gets the extension part of the filename, uppercases it, and removes the period, then sticks it onto the classname
        subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
        # print subclass (for debug)
        return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
    
    # getFileInfoClass(f) returns the class object for the subclass of FileInfo that matches your extension.
    # say that's TXTFileInfo. Then the second call returns TXTFileInfo(f), which is a constructor for that class, called with the filename. 
    return [getFileInfoClass(f)(f) for f in fileList]

In [104]:
hasattr(sys.modules[FileInfo.__module__], "MP3FileInfo")

True

In [106]:
inst = listDirectory("/mnt/xferUbuntu/jupyterNotes/PyRefreshers/music_singles", [".mp3"])


In [109]:
len(inst)
print inst

TypeError: __repr__ returned non-string (type NoneType)