## File-like objects

In the simplest case, a file−like object is any object with a read method with an optional size parameter, which
returns a string. When called with no size parameter, it reads everything there is to read from the input source and
returns all the data as a single string.

minidom.parse is a function that can take any file-like object (an input stream). 

In [4]:
### file object as a file-like object

from xml.dom import minidom

fsock = open('binary.xml')
xmldoc = minidom.parse(fsock)
fsock.close()
print xmldoc.toxml()


<?xml version="1.0" ?><!DOCTYPE grammar  PUBLIC '-//diveintopython.org//DTD Kant Generator Pro v1.0//EN'  'kgp.dtd'><grammar>
<ref id="bit">
  <p>0</p>
  <p>1</p>
</ref>
<ref id="byte">
  <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>
</ref>
</grammar>


In [5]:
### url input stream as a file-like object
import urllib
usock = urllib.urlopen('http://slashdot.org/slashdot.rdf') #usock is a file-like object.
xmldoc = minidom.parse(usock)
usock.close()
# print xmldoc.toxml() -- you won't usually want to do this here.

In [8]:
### ERROR: trying to use string as a file-like object doesn't work
contents = "<grammar><ref id='bit'><p>0</p><p>1</p></ref></grammar>"
ssock = contents
xmldoc = minidom.parse(ssock)
ssock.close()
print xmldoc.toxml()

IOError: [Errno 2] No such file or directory: "<grammar><ref id='bit'><p>0</p><p>1</p></ref></grammar>"

In [9]:
### wrap the string so that it behaves like a file-like object. Then it works.
import StringIO

contents = "<grammar><ref id='bit'><p>0</p><p>1</p></ref></grammar>"

#wrap contents so it looks like a file object
ssock = StringIO.StringIO(contents)
xmldoc = minidom.parse(ssock)
ssock.close()
print xmldoc.toxml()

<?xml version="1.0" ?><grammar><ref id="bit"><p>0</p><p>1</p></ref></grammar>


In [10]:
# note that ssock has all these file-like methods
dir(ssock)

['__doc__',
 '__init__',
 '__iter__',
 '__module__',
 'buflist',
 'close',
 'closed',
 'flush',
 'getvalue',
 'isatty',
 'len',
 'next',
 'read',
 'readline',
 'readlines',
 'seek',
 'softspace',
 'tell',
 'truncate',
 'write',
 'writelines']

### Open Anything. 

This function tries to open a URL, then a file, and then treat the source as a string. If it can't be coerced to a string, I guess you're screwed. 



In [14]:
def openAnything(source):
    # try to open with native open function (works if source is filename)
    try:
        fsock = open(source)
        print "this is a file"
        return fsock
    except (IOError, OSError):
        pass

    # try to open with urllib (if source is http, ftp, or file URL)
    import urllib
    try:
        usock = urllib.urlopen(source)
        print "This is a url"
        return usock
    except (IOError, OSError):
        pass

    # treat source as string
    import StringIO
    print "treating this as a string"
    return StringIO.StringIO(str(source))
    
sock = openAnything("binary.xml") # APPARENTLY urllib can open this

this is a file


In [16]:
sock = openAnything('http://slashdot.org/slashdot.rdf')
sock.close()

This is a url


In [17]:
# this doesn't work. Apparently it's not that much like a file object. 
sock.closed

AttributeError: addinfourl instance has no attribute 'closed'

## Command-line arguments

Each command−line argument passed to the program will be in sys.argv, which is just a list.
sys.argv[0] is always the name of the python script you're running.

For anything very complicated, you need getopt. (Surely there must be something better than getopt by now). 

Check out this example (I can't run it cause I'm in a notebook):

getopt() takes 3 inputs:

1. the argument list (which is argv[1:])
2. a string containing all the possible single-character command line flags the script accepts
3. a list of longer command line flags that are equivalent to the single character flags

I notice that there is a colon in the single-character flags "hg:d", and only the terms "help" and "grammar=" are on the other side.

the colon is there because the 'g' flag, which it follows, must be followed by some argument. The equals sign is there on 'grammar' for the same reason.

On the command line, single character flags appear as '-d'. 
Full keyword flags appear with double hyphens: '--help'. 
The order of the short and long flags must match. So specify all the short flags with matching long flags first. 

What does it return? a pair opts, args.

If there is a problem like a non-matching argument, getopt throws getopt.GetoptError. 

opts should contain a list of pairs: (flag, argument). If the flag (like -d) takes no argument, then argument is None.


    def main(argv):
        ''' argv is actually sys.argv[1:] '''
        
        grammar = "kant.xml"
        try:
            opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
        except getopt.GetoptError:
            usage()
            sys.exit(2)
        for opt, arg in opts: # note we are iterating over the tuples in 'opts'
            if opt in ("−h", "−−help"):
                usage()
                sys.exit()
            elif opt == '−d':
                global _debug
                _debug = 1
            elif opt in ("−g", "−−grammar"):
                grammar = arg
                
         # anything left must be command-line arguments, as opposed to flags
         # if there is nothing except the flags, then args will be an empty list
         source = "".join(args)
         k = KantGenerator(grammar, source)
         print k.output()    
