# Reading QM outputs

In [1]:
# This is to prevent stale code from being executed in those pesky *.pyc files, just in case.
%load_ext autoreload
%autoreload 2

There are a few ways of opening text files, reading them, and parsing their contents. I'll use one of our CO2 frequency outputs as an example.

In [2]:
import os

In [3]:
os.listdir()

['.git',
 '.ipynb_checkpoints',
 'Reading QM outputs.ipynb',
 'qm_files',
 'Plotting.html',
 'Plotting.ipynb',
 'Frequency Calculations.ipynb',
 'LICENSE.txt',
 'Frequency Calculations.html',
 '.gitignore',
 'README.md',
 'Reading QM outputs.html']

In [4]:
help(os.listdir)

Help on built-in function listdir in module posix:

listdir(...)
    listdir(path='.') -> list_of_filenames
    
    Return a list containing the names of the files in the directory.
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.
    
    path can be specified as either str or bytes.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    On some platforms, path may also be specified as an open file descriptor;
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.



In [5]:
os.listdir(path="qm_files")

['drop_0375_0qm_0mm.out']

In [6]:
filename = "qm_files/drop_0375_0qm_0mm.out"
print(filename)

qm_files/drop_0375_0qm_0mm.out


## Opening a file 1

Python has a built-in function called `open` that takes a filename as a string and returns a handle to it that we can work with, so it can be read, looped over, and closed.

In [7]:
help(open)

Help on built-in function open in module io:

open(...)
    open(file, mode='r', buffering=-1, encoding=None,
         errors=None, newline=None, closefd=True, opener=None) -> file object
    
    Open file and return a stream.  Raise IOError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the f

In [8]:
handle = open(filename)
print(handle)

<_io.TextIOWrapper name='qm_files/drop_0375_0qm_0mm.out' mode='r' encoding='UTF-8'>


In [9]:
print(type(handle))

<class '_io.TextIOWrapper'>


These will say something slightly different in Python 2 vs. 3, but we work with them in exactly the same way. Here are all the methods that are defined on the handle to our file.

In [10]:
dir(handle)

['_CHUNK_SIZE',
 '__class__',
 '__del__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_checkClosed',
 '_checkReadable',
 '_checkSeekable',
 '_checkWritable',
 '_finalizing',
 'buffer',
 'close',
 'closed',
 'detach',
 'encoding',
 'errors',
 'fileno',
 'flush',
 'isatty',
 'line_buffering',
 'mode',
 'name',
 'newlines',
 'read',
 'readable',
 'readline',
 'readlines',
 'seek',
 'seekable',
 'tell',
 'truncate',
 'writable',
 'write',
 'writelines']

This list of strings represents all the methods or member variables that can be called on the handle.

If our file handle is called `handle`, and we see `name` is a member of that list, we want to know what it does.

In [11]:
help(handle.name)

no Python documentation found for 'qm_files/drop_0375_0qm_0mm.out'



In [12]:
type(handle.name)

str

In [13]:
handle.name

'qm_files/drop_0375_0qm_0mm.out'

In [14]:
handle.name()

TypeError: 'str' object is not callable

So, through a little experimentation, we've figured out that it isn't a function, it's a variable. If it was a function, we'd be able to call it like above.

You're probably wondering what all the names that begin with `__` or `_` are. These are methods or member variables that aren't meant to be used directly by the user; they're for "under-the-hood" operations only. Let's look only at the parts of `handle` we're supposed to use.

In [15]:
print([m for m in dir(handle) if m[0] != '_'])

['buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']


We can look at the type of all these too:

In [16]:
for m in dir(handle):
    if m[0] != '_':
        print(m, type(eval("handle.{}".format(m))))

buffer <class '_io.BufferedReader'>
close <class 'builtin_function_or_method'>
closed <class 'bool'>
detach <class 'builtin_function_or_method'>
encoding <class 'str'>
errors <class 'str'>
fileno <class 'builtin_function_or_method'>
flush <class 'builtin_function_or_method'>
isatty <class 'builtin_function_or_method'>
line_buffering <class 'bool'>
mode <class 'str'>
name <class 'str'>
newlines <class 'NoneType'>
read <class 'builtin_function_or_method'>
readable <class 'builtin_function_or_method'>
readline <class 'builtin_function_or_method'>
readlines <class 'builtin_function_or_method'>
seek <class 'builtin_function_or_method'>
seekable <class 'builtin_function_or_method'>
tell <class 'builtin_function_or_method'>
truncate <class 'builtin_function_or_method'>
writable <class 'builtin_function_or_method'>
write <class 'builtin_function_or_method'>
writelines <class 'builtin_function_or_method'>


Since we're interested in reading from a file, there are a few methods that sound like they can read.

In [17]:
help(handle.readable)

Help on built-in function readable:

readable(...) method of _io.TextIOWrapper instance



In [18]:
handle.readable()

True

We can read from the file. I should hope so, we just opened it!

In [19]:
help(handle.read)

Help on built-in function read:

read(...) method of _io.TextIOWrapper instance



Well that isn't very helpful...let's look at the official documentation.

In [20]:
# This will let us do some neat stuff with the notebook, like embed webpages and videos.
import IPython

In [21]:
website = "https://docs.python.org/3.5/tutorial/inputoutput.html#reading-and-writing-files"
IPython.lib.display.IFrame(website, width=800, height=800)

After a bit of light reading, it looks like we can do `contents = handle.read()` and `contents` will be a giant string that contains all of the file contents. Only one way to find out...

In [22]:
contents = handle.read()

In [23]:
print(contents)

                  Welcome to Q-Chem
     A Quantum Leap Into The Future Of Chemistry


 Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)

 Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  
 J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  
 M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  
 T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  
 M. A. Rohrdanz,  R. P. Steele,  E. J. Sundstrom,  H. L. Woodcock III,  
 P. M. Zimmerman,  D. Zuev,  B. Albrecht,  E. Alguire,  B. Austin,  
 S. A. Baeppler,  G. J. O. Beran,  Y. A. Bernard,  E. Berquist,  
 K. Brandhorst,  K. B. Bravaya,  S. T. Brown,  D. Casanova,  C.-M. Chang,  
 Y. Chen,  S. H. Chien,  K. D. Closser,  D. L. Crittenden,  M. Diedenhofen,  
 R. A. DiStasio Jr.,  H. Do,  A. D. Dutoi,  R. G. Edgar,  P.-T. Fang,  
 S. Fatehi,  Q. Feng,  L. Fusti-Molnar,  A. Ghysels,  
 A. Golubeva-Zadorozhnaya,  J. Gomes,  A. Gunina,  M. W. D. Hanson-Heine, 

In [24]:
print(len(contents))

17970


Can you see why we don't normally print entire files to the screen? This isn't even a big one.

I wonder if the `handle` is still `readable`...

In [25]:
handle.readable()

True

What if I try and read from it again?

In [26]:
contents2 = handle.read()
print(contents2)




Nothing! So, the end of the file's been reached, and we might as well close it, since we'll be working with `contents`, not the file (handle) itself.

In [27]:
handle.close()
handle.closed

True

Just to reiterate, here's what we actually did to open a file, read it into a string, then close it:

In [28]:
handle = open(filename)
contents = handle.read()
handle.close()

## Opening a file 2

There were a few other methods that we could call on our `handle` that had to do with reading, specifically `handle.readline()` and `handle.readlines()`.

`readline` will read a single line from an open file handle up until a newline (which is the character `'\n'`). Basically, every time you see a linebreak or hit return, this invisible character is present.

In [29]:
handle = open(filename)

first_line = handle.readline()
second_line = handle.readline()

print(first_line)
print(second_line)
handle.close()

                  Welcome to Q-Chem

     A Quantum Leap Into The Future Of Chemistry



Notice that the newlines are being interpreted and printed. I think this is because of the `print()` function.

In [30]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [31]:
print(first_line, end='')
print(second_line, end='')

                  Welcome to Q-Chem
     A Quantum Leap Into The Future Of Chemistry


`handle.readlines()` does the same thing as `readline()` looped over the entire file, so it returns a list of strings.

In [32]:
handle = open(filename)

contents3 = handle.readlines()

handle.close()

In [33]:
print(contents3[:10])

['                  Welcome to Q-Chem\n', '     A Quantum Leap Into The Future Of Chemistry\n', '\n', '\n', ' Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)\n', '\n', ' Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  \n', ' J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  \n', ' M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  \n', ' T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  \n']


In [34]:
contents3[:10]

['                  Welcome to Q-Chem\n',
 '     A Quantum Leap Into The Future Of Chemistry\n',
 '\n',
 '\n',
 ' Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)\n',
 '\n',
 ' Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  \n',
 ' J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  \n',
 ' M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  \n',
 ' T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  \n']

There's another convenient method for strings that lets us do the same thing to a large string; rather than call `split()`, which will split on spaces, we call `splitlines()`, which will split on newlines.

In [35]:
contents.splitlines()[:10]

['                  Welcome to Q-Chem',
 '     A Quantum Leap Into The Future Of Chemistry',
 '',
 '',
 ' Q-Chem 4.3 (beta), Q-Chem, Inc., Pleasanton, CA (2015)',
 '',
 ' Y. Shao,  Z. Gan,  E. Epifanovsky,  A. T. B. Gilbert,  M. Wormit,  ',
 ' J. Kussmann,  A. W. Lange,  A. Behn,  J. Deng,  X. Feng,  D. Ghosh,  ',
 ' M. Goldey,  P. R. Horn,  L. D. Jacobson,  I. Kaliman,  R. Z. Khaliullin,  ',
 ' T. Kus,  A. Landau,  J. Liu,  E. I. Proynov,  Y. M. Rhee,  R. M. Richard,  ']

Notice that the newlines have been removed in this case. Hopefully that doesn't bite us in the future; it may or may not be important for what we do.

## Opening a file 3

I think I've shown all the ways contents of files can be *read*, but what about the opening and closing? There's an easier way, one where the file will be closed for us automatically.

In [36]:
with open(filename) as handle2:
    contents4 = handle2.read()

In [37]:
handle2.closed

True

This is what we call "syntactic sugar", it's something convenient. Using either is fine. I personally prefer doing it this way, because if you open a bunch of files and forget to close them, over and over again, eventually your memory usage will grow and things might get unbearably slow.

## Looping over a file 1

Just like everything else in Python, there are a couple of ways to do this. We can either loop over the contents of the file we've stored in our `contents` variable, or we can loop over the file directly. Yes, file handles are iterable, just like lists and tuples!

Here's directly looping over the file:

In [38]:
with open(filename) as handle:
    for line in handle:
        if 'Albrecht' in line:
            print(line)
        if 'Berquist' in line:
            print(line)
        if 'Lambrecht' in line:
            print(line)

 P. M. Zimmerman,  D. Zuev,  B. Albrecht,  E. Alguire,  B. Austin,  

 S. A. Baeppler,  G. J. O. Beran,  Y. A. Bernard,  E. Berquist,  

 C.-P. Hsu,  Y. Jung,  J. Kong,  D. S. Lambrecht,  W. Liang,  C. Ochsenfeld,  



and here's looping over our stored variable:

In [39]:
for line in contents.splitlines():
    if 'time' in line:
        print(line)

 Total DFTman time = 0.22 CPUs 0.22 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.25 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.25 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 Total DFTman time = 0.24 CPUs 0.24 Wall
 SCF time:  CPU 2.83 s  wall 2.83 s
 Total DFTman time = 1.83 CPUs 1.83 Wall
 Total DFTman time = 1.55 CPUs 1.55 Wall
 Total DFTman time = 1.81 CPUs 1.82 Wall
 Total DFTman time = 1.82 CPUs 1.82 Wall
 Total DFTman time = 1.86 CPUs 1.86 Wall
 Total DFTman time = 1.82 CPUs 1.82 Wall
 Total DFTman time = 1.82 CPUs 1.82 Wall
 Total DFTman time = 2.29 CPUs 2.29 Wall
 Total DFTman time = 5.54 CPUs 5.54 Wall
 Gradient time:  CPU 21.70 s  wall 21.71 s
 Total job time:  24.77s(wall), 24.75s(cpu) 


Notice that I called `contents.splitlines()`; this way, we make a list of strings, so iterating will give us one string at a time.

We can't loop over `contents` directly. Why not?

In [40]:
for line in contents[2000:2500]:
    print(line)

u
,
 
 
I
.
 
Y
.
 
Z
h
a
n
g
,
 
 
X
.
 
Z
h
a
n
g
,
 
 
Y
.
 
Z
h
a
o
,
 
 


 
B
.
 
R
.
 
B
r
o
o
k
s
,
 
 
G
.
 
K
.
 
L
.
 
C
h
a
n
,
 
 
D
.
 
M
.
 
C
h
i
p
m
a
n
,
 
 
C
.
 
J
.
 
C
r
a
m
e
r
,
 
 


 
W
.
 
A
.
 
G
o
d
d
a
r
d
 
I
I
I
,
 
 
M
.
 
S
.
 
G
o
r
d
o
n
,
 
 
W
.
 
J
.
 
H
e
h
r
e
,
 
 
A
.
 
K
l
a
m
t
,
 
 


 
H
.
 
F
.
 
S
c
h
a
e
f
e
r
 
I
I
I
,
 
 
M
.
 
W
.
 
S
c
h
m
i
d
t
,
 
 
C
.
 
D
.
 
S
h
e
r
r
i
l
l
,
 
 
D
.
 
G
.
 
T
r
u
h
l
a
r
,
 
 


 
A
.
 
W
a
r
s
h
e
l
,
 
 
X
.
 
X
u
,
 
 
A
.
 
A
s
p
u
r
u
-
G
u
z
i
k
,
 
 
R
.
 
B
a
e
r
,
 
 
A
.
 
T
.
 
B
e
l
l
,
 
 
N
.
 
A
.
 
B
e
s
l
e
y
,
 
 


 
J
.
-
D
.
 
C
h
a
i
,
 
 
A
.
 
D
r
e
u
w
,
 
 
B
.
 
D
.
 
D
u
n
i
e
t
z
,
 
 
T
.
 
R
.
 
F
u
r
l
a
n
i
,
 
 
S
.
 
R
.
 
G
w
a
l
t
n
e
y
,
 
 


 
C
.
-
P
.
 
H
s
u
,
 
 
Y
.
 
J
u
n
g
,
 
 
J
.
 
K
o
n
g
,
 
 
D
.
 
S
.
 
L
a
m
b
r
e
c
h
t
,
 
 
W
.
 
L
i
a
n
g
,
 
 
C
.
 
O
c
h
s
e
n
f
e
l
d
,
 
 


 
V
.
 
A
.
 
R
a
s
s
o
l
o
v
,
 
 
L
.
 
V
.
 
S
l
i
p
c


`contents` is a *string*; iterating over a string will give you its characters. Clearly this is nonsense. Be careful!

## Looping over a file 2

## Extracting frequencies

Now we can use our file opening/closing/looping knowledge to extract useful information from files.

Let's say I want to extract all of the vibrational frequencies from an output file, and store them in a list called `frequencies` as floating-point numbers.

The key to extracting information from QM outputs (or any text file, really) is to understand the context in which the information appears. What's the file structured like? How do we actually *get* the information we want?

I'll split the contents on newlines to make it easier to work with.

In [41]:
contents_splitlines = contents.splitlines()

The information I want occurs near the end.

In [42]:
contents_splitlines[370:]

[' **                                                                  **',
 ' **                       VIBRATIONAL ANALYSIS                       **',
 ' **                       --------------------                       **',
 ' **                                                                  **',
 ' **        VIBRATIONAL FREQUENCIES (CM**-1) AND NORMAL MODES         **',
 ' **     FORCE CONSTANTS (mDYN/ANGSTROM) AND REDUCED MASSES (AMU)     **',
 ' **                  INFRARED INTENSITIES (KM/MOL)                   **',
 ' **                                                                  **',
 ' **********************************************************************',
 ' ',
 '',
 ' Mode:                 1                      2                      3',
 ' Frequency:       621.29                1410.25                2498.02',
 ' Force Cnst:      2.9340                18.6957                47.3623',
 ' Red. Mass:      12.9010                15.9552                12.8822',
 ' IR

We can see the frequencies all occur on a single line:

```
' Frequency:       621.29                1410.25                2498.02',
```

and maybe we can check for whether `Frequency:` is in a line to get frequencies. First, we need a place to store our results.

Check all the lines for a match, and print out a match if it exists:

In [43]:
for line in contents_splitlines:
    if 'Frequency:' in line:
        print(line)

 Frequency:       621.29                1410.25                2498.02


It worked! You'll have to take my word for it that if this occurred on multiple lines of an output (say, if there were more than 3 vibrational frequencies), this would catch every instance. That's the beauty of looping.

The only problem with the code above is that once we match the line, we don't actually *do* anything with it other than print it. Let's try storing it in a variable `s` so we can manipulate it after our loop is complete.

In [44]:
for line in contents_splitlines:
    if 'Frequency:' in line:
        s = line

In [45]:
print(s)

 Frequency:       621.29                1410.25                2498.02


Now, to make it into a list of floats:

In [46]:
s.split()

['Frequency:', '621.29', '1410.25', '2498.02']

In [47]:
s.split()[1:]

['621.29', '1410.25', '2498.02']

In [48]:
map(float, s.split()[1:])

<map at 0x7f0e64391b38>

In [49]:
list(map(float, s.split()[1:]))

[621.29, 1410.25, 2498.02]

Ok, now we know how to turn a frequency line into a list of numbers we can work with in some other piece of code.

But, now we have *another* problem.

What if we have more than one line that contains `Frequency:`? This will only catch the very last one! We need to do all this work *inside* the loop.

In [50]:
frequencies = []

for line in contents_splitlines:
    if 'Frequency:' in line:
        frequencies_oneline = list(map(float, line.split()[1:]))
        frequencies.extend(frequencies_oneline)

In [51]:
print(frequencies)

[621.29, 1410.25, 2498.02]


We can't use `list.append()`; that would append a list to a list, so we'd end up with a list-of-lists. We just want a single list, so we `extend()` it.