# General overview

Each running notebook is connected to a process on the server running Python. 
This process will hold in memory any variables which are created, 
    independently of any other iPython notebook,
    for as long as the notebook is running.

# Available obituary codings

+ **coding100** A random sample of 100 obituaries. Only used for testing.
+ **coding1000** A random sample of 1000 obituaries. This is great for many purposes, and is a significant chunk of the 60k total obituaries.

# Setup

At the beginning of each file, there will be a section with this title "Setup".
These should be run first, before any other cell in the notebook.
In general cells should be run in order, but this is especially true for these setup cells.
Nothing will run without them.

Typically this section will include commands importing necessary packages, such as the libraries written specifically for coding obituaries.
Here we will also load any obituaries we intend to use in the notebook. 

In [1]:
import sys
from os import path
sys.path.append("/home/alec/projects/nytimes-obituaries/lib")

import occ, nlp, wiki

nlp.nlp.restricted = True

Loading term-code associations into variable 'codes' from /home/alec/projects/nytimes-obituaries/lib/../w2c_source/compiledCodes.csv...


In [2]:
coder = occ.Coder()
coder.loadPreviouslyCoded("coding1000")

loaded 998 documents


# Available metadata on obituaries

In [10]:
print( "The variable `coder.obituaries` is a %s" % type(coder.obituaries) )

The variable `coder.obituaries` is a <class 'list'>


In [11]:
print( "It currently contains %s obituaries" % len(coder.obituaries) )

It currently contains 998 obituaries


In [42]:
# Let's take a look at one
one_obituary = coder.obituaries[3]

In [43]:
one_obituary.info.keys()

odict_keys(['fName', 'date', 'name', 'title', 'fullBody', 'first500', 'nWordsReport', 'nWordsCalc', 'distinctWords', 'fS_str'])

In [44]:
print( "The original filename: '%s'" % one_obituary.info['fName'] )

The original filename: 'Indobit-A-NYT2007 - July 1 - Sep 30-79.txt'


In [45]:
print( "The date, parsed from the original format: %s" % one_obituary.info['date'] )

The date, parsed from the original format: july 27, 2007


In [46]:
print( "The title given to the article: '%s'" % one_obituary.info['title'] )

The title given to the article: 'E. A. Boyse, 83, Dies; Multifaceted Doctor'


In [52]:
print( "The full body of the obituary:")
print( one_obituary.info['fullBody'] )

The full body of the obituary:

 Dr. Edward A. Boyse, a physician who did some of the earliest studies of the
 immune response, the body's system for recognizing and fighting off disease
 organisms and other foreign substances, and later worked on the genetic basis of
 bodily odors, died on July 14 in Tucson. He was 83.
 The cause was pneumonia, his family said.
 In the 1960s and 1970s, Dr. Boyse and others were among the earliest researchers
 to look at how antibodies formed by mouse cells are used in assembling an
 immune-system response to a surface protein, called an antigen. They were
 particularly interested in the response of white blood cells, or lymphocytes.
 The researchers were able to use the antibodies to differentiate among the cells
 that produced them. Their discoveries led to  ''a cornerstone of how
 immunologists have dissected the cellular basis of our immune response,'' said
 Harvey Cantor, a professor of pathology at Harvard.
 Dr. Cantor added that Dr. Boyse's use 

In [48]:
print( "A record of our guess as to the OCC code of the obituary:")
print()
print( one_obituary.guess )

A record of our guess as to the OCC code of the obituary:

[{'occ': ['306', '070', '461', '306', '306'], 'preps': [], 'word': 'physician', 'state': 'vague_occWords'}, {'occ': [], 'preps': [], 'word': 'system', 'state': 'unclassified'}]


In [49]:
print( one_obituary.age )

83


In [50]:
print( one_obituary.nameS )

Dr. Edward A. Boyse
