# Python hands-on with the five things a computer can do
Carl provided a notebook with a high-level overview of the basic building blocks of pretty much any coding task. In this notebook, we'll put all of those into action to explore what seems like it should be a fairly simple question: in any given year, how many of the books printed by William Bowyer were by living authors, and how many were by dead ones?

If you haven't done much coding before, it may seem like there end up being an awful lot of steps involved, but what we're doing is breaking down that broad question into the smaller "computationally tractable" tasks it takes to answer it. If you spend more time doing this kind of work, a lot of what we'll do in this notebook will come to seem completely routine.

(This notebook provides code snippets with minimal discussion. We'll use this notebook in our class session. There's another notebook in today's `reference` folder that works through the same code, but with much fuller discussion.)

## Let's just think about the problem for a second
If we were looking at these records one at a time in a browser window, we wouldn't have any trouble saying whether the author of the book was alive or dead at the time of publication.  

Take a look at the first five records and, in the next cell, jot down some notes about what your thought process would be for determining which authors were living and which were dead when the following texts were published.

>* Davila, Arrigo Caterino,(1576-1631). *The history of the civil wars of France*. (London:printed for D. Browne..., MDCCLVIII. \[1758\]).
>
>* Holland, Richard,(1688-1730). *Observations on the small pox: or, An essay to discover a more effectual method of cure*. (London:printed for John Brindley...,1728).
>
>* Hasledine, William,(1713 or 14-1773). *The beau and the academick*. (London: printed for J. Roberts,\[1733\]).
>
> * Clarke, John,(1687-1734).
*A new grammar of the Latin tongue, comprising all necessary for grammar-schools*. (London: printed for L. Hawes, W. Clarke, and R. Collins..., M.DCC.LXVII. \[1767\]).
>
> * Spinckes, Nathaniel,(1654-1727). *The new pretenders to prophecy re-examined*. (London: printed for Richard Sare..., 1710).

Take a minute or to think about the mental processes involved in deciding whether the author of the work was alive or dead in the year the book was published and jot a phrase/sentence or two about them in the cell below. Also take note of anything you see in these summaries that might cause you to have to spend a fraction of a second longer considering some of them.


In [None]:
#Jot down some notes on how you go about answering the simple question of
#whether the author of the record was living or dead in the year of publication.







## Getting the tools we need


In [None]:
#Code cell #1
#Download and install the Pymarc package so that it's available for use in our
#Python environment
!pip install pymarc

In [None]:
#Code cell #2
#Import the MARCReader module from the Pymarc package.
from pymarc import MARCReader

## Opening a MARC file


In [None]:
#Code cell #3
from google.colab import drive
drive.mount('/gdrive/')

In [None]:
#Code cell #4
directory_path = '/gdrive/MyDrive/rbs_digital_approaches_2023/2023_data_class/'

In [None]:
#Code cell #5
#Open a file in readable binary mode and refer to it as marc_file
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  #Give marc_file to MARCReader and refer to whatever we get back as marc_reader
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record)

### Getting individual fields from each MARC record

In [None]:
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record['245'])

In [None]:
#Code cell #6
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record['245']['a'])

In [None]:
#Code cell #7
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    full_title = record['245']['a'] + record['245']['b']
    print(full_title)

In [None]:
#Write-your-own-code cell A
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    #Construct the value of the full_title variable in a way that includes a
    #space between the content of the two subfields
    full_title =
    print(full_title)

### Conditionals

In [None]:
#Code cell 8
#This cell will raise an error message
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record['100']['a'] + ' ' + record['100']['d'])

In [None]:
#Code cell #9
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record)

In [None]:
#Code cell #10
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      print(record['100']['a'] + ' ' + record['100']['d'])
    else :
      print('No 100 field')

### Try this for yourself
In the cell below, write code that will:
1. Open the MARC file we've been working with;
2. Pass that file to MARCReader
3. Iterate through the records and print:
    * The contents of field 100, subfield a and field 100, subfield d, **if that field exists**
    * The contents of field 260, subfield c, **if that field exists**

In [None]:
#Write-your-own-code cell B
#Your code goes here:


### One solution
If you're running into trouble, click to reveal a solution.

In [None]:
#@title
#Suggestion code cell 1
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      print(record['100']['a'] + ' ' + record['100']['d'])
    if record.get('260', None) is not None :
      print(record['260']['c'])

## Getting the information we need
The MARC 008 field is a structured data field that tells us about things like country of publication, language, and date of publication. Let's have a look at that field. (Note that MARC "control fields," including 008 field, don't have subfields, so we need to use `.data` to get the content of those fields.)

In [None]:
#Code cell #11
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record['008'].data)

Looking at [the documentation for the 008 field](https://www.loc.gov/marc/bibliographic/bd008a.html), we can see what parts of this fixed-length field are actually helpful to us.
* Character 6 tells us what *kind* of date (or dates) the field provides: a single year, multiple years, etc. (There are only single years in this subset, but there are other combinations to be found in the full set.)
* If there's only one year, we'll find it in characters 7-10; if there are two years, characters 7-10 will give us the first year and characters 11-14 will give us the second year.

### Slicing a substring from a longer string
* In Python, we start counting at zero, rather than one. (The MARC documentation also starts counting at zero, so we're okay there.)
* In Python, when we're slicing something like a string of text, the starting point of a slice is inclusive, but the ending point is exclusive: we get everything up to *but not including* the ending.

You can know how many elements you're going to get back by subtracting the first number from the second: `[7:10]` would only get us three digits. We want `[7:11]`, instead, which will get us the four characters starting at character 7 and going up to (but not including) character 11.

In [None]:
#Code cell #12
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record['008'].data[7:11])

### More ways of manipulating strings of text
Python has a lot of different [methods for working with strings of text](https://docs.python.org/3/library/stdtypes.html?highlight=split#string-methods)—more than we can reasonably cover in an hour or so. We'll try to explain various methods that we use as they come up in the code we'll use this week, but feel free to ask about anything that seems unclear.

In [None]:
#Code cell #13
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      print(record['100']['d'])

#### Finding a substring

In [None]:
#Code cell #14
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      dates = record['100']['d']
      print("record['100']['d'] = " + dates)
      #Find the hyphen
      hyphen_position = dates.find('-')
      #What is hyphen_position, actually?
      print('hyphen_position = ' + str(hyphen_position))

      #Use hyphen position as the starting point of our slice. No ending position
      #means we'll get the remainder of the string
      print('dates[hyphen_position:] = ' + dates[hyphen_position:])

      #Opening indices are inclusive. Add 1 to get rid of the hyphen
      print('dates[hyphen_position+1:] = ' + dates[hyphen_position+1:])

      #One-liner
      print(dates[dates.find('-')+1:])
      print('---------')


We need to get rid of the period at the end of the string, and there are several different approaches we could take.
* We could use `find()` again to find the period and use its position as our ending index.
* We could use an ending index of -1 to go up to (but not including) the last character in the string
* We could use the `strip()` method to eliminate the period from the string.

Any of those approaches will give us the same result.

In [None]:
#Code cell #15
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      dates = record['100']['d']
      print('Finding the period as an ending index = ' + \
            dates[dates.find('-')+1:dates.find('.')])
      print('Using -1 as an ending index = ' + dates[dates.find('-')+1:-1])
      print("Using strip('.') = " + dates[dates.find('-')+1:].strip('.'))
      print('----------')

#### Splitting a string to a list

In [None]:
#Code cell #16
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      date_string = record['100']['d']
      dates = date_string.split('-')
      print(dates)

Working with the elements of a list works just like getting substrings of longer strings, because items in lists also have indices, and they behave the same way:

* `my_list[1:5]` would get the second through fifth items in a list
* `my_list[2:]` would return everything from the third item of the list to the end
* `my_list[:-1]` would return all of the items except the last

Let's have a look at the structure of a list using `enumerate()` (which explicitly returns list indices as well as values).


In [None]:
#Code cell #17
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      date_string = record['100']['d']
      dates = date_string.split('-')
      print(dates)
      #Note: another for loop
      for index, value in enumerate(dates) :
        print(str(index) + ': ' + value)
      print(dates[0])
      print(dates[1])
      print('----------')

### Write your own code: assign list items to variables
Fill in the code to assign items from the `dates` list to two new variables (`birth_year` and `death_year`).

In [None]:
#Code cell #18
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      author_name = record['100']['a']
      date_string = record['100']['d']
      dates = date_string.split('-')
      #Assign values to the variables birth_year and death_year, using the list index of
      #the appropriate item in the dates list
      birth_year =
      death_year =
      print(author_name)
      print('--Birth year: ' + birth_year)
      print('--Death year: ' + death_year)

#### Solution
If you're having trouble, click here to reveal a solution.

In [None]:
#Suggestion code cell 2
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      author_name = record['100']['a']
      date_string = record['100']['d']
      dates = date_string.split('-')
      #Assign values to the variables birth_year and death_year, using the list index of
      #the appropriate item in the dates list
      birth_year = dates[0]
      #Let's go ahead and remove the period from the end of the death year now,
      #rather than having to do it in a second step.
      death_year = dates[1].rstrip('.')
      print(author_name)
      print('--Birth year: ' + birth_year)
      print('--Death year: ' + death_year)

### You thought we were pretty much in the clear at this point, didn't you?



In [None]:
#Code cell #19
with open(directory_path + '2023_d1_estc_bowyer_problem_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      author_name = record['100']['a']
      #A new wrinkle in this particular set, and a different bit of PyMarc...
      if record.get('100', None).get('c', None) is not None :
        author_name += ' ' + record['100']['c']
      date_string = record['100']['d']
      print(author_name + ' (' + date_string + ')')

### Introducing Regular Expressions
Regular expressions provide us with a way for searching for *patterns* of text, even if we don't know the specific form that the text will take.

Consider US phone numbers, for example, which often take the form of (xxx) xxx-xxxx. A phone number should only include numerical characters, and not letters.

Written as a regular expression, that might take the form:
> `'\([0-9]{3}\)\s[0-9]{3}\-[0-9]{4}'`

* The backward slash "escapes" certain characters that would otherwise be interpreted as part of the regular expression (like parentheses and hyphens in this example). When a character is "escaped," we are searching for the character, itself.
* The square brackets enclose sets of characters that *could* appear in the pattern we're looking for.

#### Dealing with unhelpful date fields
For our purposes, there are a few different patterns that are going to spell different kinds of trouble (roughly in descending order of severity):
* Any date field that begins with "b." (There's no death year. We can't use this at all.)
* Any date field that begins with "fl." (Those aren't necessarily real dates. We shouldn't use these.)
* Any date field that contains "ca."
  - If it's expressing approximation about the birth year (i.e., before the hyphen), we might be able to use it.
  - If it's expressing approximation about the death date (i.e., after the hyphen), we can't use it.
* Any date field that contains "or"
  - If it's expressing a choice between birth years, but the death year is solid, we're okay with it
  - If the birth year is solid, but the death year is uncertain, we're less okay with it
* Any date field beginning with "d." (At least we have a death year, though we should be on the lookout for other tricky bits like "d. ca. 1778"  or "d. 1750 or 51")

Rather than trying to explain the regular expressions and control structures here, I'll add comments to the code to explain things as we go.

Note that I'm using `re.compile()` to define my regular expressions. I could simply write the regular expressions in my control structures, but I prefer to compile them first for two reasons:
1. It strikes me as usually more readable use a variable name as the first argument in `re.search()` or `re.findall()`, rather than a potentially gnarly regular expression.
2. It allows me to reuse the same regular expression in different places, if need be.

In [None]:
#Code cell 20
#We need to import Python's re (Regular Expression) library to be able to work
#with regular expressions
import re

#Compile a regular expression to match a series of four digits
#(equivalent to [0-9]{4})
year_pattern = re.compile(r'\d{4}')

#Compile a regular expression to match strings with starting with "b." or "fl."
#(any combination of one or two of the characters b, f, or l, optionally
#followed by a period); OR (the "|" signals alternatives) including a hyphen,
#optionally followed by a space, followed by the characters "ca", optionally
#followed by a period.
born_flourished_cadeath = re.compile(r'^[bfl]{1,2}\.?|\-\s?ca\.?')

#Compile a regular expression to match any string starting with "d", optionally
#followed by a period.
died = re.compile(r'^d\.?')

#Compile a regular expression to match any string with "or" appearing after a
#hyphen (that is, a hyphen followed by one or more character of any kind, followed
#by a space, the characters "or" and another space).
ordeath = re.compile(r'\-.+\sor\s')

with open(directory_path + '2023_d1_estc_bowyer_problem_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      author_name = record['100']['a']
      date_string = record['100']['d']
      print(author_name + ' (' + date_string + ')')
      #Look for occurrences of the year pattern (there should be either one or
      # two,but some records may not have valid years [e.g. "17th-century"])
      if re.findall(year_pattern, date_string) is not None :

        #If the born_flourished_cadeath pattern is found (or, I guess, isn't
        #not found), we can't use this record.
        if re.search(born_flourished_cadeath, date_string) is not None :
          print('--Cannot use this date field.')


        #If the died pattern is found...
        elif re.search(died, date_string) is not None :
          #Print the first (i.e. only) year
          print('--' + re.findall(year_pattern, date_string)[0])

        #If the ordeath pattern is found...
        elif re.search(ordeath, date_string) is not None :
          #Print the second (i.e., death) year
          print('--' + re.findall(year_pattern, date_string)[1])

        #In any other case, we should be dealing with a standard yyyy-yyyy pattern
        #So find all instances of the year_pattern and print the second one.
        else :
          print('--' + re.findall(year_pattern, date_string)[1])

      else :
        print('No valid dates')



#### Refactoring code to create reusable functions

In [None]:
#Code cell 21
def get_death_year(date_string) :
  year_pattern = re.compile(r'\d{4}')
  born_flourished_cadeath = re.compile(r'^[bfl]{1,2}\.?|\-\s?ca\.?')
  died = re.compile(r'^d\.?')
  ordeath = re.compile(r'\-.+\sor\s')

  if re.search(year_pattern, date_string) is None :
    #We don't just want to print results now, we want to process a string and
    #produce a result. If there are no valid years, we need a way to return
    #a result of None.
    result = None

  elif re.search(born_flourished_cadeath, date_string) is not None :
    #If we can't provide a *usable* year, we should also return a result of None
    result = None

  elif re.search(died, date_string) is not None :
    result = re.findall(year_pattern, date_string)[0]

  elif re.search(ordeath, date_string) is not None :
    result = re.findall(year_pattern, date_string)[1]

  else :
    result = re.findall(year_pattern, date_string)[1]

  return result

In [None]:
#Code cell 22
with open(directory_path + '2023_d1_estc_bowyer_problem_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None:
      author_name = record['100']['a']
      date_string = record['100']['d']

      #Call the get_death_year function, passing date_string as the argument
      death_year = get_death_year(date_string)

      #Depending on the result that the get_death_year function returns, do
      #something
      if death_year is not None :
        print(author_name + ' died in ' + death_year)
      else :
        print('No death year for ' + author_name)

In [None]:
#Code cell 23
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    if record.get('100', None) is not None :
      #You know, let's change around with this author name a little...
      author_name = record['100']['a'].rstrip(',')
      author_name_parts = author_name.split(', ')
      author_name = author_name_parts[1] + ' ' + author_name_parts[0]
      date_string = record['100']['d']
      death_year = get_death_year(date_string)
      if death_year is not None :
        print(author_name + ' died in ' + death_year)
      else :
        print('No death year for ' + author_name)

## So, about data types...

In [None]:
#Code cell 24
with open(directory_path + '2023_d1_estc_bowyer_sample.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    pub_year = record['008'].data[7:11]
    print('Publication year: ' + pub_year)
    print(type(pub_year))

In [None]:
#Code cell 25
with open(directory_path + '2023_d1_estc_bowyer_full.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    print(record['001'].data)
    #Get a substring from MARC field 008 and convert it to an integer
    pub_year = int(record['008'].data[7:11])
    if record.get('100', None) is not None :
      author_name = record['100']['a'].rstrip(',')
      #Just in case some records don't have a date for the author, at all,
      #as can happen with ancient authors as well as pseudonymous ones.
      if record.get('100', None).get('d', None) is not None :
        date_string = record['100']['d']
        death_year = get_death_year(date_string)
        if death_year is not None :
          #Convert the death_year to an integer
          death_year = int(death_year)
          if death_year > pub_year :
            print(author_name + ' died in ' + str(death_year) + ', and so was alive in ' + str(pub_year))
          elif death_year == pub_year :
            print(author_name + ' died in the same year the work was published. Need more information.')
          else :
            print(author_name + ' died in ' + str(death_year) + ', and so was already dead by ' + str(pub_year))
      else :
        print('No dates for ' + author_name)

## Construct a data structure to hold on to all this information
For much of the week, we'll be working with a data type called the DataFrame, which is actually not native to Python, but is made available by the now ubiquitous `pandas` package. For now, though, let's stick with native Python data types. We've seen strings and lists, now it's time for a "dictionary."

For our purposes, let's keep track of living vs. dead authors for each year. One way to organize our data might be:

```
{
  1710:
        {'total_works': <integer-value>,
         'no_author': <integer_value>,
         'living_authors': <integer-value>,
         'dead_authors': <integer-value>,
         'ambiguous': <integer_value>
        },
  1711:
        {'total_works': <integer-value>,
         'no_author': <integer_value>,
         'living_authors': <integer-value>,
         'dead_authors': <integer-value>,
         'ambiguous': <integer_value>
        },
...
  1778:
        {'total_works': <integer-value>,
         'no_author': <integer_value>,
         'living_authors': <integer-value>,
         'dead_authors': <integer-value>,
         'ambiguous': <integer_value>
        }
}
```


In [None]:
#Code cell #26
#Create an empty dictionary to hold our information
author_count = {}

with open(directory_path + '2023_d1_estc_bowyer_full.mrc', 'rb') as marc_file :
  marc_reader = MARCReader(marc_file)
  for record in marc_reader :
    #Get a substring from MARC field 008 and convert it to an integer
    pub_year = int(record['008'].data[7:11])

    #If we don't already have an entry in the author_count dictionary for this
    #year, create one, with the year as a key. Give that key a value of a
    #dictionary starting with empty counts for all the categories
    author_count.setdefault(pub_year, {'total_works': 0,
                                       'no_author': 0,
                                       'living_authors': 0,
                                       'dead_authors': 0,
                                       'ambiguous': 0
                                       })
    #Immediately increase the number of total works—we don't know anything else
    #yet, but we do know that much.
    author_count[pub_year]['total_works'] += 1

    if record.get('100', None) is not None :
      author_name = record['100']['a'].rstrip(',')
      #Just in case some records don't have a date for the author, at all,
      #as can happen with ancient authors as well as pseudonymous ones.
      if record.get('100', None).get('d', None) is not None :
        date_string = record['100']['d']

        #Pass the death year (still a string) to the get_death_year function
        death_year = get_death_year(date_string)

        #If we get back a result from get_death_year other than 'None'...
        if death_year is not None :

          #Turn that result into an integer
          death_year = int(death_year)

          #Compare the death year to the publication year, and increment
          #values in our dictionary accordingly
          if death_year > pub_year :
            author_count[pub_year]['living_authors'] += 1
          elif death_year == pub_year :
            author_count[pub_year]['ambiguous'] += 1
          else :
            author_count[pub_year]['dead_authors'] += 1

      #If there are no dates
      else :
        author_count[pub_year]['ambiguous'] += 1
    #If there's no author at all
    else :
      author_count[pub_year]['no_author'] += 1

In [None]:
#Code cell #27
for k, v in author_count.items() :
  print(k)
  print(v)

## Do something with these data
Now we have all our data in a dictionary, we'll want to do something with them so we can get a sense of the answer to our question. First we'll save the data to a spreadsheet, then we'll construct a (rudimentary) bar chart.

### Saving to a file
For the sake of convenience, we'll use the `pandas` package to convert the data in our dictionary to a DataFrame and save it as an Excel spreadsheet. (We could use the native Python `csv` package and export this as a comma-separated-value file. Vanilla .csv files *definitely* have their uses. But so do Excel files.)

In [None]:
#Code cell #28
#Import pandas and os library—in case we're trying to write a file to
#a folder that doesn't exist yet
import pandas as pd
import os

#Set a variable containing the path to the output directory. See if this
#directory exists and, if it doesn't create it
output_directory = '/gdrive/MyDrive/rbs_digital_approaches_2023/output/'
if os.path.exists(output_directory) is not True :
  os.makedirs(output_directory)

df = pd.DataFrame.from_dict(author_count, orient='index')
df.sort_index(inplace=True)
df
df.to_excel(output_directory + 'living_vs_dead_authors.xlsx')

## Making a bar chart with plotly
It seems like we ought to at least have something to look at at the end of all this. Let's use the `plotly` package to draw a bar chart comparing the number of living and dead authors in each year .

In [None]:
#Code cell #29
#Import the module we need from the plotly package. (Plotly is installed by
#default in Google Colaboratory. In another environment, you might need to
#install it using pip)
import plotly.graph_objects as go

#Create empty lists for the years that we'll use as our x axis and for the counts
#that we'll chart on our y axis
years = []
living_counts = []
dead_counts = []

#Iterate through the keys and add values to our lists
for year, values in author_count.items() :
  years.append(year)
  living_counts.append(values['living_authors'])
  dead_counts.append(values['dead_authors'])

#There's a more compact syntax for creating those lists, but one that might be
#a little opaque if you're new to Python:
#years = [k for k, v in author_counts.items()]
#living_counts = [v['living_authors'] for k, v in author_counts.items()]
#dead_counts = [v['dead_authors'] for k, v in author_counts.items()]

#Create the chart using the plotly graph objects module. The "data" is a list of
#two graph_objects "bars," each of which has a label ("name"), and each of which
#takes its values for the x and y axes from the lists we constructed above.
fig = go.Figure(data=[
    go.Bar(name='Living', x=years, y=living_counts),
    go.Bar(name='Dead', x=years, y=dead_counts)
])

# Change the bar mode to group the bars together
fig.update_layout(barmode='group')

#Display the chart
fig.show()