# Preservation Notebook

This is a test notebook to examine functionality of Jupyter notebooks for teaching digital preservation.

We will be looking at two tools - [https://github.com/LibraryOfCongress/bagit-python](bagit-python) and exiftool. 

This is an interactive notebook that lets you run python and command line instructions from within a web browser. You will note that there are instructions below as well as cells that have code in them. To run the code you need to hold the shift key while clicking on the enter (or return) button. Alternatively, if you are running the live notebook, you can choose the Cell menu above and choose Run Cells or even Run All (but that ruins the suspense).

When you run a cell, the output of the commands (if any) will be shown below the cell.

You can make changes to the code from within this notebook and run the cell again to see the new output - feel free to play, you can't break it!

We'll start off looking at python-bagit.



### Import Bagit
This first thing we need to do is to import the bagit code into our notebook. This is a common way to add functionality to python scripts. There are modules for just about everything in Python. The code to import bagit is shown below - pretty simple isn't it? 

Go ahead and run the cell by selecting it and then shift-enter (or shift-return).

In [None]:
import bagit

### Setting up our directory and files
Before we can create our first bag, there is a little bit of setting up we need to do. We are going to create a directory on our disk that we will use as the basis for the bag. We'll call the directory ''bagdir_a'' and we will first move some files into the directory. 

In Jupyter notebooks we can call the command line directly by putting a ! sign in front of the command. If you were working directly on the command line you do not need the !. The first command below creates a directory and the second copies a couple of files into it.

In [None]:
# make the directory
!mkdir bagdir_a

#copy two files into the directory
!cp files/testdoc.txt bagdir_a
!cp files/tux.png bagdir_a

We can list the files in the directory using the **ls** command:

In [None]:
!ls bagdir_a

There should be two files in the directory - a text file and a png image file. 

### Creating a Bag in place
We will now create a bag in place. This is done using the **make_bag** command. Note that the command requires you to enter some extra information called arguments. The first argument is the location of the directory that are using for the bag, in this case it is **bagdir_a**. We can also add some metadata to the bag at the same time as we create it. In this case we are going to add some metadata for *Contact-Name* and *Soure-Organization*. Run the following code block to create the bag.

In [None]:
bag = bagit.make_bag('bagdir_a', {'Contact-Name': 'Wiley Coyote','Source-Organization': 'ACME'})

The line above also created a reference to our bag and stored this in the variable **bag**. We can now use this variable any time we want to refer to our bag. For example we can have a look at the info in the bag. 

Note that the data returned is a Python data list(check). It should be self-explainitory, but note that the u refers to a unicode string.

In [None]:
bag.info

You can also change this info. Try the following and see how the metadata changes. Try changing the code below and re-running to add different metadata. You can add any metadata you like, but you need to think how this metadata will be used in your system. Note that the bagger file format has some reserved fields (we've been using some of these already). See the [full list](full list). For example, you could add metadata for **Contact-Phone**, **Contact-Email**, **External-Description**. Or you could have your own metadata eg **My-Library-Bib-Number**.

In [None]:
bag.info['Authors'] = "Road Runner"

#and print the info again
bag.info

In [None]:
!ls bagdir_a

OK. There's been quite a changes. For a start our two files are not listed! But have a look in the data directory. You can list the contents of a directory by using the **ls** command:



In [None]:
!ls bagdir_a/data/

Let's have a look at the contents of some of the files. The bag-info.txt looks interesting. The **cat** command can display the contents of a file.

In [None]:
!cat bagdir_a/bag-info.txt

Let's have a look at some of those files. 

* Click here to see the [bag-info.txt](bagdir_a/bag-info.txt) file
* Click here to see the [bagit.txt](bagdir_a/bagit.txt) file

Let's have a look at the other files.

In [None]:
!cat bagdir_a/manifest-sha256.txt

These are the sha256 checksums for each file in our bag.

### Validate a bag
Remember we are working in Python and have our bag stored in a variable called **bag**. Validating is very easy:

In [None]:
bag.validate()

What happens if we modify one of the files (and hence make our bag invalid). Run the following code which adds another line to the end of the testdoc.txt file.

In [None]:
!echo "another line" >> bagdir_a/data/testdoc.txt

Now let's see what happens when we run validate again.

In [None]:
bag.validate()

## Exiftool
We are now going to look at another tool (TODO move to a new notebook)

We can print out the version of exiftool that we are using. Again, you click shift-enter (or shift-return).
Click on the help menu item above if you are having trouble.

In [None]:
## this imports the tool we need - in this case a Python wrapper around the ExifTool
import pyexifinfo as e


We can get the metadata for a single image with this command - try changing the image to tux.svg (which is a different file)

In [None]:
e.get_json('tux.png')