# Preservation Notebook

This is a test notebook to examine functionality of Jupyter notebooks for teaching digital preservation.

When you have finished exploring this notebook you should have:

* Learnt how to install bagit in python
* created a new bag
* validated a bag
* create metadata for a bag

We will be looking at two tools - [bagit-python](https://github.com/LibraryOfCongress/bagit-python) and [exiftool](https://www.sno.phy.queensu.ca/~phil/exiftool/). 

This is an interactive notebook that lets you run python and command line instructions from within a web browser. If you have never used an interactive notebook before you might like to check out [this documentation](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#notebook-user-interface). 

You will note below that there are instructions and text as well as cells that have code in them. To run the code in a cell you need to first click on the cell and then hold the shift key while hitting enter (or return) on your keyboard. Alternatively, you can choose the Cell menu above and choose Run Cells or even Run All (but that ruins the suprise).

When you run a cell, the output of the commands (if any) will be shown below the cell.

You can make changes to the code from within this notebook and run the cell again to see the new output - feel free to play, you can't break it!

We'll start off looking at python-bagit.



### Import Bagit
This first thing we need to do is to import the bagit code into our notebook. This is a common way to add functionality to python scripts. There are modules for just about everything in Python. The code to import bagit simply 
<code>import bagit</code> (pretty simple isn't it?) 

Go ahead and run the cell by selecting it and then shift-enter (or shift-return). You should see some output after the cell runs.

In [None]:
import bagit

# and here is a small test to see if we have it installed.
if bagit:
    print ("We have bagit installed!")
else:
    print("Oops. Something went wrong.")


### Setting up our directory and files
Before we can create our first bag, there is a little bit of setting up we need to do. We are going to create a directory on our disk that we will use as the basis for the bag. We'll call the directory ''bagdir'' and we will first move some files into the directory. 

In Jupyter notebooks we can call the command line directly by putting a ! sign in front of the command. If you were working directly on the command line you do not need the !. The first command below creates a directory and the second copies a couple of files into it.

In [None]:
# make the directory
!mkdir bagdir

#copy some files into the directory
!cp files/* bagdir/

#now list the files in the bagdir directory
!ls -G bagdir


These files come from a corpus of material from [openpreserve](https://github.com/openpreserve/format-corpus).


### Creating a Bag in place
We will now create a bag in place. This is done using the <code>make_bag</code> command. Note that the command requires you to enter some extra information called arguments. The first argument is the location of the directory that are using for the bag, in this case it is **bagdir**. We can also add some metadata to the bag at the same time as we create it. In this case we are going to add some metadata for *Contact-Name* and *Soure-Organization*. Run the following code block to create the bag.

In [None]:
bag = bagit.make_bag('bagdir', {'Contact-Name': 'Wiley Coyote','Source-Organization': 'ACME'})

The line above also created a reference to our bag and stored this in the variable **bag**. We can now use this variable any time we want to refer to our bag. For example we can have a look at the info in the bag. 

Note that the data returned is a Python data list(check). It should be self-explainitory, but note that the u refers to a unicode string.

In [None]:
bag.info

You can also change this info. Try the following and see how the metadata changes. Try changing the code below and re-running to add different metadata. You can add any metadata you like, but you need to think how this metadata will be used in your system. Note that the bagger file format has some reserved fields (we've been using some of these already). See the [full list](full list). For example, you could add metadata for **Contact-Phone**, **Contact-Email**, **External-Description**. Or you could have your own metadata eg **My-Library-Bib-Number**.

In [None]:
bag.info['Authors'] = "Road Runner"

# and save the information to the bag
bag.save()

#and print the info again
bag.info

### Contents of the bag

Let's have a look at the contents of our directory with the command below. You should notice that there have been quite a few changes. 



In [None]:
!ls -F bagdir

Our files have moved into the data directory - this is how a bag is organised. You can check that they are in there by listing the contents of  the data directory:

In [None]:
!ls bagdir/data/

And let's look at the content of the bag-info.txt file. It should look familiar.

In [None]:
!cat bagdir/bag-info.txt

Now let's have a look at one of the manifest files.

In [None]:
!cat bagdir/manifest-sha256.txt

These are the sha256 checksums for each file in our bag which we will use later to validate that our files have not changed.

### Validate a bag
Remember we are working in Python and have our bag stored in a variable called **bag**. Validating is very easy:

In [None]:
bag.is_valid()

What happens if we modify one of the files (and hence make our bag invalid). Run the following code which adds another line to the end of the testdoc.txt file.

In [None]:
!echo "another line" >> bagdir/data/README.md

Now let's see what happens when we run validate again.

In [None]:
bag.is_valid()

#### Using checksums to see if files have changed

If we want to check whether the files in our data directory have not changed, we can check their checksums. When the bag is created a list of checksums is created, so we can use this file to see if they still match. This is shown below.

In [None]:
# change into the bagdir directory
!cd bagdir

# check our list of checksums
!shasum -a 256 -c manifest-sha256.txt 

You should see a report above showing which files have checksums that match and are OK or if they FAILED and so are not exactly the same file. 

## Exiftool
We are now going to look at another tool (TODO move to a new notebook)

We can print out the version of exiftool that we are using. Again, you click shift-enter (or shift-return).
Click on the help menu item above if you are having trouble.

In [None]:
## this imports the tool we need - in this case a Python wrapper around the ExifTool
import pyexifinfo as e


We can get the metadata for a single file with this command - try changing the image to tux.svg (which is a different file)

In [None]:
e.get_json('files/image-24bit-300ppi.bmp')

## Conclusion
You have reached the end of the notebook. Yadda yadda yaddah. [next](index.pynb)