# With metadata to better data! (Meta)Data transfer from and to Coscine

The [Coscine research data platform](https://www.coscine.de) provides an API interface to transfer metadata annotated data to Coscine in automated processes. In the workshop, we will show in small-scale steps how to move data to Coscine using a JupyterNotebook (Python) and Coscine's personal authentication token, and how to specify the metadata using the application profile provided by the application. Prior knowledge of Python is desirable.

We will go through the basics first just to familiarize everyone. This is rather 'fill in the blank' (aka live coding on the instructor's end). Then, we will pull metadata out of an image and create a resource to upload this to in Coscine. We'll come up with most of this code together.  If time permits, we can build a little query system to look up what (meta)data we have stored in Coscine. We will come up with how to go about this as a group.

## Setup

First things first, you need a Coscine project. You can use your own if you have access, or we will add you to this one: https://coscine.rwth-aachen.de/p/fdmwerkstatt/ 



Create a file named `config.json` in the uppermost directory with the following information. Enter your token.

```json
{
    "token": "",
    "resourceName": "",
    "projectName": ""
}
```

Next, head to your [user profile](https://coscine.rwth-aachen.de/user/) and get your Access Token. Copy this into your config file under `token`.


We have already created a resource to work with here titled `BasicTransfer` (but feel free to follow along with your own). 

Go to the resource settings and copy all relevant information into your config file according to the template above. This includes resource name and project name. 

Now let's load all dependencies and configurations into our jupyter notebook.

We'll need the Coscine module, you may need to install it:

We will need the following packages:

In [None]:
import coscine
import json
from datetime import datetime
from pathlib import Path

Load the configuration:

## Interacting with Coscine Metadata Forms and (meta)Data upload

We use the Coscine package to connect with Coscine REST API, which enables us to interact with our project and resource. 

For more information and other examples: [Coscine Python SDK](https://git.rwth-aachen.de/coscine/community-features/coscine-python-sdk)

Let's create an instance of the coscine client and designate the project and resource using the loaded configuration:

We can take a look at the project and resource details using a print statement:

In this first part, our goal is to upload files and fill out the associated metadata using a very basic, fictional example.

For this, there are a couple dummy text files inclued within the `data` folder. 

But first, let's see how we can interact with the coscine metadata form. 

Get the resource metadata form and take a look at it:

This form is a dictionary-like data structure, so you can interact with it like a python dictionary. We can try to fill in some strings, for example for the field `Title`:

The metadata table above shows up what data types are expected (`Type` column). So let's trytro fill out a string for `Type`:

The error is because the field is a controlled vocabulary (indictaed by the `V` in the `C` column, above). Let's see what is allowed by looking at the controlled vocabulary for that field:

It would be nice if we could just select one instead of typing it out. Let's start by saving the options it to a list:

Let's take a look at our options:

Now we can select an index and assign metadata:

See if it worked:

Another option would be to get some user input on which value to use. Let's fill in `Subject Area` like that. Aagin, let's create a list of the allowed values:

First, we initialize an empty dictionary. Then we assign the index of the list as a keys, options as values:

Now, we can use the built-in `input` function to get user input based on the options saved in our `controlled_vocab` dictionary:

Assign the selection to the `Subject Area` field:

Take a look at the form:

Let's deal with the date. According to the metadata form, it needs to be formatted as a datetime object. Let's start with a date in standard form:

We can try to assign this and the Python SDK will try to validate our entry:

We can double check what it's looking for by looking at the metadata form (print or scroll up).

So, we need to convert to `datetime` type. If you scroll up to the beginning, you will see that we loaded the datetime module. Convert the string to a datetime object as follows and assign it to the field: 

We're still missing `Creator`. Depending on the properties set to each field in the application profile, we can enter multiple values. For example, the `Creator` field takes a vlue of type `[str]`, indicating that a list of strings may be entered. Let's add multiple creators:  

Now we upload the metadata and the data. We'll use a dummy text file here called `myData.txt`.

## S3

If you were working with a nested directory structure or larger data, you'd want to use the S3 credentials to interact with the resource via s3 protocol. 

We can get these using the API:

This let's us make directories:

And upload files to a directory:

We cannot add metadata via S3, so we use the API to update the metadata:

Let's add metadata to the folder as well:

Check if it worked:

## Let's get fancier and extract metadata from a file

Let's try getting some metadata out of an image file. For this we use the [pillow (PIL)](https://pillow.readthedocs.io/en/stable/) module.

The `data` folder includes a dataset titled `3dsem` which includes some JPEG and TIFF imagges

##### data source 

Authors: Tafti, Ahmad P and Kirkpatrick, Andrew B and Holz, Jessica D and Owen, Heather A and Yu, Zeyun

DOI: [10.7910/DVN/HVBW0Q](https://doi.org/10.7910/DVN/HVBW0Q)

License: CC-0

In [None]:
from PIL import Image, ExifTags

In [None]:
im = Image.open('data/3dsem/tapetal001.jpg')

In [None]:
im.size

In [None]:
print(im.format, im.size, im.mode)

In [None]:
getattr(im, "n_frames", 1)

In [None]:
im.show()

In [None]:
image_exif = im._getexif()
exif = { ExifTags.TAGS[k]: v for k, v in image_exif.items() if k in ExifTags.TAGS and type(v) is not bytes }


In [None]:
exif

We can extend the base profile in Coscine to fit some of this metadata. Which we've already done! Yay!

Let's create a new resource within out project. We can do this via [Coscine web interface](coscine.rwth-aachen.de/) or we can try using the API and Python SDK (but in tests we got unspecified errors...)

In [None]:
# upload all the img files and metadata 


## Querying (meta)Data

Now we have some data in Coscine. Let's try to build a query system to search files or get statistics on what we have. 