# With metadata to better data! (Meta)Data transfer from and to Coscine

The [Coscine research data platform](https://www.coscine.de) provides an API interface to transfer metadata annotated data to Coscine in automated processes. In the workshop, we will show in small-scale steps how to move data to Coscine using a JupyterNotebook (Python) and Coscine's personal authentication token, and how to specify the metadata using the application profile provided by the application. Prior knowledge of Python is desirable.

First things first, you need a Coscine project. You should have received an invite to the Coscine Project: Automating (Meta)Data.


Create a configuration file:

`config.json`

Copy and paste your token 

```JSON
{
    "RESOURCE": "Test",
    "PROJECT": "Automating (Meta)Data", 
    "TOKEN": ""
}
```

Next, head to your [user profile](https://coscine.rwth-aachen.de/user/) and get your Access Token. 

We'll copy and paste this into your config file next to: `TOKEN`.

Install Python and Jupyter in the extensions tab. This may make it easier to code along with me.

Now let's load all dependencies and configurations into our jupyter notebook.

If you have any errors loading the packages, run the code below with the associated package name:

In [None]:
pip install PACKAGE_NAME

In [None]:
! pip install coscine

Load the configuration:

We use the Coscine package to connect with Coscine REST API, which enables us to interact with our project and resource. 

For more information and other examples: [Coscine Python SDK](https://git.rwth-aachen.de/coscine/community-features/coscine-python-sdk)

Get the metadata form and take a look at it:

This form is a dictionary-like data structure, so you can interact with it like a python dictionary:

Now will interact with the metadata form and supply metadata and upload a file with that metadata to Coscine.

The error is because the field is a controlled vocabulary. Let's see what is allowed:

Now we will look at an example where we get input from a user regarding which value that we want.

Enumerate Allows to index into the list. Returns a tuple so each item in the list is paired with its index.

Removes need to manually count items in the list

Let's deal with the date. It needs to be formatted as a datetime object:

In [None]:
metadata["Creation Date"] = datetime.strptime(date, "%Y-%m-%d").date()

In [None]:
print(metadata)

Add whatever else is missing:

Now we upload the metadata and the data. We supplied a dummy text file here called `myData.txt` that you should rename to your file to your_name_test.

If you know someone has a same name then try to just make your file unique.

Key is to not get an error indicating a file is going to be uploaded that has already been uploaded with that same name.

<!-- If you were working with a nested directory structure or larger data, you'd want to use the S3 credentials to interact with the resource via s3 protocol. 

We can get these using the API: -->

Both web resources and S3 resources allow you to create directories.

S3 is more stable to use if you have large files or a lot of files.  For our workshop we will use a web resource

Due to the storage infrastructure, directories many times the first file in the folder will have the same metadata as the folder.

This let's us make directories:

In [None]:
# we can supply a path to our coscine_file_name and it will create a folder


And upload files to a directory:

Let's add metadata to the folder as well:

## Let's get fancier and extract metadata from a file

Let's try getting some metadata out of an image file. For this we use the pillow (PIL) module.

The data folder includes a dataset titled 3dsem which includes some JPEG and TIFF images

data source
Authors: Tafti, Ahmad P and Kirkpatrick, Andrew B and Holz, Jessica D and Owen, Heather A and Yu, Zeyun

DOI: 10.7910/DVN/HVBW0Q

License: CC-0

We can extend the base profile in Coscine to fit some of this metadata. Which we've already done! Yay!

Let's create a new resource within out project. We can do this via Coscine web interface.

In [None]:
from PIL import Image, ExifTags

In [None]:
image = Image.open("data/Pollen1001.jpg")

In [None]:
image.size

In [None]:
print(image.format, image.size, image.mode)

In [None]:
getattr(image, "n_frames", 1)

In [None]:
image.show()

In [None]:
image_exif = image._getexif()
exif =  { ExifTags.TAGS[k]: v for k, v in image_exif.items() if k in ExifTags.TAGS and type(v) is not bytes }

In [None]:
exif

Now here's a challenge. See if you can extract metadata from the file, update the metadata form, and upload the file to Coscine.

- Coscine/Python Script Examples: 
-- https://coscine.pages.rwth-aachen.de/community-features/coscine-technical-adaption/
- Coscine API Documentation: 
-- https://coscine.rwth-aachen.de/coscine/apps/apidocs/#tag/admin
- Python SDK Documentation:
-- https://coscine.pages.rwth-aachen.de/community-features/coscine-python-sdk/index.html
