# MORE REALISTIC PIPELINE

In this notebook we want to set-up a more realistic pipeline that take metadata by elabFTW and insert them in a hdf5 file. In this case we will start not knowing which and how many experiments are on elabFTW. We only know that we already processed experiments with ID=48 and 49.
We will
- get  all the experiments registered by using the experiments API;
- look for experiments with  ID not in 48,49
- get the metadata by elabFTW by using its API
- read the output file
- look for item type fields and use again API to decode that field
- create an empty hdf5 file
- map the metadata we are interested in, in the hdf5 fields

In practice what change with respect the simple case is that we look for the ID and that in principle the metadata structure will be more complicated.



### Set up

In [48]:
pip install nexusformat


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
import datetime
# the python library for elabftw
import elabapi_python
import json
import csv
import os
import ast
import pprint
import h5py
from nexusformat.nexus import *

path=" " #insert your path


###  API configuration

In [2]:

# replace with the URL of your instance
API_HOST_URL = 'https://nffa-di-electronic-lab.areasciencepark.it/api/v2/'
# replace with your api key
API_KEY = '' #put your key

### ElabFTW get

Now we will use the elab API to get all the experiments and save the response  in a json file 'exps.json'. We are not going to use the python library elabapi_python but using the package 'os' we work as we were on the shell and use the command curl.

In [3]:
os.system('curl -H "Authorization: '+API_KEY+'"'+"GET https://nffa-di-electronic-lab.areasciencepark.it/api/v2/experiments  -H 'accept: application/json' >"+ path+'exps.json')  

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6194    0  6194    0     0  29709      0 --:--:-- --:--:-- --:--:-- 29778


0

The response is '0' that means everything has gone well!
Now let's look at the file.

In [4]:
with open(path+'exps.json', "r") as jsonfile:
        dic_exp = json.load(jsonfile) 
dic_exp

[{'id': 49,
  'title': 'NXem_simplified_id1',
  'date': '2024-10-10',
  'body': '',
  'status': None,
  'rating': 0,
  'userid': 3,
  'elabid': '20241010-4a6bef7cb22f520f1e396250213b67ca7845fb1a',
  'locked': 0,
  'lockedby': None,
  'locked_at': None,
  'timestamped': 0,
  'timestampedby': None,
  'timestamped_at': None,
  'canread': '{"base": 30, "teams": [], "users": [], "teamgroups": []}',
  'canwrite': '{"base": 20, "teams": [], "users": [], "teamgroups": []}',
  'content_type': 1,
  'created_at': '2024-10-10 09:50:59',
  'modified_at': '2024-10-10 09:51:43',
  'lastchangeby': 3,
  'metadata': '{"elabftw": {"extra_fields_groups": [{"id": 1, "name": "Sample"}, {"id": 2, "name": "Coordinate_System_Set"}, {"id": 3, "name": "Fields"}]}, "extra_fields": {"Sample": {"type": "items", "value": 24, "group_id": 1, "required": true}, "Definition": {"type": "text", "value": "NXem", "group_id": 3, "required": true}, "Start_Time": {"type": "date", "value": "2024-10-10", "group_id": 3, "required

It is list which elements are dictionary-like entries- For all the experiments we have **all the informations** present in the experiment page (all the metadata!). We want to look for the ID we want to process.

In [10]:
id_exp_done=[48,49]
exp_to_do=[]
for k in dic_exp:
    if k['id'] not in id_exp_done:
      exp_to_do.append(k)

Let's see how many experiments we have to process

In [11]:
print(len(exp_to_do))

1


In [17]:
dic_exp_meta=eval(exp_to_do[0]['metadata'].replace('true','True'))
dic_exp_meta

{'elabftw': {'extra_fields_groups': [{'id': 1, 'name': 'Sample'},
   {'id': 2, 'name': 'Coordinate_System_Set'},
   {'id': 3, 'name': 'Fields'}]},
 'extra_fields': {'Sample': {'type': 'items',
   'value': 24,
   'group_id': 1,
   'required': True},
  'Definition': {'type': 'text',
   'value': 'NXem',
   'group_id': 3,
   'required': True},
  'Start_Time': {'type': 'date',
   'value': '2024-10-07',
   'group_id': 3,
   'required': True},
  'Experiment_alias': {'type': 'text',
   'value': 'EM_S1',
   'group_id': 3,
   'required': True},
  'Coordinate_System': {'type': 'items',
   'value': 25,
   'group_id': 2,
   'required': True},
  'Attribute_Definition_Version': {'type': 'text',
   'value': 'v1',
   'group_id': 3,
   'required': True}}}

# Exercise

Everything is very similar to what seen in the 'Simple pipeline' notebook except for two main differences: 
 1. all the 3 extra_group_fields are used (Sample, Coordinate_System_Set and Fields);
 2. The name of the extra_group_fields Coordinate_System_Set is different with respect the corresponding key in the extra_fields dictionary Coordinate_System. In my convention this means that translating into NeXus Coordinate_System_Set has a NXClass different with respect to the Coordinate_System. Indeed ifyou look at the NeXus structure we have Coordinate_System_Set is a group inside the group Coordinate_Index_Set.

