# Using the NOMAD API

In [1]:
from pprint import pprint

from nomad_utility_workflows.utils.users import get_user_by_id, who_am_i, search_users_by_name
from nomad_utility_workflows.utils.utils import get_authentication_token
from nomad_utility_workflows.utils.entries import get_entry_by_id, get_entries_of_upload, query_entries, get_entries_of_my_uploads
from nomad_utility_workflows.utils.datasets import retrieve_datasets, create_dataset, delete_dataset, get_dataset_by_id
from nomad_utility_workflows.utils.uploads import upload_files_to_nomad, get_upload_by_id, publish_upload, edit_upload_metadata, get_all_my_uploads, delete_upload

from decouple import config as environ

In [2]:
import logging
# Configure the logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)  # Set the logging level

# Create a console handler and set the level to debug
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)

# Create a formatter and set it for the handler
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)

# Add the handler to the logger
logger.addHandler(ch)

## NOMAD URLs

The NOMAD URL specifies the base address of the API for the NOMAD deployment of interest. Typically, this URL is structured as `https://<deployment_base_path>/api/v1`.

By default, `nomad_utility_workflows` uses the Test deployment of NOMAD to make API calls. This is simply a safety mechanism so that users do not accidentally publish something during testing. 

All API functions allow the user to specify the URL with the optional keyword argument `url`. If you want to use the central NOMAD URLs, you can simply set `url` equal to `prod`, `staging`, or `test`, which correspond to the following deployments (see full URLs below):

- prod: the official NOMAD deployment. 
    - Updated most infrequently (as advertised in #software-updates on the NOMAD Discord Server)
- staging: the beta version of NOMAD. 
    - Updated more frequently than prod, integrating new features. 
- test: a test NOMAD deployment. 
    - The data is occassionally wiped, such that test publishing can be made.

Note that the `prod` and `staging` deployments share a common database, and that publishing on either will result in publically available data.

Alternatively to these short names, the user can use the `url` input to specify the full API address to some alternative NOMAD deployment.

In [3]:
from nomad_utility_workflows.utils.utils import NOMAD_TEST_URL, NOMAD_STAGING_URL, NOMAD_PROD_URL
print(NOMAD_PROD_URL, NOMAD_STAGING_URL, NOMAD_TEST_URL)

https://nomad-lab.eu/prod/v1/api/v1 https://nomad-lab.eu/prod/v1/staging/api/v1 https://nomad-lab.eu/prod/v1/test/api/v1


## Authentication

Some API calls, e.g., making uploads or accessing your own non-published uploads, require an authentication token. To generate this token, `nomad_utility_workflows` expects that your NOMAD credentials are stored in a `.env` file in the plugin root directory in the format:

```bash
NOMAD_USERNAME="<your_nomad_username>"
NOMAD_PASSWORD="<your_nomad_password>"
```

You can access these explicitly with:

In [4]:
NOMAD_USERNAME = environ("NOMAD_USERNAME")
NOMAD_PASSWORD = environ("NOMAD_PASSWORD")
NOMAD_USERNAME

'JFRudzinski'

Use `get_authentication_token()` with your credentials to explicitly obtain and store a token (`nomad_utility_workflows()` will automatically obtain a token for API calls that require authentication):



In [5]:
token = get_authentication_token(username=NOMAD_USERNAME, password=NOMAD_PASSWORD, url='test')
token

'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJmb1hmZnM5QlFQWHduLU54Yk5PYlExOFhnZnlKU1FNRkl6ZFVnWjhrZzdVIn0.eyJleHAiOjE3MjkwMjE5ODYsImlhdCI6MTcyODkzNTU4NiwianRpIjoiMzJlZDc0OWUtMzMyNS00YmZiLWE3YTEtNmI4YWMwZDA3YzMzIiwiaXNzIjoiaHR0cHM6Ly9ub21hZC1sYWIuZXUvZmFpcmRpL2tleWNsb2FrL2F1dGgvcmVhbG1zL2ZhaXJkaV9ub21hZF9wcm9kIiwic3ViIjoiOGYwNTJlMWYtMTkwNi00MWZkLWIyZWItNjkwYzAzNDA3Nzg4IiwidHlwIjoiQmVhcmVyIiwiYXpwIjoibm9tYWRfcHVibGljIiwic2Vzc2lvbl9zdGF0ZSI6ImYwMTVkYjYyLTY1NGItNDIyMS1hMDE0LWI4ZjdiMWZmYTY1YiIsInNjb3BlIjoib3BlbmlkIHByb2ZpbGUgZW1haWwiLCJzaWQiOiJmMDE1ZGI2Mi02NTRiLTQyMjEtYTAxNC1iOGY3YjFmZmE2NWIiLCJlbWFpbF92ZXJpZmllZCI6dHJ1ZSwibmFtZSI6Ikpvc2VwaCBSdWR6aW5za2kiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJqZnJ1ZHppbnNraSIsImdpdmVuX25hbWUiOiJKb3NlcGgiLCJmYW1pbHlfbmFtZSI6IlJ1ZHppbnNraSIsImVtYWlsIjoicnVkemluc2tpQG1waXAtbWFpbnoubXBnLmRlIn0.JcDM3bYXZbA7fY7MKTiInZJ6zRvByNzrLvBjKM4n9eacmF4GBQcKD7l3dOxtrefLtlMMif36gZ0r2dEAI7kRCy5z4KiSzcyS5TVAk-FlItaHlyHGYt_D286vd1MSs88ZfiSHDkIACKtGbMgwIhhZyPUq8QtHSpCQwwJEkPW6-IF

### NOMAD User Metadata

`nomad_utility_workflows` uses the `NomadUser()` class to store the following user metadata:

```python
class NomadUser:
    user_id: str
    name: str
    first_name: str 
    last_name: str 
    username: str 
    affiliation: str 
    affiliation_address: str 
    email: str
    is_oasis_admin: bool 
    is_admin: bool
    repo_user_id: str 
    created: dt.datetime
```


You can retrieve your own personal info with the `who_am_i()`:

In [13]:
nomad_user_me = who_am_i(url='test')
nomad_user_me

NomadUser(name='Joseph Rudzinski')

Similarly, you can query NOMAD for other users with `search_users_by_name()`:

In [11]:
nomad_users = search_users_by_name('Rudzinski', url='test')
nomad_users

[NomadUser(name='Joseph Rudzinski'), NomadUser(name='Joseph Rudzinski')]

In the case of multiple matches or for robustly identifying particular users, e.g., coauthors, in the future, it may be useful to store their `user_id`&mdash;a persistent identifier for each user account. Then, in the future you can use `get_user_by_id()` to grab the user info.

In [12]:
nomad_user = get_user_by_id(nomad_users[0].user_id, url='test')
nomad_user

NomadUser(name='Joseph Rudzinski')

### Uploading Data

`nomad_utility_workflows` uses the `NomadUpload()` class to store the following upload metadata:

```python
class NomadUpload:
    upload_id: str
    upload_create_time: dt.datetime
    main_author: NomadUser
    process_running: bool
    current_process: str
    process_status: str
    last_status_message: str
    errors: list[Any]
    warnings: list[Any]
    coauthors: list[str]
    coauthor_groups: list[Any]
    reviewers: list[NomadUser]
    reviewer_groups: list[Any]
    writers: list[NomadUser]
    writer_groups: list[Any]
    viewers: list[NomadUser]
    viewer_groups: list[Any]
    published: bool
    published_to: list[Any]
    with_embargo: bool
    embargo_length: float
    license: str
    entries: int
    n_entries: Optional[int] 
    upload_files_server_path: Optional[str] 
    publish_time: Optional[dt.datetime] 
    references: Optional[list[str]] 
    datasets: Optional[list[str]] 
    external_db: Optional[str] 
    upload_name: Optional[str]
    comment: Optional[str] 
    url: Optional[str]
    complete_time: Optional[dt.datetime]
```

You can make an upload using the `upload_files_to_nomad()` function with input `filename=<path_to_a_zip_file_with_your_upload_data>`, as follows:  

In [14]:
test_upload_fnm = './test.zip'

In [15]:
upload_id = upload_files_to_nomad(filename=test_upload_fnm, url='test')
upload_id

'z4QvhZ7qSCmgIFv_qJqlyQ'

### Checking the upload status

The returned `upload_id` can then be used to directly access the upload, e.g., to check the upload status, using `get_upload_by_id()`:

In [10]:
nomad_upload = get_upload_by_id(upload_id, use_prod=False)

pprint(nomad_upload)

NomadUpload(upload_id='DN61X4r7SCyzm5q1kxcEcw',
            upload_create_time=datetime.datetime(2024, 10, 14, 10, 55, 12, 410000),
            main_author=NomadUser(name='Joseph Rudzinski'),
            process_running=False,
            current_process='process_upload',
            process_status='SUCCESS',
            last_status_message='Process process_upload completed successfully',
            errors=[],
            coauthors=[],
            coauthor_groups=[],
            reviewers=[],
            reviewer_groups=[],
            writers=[NomadUser(name='Joseph Rudzinski')],
            writer_groups=[],
            viewers=[NomadUser(name='Joseph Rudzinski')],
            viewer_groups=[],
            published=False,
            published_to=[],
            with_embargo=False,
            embargo_length=0.0,
            license='CC BY 4.0',
            entries=1,
            n_entries=None,
            upload_files_server_path='/nomad/test/fs/staging/D/DN61X4r7SCyzm5q1kxcEcw',

One common usage of this function is to ensure that an upload has been processed successfully before making a subsequent action on it, e.g., editing the metadata or publishing. For this purpose, one could require the `process_running==False` or `process_status='SUCCESS'`, e.g.:

```python
import time

max_wait_time = 20 * 60  # 20 minutes in seconds
interval = 2 * 60  # 2 minutes in seconds
elapsed_time = 0

while elapsed_time < max_wait_time:
    nomad_upload = get_upload_by_id(upload_id, use_prod=False)
    
    # Check if the upload is complete
    if nomad_upload.get('process_status') == 'SUCCESS':
        break
    
    # Wait for 2 minutes before the next call
    time.sleep(interval)
    elapsed_time += interval
else:
    raise TimeoutError("Maximum wait time of 20 minutes exceeded. Upload is not complete.")
```

### Editing the upload metadata

After your upload is processed successfully, you can add coauthors, references, and other comments, as well as link to a dataset and provide a name for the upload. Note that the coauthor is specified by an email address that should correspond to the email linked to the person's NOMAD account, which can be access from `NomadUser.email`. The metadata should be stored as a dictionary as follows:

```python
metadata = {
    "metadata": {
    "upload_name": '<new_upload_name>',
    "references": ["https://doi.org/xx.xxxx/xxxxxx"],
    "datasets": '<dataset_id>',
    "embargo_length": 0,
    "coauthors": ["coauthor@affiliation.de"],
    "comment": 'This is a test upload...'
    },
}
```

For example:

In [16]:
metadata_new = {'upload_name': "Test Upload", 'comment': "This is a test upload..."}
edit_upload_metadata(upload_id, url='test', **metadata_new)

{'upload_id': 'z4QvhZ7qSCmgIFv_qJqlyQ',
 'data': {'process_running': True,
  'current_process': 'edit_upload_metadata',
  'process_status': 'PENDING',
  'last_status_message': 'Pending: edit_upload_metadata',
  'errors': [],
  'complete_time': '2024-10-14T20:20:39.868000',
  'upload_id': 'z4QvhZ7qSCmgIFv_qJqlyQ',
  'upload_create_time': '2024-10-14T20:20:38.757000',
  'main_author': '8f052e1f-1906-41fd-b2eb-690c03407788',
  'coauthors': [],
  'coauthor_groups': [],
  'reviewers': [],
  'reviewer_groups': [],
  'writers': ['8f052e1f-1906-41fd-b2eb-690c03407788'],
  'writer_groups': [],
  'viewers': ['8f052e1f-1906-41fd-b2eb-690c03407788'],
  'viewer_groups': [],
  'published': False,
  'published_to': [],
  'with_embargo': False,
  'embargo_length': 0,
  'license': 'CC BY 4.0',
  'entries': 1,
  'upload_files_server_path': '/nomad/test/fs/staging/z/z4QvhZ7qSCmgIFv_qJqlyQ'}}

Before moving on, let's again check that this additional process is complete:

In [17]:
nomad_upload = get_upload_by_id(upload_id, url='test')

pprint(nomad_upload)

NomadUpload(upload_id='z4QvhZ7qSCmgIFv_qJqlyQ',
            upload_create_time=datetime.datetime(2024, 10, 14, 20, 20, 38, 757000),
            main_author=NomadUser(name='Joseph Rudzinski'),
            process_running=False,
            current_process='edit_upload_metadata',
            process_status='SUCCESS',
            last_status_message='Process edit_upload_metadata completed '
                                'successfully',
            errors=[],
            coauthors=[],
            coauthor_groups=[],
            reviewers=[],
            reviewer_groups=[],
            writers=[NomadUser(name='Joseph Rudzinski')],
            writer_groups=[],
            viewers=[NomadUser(name='Joseph Rudzinski')],
            viewer_groups=[],
            published=False,
            published_to=[],
            with_embargo=False,
            embargo_length=0.0,
            license='CC BY 4.0',
            entries=1,
            n_entries=None,
            upload_files_server_path='/n

### Accessing individual entries of an upload

During the upload process, NOMAD automatically identfies representative files that indicate the presence of data that can be parsed with the included parser plugins within a given deployment. This means that each upload can contain multiple *entries*&mdash;the fundamental unit storage within the NOMAD database.

You can query the individual entries within a known upload with `get_entries_of_upload()`:

In [18]:
entries = get_entries_of_upload(upload_id, url=url, with_authentication=True)
entries

TypeError: get_entries_of_upload() got an unexpected keyword argument 'use_prod'

In [14]:
entry = get_entry_by_id(entries[0].entry_id, use_prod=False, with_authentication=True)
entry

NomadEntry(entry_id='7A6lJb-14xR9lxXO8kjuYt5-vxg2', upload_id='DN61X4r7SCyzm5q1kxcEcw', references=[], origin='Joseph Rudzinski', n_quantities=0, nomad_version='1.3.7.dev55+ge83de27b3', upload_create_time=datetime.datetime(2024, 10, 14, 10, 55, 12, 410000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), nomad_commit='', processing_errors=[], entry_name='test.archive.json', last_processing_time=datetime.datetime(2024, 10, 14, 10, 55, 12, 808000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), parser_name='parsers/archive', calc_id='7A6lJb-14xR9lxXO8kjuYt5-vxg2', published=False, writers=[NomadUser(name='Joseph Rudzinski')], processed=True, mainfile='test.archive.json', main_author=NomadUser(name='Joseph Rudzinski'), entry_create_time=datetime.datetime(2024, 10, 14, 10, 55, 12, 563000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), with_embargo=False, entry_type=None, license='CC BY 4.0', domain=None, comment='This is a test upload...', upload_name='Test

In [15]:
# delete_upload(upload_id, use_prod=False)

In [16]:
publish_upload(nomad_upload.upload_id, use_prod=False)

{'upload_id': 'DN61X4r7SCyzm5q1kxcEcw',
 'data': {'process_running': True,
  'current_process': 'publish_upload',
  'process_status': 'PENDING',
  'last_status_message': 'Pending: publish_upload',
  'errors': [],
  'complete_time': '2024-10-14T10:55:16.139000',
  'upload_id': 'DN61X4r7SCyzm5q1kxcEcw',
  'upload_name': 'Test Upload',
  'upload_create_time': '2024-10-14T10:55:12.410000',
  'main_author': '8f052e1f-1906-41fd-b2eb-690c03407788',
  'coauthors': [],
  'coauthor_groups': [],
  'reviewers': [],
  'reviewer_groups': [],
  'writers': ['8f052e1f-1906-41fd-b2eb-690c03407788'],
  'writer_groups': [],
  'viewers': ['8f052e1f-1906-41fd-b2eb-690c03407788'],
  'viewer_groups': [],
  'published': False,
  'published_to': [],
  'with_embargo': False,
  'embargo_length': 0,
  'license': 'CC BY 4.0',
  'entries': 1,
  'upload_files_server_path': '/nomad/test/fs/staging/D/DN61X4r7SCyzm5q1kxcEcw'}}

In [17]:
my_datasets = retrieve_datasets(user_id=nomad_user_me.user_id, use_prod=True, max_datasets=20)
pprint(my_datasets)

TypeError: NomadDataset.__init__() got an unexpected keyword argument 'user_id'

In [4]:
# dataset_id = create_dataset("test dataset", use_prod=False)
# dataset_id

dataset_id = create_dataset("test martignac", use_prod=True)
dataset_id

'ebwoNMcASJyL2ZDeO4R3SQ'

In [23]:
metadata_new = {'dataset_id': dataset_id}
edit_upload_metadata(upload_id, use_prod=False, **metadata_new)

{'upload_id': 'aMG_RQZLRFmd861n8EdVNg',
 'data': {'process_running': True,
  'current_process': 'edit_upload_metadata',
  'process_status': 'PENDING',
  'last_status_message': 'Pending: edit_upload_metadata',
  'errors': [],
  'complete_time': '2024-07-01T11:08:34.973000',
  'upload_id': 'aMG_RQZLRFmd861n8EdVNg',
  'upload_name': 'Test Upload',
  'upload_create_time': '2024-07-01T11:08:22.913000',
  'main_author': '7c85bdf1-8b53-40a8-81a4-04f26ff56f29',
  'coauthors': [],
  'coauthor_groups': [],
  'reviewers': [],
  'reviewer_groups': [],
  'writers': ['7c85bdf1-8b53-40a8-81a4-04f26ff56f29'],
  'writer_groups': [],
  'viewers': ['7c85bdf1-8b53-40a8-81a4-04f26ff56f29'],
  'viewer_groups': [],
  'published': True,
  'published_to': [],
  'publish_time': '2024-07-01T11:08:34.959000',
  'with_embargo': False,
  'embargo_length': 0,
  'license': 'CC BY 4.0',
  'entries': 1}}

In [24]:
dataset = get_dataset_by_id(dataset_id, use_prod=False)
dataset

NomadDataset(dataset_id='71V3LrAhQOaYcqILcetQTg', dataset_create_time=datetime.datetime(2024, 7, 1, 11, 9, 26, 760000), dataset_name='test dataset', dataset_type='DatasetType.owned', dataset_modified_time=datetime.datetime(2024, 7, 1, 11, 9, 26, 760000), user=NomadUser(name='Joseph Rudzinski'), doi=None, pid=None, m_annotations=None)

In [25]:
query_entries(dataset_id=dataset_id, use_prod=False)

[NomadEntry(entry_id='Me3_ysgR6EaWCvPbYei7hFPBQU8b', upload_id='aMG_RQZLRFmd861n8EdVNg', references=[], origin='Joseph Rudzinski', n_quantities=0, nomad_version='1.3.0', upload_create_time=datetime.datetime(2024, 7, 1, 11, 8, 22, 913000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), nomad_commit='', processing_errors=[], entry_name='test.archive.json', last_processing_time=datetime.datetime(2024, 7, 1, 11, 8, 23, 424000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), parser_name='parsers/archive', calc_id='Me3_ysgR6EaWCvPbYei7hFPBQU8b', published=True, writers=[NomadUser(name='Joseph Rudzinski')], processed=True, mainfile='test.archive.json', main_author=NomadUser(name='Joseph Rudzinski'), entry_create_time=datetime.datetime(2024, 7, 1, 11, 8, 23, 217000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), with_embargo=False, entry_type=None, license='CC BY 4.0', domain=None, comment='This is a test upload...', upload_name='Test Upload', text_search_cont

In [26]:
# delete_dataset(dataset_id, use_prod=False)

In [27]:
get_entries_of_my_uploads(use_prod=False)

[NomadEntry(entry_id='u6jMV0SRNoWbZS3dy5GwUPTzI0pf', upload_id='E70NAf4KR7S2Eal5K1ojVQ', references=[], origin='Tristan Bereau', n_quantities=0, nomad_version='1.2.2.dev365+g0c980916a', upload_create_time=datetime.datetime(2024, 5, 1, 11, 57, 40, 911000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), nomad_commit='', processing_errors=['Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.', 'Error parsing interactions.', 'Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.', 'Error creating MDAnalysis universe.'], entry_name='GROMACS MolecularDynamics simulation', last_processing_time=datetime.datetime(2024, 5, 1, 11, 57, 42, 262000, tzinfo=datetime.timezone(datetime.timedelta(0), '+0000')), parser_name='parsers/gromacs', calc_id='u6jMV0S

In [28]:
get_all_my_uploads(use_prod=False)



   entries can be organized hierarchically into *uploads*, *workflows*, and *datasets*. Since the parsing execution is dependent on automated identification of representative files, users are free to arbitrarily group simulations together upon upload. In this case, multiple entries will be created with the corresponding simulation data. An additional unique identifier, \texttt{upload\_id}, will be provided for this group of entries. Although the grouping of entries into an upload is not necessarily scientifically meaningful, it is practically useful for submitting batches of files from multiple simulations to NOMAD. Concretely, Martignac utilizes uploads to group all $\lambda$ coupling points of a thermodynamic-integration calculation. This is particularly convenient since NOMAD retains the original directory structure when storing all the raw and processed data.