<a href="https://colab.research.google.com/github/Exabyte-io/api-examples/blob/dev/examples/job/run-simulations-and-extract-properties.ipynb" target="_parent">
<img alt="Open in Google Colab" src="https://user-images.githubusercontent.com/20477508/128780728-491fea90-9b23-495f-a091-11681150db37.jpeg" width="150" border="0">
</a>

# Run Simulations and Extract Properties

This example demonstrates how to use Mat3ra RESTful API to create simulation [Jobs](https://docs.mat3ra.com/jobs/overview/) programmatically for multiple [Materials](https://docs.mat3ra.com/materials/overview/) at once and extract the resulting [Properties](https://docs.mat3ra.com/properties/overview/) forming a [Pandas](https://pandas.pydata.org/) dataframe.

This approach can work with any [Workflows](https://docs.mat3ra.com/workflows/overview/). For the demonstration purpose we use the Density Functional Theory and extract Electronic Band Gap as the property of interest.

> <span style="color: orange">**IMPORTANT NOTE**</span>: In order to run this example in full, an active Mat3ra.com account with access to VASP (Vienna ab-initio simulations package) is required. Alternatively, Readers may substitute the workflow ID below with another one (an equivalent one for Quantum ESPRESSO, for example) and adjust extraction of the results ("Extract results" section). RESTful API credentials shall be updated in [settings](../../utils/settings.json).


## Steps

We follow the below steps:

- Import materials from [Materials Bank](https://docs.mat3ra.com/materials/bank/)

- Group imported materials inside a [materials set](https://docs.mat3ra.com/entities-general/sets/)

- Create jobs for the materials and grouping them inside a [jobs set](https://docs.mat3ra.com/entities-general/sets/)

- Submit jobs and monitoring the progress

- Extract the [final structure](https://docs.mat3ra.com/properties/structural/final-structure) (relaxed structure) and its properties

- Output the results as Pandas dataFrame

## Pre-requisites

The explanation below assumes that the reader is familiar with the concepts used in Mat3ra platform and RESTful API. We outline these below and direct the reader to the original sources of information:

- [Generating RESTful API authentication parameters](../system/get_authentication_params.ipynb)
- [Importing materials from Materials Bank](../material/get_materials_by_formula.ipynb)
- [Creating and submitting jobs](../job/create_and_submit_job.ipynb)

# Complete Authorization Form and Initialize Settings

This will also determine environment and set all environment variables. We determine if we are using Jupyter Notebooks or Google Colab to run this tutorial.

If you are running this notebook from Google Colab, Colab takes ~1 min to execute the following cell.

ACCOUNT_ID and AUTH_TOKEN - Authentication parameters needed for when making requests to [Mat3ra.com's API Endpoints](https://docs.mat3ra.com/rest-api/endpoints/).

ORGANIZATION_ID - Authentication parameter needed for when working with collaborative accounts https://docs.mat3ra.com/collaboration/organizations/overview/

> <span style="color: orange">**NOTE**</span>: If you are running this notebook from Jupyter, the variables ACCOUNT_ID, AUTH_TOKEN, MATERIALS_PROJECT_API_KEY, and ORGANIZATION_ID should be set in the file [settings.json](../../utils/settings.json) if you need to use these variables. To obtain API token parameters, please see the following link to the documentation explaining how to get them: https://docs.mat3ra.com/accounts/ui/preferences/api/

In [None]:
# @title Authorization Form
ACCOUNT_ID = "ACCOUNT_ID"  # @param {type:"string"}
AUTH_TOKEN = "AUTH_TOKEN"  # @param {type:"string"}
ORGANIZATION_ID = "ORGANIZATION_ID"  # @param {type:"string"}

import os
import sys
import json

if "COLAB_JUPYTER_IP" in os.environ:
    os.environ.update(
        dict(
            ACCOUNT_ID=ACCOUNT_ID,
            AUTH_TOKEN=AUTH_TOKEN,
            ORGANIZATION_ID=ORGANIZATION_ID,
        )
    )

    !GIT_BRANCH="dev"; export GIT_BRANCH; curl -s "https://raw.githubusercontent.com/Exabyte-io/api-examples/${GIT_BRANCH}/scripts/env.sh" | bash

if sys.platform == "emscripten":
    apiConfig = data_from_host.get("apiConfig")
    os.environ.update(data_from_host.get("environ", {}))
    os.environ.update(
        dict(
            ACCOUNT_ID=apiConfig.get("accountId"),
            AUTH_TOKEN=apiConfig.get("authToken"),
            ORGANIZATION_ID=apiConfig.get("organizationId", ""),
            CLUSTERS=json.dumps(apiConfig.get("clusters", [])),
        )
    )

    import micropip

    await micropip.install("mat3ra-api-examples", deps=False)
    from utils.jupyterlite import install_packages

    await install_packages("api_examples")

### Import packages

In [None]:
import time
from IPython.display import IFrame

# Import settings file and utils file
from utils.settings import ENDPOINT_ARGS, ACCOUNT_ID
from utils.generic import wait_for_jobs_to_finish, get_property_by_subworkflow_and_unit_indicies, dataframe_to_html

import pandas as pd

# Relevant functions from the API client
from exabyte_api_client.endpoints.jobs import JobEndpoints
from exabyte_api_client.utils.materials import flatten_material
from exabyte_api_client.endpoints.projects import ProjectEndpoints
from exabyte_api_client.endpoints.materials import MaterialEndpoints
from exabyte_api_client.endpoints.bank_workflows import BankWorkflowEndpoints
from exabyte_api_client.endpoints.bank_materials import BankMaterialEndpoints
from exabyte_api_client.endpoints.properties import PropertiesEndpoints

#### Materials

- **MATERIALS_SI**: the name of the Silicon material to import from Standata
- **MATERIALS_GE**: the name of the Germanium material to import from Standata
- **TAGS**: a list of [tags](https://docs.mat3ra.com/entities-general/data/#tags) to assign to imported materials
- **MATERIALS_SET_NAME**: the name of the materials set


In [None]:
# Materials Bank queries for Si and Ge
MATERIAL_SI = "Si"
MATERIAL_GE = "Ge"
MATERIALS_SET_NAME = "materials-set"
TAGS = ["tag1", "tag2"]

#### Jobs

Parameters for the jobs to be ran for the imported materials:

- **JOB_NAME_PREFIX**: prefix to be used for the job name with "{JOB_NAME_PREFIX} {FORMULA}" convention (e.g.  "Job Name Prefix - SiGe")
- **JOBS_SET_NAME**: the name of the jobs set

In [None]:
JOB_NAME_PREFIX = "Job Name Prefix"
JOBS_SET_NAME = "jobs-set"

#### Workflow

This example is based on [this](https://platform.mat3ra.com/bank/workflows/xxx) bank workflow which is later copied to the account workflows collection. The workflow is utilizing ESPRESSO simulation engine at version

In [None]:
BANK_WORKFLOW_SYSTEM_NAME = "espresso-band-gap"

#### Compute

Setup compute parameters. See [this](https://docs.mat3ra.com/infrastructure/compute/parameters/) for more information about compute parameters.

- **CLUSTER_NAME**: The full qualified domain name (FQDN) or alias of the cluster to submit the jobs into.
- **PPN**: Number of MPI processes per each node, Defaults to 1.
- **NODES**: Number of nodes. Defaults to 1.
- **QUEUE_NAME**: The name of queue to submit the jobs into. Defaults to "D".
- **TIME_LIMIT**: Job walltime. Defaults to "01:00:00" (one hour).

> <span style="color: orange">**NOTE**</span>: Although here we set the QUEUE to be debug, it is possible the job might run out of memory, and result in an Errored-Jobs status. If this happens, we suggest you switch from `QUEUE = D` to `QUEUE = OR` to avoid memory limitations.


In [None]:
# using the first available cluster and queue in the list
cluster_config = next(iter(json.loads(os.getenv("CLUSTERS"))), {})
queue_configs = cluster_config.get("queues", [])

CLUSTER_NAME = cluster_config.get("displayName", "cluster-001")
PPN = 1
NODES = 1
QUEUE_NAME = next(iter(queue_configs), {}).get("NAME", "D")
TIME_LIMIT = "01:00:00"

### Initialize endpoints

In [None]:
job_endpoints = JobEndpoints(*ENDPOINT_ARGS)
project_endpoints = ProjectEndpoints(*ENDPOINT_ARGS)
material_endpoints = MaterialEndpoints(*ENDPOINT_ARGS)
property_endpoints = PropertiesEndpoints(*ENDPOINT_ARGS)
bank_workflow_endpoints = BankWorkflowEndpoints(*ENDPOINT_ARGS)
bank_material_endpoints = BankMaterialEndpoints(*ENDPOINT_ARGS)

Next, we retrieve the owner and project IDs as they are needed by the endpoints. Account's default material is used to extract the owner ID. One can extract the owner ID from any other account's [entities](https://docs.mat3ra.com/entities-general/overview/).

In [None]:
OWNER_ID = os.getenv("ORGANIZATION_ID") or ACCOUNT_ID
project_id = project_endpoints.list({"isDefault": True, "owner._id": OWNER_ID})[0]["_id"]

### Create workflow

Copy bank workflow (template) to the account's workflows collection.

In [None]:
bank_workflow_id = bank_workflow_endpoints.list({"systemName": BANK_WORKFLOW_SYSTEM_NAME})[0]["_id"]
workflow_id = bank_workflow_endpoints.copy(bank_workflow_id, OWNER_ID)["_id"]

In [None]:
# Visualize the bank workflow below
# NOTE: might not be rendered on Github
IFrame("https://platform.mat3ra.com/analytics/workflows/{}".format(bank_workflow_id), width=900, height=650)

### Import materials

Get materials from Standata and upload to account's materials collection.

In [None]:
from mat3ra.standata.materials import Materials

material_json_si = Materials.get_by_name_first_match(MATERIAL_SI)
material_json_ge = Materials.get_by_name_first_match(MATERIAL_GE)

# Add tags to the materials
material_json_si["tags"] = TAGS
material_json_ge["tags"] = TAGS

# Create materials in the account's materials collection
material_id_si = material_endpoints.create(material_json_si, OWNER_ID)["_id"]
material_id_ge = material_endpoints.create(material_json_ge, OWNER_ID)["_id"]

material_si = material_endpoints.list({"_id": material_id_si})[0]
material_sige = material_endpoints.list({"_id": material_id_ge})[0]

materials = [material_sige, material_si]

Create a materials set and move the materials into it.

In [None]:
materials_set = material_endpoints.create_set({"name": MATERIALS_SET_NAME, "owner": {"_id": OWNER_ID}})
for material in materials:
    material_endpoints.move_to_set(material["_id"], "", materials_set["_id"])

### Create jobs

Create jobs for the materials above.

In [None]:
compute = job_endpoints.get_compute(CLUSTER_NAME, PPN, NODES, QUEUE_NAME, TIME_LIMIT)
jobs = job_endpoints.create_by_ids(materials, workflow_id, project_id, JOB_NAME_PREFIX, OWNER_ID, compute)

Create a jobs set and move the jobs into it.

In [None]:
jobs_set = job_endpoints.create_set({"name": JOBS_SET_NAME, "projectId": project_id, "owner": {"_id": OWNER_ID}})
for job in jobs:
    job_endpoints.move_to_set(job["_id"], "", jobs_set["_id"])

Submit the jobs for execution.

In [None]:
for job in jobs:
    job_endpoints.submit(job["_id"])

Monitor the jobs and print the status until they are all finished.

In [None]:
job_ids = [job["_id"] for job in jobs]
wait_for_jobs_to_finish(job_endpoints, job_ids)

### Extract results

For each material, simulaion job, final structure, pressure and band gaps are extracted.

- Final structure and pressure are extracted from the first unit (vasp_relax with index 0) of the first job's subworkflow (volume-relaxation with index 0)

- Band gaps are extracted from the second unit (vasp-bands with index 1) of the second job's subworkflow (SCF-BS-BG-DOS with index 1).

In [None]:
results = []
for material in materials:
    job = next((job for job in jobs if job["_material"]["_id"] == material["_id"]))
final_structure = get_property_by_subworkflow_and_unit_indicies(property_endpoints, "final_structure", job, 0, 0)[
    "data"
]
pressure = get_property_by_subworkflow_and_unit_indicies(property_endpoints, "pressure", job, 0, 0)["data"]["value"]
unit_flowchart_id = job["workflow"]["subworkflows"][0]["units"][1]["flowchartId"]
band_gap_direct = property_endpoints.get_direct_band_gap(job["_id"], unit_flowchart_id)
band_gap_indirect = property_endpoints.get_indirect_band_gap(job["_id"], unit_flowchart_id)
results.append(
    {
        "initial_structure": material,
        "final_structure": final_structure,
        "pressure": pressure,
        "band_gap_direct": band_gap_direct,
        "band_gap_indirect": band_gap_indirect,
    }
)

### Flatten results

The below for-loop iterates over the results and flatten them to form the final Pandas dataFrame.

In [None]:
table = []
for result in results:
    data = flatten_material(result["initial_structure"])
data.extend(flatten_material(result["initial_structure"]))
data.extend([result["pressure"], result["band_gap_direct"], result["band_gap_indirect"]])
table.append(data)

### Output results

Form the Pandas dataFrame headers according to the table generated above with the following abbreviations:

- **"INI"**: INITIAL
- **"FIN"**: FINAL
- **"N-SITES"**: Number of Sites
- **"LAT"**: LATTICE

In [None]:
headers = []
keys = ["ID", "NAME", "TAGS", "NS", "LAT-A", "LAT-B", "LAT-C", "LAT-ALPHA", "LAT-BETA", "LAT-GAMMA"]
headers.extend(["-".join(("INI", key)) for key in keys])
headers.extend(["-".join(("FIN", key)) for key in keys])
headers.extend(["PRESSURE", "DIRECT-GAP", "INDIRECT-GAP"])

Create and print the final table as Pandas dataFrame.

In [None]:
df = pd.DataFrame(data=table, columns=headers)
html = dataframe_to_html(df)
html