# Customising and chaining atomate2 workflows

So far, you have run the stock atomate2 workflows directly without customisation. In practice, you'll want to configure the input sets for most calculations. Furthermore, one of the most powerful aspects of atomate2 is the ease of developing new workflows by chaining together existing jobs and flow.

In this hands-on, we will tackle both of these points. Topics covered include:
- The BaseVaspMaker for VASP jobs
- Input set generators and how to customise them
- Powerups
- Adding metadata
- Querying the job store and MongoDB syntax
- Chaining jobs
- Writing a simple flow

Just like in the first session, we'll be running calculations through [jobflow-remote](https://github.com/Matgenix/jobflow-remote) on the CECAM HPC cluster. If you're experiencing any issues with your installation, just let the helpers know.

## VASP jobs and the BaseVaspMaker 

Let's start by looking in detail at the structure of a VASP job Maker. As with every Maker, the way you create jobs is through the make function. Vasp job Makers can be found in

```python
atomate2.vasp.jobs.<subpackage>
```

In [None]:
from atomate2.vasp.jobs.core import RelaxMaker

rm = RelaxMaker()
rm.make?

As we saw in the last session, the VASP makers accept a pymatgen [Structure](https://github.com/materialsproject/pymatgen/blob/master/src/pymatgen/core/structure.py) object as input and optional directory from the previous calculation.

The make function itself doesn't provide any way of customising the VASP settings. Instead, this is controlled by the Maker fields. These were outlined in the lecture, but let's see a reminder here.

In [None]:
RelaxMaker?

Here we have options for controlling every stage of the VASP calculation:
- Copying files from the previous directory (`copy_vasp_kwargs`)
- Writing inputs (`input_set_generator`)
- Running VASP (`run_vasp_kwargs`)
- Loading the task document from the output files (`task_document_kwargs`)
- Checking the calculation was successful (`stop_children_kwargs`)

In particular, these keyword arguments are passed to the relevant functions which are called inside the make method. Jupyter lab does not render the links to these functions, but if you view the [RelaxMaker](https://materialsproject.github.io/atomate2/reference/atomate2.vasp.jobs.core.RelaxMaker.html) page in the atomate2 documentation, you can follow the links to see exactly what can be configured.

We can specify custom parameters by initialising the RelaxMaker appropriately. For example, below we ensure that the calculation always returns successfully whether or not electronic and ionic convergence has been achieved.

In [None]:
rm = RelaxMaker(stop_children_kwargs={"handle_unsuccessful": False})

You should read the docstring for [should_stop_children](https://materialsproject.github.io/atomate2/reference/atomate2.aims.run.should_stop_children.html#atomate2.aims.run.should_stop_children) to check you understand the above code.

### Activity – copying files

Now you have seen how to customise the stages of the VASP job, it is time to practice yourself. The goal of this activity is to configure the job to copy the CHGCAR from the previous directory. The steps to achieve this are as follows

1. Decide which of the stages controls copying files.
2. Look up the docstring for the relevant function.
3. Determine how to configure the copied files.
4. Apply these options in the RelaxMaker.

If you get stuck, we've provided some hints below.

<details>
<summary> Hint 1 </summary>

[copy_vasp_outputs](https://materialsproject.github.io/atomate2/reference/atomate2.vasp.files.copy_vasp_outputs.html) is responsible for copying previous outputs ot the current directory.
</details>

</br>

<details>
<summary> Hint 2 </summary>
The <code>additional_vasp_files</code> option controls which files to copy. In this case we'll need to set this as:
<code>["CHGCAR"]</code>
</details>

</br>

<details>
<summary> Answer </summary>
Putting the hints together, you should get:
</br></br>

```python
rm = RelaxMaker(copy_vasp_kwargs={"additional_vasp_files": ["CHGCAR"]})
```
</details>

In [None]:
# Use this space to solve the activity

rm = RelaxMaker(
    ...  # update me
)

All of the above functionality is provided by BaseVaspMaker from which all VASP job makers derive. You won't ever use this class directly, but will subclass it when creating new VASP makers. More details on the [BaseVaspMaker](https://materialsproject.github.io/atomate2/reference/atomate2.vasp.jobs.base.BaseVaspMaker.html) is available on the online documentation.

## Creating inputs with input set generators

Input set generators are responsible for converting a pymatgen [Structure](https://pymatgen.org/pymatgen.core.html#pymatgen.core.structure.Structure) object into the input files needed to run VASP. As a reminder, these are:

- **POSCAR**: Lattice, atomic positions, and atom types.
- **INCAR**: Calculation settings (functional, convergence criteria, etc)
- **POTCAR**: Pseudopotentials
- **KPOINTS**: K-point sampling (optional)

Let's start by creating a RelaxSetGenerator (used by the RelaxMaker above). VASP input sets live either in pymatgen or atomate2 at the following paths:

```python
# in atomate2
atomate2.vasp.sets.<subpackage>

# in pymatgen
pymatgen.io.vasp.sets
```

For the purposes of this tutorial, we'll use the standard atomate2 input sets. The pymatgen input sets differ in the choice of functional (PBE vs PBEsol) and pseudopotential versions (PBE vs PBE_54).

In [None]:
from atomate2.vasp.sets.core import RelaxSetGenerator

isg = RelaxSetGenerator()

By itself, the input set generator is just a mechanism to generate the VASP inputs at a later date. We do this using the get_input_set function. For that we'll to provide a structure as input.

In [None]:
from pymatgen.core import Structure

structure = Structure.from_file("Si.vasp")

input_set = isg.get_input_set(structure, potcar_spec=True)

Note, we have to use `potcar_spec=True` in this notebook as we don't have the VASP psuedopotential files installed. In practice, this function is only called when the job is executed, so the potcars only need to be configured on the HPC where the job runs.

We can access the individual VASP input files using the attributes of the input set.

In [None]:
incar = input_set.incar
kpoints = input_set.kpoints
poscar = input_set.poscar

print(f"""Default RelaxSetGenerator inputs for Si

INCAR
=====
{incar}
      
KPOINTS
=======
{kpoints}

POSCAR
======
{poscar}""")

We can also write the inputs to a directory. Using the write_input function. This will create the folder if it doesn't already exist.

In [None]:
input_set.write_input("Si_inputs")

Look in the Si_inputs folder and confirm that the files have been written successfully. What energy cutoff does atomate2 use by default and what is the energy and force convergence criteria?

<details>
<summary> Hint </summary>

The parameters are specified by the tags:

- `ENCUT`: Plane wave energy cutoff.
- `EDIFF`: Energy convergence criteria.
- `EDIFFG`: Force convergence criteria.

</details>


## Customising input sets

It is very common to modify the VASP input settings for a job. Use cases include:

- Tuning parallelisation settings
- Selecting the exchange-correlation functional
- Modifying convergence settings 
- Changing the pseudopotentials
- Increasing the k-point sampling density

All of these can be controlled through the input set generator. To understand, let's see the docstring for RelaxSetGenerator.

In [None]:
RelaxSetGenerator?

Unfortunately, the docstring doesn't render properly. Instead we need to look at the docstring for VaspInputGenerator. This is the superclass from which all input set makers derive.

In [None]:
from atomate2.vasp.sets.base import VaspInputGenerator

VaspInputGenerator?

The docstring contains many options making it very flexible. In day-to-day usage of atomate2, you won't need most of these options and can instead focus on.
- **user_incar_settings**: To set specific INCAR settings
- **user_kpoint_settings**: For configuring the k-point mesh density
- **user_potcar_settings**: To control specific POTCAR choice (e.g. "Bi" vs "Bi_d").
- **user_potcar_functional**: To control the version of pseudpotentials used, e.g. "PBE_54" vs "PBE_64".

For example, we can set a custom energy cutoff.

In [None]:
isg = RelaxSetGenerator(user_incar_settings={"ENCUT": 1000})

We can confirm this worked successfully.

In [None]:
input_set = isg.get_input_set(structure, potcar_spec=True)
incar = input_set.incar

print("The ENCUT value is", incar["ENCUT"])

We can configure the k-point mesh sampling in two ways, either through setting the reciprocal density (larger values indicate denser meshes) or by specifying a pymatgen [Kpoints](https://pymatgen.org/pymatgen.io.vasp.html#pymatgen.io.vasp.inputs.Kpoints) object directly. This second option is not recommended except in specific circumstances. Instead, reciprocal density is more flexibile, e.g. when creating a supercell, the k-point sampling will adjust accordingly.

In [None]:
from pymatgen.io.vasp import Kpoints

# option 1 - reciprocal density
isg = RelaxSetGenerator(user_kpoints_settings={"reciprocal_density": 200})

# option 2 - Kpoints object
kpoints = Kpoints.automatic([6, 6, 6])
isg = RelaxSetGenerator(user_kpoints_settings=kpoints)

As the warning indicates, it is also possible to specify the k-point mesh through the [KSPACING](https://www.vasp.at/wiki/index.php/KSPACING) INCAR tag. This is used by some of the Materials Project r2SCAN workflows. This option means the KPOINTS file is not generated.

In [None]:
# option 3 - KSPACING
isg = RelaxSetGenerator(user_incar_settings={"KSPACING": 0.44})

input_set = isg.get_input_set(structure, potcar_spec=True)

print(input_set.kpoints)

Note, in this case, the kpoints attribute is set to None.


We use a similar process to configure which pseudopotentials to use. Note, in most cases the default will be fine as atomate2 is configured to use the recommeneded pseudopotentials listed on the [VASP website](https://www.vasp.at/wiki/index.php/Choosing_pseudopotentials#Recommended_PAW_potentials). For example, to use the "Bi_d" pseudopotential for bismuth.

In [None]:
isg = RelaxSetGenerator(user_potcar_settings={"Bi": "Bi_d"})

### Activity – kpoint sampling

Now you have seen how to customise input sets, it is time to practice yourself. 
The goal of this activity is to investigate which reciprocal density is equivalent to a 8x8x8 k-point mesh for Si. To achieve this you need to:

1. Create an input set generator object with a specified reciprocal density.
2. Generate the input set with the silicon structure.
3. Determine the k-point sampling density.
4. Repeat until you find the density that produces a 6x6x6 k-point mesh.

If you get stuck, we've provided a hint below.

<details>
<summary> Hint 1 </summary>
The k-point mesh sampling for a specific reciprocal density can be obtained with:
</br></br>

```python
isg = RelaxSetGenerator(user_kpoints_settings={"reciprocal_density": 100})
input_set = isg.get_input_set(structure, potcar_spec=True)
kpoints = input_set.kpoints
print(kpoints.kpts)
```
</details>

</br>

<details>
<summary> Answer </summary>
Putting the hints together, you could loop over a few densities
</br></br>

```python
for density in range(100, 500, 50):
    isg = RelaxSetGenerator(user_kpoints_settings={"reciprocal_density": density})
    input_set = isg.get_input_set(structure, potcar_spec=True)
    kpoints = input_set.kpoints
    print(f"reciprocal_density {density} = {kpoints.kpts}")
```

The answer is something around 350 reciprocal density.
</details>


In [None]:
# Use this space to solve the activity



## Using custom inputs in a job

So far we have just been updating the input set generator. Now we will update the job Maker to use the custom settings. To do this, we can override the input_set_generator field of the Maker.

In [None]:
isg = RelaxSetGenerator(user_incar_settings={"ENCUT": 600})
rm = RelaxMaker(input_set_generator=isg)

While this looks simple here, for most practial workflows this can get quite tricky as you always have to make sure to select the correct input set generator for each Maker. For example, consider the case where we are trying to update the ENCUT for the band structure workflow. 

This workflow includes the following calculations:
- A static calculation to generate the charge density.
- A line-mode band structure non-self consistent field (NSCF) calculation
- A uniform band structure NSCF calculation.

This workflow has two Makers as arguments, a static maker and NSCF maker. We have to update both the input set generators and makers to customise the workflow.

In [None]:
from atomate2.vasp.flows.core import BandStructureMaker
from atomate2.vasp.sets.core import StaticSetGenerator, NonSCFSetGenerator
from atomate2.vasp.jobs.core import StaticMaker, NonSCFMaker

custom_settings = {"ENCUT": 600}

# customise the input sets
ssg = StaticSetGenerator(user_incar_settings=custom_settings)
nsg = NonSCFSetGenerator(user_incar_settings=custom_settings)

# customise the job makers
sm = StaticMaker(input_set_generator=ssg)
nsm = NonSCFMaker(input_set_generator=nsg)

# customise the flow maker
maker = BandStructureMaker(
    static_maker=sm,
    bs_maker=nsm,
)

An alternative approach is to update the input set generator directly.

In [None]:
rm = RelaxMaker()
rm.input_set_generator.user_incar_settings["ENCUT"] = 600

But this is also cumbersome for nested flows, and requires knowledge of the specific makers in the workflow.

In [None]:
maker = BandStructureMaker()
maker.static_maker.input_set_generator.user_incar_settings["ENCUT"] = 600
maker.bs_maker.input_set_generator.user_incar_settings["ENCUT"] = 600

This same process can be achieved using the update_maker_kwargs function of a Flow or job object.

In [None]:
flow = BandStructureMaker().make(structure)

flow.update_maker_kwargs(
    {"_set": {"input_set_generator->user_incar_settings->ENCUT": 600}},
    dict_mod=True,
)

However, this code is quite complicated not very user friendly. A better approach is to use powerups.

## Using powerups

Powerups provide a convenient tool to customisation.
Powerups take jobs, flows, or Maker as input and return a modified copy.
We have developed custom powerups specific for the VASP workflows in atomate2.

To start with, let's use the update_user_incar_settings powerup to modify input generator settings. In atomate2, powerups live in either

```python
# common powerups
atomate2.common.powerups

# vasp powerups
atomate2.vasp.powerups
```

In [None]:
from atomate2.vasp.powerups import update_user_incar_settings

flow = BandStructureMaker().make(structure)
flow = update_user_incar_settings(flow, {"ENCUT": 600})

This is much cleaner than any previous method. We can also use the filtering options to only update specific parts of the workflow. Two options are available, filtering by name or class.

In [None]:
# filtering by name
flow = update_user_incar_settings(flow, {"ENCUT": 600}, name_filter="static")

# filtering by class
flow = update_user_incar_settings(flow, {"ENCUT": 600}, class_filter=StaticMaker)

### Activity - powerups

Now you have seen how to use powerups, it is time to practice yourself.
The goal of this activity is to explore the other VASP powerups available in atomate2.
Specifically, can you figure out how to perform the following updates:

1. Update the k-point reciprocal_density for all calculations to 200.
2. Update the potcar functional to LDA.

If you get stuck, we've provided a hint below.

<details>
<summary> Hint 1 </summary>

You can see the powerups available on the atomate2 VASP [powerups](https://materialsproject.github.io/atomate2/reference/atomate2.vasp.powerups.html) page.
</details>

</br>

<details>
<summary> Hint 2 </summary>
You need to use the functions:

- [update_user_kpoints](https://materialsproject.github.io/atomate2/reference/atomate2.vasp.powerups.update_user_kpoints_settings.html#atomate2.vasp.powerups.update_user_kpoints_settings) 
- [update_user_potcar_functional](https://materialsproject.github.io/atomate2/reference/atomate2.vasp.powerups.update_user_potcar_functional.html#atomate2.vasp.powerups.update_user_potcar_functional)
</details>

</br>

<details>
<summary> Answer </summary>
Putting the hints together, you can solve the problem using:
</br></br>

```python
from atomate2.vasp.powerups import update_user_kpoints_settings, update_user_potcar_functional

flow = BandStructureMaker().make(structure)
flow = update_user_kpoints_settings(flow, {"reciprocal_density": 200})
flow = update_user_potcar_functional(flow, "LDA")
```
</details>



In [None]:
# Use this space to solve the activity


## Database management through metadata

When running high-throughput workflows, managing your data is key challenge.
You might be used to using your folder structure to organise calculations, however, this won't scale to thousands of systems and codes like jobflow-remote use a flat directory structure based on the time a calculation ran. 
Instead of accessing output files on disk, you instead access calculation outputs via the database.

The easiest way to make sense of your calculations (which often cover multiple projects, levels of theory, versions of parameters, etc) is through metadata. Typically, this should be information that is not readily available from the rest of the task document and might include:

- Materials Project ID of the material
- Project name or number
- Author of the calculations
- Crystal phase
- Version of the calculation settings

Jobflow has an inbuild approach to adding metadata to calculations. It automatically gets stored alongside the calculation output. Lets create a workflow with metadata now:

In [None]:
relax_job = RelaxMaker().make(structure)
relax_job.update_metadata(
    {
        "author": "Alex", 
        "project": "cecam-school", 
        "tags": ["GGA", "v1"]
    }
)

The structure of the metadata is completely up to you - there is no correct answer. The guiding principle is that it should make your calculations easier to manage once they are completed. It can often be useful to think carefully about the potential future steps of the project and design the metadata accordingly.

Lets now submit our calculation using jobflow-remote. If this looks unfamiliar, look over the resources from the first atomate2 session for a refresh.

In [None]:
from jobflow_remote import submit_flow

submit_flow(
    relax_job, 
    worker="cecam",
    resources={"nodes": 1, "ntasks": 36, "time": "03:00:00"} , 
    exec_config="vasp_6.4.3_cecam"
)

You can check the status of the calculation with:

In [None]:
! jf job info {relax_job.uuid}

Remember, if the jobflow-remote runner is not started, you can start it with:

```bash
! jf runner start
```

Once the job has finished running, we use the metadata to query it from the job store. Note, in this case we have access to the job uuid, so we could simply query the output that way, but in the future you likely won't have the uuid to hand.

In [None]:
from jobflow_remote import get_jobstore

store = get_jobstore()
store.connect()

result = store.query_one({"metadata.project": "cecam-school", "metadata.tags": "v1"})

Currently, result is a raw dictionary, but we can convert it into the task document by deserialising it with monty.

In [None]:
from monty.json import MontyDecoder

result = MontyDecoder().process_decoded(result)

We can now navigate the task document more easily.

In [None]:
print(f"""metadata {result.metadata}
energy {result.output.output.energy}
functional {result.output.input.incar["GGA"]}""")

## Advanced database queries

Often you want to search your database for very specific calculations. For example, all calculations containing oxygen but not hydrogen, calculated using the PBEsol functional. Predicting how to structure your metadata in advance to enable this query would be extremely difficult and require signficant forsight and vision for the project direction. However, the task document includes a large amount of "standard" metadata that makes this possible when combined with advanced mongoDB queries.

To see how to construct these queries, lets first run static calculations for a number of alkaline earth rocksalt metal oxides.

In [None]:
from pymatgen.core import Structure
from atomate2.vasp.jobs.core import StaticMaker

sm = StaticMaker()

for metal in ["Mg", "Ca", "Sr"]:
    structure = Structure.from_spacegroup(
        "Fm-3m",
        [[5, 0, 0], [0, 5, 0], [0, 0, 5]],
        [metal, "O"],
        [[0.0, 0.0, 0.0], [0.5, 0.5, 0.5]]
    )

    static_job = sm.make(structure)
    static_job.update_metadata({"author": "Alex", "project": "metal-oxides"})

    submit_flow(
        static_job, 
        worker="cecam",
        resources={"nodes": 1, "ntasks": 36, "time": "03:00:00"} , 
        exec_config="vasp_6.4.3_cecam"
    )

We can monitor the progress of these jobs using some jobflow-remote magic.

In [None]:
! jf job list --hours 1 --name "static"

Once they are all COMPLETED, we can develop some queries. Let's start by simply querying for all static calculations containing oxygen.

In [None]:
results = store.query({"output.elements": "O"})

for result in results:
    print(result["output"]["formula_pretty"])

Next, lets get all outputs that contain oxygen but not Sr.

In [None]:
results = store.query(
    {
        "$and": [
            {"output.elements": "O"}, 
            {"output.elements": {"$ne": "Sr"}}
        ]
    }
)

for result in results:
    print(result["output"]["formula_pretty"])

What about outputs that contain either CaO or Si:

In [None]:
results = store.query(
    {
        "$or": [
            {"output.formula_pretty": {"$all": ["Ca", "O"]}}, 
            {"output.elements": "Si"}
        ]
    }
)

for result in results:
    print(result["output"]["formula_pretty"])

As a reminder from the tutorial, a wide range of queries are available. 

| Operation            | Syntax                        | Example                             |
|----------------------|-----------------------------|-------------------------------------|
| **Comparison queries** | | |
| Equality            | `{"key": "value"}`           | `{"formula" : "SiO2"}`             |
| Less Than          | `{"key": {"$lt": "value"}}`  | `{"nsites" : {"$lt": 4}}`          |
| Less Than/Equal to | `{"key": {"$lte": "value"}}` | `{"nsites" : {"$lte": 3}}`         |
| Greater Than       | `{"key": {"$gt": "value"}}`  | `{"nsites" : {"$gt": 5}}`          |
| Greater Than/Equal to | `{"key": {"$gte": "value"}}` | `{"nsites" : {"$gte": 5}}`         |
| Not Equal to       | `{"key": {"$ne": "value"}}`  | `{"formula" : {"$ne": "SiO2"}}`    |
| Value is in       | `{"key": {"$in": [<v1>, …]}}` | `{"nsites" : {"$in": [1, 2]}}`     |
| Value is Not in   | `{"key": {"$nin": [<v1>, …]}}` | `{"formula" : {"$nin": ["SiO2"]}}` |
| Contains all      | `{"key": {"$all": [<v1>, …]}}` | `{"elements": {"$all": ["H", "O"]}}` |
| **Logical queries**| | |
| AND | `{"key": {"$and": [<c1>, <c2>, …]}}` | `{"elements": {"$and": [{"$all": ["O", "Ni"]}, {"$nin": ["H", "F"]}]}}` |
| NOT | `{"key": {"$not": [<c1>, <c2>, …]}}` | `{"elements": {"$not": [{"$all": ["O", "Ni"]}, {"$nin": ["H", "F"]}]}}` |
| OR | `{"key": {"$or": [<c1>, <c2>, …]}}` | `{"elements": {"$or": [{"$all": ["O", "Ni"]}, {"$nin": ["H", "F"]}]}}` |


### Activity - queries

Now you have seen how to use mongoDB queries, it is time to practice yourself.
The goal of this activity is to construct the queries to select the right calculations.
Specifically, you should try and filter for the following (separate) criteria.

1. Calculations containing oxygen with an energy less than -6 eV/atom.
2. Calculations with more than 1 element but not containing Ca

If you get stuck, we've provided hints below.

<details>
<summary> Hint Query 1 </summary>

The task document fields you want to filter on are:

- "output.elements"
- "output.output.energy_per_atom"

</details>

</br>

<details>
<summary> Hint Query 2 </summary>
    
The task document fields you want to filter on are:

- "output.nelements"
- "output.elements"</details>

</br>

<details>
<summary> Answer </summary>
Putting the hints together, you can solve the problem using:
</br></br>

```python
# query 1
results = store.query({"output.elements": "O", "output.output.energy_per_atom": {"$lt": -6}})

# query 2
results = store.query({"output.nelements": {"$gt": 1}, "output.elements": {"$ne": "Ca"}})
```
</details>



In [None]:
# Use this space to solve the activity



## Chaining jobs

So far we have run pre-prepared VASP workflows and jobs but often you'll want to combine them in novel ways. To do this, let's make the most basic multi-stage workflow - a relaxation followed by a static.

In [None]:
from jobflow import Flow

structure = Structure.from_file("Si.vasp")

relax_job = RelaxMaker().make(structure)
static_job = StaticMaker().make(
    relax_job.output.structure,
    prev_dir=relax_job.output.dir_name
)
flow = Flow([relax_job, static_job])
flow.update_metadata({"author": "alex", "project": "cecam-custom-flow"})

The aspects of this process are:

1. Using the output structure of the relax job as input to the static job.
2. Passing the directory of the relax job to the prev_dir keyword argument of the static job.
3. Putting the relax and static jobs into a Flow.
4. Updating the metadata in the same way for flows as for jobs.

In principle, we could skip step 2 and not pass the previous directory. However, this has several advantages:

- It allows optimisations in the input set, for example disabling spin polarisation if the magnetic moments are zero.
- It enables copying VASP outputs like CHGCAR from a previous directory (useful for band structure flows).

We can now run our Flow.

In [None]:
submit_flow(
    flow, 
    worker="cecam",
    resources={"nodes": 1, "ntasks": 36, "time": "03:00:00"} , 
    exec_config="vasp_6.4.3_cecam"
)

And check the status using jobflow remote.

In [None]:
! jf flow info {flow.uuid}

Finally, we can get the final relaxed structure and energy.

In [None]:
result = store.query_one({"metadata.project": "cecam-custom-flow", "name": "static"})
result = MontyDecoder().process_decoded(result)

print(f"""Relaxed structure
{result.output.structure}

Relaxed energy
{result.output.output.energy}""")

#### Writing a simple workflow

Let's now use these principles to write a more thorough workflow. In particular, we will create a workflow to calculate the equation of state (energy volume curve). We'll create a Maker to achieve this.

The overall process is as follows:

1. Take a structure as input.
2. Scale the volume ±3% in steps of 1%
3. Run a static calculation for each volume.

We will then perform post-processing to plot the results.

In [None]:
from dataclasses import dataclass, field
from jobflow import Maker

@dataclass
class EOSMaker(Maker):
    name: str = "eos"
    strains: tuple[int] = (-3, -2, -1, 0, 1, 2, 3)
    static_maker: Maker = field(default_factory=StaticMaker)

    def make(self, structure):
        jobs = []
        
        for strain in self.strains:
            strain_percent = strain / 100
            strain_structure = structure.apply_strain(
                strain_percent,
                inplace=False  # don't update the original structure
            )
            
            static_job = self.static_maker.make(strain_structure)
            jobs.append(static_job)

        return Flow(jobs)

Read through the code and try and understand the overall structure.
One thing that may not be familiar is `field(default_factory=StaticMaker)`. This sets the default value of the static maker.

Lets run the workflow on silicon and submit it to the cluster.

In [None]:
flow = EOSMaker().make(structure)
flow.update_metadata({"author": "alex", "project": "cecam-eos"})

In [None]:
submit_flow(
    flow, 
    worker="cecam",
    resources={"nodes": 1, "ntasks": 36, "time": "03:00:00"} , 
    exec_config="vasp_6.4.3_cecam"
)

And check the status using jobflow remote.

In [None]:
! jf flow info {flow.uuid}

We can now query for the relevant task documents and extract the energies and volumes.

In [None]:
results = store.query({"metadata.project": "cecam-eos"})

volumes = []
energies = []

for result in results:
    volumes.append(result["output"]["output"]["structure"]["lattice"]["volume"])
    energies.append(result["output"]["output"]["energy"])

Finally, let's plot the results.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot(volumes, energies, "-o")
ax.set(xlabel="Volume (Å$^3$)", ylabel="Energy (eV)")

### Activity - creating workflows

Now you have seen how to use chain and construct simple workflows, it is time to practice yourself.
The goal of this activity is to construct a flow maker to interpolate between two structures and calculate the potential energy surface.
Specifically, the tasks you should aim to achieve are:

1. Create a flow maker with the following fields:
   - nimages: defining the number of interpolated structures along the path with a default value of 5
   - static_maker: defining the static calculation job maker
2. Define a make method on the maker that accepts two structures as inputs, one called initial_structure and one called final_structure.
3. Generate the interpolated structures along the path, create static jobs for each of them, and return a Flow with the jobs.

You should use the `interpolate` function of the pymatgen structure object to help with the interpolation. You can see the docstring below.

We have provided a hint below if you get stuck.

<details>
<summary> Hint </summary>

The interpolated structures can be generated using:

```python
structures = initial_structure.interpolate(final_structure, nimages)
```

</details>

</br>

<details>
<summary> Answer </summary>
Putting the hints together, you can solve the problem using:
</br></br>

```python
from dataclasses import dataclass, field
from jobflow import Maker

@dataclass
class InterpolateMaker(Maker):
    name: str = "interpolate"
    nimages: int = 5
    static_maker: Maker = field(default_factory=StaticMaker)

    def make(self, initial_structure, final_structure):
        jobs = []

        structures = initial_structure.interpolate(
            final_structure,
            nimages=self.nimages
        )
        for structure in structures:
            static_job = self.static_maker.make(structure)
            jobs.append(static_job)

        return Flow(jobs)

flow = InterpolateMaker().make(initial_structure, final_structure)
flow.update_metadata({"author": "alex", "project": "cecam-interpolate"})

submit_flow(
    flow, 
    worker="cecam",
    resources={"nodes": 1, "ntasks": 36, "time": "03:00:00"} , 
    exec_config="vasp_6.4.3_cecam"
)

results = store.query({"metadata.project": "cecam-interpolate"})

energies = []
for result in results:
    energies.append(result["output"]["output"]["energy"])

fig, ax = plt.subplots()
ax.plot(energies, "-o")
ax.set(xlabel="Image", ylabel="Energy (eV)")
```
</details>



In [None]:
structure.interpolate?

In [None]:
# Load initial and final structures
initial_structure = Structure.from_file("BP_rocksalt.cif")
final_structure = Structure.from_file("BP_zincblende.cif")

# Use this space to solve the activity

