# Lightshow Basic Usage

The [Lightshow](https://github.com/AI-multimodal/Lightshow) software package is a one-stop-shop for writing computational spectroscopy input files. In this tutorial, we'll show you how to initialize a `Database` object from the Materials Project, and use that database to write input files for the FEFF, VASP, EXCITING, Xspectra and OCEAN codes.

📝 **Note:** This notebook is a tutorial designed to be run via online hosting services, but you can of course run it locally, it just might require a few modifications.

📝 **Note:** You can find our arXiv Preprint here: https://arxiv.org/abs/2211.04452.

⚠️ **Important Note:** In order to pull data from the Materials Project, you will have to setup an API key. Lightshow currently uses the v2 API of the Materials Project, and you can find the instructions on how to get an API key here: https://materialsproject.org/api.

In [None]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi = False

In [None]:
from pprint import pprint

## Summary

Below, we showcase a workflow for performing a two-step process for constructing a structure-spectra (or more generally, structure-property) database for future use in e.g. data-driven analysis techniques, or simply for convenience.
1. Database construction: information is pulled from the Materials Project.
2. Input file writing: all input files for the desired calculations are saved systematically. This includes all desired metadata and the precise states of all of the input file writers.

**Note:** For convenience, set your Pymatgen API key in an environment variable called `MP_API_KEY`. Alternatively, you can pass it directly when calling the `from_materials_project` classmethod below.

# Install Lightshow

You can install lightshow via `pip` in this single command!

In [None]:
!pip install lightshow

# Demo

Before you do anything else, you'll need to get a Materials Project API key. Instructions can be found here: https://materialsproject.org/api. Once you have it, simply replace `_YOUR_API_KEY_` below with your key, and run the cell.

⚠️ **Warning:** Do not in general share any API keys or save them in notebooks. These keys are tied to YOU, and at the very least, usually the terms of service dictate that you are the only one allowed to use the key. 

In [None]:
api_key=_YOUR_API_KEY_

## The `Database` object

The Lightshow API is designed with simplicity in mind. It contains two core objects: the `Database` and parameter files that inherit from `lightshow.parameters._base._BaseParameters`.

You can import the `Database` object via

```python
from lightshow import Database
```

It is a lightweight wrapper for two primary pieces of information. The structures

```python
db = Database(...)
db.structures
```

which is a dictionary of keys (which usually represent Materials Project IDs) and `pymatgen.core.structure.Structure` objects.

The second piece of information is the metadata:

```python
db.metadata
```

which generally is curated metadata from the Materials Project.

Let's import the `Database` object below and instantiate it from a few materials. In the cells below, the Pymatgen API key is read from an environment variable (`PMG_API_KEY`), but you can always pass it directly to any of the `Database` classmethods by using the `api_key` keyword argument.

In [None]:
import sys
sys.path.append("..")

In [None]:
import lightshow
from lightshow import Database
print(lightshow.__version__)

In [None]:
database = Database.from_materials_project(material_ids=[
    "mp-390",
    "mp-1215",
    "mp-1840",
    "mp-2657",
    "mp-2664",
    "mp-430",
    "mp-458",
    "mp-10734",
    "mp-390",
    "mp-754147",
])

You can always access the structures, metadata and any errors that were logged during the retrieval of the data:

## The `Parameters` and workflow

Once the `Database` is created, you can write various input files for each type of spectroscopy code of interest. Currently available are:
* FEFF
* VASP
* XSpectra
* EXCITING
* OCEAN

though for this tutorial we'll focus on FEFF and VASP. The remaining workflow goes as such:
1. Instantiate a parameter object for each of the input files of interest
2. Pass those parameter objects to the `Database` `write` method

Let's import the parameters we need.

In [None]:
from lightshow import FEFFParameters, VASPParameters, OCEANParameters, EXCITINGParameters, XSpectraParameters

### FEFF Parameters
Documentation can be found [here](https://ai-multimodal.github.io/Lightshow/quickstart.html#feff).

Next let's construct the FEFF inputs, which mimic Pymatgen's API, but allow for easier systematic input file generation, which we'll explain when we run the `write` method.

The `FEFFParameters` takes a few core arguments, including `cards` (which generally just mimic that of the FEFF input file itself), `edge` (for the type of edge you want, e.g. `"K"` or `"L3"`) and `radius`. The `radius` key is very important, as this determines the size of the cluster, with respect to the absorbing atom, that will be generated and saved.

In [None]:
feff_params = FEFFParameters(
    cards={
        "S02": "0",
        "COREHOLE": "RPA",
        "CONTROL": "1 1 1 1 1 1",
        "XANES": "4 0.04 0.1",
        "SCF": "7.0 0 100 0.2 3",
        "FMS": "9.0 0",
        "EXCHANGE": "0 0.0 0.0 2",
        "RPATH": "-1"
    },
    edge="K",
    radius=10.0
)

Other input keyword arguments to `FEFFParameters` include:
- `spectrum`, which can be either `"XANES"` or `"EXAFS"`
- `name`, which defaults to `"FEFF"` and indicates the name of the directory corresponding to this parameter (note that every `Parameters` object requires this keyword argument to be set)
- Any other keyword arguments passed will be passed to the `FEFFDictSet` Pymatgen object

### VASP Parameters

Documentation can be found [here](https://ai-multimodal.github.io/Lightshow/quickstart.html#vasp).

Next, we'll do the same thing for VASP. This is a bit more complex, since VASP requires quite a few different types of input files. Check out the documentation to see exactly what each of these do. For now, we'll use many of the defaults.

Note that each parameter type takes different arguments. We also note that there are some defaults that you can make use of, and modify accordingly. For example, we'll use the `VASP_INCAR_DEFAULT_COREHOLE_POTENTIAL` default:

In [None]:
from lightshow.defaults import VASP_INCAR_DEFAULT_COREHOLE_POTENTIAL

Setting the `VASPParameters` is a bit trickier than that of FEFF, as the code requires a bit more specificity in its input files.

#### Incar file

The `INCAR` parameters are passed to the `incar` keyword argument, and look something like this:

```
{
 'ALGO': 'Normal',
 'CH_LSPEC': True,
 'CH_NEDOS': 40000,
 'CH_SIGMA': 0.05,
 'CLL': 0,
 'CLN': 1,
 'CLNT': 1,
 'CLZ': 1.0,
 'EDIFF': 1e-05,
 'EDIFFG': -0.01,
 'ICORELEVEL': 2,
 'ISMEAR': 0,
 'ISPIN': 2,
 'KPAR': 1,
 'LAECHG': True,
 'LCHARG': False,
 'LREAL': 'Auto',
 'LWAVE': False,
 'NCORE': 1,
 'NELM': 200,
 'NSIM': 16,
 'PREC': 'Accurate',
 'SIGMA': 0.05,
 'SYMPREC': 1e-05
}
 ```
 
This is a standard corehole calculation input file. Note a few important parameters, such as the `Normal` algorithm type, `CH_LSPEC == True` and the number of points to compute on the spectrum (the default is much too small).

#### POTCAR directory

As the VASP code requires a license, we cannot provide POTCAR files directly. Instead, we provide a mechanism for accessing the user's potentials and incorporating them into the input-file-writing process.

Below, we set `potcar_directory` to `None`, which then looks at an environment variable `VASP_POTCAR_DIRECTORY`. If this is not found, POTCAR files will not be written.

The directory `VASP_POTCAR_DIRECTORY` contains subdirectories that look like this:

```
Ac
Ag
Ag_GW
Ag_pv
Ag_sv_GW
Al
...
Cr
Cr_pv
...
```

Each of these subdirectories contains a `POTCAR` file, which is used to construct the final `POTCAR` for the calculation.

There is also the question of which subdirectory to use for each element. For example, for `Ag`, we could use `Ag`, `Ag_GW`, `Ag_pv` or `Ag_sv_GW`. The default settings for each element are provided in `lightshow.parameters.vasp.VASP_POTCAR_DEFAULT_ELEMENT_MAPPING`. Specific entries can be overridden by setting the `potcar_element_mapping` dictionary in `VASPParameters` during instantiation.

Additionally, you can set the most important calculational parameters:
 - `kpoints` a method to define the resolution of the k-grid of the Brilloiun zone; details on this parameter can be found [here](https://arxiv.org/abs/2303.17089). The method is described at the docstirng of `lightshow.common.kpoints.GenericEstimatorKpoints`, which takes two inputs `cutoff` (effective crystal size) and `max_radii` in Bohr. Based on our benchmark study on Ti K-edge XANES, 43 Bohr is a good choice for VASP and XSpectra; 33 Bohr is a good choice for OCEAN and EXCITING.
 - `nbands` a method for defining the number of unoccupied bands to be considered. The method is described at the docstring of `lightshow.common.nbands.UnitCellVolumeEstimate`, which takes one input `erange` (the expected energy range).

In [None]:
vasp_params_corehole = VASPParameters(
    incar=VASP_INCAR_DEFAULT_COREHOLE_POTENTIAL,
    potcar_directory=None,
    force_spin_unpolarized=False,
    kpoints=lightshow.common.kpoints.GenericEstimatorKpoints(cutoff=43),
    nbands=lightshow.common.nbands.UnitCellVolumeEstimate(e_range=40)
)

### OCEAN Parameters

Next let's construct the OCEAN inputs, which mimic Pymatgen's API, but allow for easier systematic input file generation, which we'll explain when we run the `write` method.

The `OCEANParameters` takes a few core arguments, including `cards` (which generally just mimic that of the OCEAN input file itself) and `edge` (for the type of edge you want, e.g. `"K"` or `"L3"`). Additionally, you can set the most important calculational parameters:

Additionally, you can set the most important calculational parameters:
 - `kpoints` a method to define the resolution of the k-grid of the Brilloiun zone; details on this parameter can be found [here](https://arxiv.org/abs/2303.17089). The method is described at the docstirng of `lightshow.common.kpoints.GenericEstimatorKpoints`, which takes two inputs `cutoff` (effective crystal size) and `max_radii` in Bohr.
 - `nbands` a method for defining the number of unoccupied bands to be considered. The method is described at the docstring of `lightshow.common.nbands.UnitCellVolumeEstimate`, which takes one input `erange` (the expected energy range).

In [None]:
mycards={'dft': 'qe', 'ecut': '-1', 'opf.program': 'hamann', 'para_prefix': 'srun -n 36'}
mycards['pp_database'] = 'ONCVPSP-PBE-PDv0.4-stringent'
mycards['core_offset'] = 'true'
mycards['screen.nkpt'] = '-2.25'
mycards['screen_energy_range'] = '100'
mycards['screen.shells'] = '3.5 4.0 4.5 5.0 5.5 6.0'
mycards['cnbse.rad'] = '3.5'

ocean_params = OCEANParameters(
    edge="K",
    diel=6.84,
    cards=mycards,
    kpoints=lightshow.common.kpoints.GenericEstimatorKpoints(cutoff=33),
    nbands=lightshow.common.nbands.UnitCellVolumeEstimate(e_range=40)
)

### EXCITING Parameters

Documentation can be found [here](https://exciting-code.org/)

EXCITING requires two types of input files. The `input.xml` and the `species` files. 

The `input.xml` is created by Lightshow, while the path to the `species` files can be provided. By default, Lightshow sets the calculation directory as the path for the `species` files, which means before running the calculations, you need to copy `species` files to working directories manually.

The `EXCITINGParameters` takes the arguments `cards` (which generally mimics the EXCITING input itself; refer the EXCITING documentation for details) and `edge` (as for the other codes). Lightshow provides a default `cards` at `lightshow.defaults.EXCITING_DEFAULT_CARDS`. The details of the default cards are:
```
{'structure': {'speciespath': './', 'autormt': 'true'},
 'groundstate': {'xctype': 'GGA_PBE',
  'nempty': '200',
  'rgkmax': '9.0',
  'do': 'fromscratch',
  'gmaxvr': '25',
  'lmaxmat': '10',
  'lmaxvr': '10',
  'lmaxapw': '10'},
 'xs': {'xstype': 'BSE',
  'vkloff': '0.05 0.03 0.13',
  'nempty': '150',
  'gqmax': '4.0',
  'broad': '0.0327069',
  'tevout': 'true',
  'tappinfo': 'true',
  'energywindow': {'intv': '178.2 180.5', 'points': '1000'},
  'screening': {'screentype': 'full', 'nempty': '150'},
  'BSE': {'xas': 'true',
   'bsetype': 'singlet',
   'nstlxas': '1 20',
   'distribute': 'true',
   'eecs': '1000'},
  'qpointset': {'qpoint': {'text()': '0.0 0.0 0.0'}}}}
 ```
This default card is a dictionary, you can simply change your own input values using dictionary operations. For example, if you want to change the `bsetype` to independent spectra calculation:

In [None]:
from lightshow.defaults import EXCITING_DEFAULT_CARDS
mycards = EXCITING_DEFAULT_CARDS
mycards['xs']['BSE']['bsetype'] = 'IP'

Please note we use separate argument to control some of the input parameters. For example, if you want to change the `speciespath`, you need to provide the value to `species_directory` in the `EXCITINGParameters` instead of modifying the `cards` parameter.

Additionally, you can set the most important calculational parameters:
 - `kpoints` a method to define the resolution of the k-grid of the Brilloiun zone; details on this parameter can be found [here](https://arxiv.org/abs/2303.17089). The method is described at the docstirng of `lightshow.common.kpoints.GenericEstimatorKpoints`, which takes two inputs `cutoff` (effective crystal size) and `max_radii` in Bohr.
 - `nbands` a method for defining the number of unoccupied bands to be considered. The method is described at the docstring of `lightshow.common.nbands.UnitCellVolumeEstimate`, which takes one input `erange` (the expected energy range).

In [None]:
exciting_params = EXCITINGParameters(
    cards=mycards,
    edge="K",
    kpoints=lightshow.common.kpoints.GenericEstimatorKpoints(cutoff=33),
    nbands=lightshow.common.nbands.UnitCellVolumeEstimate(e_range=40)  
)

### XSPECTRA Parameters

Documentation can be found [here](https://www.quantum-espresso.org/Doc/INPUT_XSPECTRA)

XSpectra calculates the XANES spectra in two steps: an scf calculation and a spectra calcualtion. The `es.in` is the input file of the prerequisite scf calculation, which is based on Quantum Espresso. The `xanes.in` is the input file for the spectra calcualtions. Three folders named as `dipole{1..}` will be generated by Lightshow, corresponding the three polar directions. The `gs.in` used for the ground state calculation will also be generated.

#### Pseudo potential
XSpectra requires core-hole pseudo potential for the absorbing atom in `es.in`. Lightshow does not provide any core-hole pseudo potential along with its release. Users need to generate corresponding the core-hole pseudo potential by themselves. Users also need to take care of the pseudo potentials for the elements other than the absorber. In the input files like `es.in` and `gs.in`, a placeholder for pseduo potential of each element is used, for example, Ti.upf. 

If the users have a pseudo potential database they want to link, such as [SSSP database](https://www.materialscloud.org/discover/sssp/table/precision), they can point to the directory for the database 
 using `psp_directory` parameter. Besides, they will also need to create a cutoff table file in **JSON** format containing some basic information about the database and provide the name of the cutoff table file to `psp_cutoff_table`. An example of the cutoff table in **JSON** format looks like:
```
        {
                'Ti': {'filename': 'Ti.upf',
                'cutoff_wfc': 50.0,
                'cutoff_rho': 200.0},
                'O': {'filename': 'O.upf',
                'cutoff_wfc': 60.0,
                'cutoff_rho': 240.0},               
        }
```
where the keys are element names, and inside it are the filename, cutoff_wfc, and cutoff_rho. By setting the `psp_directory` and corresponding `psp_cutoff_table`, Lightshow will copy the pseudo potential files to the working directory and take care of the pseudo potential filename and corresponding energy cutoffs in the input files. 

#### Core-hole Pseudo potential
The users can link the the directory where the core-hole pseudo potentials and the core wavefunctions are stored using `chpsp_directory` parameter in Lightshow. The naming of the core-hole pseudo potentials and core wavefunctions should follow strickly the naming convention: *element*.fch.upf and Core_*element*.wfc, where *element* is the target element.

The `XSpectraParameters` takes the argumentes `card` and `edge`. Below is a default `card` provided by Lightshow, which can be stored at `lightshow.defaults.XSPECTRA_DEFAULT_CARDS`.
```
{'QE': {'control': {'restart_mode': 'from_scratch', 'wf_collect': '.true.'},
  'electrons': {'conv_thr': 1e-08, 'mixing_beta': 0.4},
  'system': {'degauss': 0.002,
   'ecutrho': 320,
   'ecutwfc': 40,
   'nspin': 1,
   'occupations': 'smearing',
   'smearing': 'gauss'}},
 'XS': {'cut_occ': {'cut_desmooth': 0.3},
  'input_xspectra': {'outdir': '../',
   'prefix': 'pwscf',
   'xcheck_conv': 200,
   'xerror': 0.01,
   'xniter': 5000,
   'xcoordcrys': '.false.'},
  'kpts': {'kpts': '2 2 2', 'shift': '0 0 0'},
  'plot': {'cut_occ_states': '.true.',
   'terminator': '.true.',
   'xemax': 70,
   'xemin': -15.0,
   'xnepoint': 400}}}
```
The default `card` is just dictionary with two main keys: `QE` key governs the basic parameter for Quantum Espresso calculation; `XS` determines the XANES calcualtion. You can change the default parameters by using simple dictionary operations. Please note: some of the default parameters could be overwritten by some other keywords in `XSpectraParameters`, such as the energy cutoff, number of kpoints, convergence threshold etc.

In [None]:
from lightshow.defaults import XSPECTRA_DEFAULT_CARDS
mycards = XSPECTRA_DEFAULT_CARDS

Additionally, you can set the most important calculational parameters:
 - `kpoints` a method to define the resolution of the k-grid of the Brilloiun zone; details on this parameter can be found [here](https://arxiv.org/abs/2303.17089). The method is described at the docstirng of `lightshow.common.kpoints.GenericEstimatorKpoints`, which takes two inputs `cutoff` (effective crystal size) and `max_radii` in Bohr.

In [None]:
from lightshow.common.kpoints import GenericEstimatorKpoints

In [None]:
xspectra_params = XSpectraParameters(
    cards=mycards,
    edge="K",
    kpoints=lightshow.common.kpoints.GenericEstimatorKpoints(cutoff=43),
)

## Write

Once the parameter files are created, we can call the `write` method. This does a few things, and reading the documentation for `write` is advised. However, for starters, note that there are a few key parameters that should be set in general.
* `root`: the root directory for saving the files. Everything gets saved here.
* `absorbing_atom`: when doing any spectroscopy calculation, the `absorbing_atom` must be specified. The exception to this is for "global" calculations, such as a pure SCF VASP calculation, where `absorbing_atom=None` is permitted.
* `options`: this is a list of the parameter files you've defined above. This tells `write` which types of spectroscopy input files to create.
* There are also various "global" property cutoffs the user can specify, such as `max_primitive_total_atoms`, which should be self-explanatory. For now though, leaving them as the default causes them to behave sensibly.

In [None]:
database.write("test", options=[feff_params, vasp_params_corehole, ocean_params, exciting_params, xspectra_params], absorbing_atoms=["Ti"])

# MSONable

Every object of importance in `Lightshow` can be serialized as a Python dictionary. This allows users to save any object they want and reload it from disk in a readable way, significantly extending the transparency of the core objects. For example:

In [None]:
pprint(database.as_dict())
pprint(feff_params.as_dict())
pprint(vasp_params_corehole.as_dict())
pprint(ocean_params.as_dict())

The code makes use of this ability during the use of `write`, where an object `writer_metadata.json` is always saved, which details every aspect of the input files. 

# Optional: download your files

If you are using Google Colab or something like it, you can download your files via something like this.

In [None]:
!zip -r /content/test.zip /content/test

In [None]:
from google.colab import files
files.download("/content/test.zip")