<br>
<p align="center">
<img src="./logo.png" alt= "MXgap logo" width=600>
</p>

<h1 style="text-align: center;"> MXgap: A Machine Learning Program to predict MXene Bandgaps


`mxgap` is a computational tool designed to streamline electronic structure calculations for MXenes using hybrid functionals like PBE0. By employing Machine Learning (ML) models, `mxgap` predicts the PBE0 bandgap based on features extracted from a PBE calculation. Aside from its CLI interface, it can also be used as an imported module. In this Notebook some examples are found.

## 1. Getting Started

Predictions can be made either using the `run_prediction()` or the `ML_prediction()` functions. The `run_prediction()` receives the same arguments as in the CLI and does input validation, and the runs `ML_prediction()` internally. While the `ML_prediction()` will directly run the prediction with the ML model chosen. Both will return the prediction (or predictions when choosing a C+R model combination) in a list, and write a file (`mxgap.info`) in the selected path folder with a report of the calculation.

For example, to use the best model available (a combination of GBC classifier and RFR regressor):

In [1]:
from mxgap import run_prediction

path         = "."                  # Path to the folder where the CONTCAR and DOSCAR are present
model        = "GBC+RFR_onlygap"    # "best" or "default" can also be used to get the best model.
prediction   =  run_prediction(path = path, model = model)




                            MXgap Report                           

Date:            2025-02-10 21:47:50
Model Used:      GBC+RFR_onlygap
Folder Path:     .
CONTCAR file:    ./CONTCAR
DOSCAR file:     ./DOSCAR
Output Path:     ./mxgap.info

    
Predicted ML_isgap  =  1 (Semiconductor)
Predicted ML_gap    =  1.961

Finished successfully in 0.53s


The direct paths for the CONTCAR and DOSCAR can also be given, with the files argument:

In [2]:
from mxgap import run_prediction

files        = ["./CONTCAR","./DOSCAR"]     # List with the CONTCAR and DOSCAR files
model        = "GBC+RFR_onlygap"            # "best" or "default" can also be used to get the best model.
prediction   =  run_prediction(files = files, model = model)



                            MXgap Report                           

Date:            2025-02-10 21:47:54
Model Used:      GBC+RFR_onlygap
Folder Path:     None
CONTCAR file:    ./CONTCAR
DOSCAR file:     ./DOSCAR
Output Path:     ./mxgap.info

    
Predicted ML_isgap  =  1 (Semiconductor)
Predicted ML_gap    =  1.961

Finished successfully in 0.06s


And the same can be done with the `ML_prediction()` function:

In [3]:
from mxgap import ML_prediction

contcar_path    = "./CONTCAR"            # Path to the CONTCAR file
doscar_path     = "./DOSCAR"             # Path to the DOSCAR file
model           = "GBC+RFR_onlygap"      # ML model
prediction      =  ML_prediction(contcar_path,doscar_path,model)

Predicted ML_isgap  =  1 (Semiconductor)
Predicted ML_gap    =  1.961


There are several models available, between Classifiers and Regressors (and can be combined). Generally, the models that are not trained with DOS information (_notDOS) are faster and do not require the DOSCAR file, but the results are less accurate. We recommend using the default model "GBC+RFR_onlygap", which is a combination of a Classifier (metallic/semiconductor) and a Regressor (bandgap prediction). More info about the ML models in the models/ folder.

In [4]:
from mxgap.utils import load_models_list

models_list, models_list_string = load_models_list()
print(models_list_string)

Classifiers	     Regressors (full)    Regressors (only gap) Regressors (edges)
GBC                  GBR                  GBR_onlygap          	GBR_edges           
RFC                  RFR                  RFR_onlygap          	RFR_edges           
SVC                  SVR                  SVR_onlygap          	SVR_edges           
MLPC                 MLPR                 MLPR_onlygap         	MLPR_edges          
LR                   KRR                  KRR_onlygap          	KRR_edges            
GBC_notDOS           GBR_notDOS           GBR_onlygap_notDOS   	GBR_edges_notDOS    
RFC_notDOS           RFR_notDOS           RFR_onlygap_notDOS   	RFR_edges_notDOS    
SVC_notDOS           SVR_notDOS           SVR_onlygap_notDOS   	SVR_edges_notDOS    
MLPC_notDOS          MLPR_notDOS          MLPR_onlygap_notDOS  	MLPR_edges_notDOS   
LR_notDOS            KRR_notDOS           KRR_onlygap_notDOS   	KRR_edges_notDOS    



## 2. Batch calculations

The program can be used in batch to quickly screen different MXenes. Here is done for the examples available in the `test/examples/` folder, but you can use whatever paths you need:

In [None]:
import os
from mxgap import run_prediction

examples_folder = "../mxgap/test/examples/" 
paths   = [examples_folder + e for e in os.listdir(examples_folder)]

for mxene_path in paths:
    print(mxene_path)
    prediction = run_prediction(mxene_path)


## 3. Feature extraction

If needed, you can easily extract the feature arrays that the ML models uses to predict the bandgap:

In [5]:
from mxgap.features import get_contcar_array, get_doscar_array

# Non-normalized arrays from CONTCAR and DOSCAR files
contcar_array   = get_contcar_array("./CONTCAR")    # Elemental + structural features from the CONTCAR file
doscar_array    = get_doscar_array("./DOSCAR")      # DOS features extracted from the DOSCAR
print(contcar_array,doscar_array,sep="\n")

[ 1.          0.          0.          4.00767035  6.52237803  1.78952712
  1.78952712  2.92510078  2.92510078  3.9986449   3.9986449   2.74218817
  2.74218817 57.          3.          6.          1.1         0.5575462
  2.43        1.95        6.         14.          2.55        1.26211361
  1.7         0.7        17.         17.          3.          3.16
  3.61272528  1.75        1.        ]
[-2.74500000e+00 -2.10400000e+00  6.41000000e-01  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  1.61024000e+00  5.43692000e+00
  1.81104800e+01  9.98550000e+00  4.97735000e+00  1.30065220e+00
  3.16461800e+01  2.04537200e+01  2.33658200e+01  3.86112040e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0

The ML models actually recieve a normalized version of these arrays, achieved with the `make_data_array()` function, which takes care of everything:

In [6]:
from mxgap.features import make_data_array
from mxgap.utils import load_normalization

# Final feature array, the one that the model actually reads
norm_x_contcar, norm_x_doscar, norm_y = load_normalization()    # We need normalization constants
data_array = make_data_array("CONTCAR","DOSCAR",needDOS=True,norm_x_contcar=norm_x_contcar,norm_x_doscar=norm_x_doscar)
data_array
# The DOS part is acctually not normalized, to conserve the different number of electrons between systems

array([ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  9.09631933e-01,
        2.51629672e-01,  5.91180702e-01,  5.91425345e-01,  7.28368395e-01,
        7.30668148e-01,  8.51341178e-01,  8.53155363e-01,  6.02698551e-01,
        6.33307511e-01,  6.79245283e-01,  0.00000000e+00,  1.00000000e+00,
       -1.05263158e-01,  5.72541818e-01,  1.42307692e+00,  1.33333333e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  1.00000000e+00,
        1.00000000e+00,  1.00000000e+00,  3.07692308e-01,  1.00000000e+00,
        5.00000000e-01,  5.63829787e-01,  1.00000000e+00,  6.77083333e-01,
        6.52173913e-01,  2.06422535e-01,  2.78647887e-01,  3.02930057e-01,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  1.61024000e+00,
        5.43692000e+00,  1.81104800e+01,  9.98550000e+00,  4.97735000e+00,
        1.30065220e+00,  3.16461800e+01,  2.04537200e+01,  2.33658200e+01,
        3.86112040e+00,  

## 4. Structure

To get all the structural informaiton, a `Structure()` object class was created that inherites from `ase.Atoms`. This has all the properties of `ase.Atoms` plus some extra functionality thought for MXenes, like get the stacking and hollows, add a termination to the surface, get the *M*, *X*, *T* positions or symbols separately, ... Here are some examples:

In [7]:
from mxgap.structure import Structure

structure = Structure("CONTCAR")

## Sets vacuum to M2X or M2XT2 structure.
structure.add_vacuum(vacuum=30)

## Shifts the slab a certain amount
structure.shift(3)

## Shifts to zero/origin all the atoms 
structure.to_zero()

## Separated M, X, T atoms
M_pos,X_pos,T_pos = structure.getMXT()                          # By positions
M_symbols,X_symbols,T_symbols =structure.getMXT(symbols=True)   # By symbols

## Get stacking and T hollow position
stack, hollows = structure.get_stack_hollows()

## Adds Termination to structure.
structure.addT("O",hollow="HX")
structure.addT("H",hollow="HX")

## Write as a new POSCAR file
structure.write("POSCAR_new","vasp",direct=True)

## Convert to FHI-AIMS geometry.in
structure.write("geometry.in","aims",scaled=True)

## Extracts geometry parameters (lattice parameter and width, with extra=True also bond distances, etc)
geom = structure.get_geom(extra=True)