<br>
<p align="center">
<img src="./logo.png" alt= "MXgap logo" width=600>
</p>

<h1 style="text-align: center;"> MXgap: A Machine Learning Program to predict MXene Bandgaps


`mxgap` is a computational tool designed to streamline electronic structure calculations for MXenes using hybrid functionals like PBE0. By employing Machine Learning (ML) models, mxgap predicts the PBE0 bandgap based on features extracted from a PBE calculation. Aside from its CLI interface, it can also be used as an imported module. In this Notebook some examples are found.

## 1. Getting Started

Predictions can be made either using the `run_prediction()` or the `ML_prediction()` functions. The `run_prediction()` receives the same arguments as in the CLI and does input validation, and the runs `ML_prediction()` internally. While the `ML_prediction()` will directly run the prediction with the ML model chosen. Both will return the prediction (or predictions when choosing a C+R model combination) in a list, and write a file (`mxgap.info`) in the selected path folder with a report of the calculation.

For example, to use the best model available (a combination of GBC classifier and RFR regressor):

In [1]:
from mxgap import run_prediction

path         = "."                  # Path to the folder where the CONTCAR and DOSCAR are present
model        = "GBC+RFR_onlygap"    # "best" or "default" can also be used to get the best model.
prediction   =  run_prediction(path = path, model = model)




                            MXgap Report                           

Date:            2024-10-16 10:44:57
Model Used:      GBC+RFR_onlygap
Folder Path:     .
CONTCAR file:    ./CONTCAR
DOSCAR file:     ./DOSCAR

    
Predicted ML_isgap  =  1 (Semiconductor)
Predicted ML_gap    =  1.961

Finished successfully in 0.56s


The direct paths for the CONTCAR and DOSCAR can also be given, with the files argument:

In [2]:
from mxgap import run_prediction

files        = ["./CONTCAR","./DOSCAR"]     # List with the CONTCAR and DOSCAR files
model        = "GBC+RFR_onlygap"            # "best" or "default" can also be used to get the best model.
prediction   =  run_prediction(files = files, model = model)



                            MXgap Report                           

Date:            2024-10-16 10:44:58
Model Used:      GBC+RFR_onlygap
Folder Path:     None
CONTCAR file:    ./CONTCAR
DOSCAR file:     ./DOSCAR

    
Predicted ML_isgap  =  1 (Semiconductor)
Predicted ML_gap    =  1.961

Finished successfully in 0.07s


And the same can be done with the `ML_prediction()` function:

In [3]:
from mxgap import ML_prediction

contcar_path    = "./CONTCAR"            # Path to the CONTCAR file
doscar_path     = "./DOSCAR"             # Path to the DOSCAR file
model           = "GBC+RFR_onlygap"      # ML model
prediction      =  ML_prediction(contcar_path,doscar_path,model)

Predicted ML_isgap  =  1 (Semiconductor)
Predicted ML_gap    =  1.961


There are several models available, between Classifiers and Regressors (and can be combined). Generally, the models that are not trained with DOS information (_notDOS) are faster and do not require the DOSCAR file, but the results are less accurate. We recommend using the default model "GBC+RFR_onlygap", which is a combination of a Classifier (metallic/semiconductor) and a Regressor (bandgap prediction). More info about the ML models in the models/ folder.

In [4]:
from mxgap.utils import load_models_list

models_list, models_list_string = load_models_list("../mxgap/models/MODELS_LIST.txt")
print(models_list_string)

Classifiers	     Regressors (full)    Regressors (only gap) Regressors (edges)
GBC                  GBR                  GBR_onlygap          	GBR_edges           
RFC                  RFR                  RFR_onlygap          	RFR_edges           
SVC                  SVR                  SVR_onlygap          	SVR_edges           
MLPC                 MLPR                 MLPR_onlygap         	MLPR_edges          
LR                   KRR                  KRR_onlygap          	KRR_edges            
GBC_notDOS           GBR_notDOS           GBR_onlygap_notDOS   	GBR_edges_notDOS    
RFC_notDOS           RFR_notDOS           RFR_onlygap_notDOS   	RFR_edges_notDOS    
SVC_notDOS           SVR_notDOS           SVR_onlygap_notDOS   	SVR_edges_notDOS    
MLPC_notDOS          MLPR_notDOS          MLPR_onlygap_notDOS  	MLPR_edges_notDOS   
LR_notDOS            KRR_notDOS           KRR_onlygap_notDOS   	KRR_edges_notDOS    



## 2. Batch calculations

The program can be used in batch to quickly screen different MXenes. Here is done for the examples available in the `test/examples/` folder, but you can use whatever paths you need:

In [None]:
import os
from mxgap import run_prediction

examples_folder = "../mxgap/test/examples/" 
paths   = [examples_folder + e for e in os.listdir(examples_folder)]

for mxene_path in paths:
    print(mxene_path)
    prediction = run_prediction(mxene_path)
