## Error Correction Spontaneous Magnetization 

Identify and correct systematic biases in DFT-based simulations of spontaneous magnetization (Ms), using a combination of:

- Statistical modeling of error patterns
- AI-driven correction functions
- Data augmentation for missing experimental values

In [1]:
import yaml

from src.ms_aux import *
from src.load_data import load_data
from src.combine_dataframes import combine_dataframes
from src.augment_data import augment_data_experimental
from src.data_pipeline import run_data_processing_pipeline

### Data Pre-Processing

In [2]:
path_to_config_file = 'configs/data_pipeline_ms.yml'

In [3]:
# read config file
with open(path_to_config_file, "r") as f:
    config = yaml.safe_load(f)   

In [4]:
run_data_processing_pipeline(config)

loading oqmd ...
loading literature ...
loading bhandari_i ...
loading bhandari_xii ...
loading bhandari_xiii ...
loading magnetic_materials_exp ...
loading magnetic_materials_sim ...
Loading done!
Number of rows with both experimental and simulation values: 6978
Combined DF saved to: data/merged_df_python.csv
Nr of simulated values: 157555
Nr of experimental values: 11996
Combined DF RE Materials: 76114
Combined DF RE-Free Materials: 86459
Shape combined dataframe from different data sources: 162573
------------ Remove small Ms values ------------
Shape DF after rm small Ms values: 58599
Nr RE Materials after rm small Ms values: 20367
Nr RE-Free Materials after rm small Ms values: 38232
------------ Create Pairwise Dataset without small Ms values ------------
Shape Pairwise DF: 4225


### Augment Data

In [6]:
augment_data_experimental(config)

------ Augment experimental values ------
Pairs All Materials: 4225
Pair RE Materials 896
Pair RE-Free Materials 3329
-------------------------------------------------------------------------
Before mock data generation, exp-val NaNs in ALL: 50444
Before mock data generation, exp-val NaNs in RE: 18501
Before mock data generation, exp-val NaNs in RE-Free: 31943
-------------------------------------------------------------------------
Number of measured experimental values before mock data generation:
ALL: 8155
RE: 1866
RE-Free: 0
-------------------------------------------------------------------------
Number of duplicate rows: 0
Shape df_augmented: 54669
RE-Materials after augmentation: 19397
RE-Free Materials after augmentation: 35272
<bound method NDFrame.head of       composition  Ms (ampere/meter)_e  Ms (ampere/meter)_s  material_id  \
0       Ac(CoO2)2                  NaN        104130.562861           88   
1       Ac(CuO2)2                  NaN        300206.820820          360

### Model Training