# Machine Learning Approaches for Magnetic Characterization
### Two-dimensional magnetic materials
Trevor David Rhone, Rensselaer Polytechnic Institute

Associated tutorial can be found on YouTube:
https://www.youtube.com/watch?v=yiyFQNWs2F4

In [None]:
# import python modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

In [None]:
# import all machine learning functions
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split

In [None]:
# global variables
TEST_SIZE = 0.2
RANDOM_STATE = 42

### Download 2D magnetic materials formation energies data set:

Download data from :
https://archive.materialscloud.org/record/2019.0020/v1

Description of data and corresponding study can be found here:
https://www.nature.com/articles/s41598-020-72811-z

- save the file to your google drive (with colab) or your local drive (jupyter notebook).
- Can also upload from github: https://github.com/trevorguru/materials_informatics_tutorial

Verify mount and check path for the csv file. Change the path below as needed.

Open and load "magneticmoment_Ef_data.csv" using pandas.

In [None]:
df = pd.read_csv("../magnetic_materials_2d/data/magneticmoment_Ef_data.csv")

In [None]:
df.columns

In [None]:
df.head(n = 6)

In [None]:
from magnetic_materials_2d.data.dictionaries import column_meaning_map, formation_energy_map, magnetic_moment_map

In [None]:
# extract only the numeric descriptors
numeric_df = df.select_dtypes(include=['float64', 'int64'])
numeric_df.drop(columns = ["Unnamed: 0"], inplace = True)
print("There are", len(numeric_df.columns), "numeric descriptors.")
numeric_df.head(n = 6)

In [None]:
from magnetic_materials_2d.utils import sorted_descriptors, top12

In [None]:
formation_energy_descriptors_linear = sorted_descriptors(numeric_df,
                                                         formation_energy_map["label"],
                                                         LinearRegression())
top12(formation_energy_descriptors_linear, column_meaning_map, "formation energy", "linear")

In [None]:
formation_energy_descriptors_rf = sorted_descriptors(numeric_df,
                                                     formation_energy_map["label"],
                                                     RandomForestRegressor(random_state = RANDOM_STATE))
top12(formation_energy_descriptors_rf, column_meaning_map, "formation energy", "random forest")

In [None]:
formation_energy_descriptors_et = sorted_descriptors(numeric_df,
                                                     formation_energy_map["label"],
                                                     ExtraTreesRegressor(random_state = RANDOM_STATE))
top12(formation_energy_descriptors_et, column_meaning_map, "formation energy", "extra trees")

In [None]:
magnetic_moment_descriptors_linear = sorted_descriptors(numeric_df,
                                                        magnetic_moment_map["label"],
                                                        LinearRegression())
top12(magnetic_moment_descriptors_linear, column_meaning_map, "magnetic moment", "linear")

In [None]:
magnetic_moment_descriptors_rf = sorted_descriptors(numeric_df,
                                                    magnetic_moment_map["label"],
                                                    RandomForestRegressor(random_state = RANDOM_STATE))
top12(magnetic_moment_descriptors_rf, column_meaning_map, "magnetic moment", "random forest")

In [None]:
magnetic_moment_descriptors_et = sorted_descriptors(numeric_df,
                                                    magnetic_moment_map["label"],
                                                    ExtraTreesRegressor(random_state = RANDOM_STATE))
top12(magnetic_moment_descriptors_et, column_meaning_map, "magnetic moment", "extra trees")

In [None]:
from magnetic_materials_2d.utils import best_descriptors, print_best_descriptors

In [None]:
best_formation_energy_descriptors_linear = best_descriptors(numeric_df,
                                                            formation_energy_descriptors_linear,
                                                            LinearRegression(),
                                                            formation_energy_map["label"])
print_best_descriptors(best_formation_energy_descriptors_linear,
                       column_meaning_map,
                       formation_energy_map["label"],
                       "linear regression")

In [None]:
best_formation_energy_descriptors_rf = best_descriptors(numeric_df,
                                                        formation_energy_descriptors_rf,
                                                        RandomForestRegressor(random_state = RANDOM_STATE),
                                                        formation_energy_map["label"])
print_best_descriptors(best_formation_energy_descriptors_rf,
                       column_meaning_map,
                       formation_energy_map["label"],
                       "random forest")

In [None]:
best_formation_energy_descriptors_et = best_descriptors(numeric_df,
                                                        formation_energy_descriptors_et,
                                                        ExtraTreesRegressor(random_state = RANDOM_STATE),
                                                        formation_energy_map["label"])
print_best_descriptors(best_formation_energy_descriptors_rf,
                       column_meaning_map,
                       formation_energy_map["label"],
                       "extra trees")

In [None]:
best_magnetic_moment_descriptors_linear = best_descriptors(numeric_df,
                                                           magnetic_moment_descriptors_linear,
                                                           LinearRegression(),
                                                           magnetic_moment_map["label"])
print_best_descriptors(best_magnetic_moment_descriptors_linear,
                       column_meaning_map,
                       magnetic_moment_map["label"],
                       "linear regression")

In [None]:
best_magnetic_moment_descriptors_rf = best_descriptors(numeric_df,
                                                       magnetic_moment_descriptors_rf,
                                                       RandomForestRegressor(random_state = RANDOM_STATE),
                                                       magnetic_moment_map["label"])
print_best_descriptors(best_magnetic_moment_descriptors_rf,
                       column_meaning_map,
                       magnetic_moment_map["label"],
                       "random forest")

In [None]:
best_magnetic_moment_descriptors_et = best_descriptors(numeric_df,
                                                       magnetic_moment_descriptors_et,
                                                       ExtraTreesRegressor(random_state = RANDOM_STATE),
                                                       magnetic_moment_map["label"])
print_best_descriptors(best_magnetic_moment_descriptors_et,
                       column_meaning_map,
                       magnetic_moment_map["label"],
                       "extra trees")

In [None]:
from magnetic_materials_2d.utils import important_descriptors

In [None]:
important_descriptors_formation_energy_rf = important_descriptors(numeric_df,
                                                                  formation_energy_map["label"],
                                                                  RandomForestRegressor(random_state = RANDOM_STATE))

In [None]:
important_descriptors_formation_energy_et = important_descriptors(numeric_df,
                                                                  formation_energy_map["label"],
                                                                  ExtraTreesRegressor(random_state = RANDOM_STATE))

In [None]:
important_descriptors_magnetic_moment_rf = important_descriptors(numeric_df,
                                                                 magnetic_moment_map["label"],
                                                                 RandomForestRegressor(random_state = RANDOM_STATE))

In [None]:
important_descriptors_magnetic_moment_et = important_descriptors(numeric_df,
                                                                 magnetic_moment_map["label"],
                                                                 ExtraTreesRegressor(random_state = RANDOM_STATE))

In [None]:
from magnetic_materials_2d.utils import optimum_importance

In [None]:
optimum_importance_descriptors_formation_energy_rf = optimum_importance(numeric_df,
                                                                        important_descriptors_formation_energy_rf,
                                                                        RandomForestRegressor(random_state = RANDOM_STATE),
                                                                        formation_energy_map["label"])
print_best_descriptors(optimum_importance_descriptors_formation_energy_rf,
                       column_meaning_map,
                       formation_energy_map["label"],
                       "random forest")

In [None]:
optimum_importance_descriptors_formation_energy_et = optimum_importance(numeric_df,
                                                                        important_descriptors_formation_energy_et,
                                                                        ExtraTreesRegressor(random_state = RANDOM_STATE),
                                                                        formation_energy_map["label"])
print_best_descriptors(optimum_importance_descriptors_formation_energy_et,
                       column_meaning_map,
                       formation_energy_map["label"],
                       "extra trees")

In [None]:
optimum_importance_descriptors_magnetic_moment_rf = optimum_importance(numeric_df,
                                                                        important_descriptors_magnetic_moment_rf,
                                                                        RandomForestRegressor(random_state = RANDOM_STATE),
                                                                        magnetic_moment_map["label"])
print_best_descriptors(optimum_importance_descriptors_magnetic_moment_rf,
                       column_meaning_map,
                       magnetic_moment_map["label"],
                       "random forest")

In [None]:
optimum_importance_descriptors_magnetic_moment_et = optimum_importance(numeric_df,
                                                                        important_descriptors_magnetic_moment_et,
                                                                        ExtraTreesRegressor(random_state = RANDOM_STATE),
                                                                        magnetic_moment_map["label"])
print_best_descriptors(optimum_importance_descriptors_magnetic_moment_et,
                       column_meaning_map,
                       magnetic_moment_map["label"],
                       "extra trees")

## Model creation and prediction
### Linear regression

In [None]:
from magnetic_materials_2d.utils import print_loss, single_descriptor_regression

In [None]:
# best_descriptor = best_formation_energy_descriptors_linear[0]
# single_descriptor_regression(numeric_df, best_descriptor, column_meaning_map,
#                              formation_energy_map["unit"],
#                              formation_energy_map["label"],
#                              LinearRegression())

In [None]:
# best_descriptor = best_formation_energy_descriptors_rf[0]
# single_descriptor_regression(numeric_df, best_descriptor, column_meaning_map,
#                              formation_energy_map["unit"],
#                              formation_energy_map["label"],
#                              RandomForestRegressor(random_state = RANDOM_STATE))

In [None]:
# best_descriptor = best_formation_energy_descriptors_et[0]
# single_descriptor_regression(numeric_df, best_descriptor, column_meaning_map,
#                              formation_energy_map["unit"],
#                              formation_energy_map["label"],
#                              ExtraTreesRegressor(random_state = RANDOM_STATE))

In [None]:
# best_descriptor = best_magnetic_moment_descriptors_linear[0]
# single_descriptor_regression(numeric_df, best_descriptor, column_meaning_map,
#                              magnetic_moment_map["unit"],
#                              magnetic_moment_map["label"],
#                              LinearRegression())

In [None]:
# best_descriptor = best_magnetic_moment_descriptors_rf[0]
# single_descriptor_regression(numeric_df, best_descriptor, column_meaning_map,
#                              magnetic_moment_map["unit"],
#                              magnetic_moment_map["label"],
#                              RandomForestRegressor(random_state = RANDOM_STATE))

In [None]:
# best_descriptor = best_magnetic_moment_descriptors_et[0]
# single_descriptor_regression(numeric_df, best_descriptor, column_meaning_map,
#                              magnetic_moment_map["unit"],
#                              magnetic_moment_map["label"],
#                              ExtraTreesRegressor(random_state = RANDOM_STATE))

TASK #4
- Use X_train to train a linear model
- Generate predictions using X_test and X_train

In [None]:
from magnetic_materials_2d.utils import test_performance

In [None]:
test_performance(numeric_df,
                 best_formation_energy_descriptors_linear,
                 formation_energy_map["unit"],
                 formation_energy_map["label"],
                 LinearRegression())

In [None]:
test_performance(numeric_df,
                 best_formation_energy_descriptors_rf,
                 formation_energy_map["unit"],
                 formation_energy_map["label"],
                 RandomForestRegressor(random_state = RANDOM_STATE))

In [None]:
test_performance(numeric_df,
                 best_formation_energy_descriptors_et,
                 formation_energy_map["unit"],
                 formation_energy_map["label"],
                 ExtraTreesRegressor(random_state = RANDOM_STATE))

In [None]:
test_performance(numeric_df,
                 best_magnetic_moment_descriptors_linear,
                 magnetic_moment_map["unit"],
                 magnetic_moment_map["label"],
                 LinearRegression())

In [None]:
test_performance(numeric_df,
                 best_magnetic_moment_descriptors_rf,
                 magnetic_moment_map["unit"],
                 magnetic_moment_map["label"],
                 RandomForestRegressor(random_state = RANDOM_STATE))

In [None]:
test_performance(numeric_df,
                 best_magnetic_moment_descriptors_et,
                 magnetic_moment_map["unit"],
                 magnetic_moment_map["label"],
                 ExtraTreesRegressor(random_state = RANDOM_STATE))

Notice that RandomForestRegressor() has more than one hyperparameter.
- Do a two-dimensional grid search instead of a one-dimensional grid search as shown above. (Choose an appropriate range of values for each hyperparameter).
- Display your results using plt.imshow()
- Determine the best combination of hyperparameters
- Create a model using the best combination of hyperparameters

TASK #7
- Plot the DFT formation energy versus the machine learning predicted formation energy for the training set and the test set
  - Use the machine learning model (and hyperparameters) with the best performance

In [None]:
from magnetic_materials_2d.hyper_search import best_hyperparameters

In [None]:
max_depth, n_estimators = best_hyperparameters(numeric_df,
                                               best_formation_energy_descriptors_rf,
                                               formation_energy_map["label"])
test_performance(numeric_df,
                 best_formation_energy_descriptors_rf,
                 formation_energy_map["unit"],
                 formation_energy_map["label"],
                 RandomForestRegressor(random_state = RANDOM_STATE,
                                       max_depth = max_depth,
                                       n_estimators = n_estimators))

In [None]:
max_depth, n_estimators = best_hyperparameters(numeric_df,
                                               best_magnetic_moment_descriptors_rf,
                                               magnetic_moment_map["label"])
test_performance(numeric_df,
                 best_magnetic_moment_descriptors_rf,
                 magnetic_moment_map["unit"],
                 magnetic_moment_map["label"],
                 RandomForestRegressor(random_state = RANDOM_STATE,
                                       max_depth = max_depth,
                                       n_estimators = n_estimators))

=====================================================================================

CONGRATULATIONS!!! 👏

You've completed the exercises and are well on your way to becoming an expert in materials informatics.