**Artificial Intelligence-Aided Analysis of Hydrogen-Based Monoclinic Structures and Modeling of Structure-Property Relationships**

**1#Data Collection Steps**

Göktuğ USTA & Sedef KORKMAZ | Izmir Democracy University - Electrical and Electronics Engineering Department - 2025

goktugustaa@gmail.com 

sedefkorkmaz67@hotmail.com

Data retrieved from The Materials Project API (https://next-gen.materialsproject.org/).

 *Special thanks to Assoc. Prof. Dr. Selgin AL(https://scholar.google.com/citations?hl=tr&user=TXOSXvoAAAAJ&view_op=list_works&sortby=pubdate) and Assoc. Prof. Dr. Ahmet İYİGÖR(https://scholar.google.com/citations?user=5PJlG-AAAAAJ&hl=tr) for their valuable advices and directions during the preparation of this paper. Their suggestions were essential for the completion of this project.* 

 *Also,*

 *Very big thanks to helps Patrick Huck(https://scholar.google.com/citations?user=1iJjyrYAAAAJ&hl=en)*

@article{Jain2013,
  title = {Commentary: The Materials Project: A materials genome approach to accelerating materials innovation},
  author = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin A.},
  journal = {APL Materials},
  year = {2013},
  volume = {1},
  issue = {1},
  pages = {011002},
  doi = {10.1063/1.4812323},
  url = {https://dx.doi.org/10.1063/1.4812323}
}

#Data Collection Steps with Materials Project API Key. If you have trouble about finding your API Key please look: https://next-gen.materialsproject.org/api

*DON'T SHARE YOUR API KEY ANYONE*

In [14]:
# --- IMPORT LIBRARIES --- #
import getpass
from mp_api.client import MPRester
import pandas as pd 
import numpy as np 

# --- CONNECT YOUR API KEY --- #
try: 
    api_key = getpass.getpass(prompt="Materials Project API Key: ")
    print("Succesfully connected your API Key!")
except Exception:
    print(f"Error {Exception}")

Succesfully connected your API Key!


In [15]:
# --- DEFINE TARGET FIELDS TO CATCH --- #
FIELDS_TO_FETCH = [
    'material_id', 
    'formula_pretty', 
    'formula_anonymous',
    'elements',
    'structure',
    'chemsys',
    'volume', 
    'density', 
    'band_gap', 
    'total_magnetization',
    'universal_anisotropy',
    'elements',
    'symmetry'
]

In [16]:
raw_data = []
# --- Monoclinic crystal system containing Hydrogen only --- #
with MPRester(api_key) as mpr:
    docs = mpr.materials.summary.search(
        crystal_system="Monoclinic",
        elements=["H"],
        fields=FIELDS_TO_FETCH
    )
print(f"{len(docs)} counted data collected..")

Retrieving SummaryDoc documents:   0%|          | 0/4270 [00:00<?, ?it/s]

4270 counted data collected..


In [17]:
# --- PARSING THE RESPONSES --- #   
for doc in docs:
    row = {
        "material_id": str(doc.material_id),
        "formula": doc.formula_pretty,
        "elements": str(doc.elements),
        "volume": doc.volume,
        "density": doc.density,
        "band_gap": doc.band_gap,
        "magnetization": doc.total_magnetization,
        # lattice parameters
        "lattice_a": doc.structure.lattice.a if doc.structure else None,
        "lattice_b": doc.structure.lattice.b if doc.structure else None,
        "lattice_c": doc.structure.lattice.c if doc.structure else None,
    }
    raw_data.append(row)

In [18]:
# --- SAVE IN CSV FORMAT --- #
df = pd.DataFrame(raw_data)
df.to_csv("monoclinic_hydrogen_data.csv", index=False)
print(f"Success: Retrieved {len(df)} materials.")
print(f"All datas saved on 'monoclinic_hydrogen_data.csv'")
df.head()

Success: Retrieved 4270 materials.
All datas saved on 'monoclinic_hydrogen_data.csv'


Unnamed: 0,material_id,formula,elements,volume,density,band_gap,magnetization,lattice_a,lattice_b,lattice_c
0,mp-995200,HC3,"[Element C, Element H]",73.158575,1.681455,0.0,1.2e-05,2.463933,3.663947,8.107367
1,mp-1217971,Ta2H,"[Element H, Element Ta]",38.193485,15.777973,0.0,0.0,2.919064,2.919064,4.883885
2,mp-642644,V2H,"[Element H, Element V]",28.577852,5.978561,0.0,5e-06,2.656866,2.656866,4.428967
3,mp-995184,HC2,"[Element C, Element H]",58.118465,1.430258,0.0,1.4e-05,6.33287,6.33287,3.813987
4,mp-995197,HC,"[Element C, Element H]",148.834584,1.161986,3.5572,0.001015,6.10339,6.10339,4.861911


In [19]:
# # --- PARSING RESPONSE --- #
# def pull_data(doc):
#     data = {
#         "material_id": str(getattr(doc, "material_id", None)),
#         "formula_pretty": getattr(doc, "formula_pretty", None),
#         "formula_anonymous": getattr(doc, "formula_anonymous", None),
#         "elements": str(getattr(doc, "elements", None)),
#         "structure": str(getattr(doc, "structure", None)),
#         "chemsys": getattr(doc, "chemsys", None),
#         "volume": getattr(doc, "volume", None),
#         "density": getattr(doc, "density", None),
#         "band_gap": getattr(doc, "band_gap", None),
#         "total_magnetization": getattr(doc, "total_magnetization", None),
#         "universal_anisotropy" : getattr(doc, "universal_anisotropy", None),
#         "crystal_system": str(doc.symmetry.crystal_system) if doc.symmetry else None}
#     #Lattice constant values
#     try:
#         lattice = doc.structure.lattice
#         data["lattice_a"] = lattice.a
#         data["lattice_b"] = lattice.b
#         data["lattice_c"] = lattice.c
#         # Açılar gerekirse: lattice.alpha, lattice.beta, lattice.gamma
#     except AttributeError:
#         data["lattice_a"] = None
#         data["lattice_b"] = None
#         data["lattice_c"] = None
    
#     return data

In [20]:
# for doc in docs:
#     extracted = pull_data(doc)
#     processed_data.append(extracted)

# # DataFrame oluştur ve kaydet
# df = pd.DataFrame(processed_data)
# df.to_csv("monoclinic_hydrogen_data.csv", index=False)

# print(f"Datas succesfully saved! ")
# print(df.head()) 