[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/caer200/ocelot_mlp/blob/main/dataset.ipynb)

### Step 1: Install Required Packages
Installs required libraries:


`ase`: For working with atomic structures and simulations.

In [1]:
!pip install ase

Collecting ase
  Downloading ase-3.25.0-py3-none-any.whl.metadata (4.2 kB)
Downloading ase-3.25.0-py3-none-any.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ase
Successfully installed ase-3.25.0


### Step 2: Clone the Ocelot MLP Repository
Clones the GitHub repository containing the pretrained M3GNet model and example structures.

In [2]:
!git clone https://github.com/caer200/ocelot_mlp.git

Cloning into 'ocelot_mlp'...
remote: Enumerating objects: 24, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 24 (delta 4), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (24/24), 899.52 KiB | 9.37 MiB/s, done.
Resolving deltas: 100% (4/4), done.


### Step 3: Extract Structure, energy, stress and forces from the CIF file

In [5]:
import numpy as np
from ase.io import read
from ase.calculators.singlepoint import SinglePointCalculator

def parse_metadata_from_cif(cif_path):
    """Reads metadata (structure, energy, forces, stress) from comment lines in a CIF file."""
    forces = []
    stress = []
    energy = None

    with open(cif_path, "r") as f:
        lines = f.readlines()

    for line in lines:
        line = line.strip()
        if line.startswith("# Total Energy:"):
            energy = float(line.split(":")[1].split()[0])
        elif line.startswith("#   Atom"):
            parts = line.split(":")[1].strip().split()
            force = [float(p) for p in parts]
            forces.append(force)
        elif line.startswith("#   ") and len(forces) > 0:  # stress lines
            stress_row = [float(x) for x in line.strip("# ").split()]
            stress.append(stress_row)
        elif not line.startswith("#"):
            break  # Exit after metadata block

    forces = np.array(forces)
    stress = np.array(stress)
    atoms = read(filename)
    return atoms, energy, forces, stress

# === Step 1: Read structure from CIF ===
filename = "ocelot_mlp/test.cif"
atoms, energy, forces, stress = parse_metadata_from_cif(filename)


print("Energy (eV):", energy)
print("Forces:\n", forces)
print("Stress tensor:\n", stress)
print("Structure:", atoms)

Energy (eV): -505.800042
Forces:
 [[ 0.0174  0.0219  0.01  ]
 [ 0.0095  0.0125 -0.0685]
 [-0.0183 -0.0697  0.0236]
 [ 0.0183  0.0697 -0.0236]
 [-0.0174 -0.0219 -0.01  ]
 [-0.0095 -0.0125  0.0685]
 [ 0.006  -0.0298  0.0679]
 [ 0.0315  0.0123 -0.061 ]
 [-0.0315 -0.0123  0.061 ]
 [ 0.033   0.0108  0.0243]
 [-0.0265 -0.0248 -0.0093]
 [-0.006   0.0298 -0.0679]
 [ 0.0265  0.0248  0.0093]
 [-0.033  -0.0108 -0.0243]
 [-0.0003 -0.0338  0.0449]
 [ 0.0023 -0.016   0.0093]
 [-0.0023  0.016  -0.0093]
 [ 0.016   0.0345 -0.0251]
 [-0.0228  0.0405  0.0091]
 [-0.016  -0.0345  0.0251]
 [ 0.0003  0.0338 -0.0449]
 [ 0.0483 -0.1534  0.0081]
 [-0.0567  0.0949  0.027 ]
 [ 0.0652 -0.0065 -0.0768]
 [-0.0483  0.1534 -0.0081]
 [ 0.0228 -0.0405 -0.0091]
 [-0.0652  0.0065  0.0768]
 [ 0.0567 -0.0949 -0.027 ]
 [ 0.0436  0.0184 -0.0061]
 [ 0.0126  0.0682 -0.0125]
 [-0.0034  0.0565  0.0683]
 [ 0.0034 -0.0565 -0.0683]
 [-0.0076 -0.0876  0.0616]
 [-0.0205 -0.0326 -0.0214]
 [-0.0126 -0.0682  0.0125]
 [ 0.0205  0.0326  0.