
# ReaxKit — MolFra Performance & Analysis Notebook

This notebook provides two parts:
1. **Performance test** for `MolFraHandler` using `time.perf_counter()` and `tracemalloc`.
2. **Analysis** using `molfra_analyzer` helpers to extract the largest molecule and parse its atom counts.

> **Files expected:** `molfra.out` in the working directory (adjust path if needed).


In [6]:

# ## Imports
import time
import tracemalloc
from pathlib import Path
import pandas as pd

from reaxkit.io.molfra_handler import MolFraHandler
from reaxkit.analysis.molfra_analyzer import (
    largest_molecule_by_individual_mass,
    atoms_in_the_largest_molecule_wide_format,
    atoms_in_the_largest_molecule_long_format,
)

# Jupyter display helpers
from IPython.display import display



## 0) Paths & basic checks


In [7]:

molfra_path = Path("molfra.out")  # change if your file lives elsewhere



## 1) Performance test for `MolFraHandler`

Measures elapsed time and memory usage while parsing `molfra.out`.  
It also prints a small sample of molecule-level data and (if available) totals per iteration.


In [8]:

def test_molfra_performance(file_path="molfra.out"):
    file_path = Path(file_path)
    if not file_path.exists():
        raise FileNotFoundError(f"{file_path} not found.")

    print("\n--- Performance Test for MolFraHandler ---")
    print(f"File: {file_path}")

    # Measure time and memory
    tracemalloc.start()
    start = time.perf_counter()

    handler = MolFraHandler(file_path)
    df_mol, meta = handler._parse()

    end = time.perf_counter()
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()

    # Retrieve totals if available
    df_tot = getattr(handler, "_df_totals", None)

    # ---- Summary output ----
    print(f"\nParsing time: {end - start:.3f} s")
    print(f"Memory used: {current / 1e6:.2f} MB (peak {peak / 1e6:.2f} MB)")
    print(f"Molecule rows parsed: {len(df_mol)}")
    print(f"Iterations: {meta.get('n_iterations')}")
    print(f"Unique molecule types: {len(meta.get('molecule_types', []))}")
    print(f"Iteration range: {meta.get('iteration_min')} – {meta.get('iteration_max')}\n")

    print("--- Sample molecule-level data ---")
    display(df_mol.head())

    if df_tot is not None:
        print("\n--- Sample totals per iteration ---")
        display(df_tot.head())
        print(f"Total-rows parsed: {len(df_tot)}")
    else:
        print("\nNo totals table detected.")

    # Save outputs for later use
    df_mol.to_csv("molfra_molecules.csv", index=False)
    if df_tot is not None:
        df_tot.to_csv("molfra_totals.csv", index=False)
        print("Saved: molfra_molecules.csv, molfra_totals.csv")
    else:
        print("Saved: molfra_molecules.csv")

# Run the performance test
test_molfra_performance(molfra_path)



--- Performance Test for MolFraHandler ---
File: molfra.out

Parsing time: 4.899 s
Memory used: 24.19 MB (peak 132.61 MB)
Molecule rows parsed: 275005
Iterations: None
Unique molecule types: 0
Iteration range: None – None

--- Sample molecule-level data ---


Unnamed: 0,iter,molecular_formula,freq,molecular_mass
0,0,N128Al128,1,5245.696
1,0,H2O,10,18.015
2,25,N128Al128,1,5245.696
3,25,H2O,10,18.015
4,50,N128Al128,1,5245.696



--- Sample totals per iteration ---


Unnamed: 0,total_molecules,total_atoms,total_molecular_mass,iter
0,11,286,5425.846,0
1,11,286,5425.846,25
2,11,286,5425.846,50
3,11,286,5425.846,75
4,11,286,5425.846,100


Total-rows parsed: 80001
Saved: molfra_molecules.csv, molfra_totals.csv



## 2) Largest molecule & atom-count parsing

- Extract the **largest molecule by individual mass** per iteration.  
- Parse **atom counts** from that selection.


In [12]:

# Initialize handler and parse (reuse if already created in performance cell)
handler = MolFraHandler(molfra_path)
handler._parse()

# Largest molecule per iteration
df_largest = largest_molecule_by_individual_mass(handler)
print("Largest molecule by individual mass — head:")
display(df_largest.head())

# Save to CSV for further analysis
df_largest.to_csv("molfra_largest_molecules.csv", index=False)
print("Saved: molfra_largest_molecules.csv, molfra_atom_counts.csv")


Largest molecule by individual mass — head:


Unnamed: 0,iter,molecular_formula,molecular_mass
0,0,N128Al128,5245.696
1,25,N128Al128,5245.696
2,50,N128Al128,5245.696
3,75,N128Al128,5245.696
4,100,N128Al128,5245.696


Saved: molfra_largest_molecules.csv, molfra_atom_counts.csv
