<a href="https://colab.research.google.com/github/Laere11/Laere11/blob/Material-Sciences/Matminer_feature_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install necessary dependencies
!pip install pymatgen matminer

This code demonstrates how to use Matminer (in conjunction with pymatgen) to extract features from a synthetic crystal composition.

**Using Matminer for Feature Extraction**
This example creates a simple NaCl crystal structure using pymatgen and then uses a Matminer featurizer (using the “magpie” preset) to extract composition‑based features.

What this code does:

Installs and imports pymatgen and matminer.
Creates a simple NaCl crystal structure.
Uses Matminer’s ElementProperty featurizer (with the Magpie preset) to compute a set of descriptors from the material’s composition.
Prints out the list of features.

Why was this code created, what purpose does it serve?
This code is meant to illustrate typical outputs and post-processing steps one might perform with **MatterGen** outputs.  MatterGen represents a new paradigm of materials design enabled by generative AI technology. It explores a significantly larger space of materials than screening-based methods. It is also more efficient by guiding materials exploration with prompts.


In [18]:
# Import required modules
from pymatgen.core import Lattice, Structure
from matminer.featurizers.composition import ElementProperty
from IPython.display import display, HTML
import numpy as np

# Create a simple NaCl structure
lattice = Lattice.cubic(5.64)  # approximate lattice parameter in Å
structure = Structure(lattice, ["Na", "Cl"], [[0, 0, 0], [0.5, 0.5, 0.5]])

# Extract the composition from the structure
comp = structure.composition

# Initialize a Matminer featurizer using the 'magpie' preset
ep = ElementProperty.from_preset("magpie")
features = ep.featurize(comp)
feature_labels = ep.feature_labels()

# Filter out label/value pairs where the value is exactly 0 or 1
# Also remove the redundant "MagpieData" text from each label
filtered_pairs = []
for label, value in zip(feature_labels, features):
    if not (value == 0 or value == 1):
        clean_label = label.replace("MagpieData", "")  # Remove redundant text
        filtered_pairs.append(f"{clean_label.strip()}: {value:.4f}")

# Create rows with 6 cells per row
rows_html = ""
cols_per_row = 6
for i in range(0, len(filtered_pairs), cols_per_row):
    row_cells = filtered_pairs[i:i+cols_per_row]
    row_html = "<tr>" + "".join(
        f"<td style='border: 1px solid black; padding: 5px;'>{cell}</td>"
        for cell in row_cells
    ) + "</tr>"
    rows_html += row_html

# Build complete HTML table with a title block
html_output = f"""
<div style="font-family: Arial, sans-serif;">
  <h2 style="text-align: center;">The Extracted Features for NaCl</h2>
  <table style="width:100%; border-collapse: collapse; text-align: left;">
    {rows_html}
  </table>
</div>
"""

display(HTML(html_output))


0,1,2,3,4,5
minimum Number: 11.0000,maximum Number: 17.0000,range Number: 6.0000,mean Number: 14.0000,avg_dev Number: 3.0000,mode Number: 11.0000
minimum MendeleevNumber: 2.0000,maximum MendeleevNumber: 94.0000,range MendeleevNumber: 92.0000,mean MendeleevNumber: 48.0000,avg_dev MendeleevNumber: 46.0000,mode MendeleevNumber: 2.0000
minimum AtomicWeight: 22.9898,maximum AtomicWeight: 35.4530,range AtomicWeight: 12.4632,mean AtomicWeight: 29.2214,avg_dev AtomicWeight: 6.2316,mode AtomicWeight: 22.9898
minimum MeltingT: 171.6000,maximum MeltingT: 370.8700,range MeltingT: 199.2700,mean MeltingT: 271.2350,avg_dev MeltingT: 99.6350,mode MeltingT: 171.6000
maximum Column: 17.0000,range Column: 16.0000,mean Column: 9.0000,avg_dev Column: 8.0000,minimum Row: 3.0000,maximum Row: 3.0000
mean Row: 3.0000,mode Row: 3.0000,minimum CovalentRadius: 102.0000,maximum CovalentRadius: 166.0000,range CovalentRadius: 64.0000,mean CovalentRadius: 134.0000
avg_dev CovalentRadius: 32.0000,mode CovalentRadius: 102.0000,minimum Electronegativity: 0.9300,maximum Electronegativity: 3.1600,range Electronegativity: 2.2300,mean Electronegativity: 2.0450
avg_dev Electronegativity: 1.1150,mode Electronegativity: 0.9300,maximum NsValence: 2.0000,mean NsValence: 1.5000,avg_dev NsValence: 0.5000,maximum NpValence: 5.0000
range NpValence: 5.0000,mean NpValence: 2.5000,avg_dev NpValence: 2.5000,maximum NValence: 7.0000,range NValence: 6.0000,mean NValence: 4.0000
avg_dev NValence: 3.0000,mean NsUnfilled: 0.5000,avg_dev NsUnfilled: 0.5000,mean NpUnfilled: 0.5000,avg_dev NpUnfilled: 0.5000,minimum GSvolume_pa: 24.4975


Explanation
Dependencies:
The code installs and imports pymatgen and matminer for creating the structure and extracting features. It also uses IPython’s HTML display for a nicely formatted table.

Structure Creation & Feature Extraction:
A NaCl crystal is created and the ElementProperty featurizer (using the Magpie preset) computes a set of descriptors.

Filtering:
The code excludes any feature whose value is exactly 0 or 1.

The code below is basic version from above which provides a simple text list output of all values (unfiltered)

In [17]:
# Import required modules
from pymatgen.core import Lattice, Structure, Composition
from matminer.featurizers.composition import ElementProperty

# Create a simple NaCl structure
lattice = Lattice.cubic(5.64)  # approximate lattice parameter in Å
structure = Structure(lattice, ["Na", "Cl"], [[0, 0, 0], [0.5, 0.5, 0.5]])

# Extract the composition from the structure
comp = structure.composition

# Initialize a Matminer featurizer using the 'magpie' preset (which computes a variety of elemental properties)
ep = ElementProperty.from_preset("magpie")
features = ep.featurize(comp)

# Retrieve the names of the features for reference
feature_labels = ep.feature_labels()

print("Extracted Features for NaCl:")
for label, value in zip(feature_labels, features):
    print(f"{label}: {value}")


Extracted Features for NaCl:
MagpieData minimum Number: 11.0
MagpieData maximum Number: 17.0
MagpieData range Number: 6.0
MagpieData mean Number: 14.0
MagpieData avg_dev Number: 3.0
MagpieData mode Number: 11.0
MagpieData minimum MendeleevNumber: 2.0
MagpieData maximum MendeleevNumber: 94.0
MagpieData range MendeleevNumber: 92.0
MagpieData mean MendeleevNumber: 48.0
MagpieData avg_dev MendeleevNumber: 46.0
MagpieData mode MendeleevNumber: 2.0
MagpieData minimum AtomicWeight: 22.98976928
MagpieData maximum AtomicWeight: 35.453
MagpieData range AtomicWeight: 12.463230720000002
MagpieData mean AtomicWeight: 29.221384640000004
MagpieData avg_dev AtomicWeight: 6.231615360000001
MagpieData mode AtomicWeight: 22.98976928
MagpieData minimum MeltingT: 171.6
MagpieData maximum MeltingT: 370.87
MagpieData range MeltingT: 199.27
MagpieData mean MeltingT: 271.235
MagpieData avg_dev MeltingT: 99.635
MagpieData mode MeltingT: 171.6
MagpieData minimum Column: 1.0
MagpieData maximum Column: 17.0
Magpie


# **Further Explaination of MatterGen**

MatterGen is a generative AI tool that uses machine learning to design new materials. It can create materials with specific properties, such as mechanical, electronic, or magnetic properties.  
How it works

• MatterGen starts with a random arrangement of atoms
• It adjusts the positions, elements, and periodic lattice of the atoms
• It iteratively refines the arrangement into a stable, crystalline structure

Features   

• MatterGen can be fine-tuned to generate materials with specific design requirements
• It can generate materials with desired chemical systems and space groups   
• It can tackle the problem of finding materials with low supply-chain risk   
• It can be applied to specific domains in materials science, such as developing biodegradable materials

Training  

• MatterGen is trained on a dataset of over 600,000 known materials
• It uses a diffusion model, which is a type of machine learning that has been effective in image and protein design

Benefits

• MatterGen can help researchers discover materials that were previously unimaginable   
• It can accelerate the discovery of materials that can lead to new technological advancements

Generative AI is experimental.

