# PyTorch Featurization

**Note: `aenet-python` needs to be installed with the `[torch]` requirements (`pip install aenet[torch]`) for this notebook to work.**

This notebook demonstrates how to featurize larger data sets efficiently with the AUC method.

## 1. Basic Featurization: Water Molecule

For convenience, `TorchAUCFeaturizer` offers the same API as `AenetAUCFeaturizer`. In many cases, this is the most convenient way to featurize a moderate number of atomic structure.

In [1]:
import aenet.io.structure
from aenet.torch_featurize import TorchAUCFeaturizer

struc = aenet.io.structure.read('water.xyz')

descriptor = TorchAUCFeaturizer(
    typenames=['O', 'H'],
    rad_order=10,      # Radial polynomial order
    rad_cutoff=4.0,    # Radial cutoff (Angstroms)
    ang_order=3,       # Angular polynomial order
    ang_cutoff=1.5     # Angular cutoff (Angstroms)
)

featurized_structure = descriptor.featurize_structure(struc)
print(featurized_structure.atom_features) 

[[ 1.73009825 -0.90147023 -0.79067349  1.72543339 -1.00740569 -0.67561297
   1.71146394 -1.10790861 -0.55690914  1.68826525 -1.20243702  0.0835817
  -0.02038995 -0.07363335  0.05631601  1.73009825 -0.90147023 -0.79067349
   1.72543339 -1.00740569 -0.67561297  1.71146394 -1.10790861 -0.55690914
   1.68826525 -1.20243702  0.0835817  -0.02038995 -0.07363335  0.05631601]
 [ 1.55242929 -0.61883393 -1.00049978  1.32680075 -0.12552333 -0.98685814
   0.79500362  0.12479959 -0.54970475  0.29804719 -0.06287795  0.
   0.          0.          0.         -0.17766897  0.28263629 -0.20982628
  -0.39863264  0.88188236 -0.31124517 -0.91646032  1.2327082   0.00720439
  -1.39021806  1.13955907  0.          0.          0.          0.        ]
 [ 1.55242929 -0.61883393 -1.00049978  1.32680075 -0.12552333 -0.98685814
   0.79500362  0.12479959 -0.54970475  0.29804719 -0.06287795  0.
   0.          0.          0.         -0.17766897  0.28263629 -0.20982628
  -0.39863264  0.88188236 -0.31124517 -0.91646032  1.

# 2. Low-level ChebyshevDescriptor

When the featurizer is used as part of a workflow, or when the feature gradients are needed, the low-level `ChebyshevDescriptor` should be used directly. Unlike the `TorchAUCFeaturizer`, `ChebyshevDescriptor` works directly with the positions in form of NumPy arrays or Torch tensors.

In [2]:
import torch
from aenet.torch_featurize import ChebyshevDescriptor

# Create descriptor
descriptor = ChebyshevDescriptor(
    species=['O', 'H'],
    rad_order=10,
    rad_cutoff=4.0,
    ang_order=3,
    ang_cutoff=1.5
)

# Featurize structure
positions = torch.tensor([
    [0.0, 0.0, 0.12],   # O
    [0.0, 0.76, -0.47], # H
    [0.0, -0.76, -0.47] # H
], dtype=torch.float64)

species = ['O', 'H', 'H']
features = descriptor.forward_from_positions(positions, species)
print(features)

tensor([[ 1.7278, -0.8966, -0.7972,  1.7241, -0.9921, -0.6944,  1.7128, -1.0833,
         -0.5885,  1.6941, -1.1697,  0.0813, -0.0202, -0.0713,  0.0555,  1.7278,
         -0.8966, -0.7972,  1.7241, -0.9921, -0.6944,  1.7128, -1.0833, -0.5885,
          1.6941, -1.1697,  0.0813, -0.0202, -0.0713,  0.0555],
        [ 1.5480, -0.6125, -1.0039,  1.3167, -0.1090, -0.9877,  0.7768,  0.1370,
         -0.5404,  0.2865, -0.0696,  0.0000,  0.0000,  0.0000,  0.0000, -0.1798,
          0.2841, -0.2066, -0.4073,  0.8831, -0.2933, -0.9360,  1.2203,  0.0481,
         -1.4075,  1.1001,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 1.5480, -0.6125, -1.0039,  1.3167, -0.1090, -0.9877,  0.7768,  0.1370,
         -0.5404,  0.2865, -0.0696,  0.0000,  0.0000,  0.0000,  0.0000, -0.1798,
          0.2841, -0.2066, -0.4073,  0.8831, -0.2933, -0.9360,  1.2203,  0.0481,
         -1.4075,  1.1001,  0.0000,  0.0000,  0.0000,  0.0000]],
       dtype=torch.float64)


## 3. Periodic System: Crystal Structure

The low-level routine also supports periodic structures, requiring the cell as an additional argument.

In [3]:
# AuCu crystal structure
positions_pbc = torch.tensor([
    [0.0, 0.0, 0.0],
    [0.0, 0.5, 0.5],
    [0.5, 0.0, 0.5],
    [0.5, 0.5, 0.0]
], dtype=torch.float64)
species_pbc = ['Cu', 'Cu', 'Au', 'Au']

# Unit cell
cell = torch.tensor([
    [4.0, 0.0, 0.0],
    [0.0, 4.0, 0.0],
    [0.0, 0.0, 4.0]
], dtype=torch.float64)

# Periodic boundary conditions in all dimensions
pbc = torch.tensor([True, True, True], dtype=torch.bool)

# Create descriptor for Au-Cu system
descriptor_aucu = ChebyshevDescriptor(
    species=['Au', 'Cu'],
    rad_order=8,
    rad_cutoff=3.5,
    ang_order=5,
    ang_cutoff=3.5
)

# Featurize with PBC
features_pbc = descriptor_aucu.forward_from_positions(
    positions_pbc, species_pbc, cell=cell, pbc=pbc
)

print(f"Crystal feature shape: {features_pbc.shape}")
print(f"Cu atom 0 features (first 10): {features_pbc[0, :10]}")
print(f"Au atom 2 features (first 10): {features_pbc[2, :10]}")

Crystal feature shape: torch.Size([4, 30])
Cu atom 0 features (first 10): tensor([ 2.7079, -1.6137, -0.7845,  2.5488, -2.2533,  0.1369,  2.0901, -2.6281,
         1.0422,  2.4442], dtype=torch.float64)
Au atom 2 features (first 10): tensor([ 2.7079, -1.6137, -0.7845,  2.5488, -2.2533,  0.1369,  2.0901, -2.6281,
         1.0422,  2.4442], dtype=torch.float64)


## 4. Batch Processing

When efficiency matters and many structures need to be featurized, the `BatchedFeaturizer` can be used to help with batch processing.

In [4]:
import torch
from aenet.torch_featurize import ChebyshevDescriptor, BatchedFeaturizer

# Create base descriptor
descriptor = ChebyshevDescriptor(
    species=['O', 'H'],
    rad_order=10,
    rad_cutoff=4.0,
    ang_order=3,
    ang_cutoff=1.5
)

# Wrap in BatchedFeaturizer for efficient batch processing
batch_featurizer = BatchedFeaturizer(descriptor)

# Prepare batch of structures (different sizes allowed)
batch_positions = [
    torch.tensor([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]),  # 3 atoms
    torch.tensor([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]]),                   # 2 atoms
    torch.tensor([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]])   # 3 atoms
]

batch_species = [
    ['O', 'H', 'H'],
    ['O', 'H'],
    ['O', 'H', 'H']
]

# Process entire batch at once
features, batch_indices = batch_featurizer(batch_positions, batch_species)
for i, f in zip(batch_indices, features):
    print(f'Structure {i:.0f}:', f)

Structure 0: tensor([ 1.7071e+00, -8.5355e-01, -8.5355e-01,  1.7071e+00, -8.5355e-01,
        -8.5355e-01,  1.7071e+00, -8.5355e-01, -8.5355e-01,  1.7071e+00,
        -8.5355e-01,  6.2500e-02,  3.8270e-18, -6.2500e-02, -1.1481e-17,
         1.7071e+00, -8.5355e-01, -8.5355e-01,  1.7071e+00, -8.5355e-01,
        -8.5355e-01,  1.7071e+00, -8.5355e-01, -8.5355e-01,  1.7071e+00,
        -8.5355e-01,  6.2500e-02,  3.8270e-18, -6.2500e-02, -1.1481e-17],
       dtype=torch.float64)
Structure 0: tensor([ 1.5756e+00, -6.3825e-01, -1.0249e+00,  1.4154e+00, -1.5777e-01,
        -1.1462e+00,  1.0060e+00,  2.0336e-01, -9.4833e-01,  5.2893e-01,
         2.8494e-01,  2.0122e-03,  1.4228e-03, -3.2358e-19, -1.4228e-03,
        -1.3155e-01,  2.1531e-01, -1.7135e-01, -2.9171e-01,  6.9578e-01,
        -2.9265e-01, -7.0113e-01,  1.0569e+00, -9.4776e-02, -1.1782e+00,
         1.1385e+00, -2.0122e-03, -1.4228e-03,  3.2358e-19,  1.4228e-03],
       dtype=torch.float64)
Structure 0: tensor([ 1.5756e+00, -6.382

## 5. GPU Acceleration

Use GPU for faster featurization (if available).

In [5]:
if torch.cuda.is_available():

    # Create descriptor on GPU
    descriptor = ChebyshevDescriptor(
        species=['O', 'H'],
        rad_order=10,
        rad_cutoff=4.0,
        ang_order=3,
        ang_cutoff=1.5,
        device='cuda'  # Use GPU
    )
    
    # Input tensors automatically moved to GPU
    features = descriptor.forward_from_positions(positions, species)

else:
    print("CUDA is not available.")

CUDA is not available.


## 6. Gradient Computation

Compute feature gradients by setting `requires_grad=True`.

In [6]:
# Enable gradient tracking
positions_torch = positions.detach().clone().requires_grad_(True)

# Compute features with gradients
features_torch = descriptor(positions_torch, species)

# Compute gradient via backpropagation
loss = features_torch.sum()
loss.backward()

print("Position gradients:")
print(f"Shape: {positions_torch.grad.shape}")
print(f"Oxygen gradient: {positions_torch.grad[0]}")

Position gradients:
Shape: torch.Size([3, 3])
Oxygen gradient: tensor([0.0000, 0.0000, 6.0276], dtype=torch.float64)
