# Creating a basic Cluster Expansion

In [1]:
import numpy as np
import json
from monty.serialization import loadfn
from smol.cofe import ClusterSubspace, StructureWrangler, \
    ClusterExpansion, RegressionData

In [2]:
prim = loadfn('../data/lmto_prim.json')

Loading the DFT data of Li-Mn-Ti-O system

In [3]:
computed_entries = loadfn('../data/lmto_entries.json')
print(f"{len(computed_entries)} structures in dataset.")

630 structures in dataset.


### 0) The prim structure
Active sites have fractional compositions. Vacancies are allowed in sites where the composition does not sum to one.

0. Is active. The allowed species are: Li+, Ti4+ and Mn3+.
1. Is not active. Only O2- is allowed.

In [4]:
print(prim)

Full Formula (Li0.33333333 Ti0.33333333 Mn0.33333333 O1)
Reduced Formula: Li0.33333333Ti0.33333333Mn0.33333333O1
abc   :   2.969848   2.969848   2.969848
angles:  60.000000  60.000000  60.000000
Sites (2)
  #  SP                                   a    b    c
---  ---------------------------------  ---  ---  ---
  0  Li+:0.333, Ti4+:0.333, Mn3+:0.333  0    0    0
  1  O2-                                0.5  0.5  0.5


### 1) Create a cluster subspace
The `ClusterSubspace` represents all the orbits (groups of equivalent clusters) that will be considered when fitting the cluster expansion. Its main purpose is to compute the **correlations functions** for each included orbit given a structure in the compositional space defined by the prim.

In [5]:
subspace = ClusterSubspace.from_cutoffs(prim,
                                        cutoffs={2: 7.1, 3: 4, 4: 4},
                                        basis='sinusoid',
                                        supercell_size='volume')
# subspace.add_external_term(EwaldTerm(eta=None))

print(subspace) # single site and empty orbits are always included.

ClusterBasis: [Prim Composition] Li+0.33333333 Mn3+0.33333333 Ti4+0.33333333 O2-1
    [Size] 0
      [Orbit] id: 0  orderings: 1
    [Size] 1
      [Orbit] id: 1  orderings: 2   multiplicity: 1    no. symops: 48  
              [Base Cluster] Radius: 0.0   Centroid: [0. 0. 0.]         Points: [[0. 0. 0.]]         
    [Size] 2
      [Orbit] id: 2  orderings: 3   multiplicity: 6    no. symops: 8   
              [Base Cluster] Radius: 1.48  Centroid: [0.  0.  0.5]      Points: [[0. 0. 1.]  [0. 0. 0.]]                  
      [Orbit] id: 3  orderings: 3   multiplicity: 3    no. symops: 16  
              [Base Cluster] Radius: 2.1   Centroid: [0.5 0.5 0.5]      Points: [[1. 1. 0.]  [0. 0. 1.]]                  
      [Orbit] id: 4  orderings: 3   multiplicity: 12   no. symops: 4   
              [Base Cluster] Radius: 2.57  Centroid: [0.  0.5 0.5]      Points: [[0. 1. 1.]  [0. 0. 0.]]                  
      [Orbit] id: 5  orderings: 3   multiplicity: 6    no. symops: 8   
              

#### 1.1) Computing a correlation vector.
A correlation vector for a specific structure (represents the feature vector) used to train and predict target values.

### 2) Create a structure wrangler
The `StructureWrangler` is a class that will is used to create and organize the data that will be used to train (and possibly test) the cluster expansion. It makes sure that all the supplied structures appropriately match the prim structure, and obtains the necessary information to correctly normalize target properties (such as energy) necessary for training.

In [6]:
wrangler = StructureWrangler(subspace)

# you can add any number of properties and name them
# whatever you want. You should use something descriptive.
# In this case we'll call it 'total_energy'.
for i, entry in enumerate(computed_entries):
    print("processed: {}/{}".format(i, len(computed_entries)))
    wrangler.add_data(entry.structure,
                      properties={'total_energy': entry.energy},
                      verbose=True)
# The verbose flag will print structures that fail to match.

print(f'\nTotal structures that match {wrangler.num_structures}/{len(computed_entries)}')

processed: 0/630
processed: 1/630
processed: 2/630
processed: 3/630
processed: 4/630
processed: 5/630
processed: 6/630
processed: 7/630
processed: 8/630
processed: 9/630
processed: 10/630
processed: 11/630
processed: 12/630
processed: 13/630
processed: 14/630
processed: 15/630
processed: 16/630
processed: 17/630


 Index 13 - Li+16 Mn3+16 O2-32{'total_energy': -422.52336463}
Index 16 - Mn3+6 Li+6 O2-12{'total_energy': -157.651775}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 18/630
processed: 19/630


 Index 13 - Li+16 Mn3+16 O2-32{'total_energy': -422.52336463}
Index 16 - Mn3+6 Li+6 O2-12{'total_energy': -157.651775}
Index 19 - Mn3+6 Li+6 O2-12{'total_energy': -157.61200258}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 20/630
processed: 21/630


 Index 13 - Li+16 Mn3+16 O2-32{'total_energy': -422.52336463}
Index 16 - Mn3+6 Li+6 O2-12{'total_energy': -157.651775}
Index 19 - Mn3+6 Li+6 O2-12{'total_energy': -157.61200258}
Index 21 - Mn3+6 Li+6 O2-12{'total_energy': -157.86858139999998}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 22/630
processed: 23/630
processed: 24/630


 Index 13 - Li+16 Mn3+16 O2-32{'total_energy': -422.52336463}
Index 16 - Mn3+6 Li+6 O2-12{'total_energy': -157.651775}
Index 19 - Mn3+6 Li+6 O2-12{'total_energy': -157.61200258}
Index 21 - Mn3+6 Li+6 O2-12{'total_energy': -157.86858139999998}
Index 23 - Mn3+6 Li+6 O2-12{'total_energy': -157.88582681999998}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 25/630
processed: 26/630
processed: 27/630
processed: 28/630
processed: 29/630
processed: 30/630
processed: 31/630
processed: 32/630
processed: 33/630
processed: 34/630
processed: 35/630
processed: 36/630
processed: 37/630
processed: 38/630
processed: 39/630
processed: 40/630
processed: 41/630
processed: 42/630
processed: 43/630
processed: 44/630
processed: 45/630
processed: 46/630
processed: 47/630
processed: 48/630
processed: 49/630
processed: 50/630
processed: 51/630
processed: 52/630
processed: 53/630
processed: 54/630
processed: 55/630
processed: 56/630
processed: 57/630
processed: 58/630
processed: 59/630
processed: 60/630
processed: 61/630
processed: 62/630
processed: 63/630
processed: 64/630


 Index 43 - Li+4 Mn3+4 O2-8{'total_energy': -104.09167402}
Index 60 - Mn3+4 Li+4 O2-8{'total_energy': -104.11239212999999}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 65/630
processed: 66/630
processed: 67/630
processed: 68/630
processed: 69/630
processed: 70/630
processed: 71/630
processed: 72/630
processed: 73/630
processed: 74/630
processed: 75/630


 Index 53 - Mn3+4 Li+4 O2-8{'total_energy': -104.93106644999999}
Index 65 - Li+4 Mn3+4 O2-8{'total_energy': -105.05485193}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 51 - Mn3+4 Li+4 O2-8{'total_energy': -103.79666542}
Index 72 - Li+4 Mn3+4 O2-8{'total_energy': -103.27479317999999}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 61 - Li+4 Mn3+4 O2-8{'total_energy': -104.07207165999999}
Index 74 - Li+4 Mn3+4 O2-8{'total_energy': -104.0335375}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 76/630
processed: 77/630
processed: 78/630
processed: 79/630
processed: 80/630
processed: 81/630
processed: 82/630
processed: 83/630
processed: 84/630
processed: 85/630
processed: 86/630


 Index 66 - Mn3+4 Li+4 O2-8{'total_energy': -104.90261661}
Index 81 - Mn3+4 Li+4 O2-8{'total_energy': -104.76946991}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 76 - Li+4 Mn3+4 O2-8{'total_energy': -104.39110529}
Index 83 - Mn3+4 Li+4 O2-8{'total_energy': -104.45155711}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 85 - Mn3+3 Li+3 O2-6{'total_energy': -78.95606344}
Index 86 - Li+3 Mn3+3 O2-6{'total_energy': -79.08766976999999}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 48 - Mn3+4 Li+4 O2-8{'total_energy': -104.93574505999999}
Index 92 - Li+2 Mn3+2 O2-4{'total_energy': -52.29711425}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 87/630
processed: 88/630
processed: 89/630
processed: 90/630
processed: 91/630
processed: 92/630
processed: 93/630
processed: 94/630
processed: 95/630
processed: 96/630
processed: 97/630
processed: 98/630
processed: 99/630
processed: 100/630
processed: 101/630
processed: 102/630
processed: 103/630
processed: 104/630
processed: 105/630
processed: 106/630
processed: 107/630
processed: 108/630
processed: 109/630
processed: 110/630
processed: 111/630


 Index 13 - Li+16 Mn3+16 O2-32{'total_energy': -422.52336463}
Index 16 - Mn3+6 Li+6 O2-12{'total_energy': -157.651775}
Index 19 - Mn3+6 Li+6 O2-12{'total_energy': -157.61200258}
Index 21 - Mn3+6 Li+6 O2-12{'total_energy': -157.86858139999998}
Index 23 - Mn3+6 Li+6 O2-12{'total_energy': -157.88582681999998}
Index 108 - Li+4 Mn3+4 O2-8{'total_energy': -105.63097}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 85 - Mn3+3 Li+3 O2-6{'total_energy': -78.95606344}
Index 86 - Li+3 Mn3+3 O2-6{'total_energy': -79.08766976999999}
Index 109 - Li+18 Mn3+18 O2-36{'total_energy': -473.68233}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 112/630
processed: 113/630
processed: 114/630


 Index 85 - Mn3+3 Li+3 O2-6{'total_energy': -78.95606344}
Index 86 - Li+3 Mn3+3 O2-6{'total_energy': -79.08766976999999}
Index 109 - Li+18 Mn3+18 O2-36{'total_energy': -473.68233}
Index 112 - Li+18 Mn3+18 O2-36{'total_energy': -474.24243}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 85 - Mn3+3 Li+3 O2-6{'total_energy': -78.95606344}
Index 86 - Li+3 Mn3+3 O2-6{'total_energy': -79.08766976999999}
Index 109 - Li+18 Mn3+18 O2-36{'total_energy': -473.68233}
Index 112 - Li+18 Mn3+18 O2-36{'total_energy': -474.24243}
Index 115 - Mn3+18 Li+18 O2-36{'total_energy': -473.80347}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 115/630
processed: 116/630
processed: 117/630
processed: 118/630
processed: 119/630
processed: 120/630
processed: 121/630
processed: 122/630


 Index 85 - Mn3+3 Li+3 O2-6{'total_energy': -78.95606344}
Index 86 - Li+3 Mn3+3 O2-6{'total_energy': -79.08766976999999}
Index 109 - Li+18 Mn3+18 O2-36{'total_energy': -473.68233}
Index 112 - Li+18 Mn3+18 O2-36{'total_energy': -474.24243}
Index 115 - Mn3+18 Li+18 O2-36{'total_energy': -473.80347}
Index 120 - Mn3+18 Li+18 O2-36{'total_energy': -473.90453}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 123/630
processed: 124/630
processed: 125/630
processed: 126/630
processed: 127/630
processed: 128/630
processed: 129/630
processed: 130/630
processed: 131/630
processed: 132/630
processed: 133/630
processed: 134/630
processed: 135/630
processed: 136/630
processed: 137/630
processed: 138/630
processed: 139/630
processed: 140/630
processed: 141/630
processed: 142/630
processed: 143/630
processed: 144/630
processed: 145/630
processed: 146/630
processed: 147/630
processed: 148/630
processed: 149/630
processed: 150/630
processed: 151/630
processed: 152/630
processed: 153/630
processed: 154/630
processed: 155/630
processed: 156/630
processed: 157/630
processed: 158/630
processed: 159/630
processed: 160/630
processed: 161/630
processed: 162/630
processed: 163/630
processed: 164/630
processed: 165/630
processed: 166/630
processed: 167/630
processed: 168/630
processed: 169/630
processed: 170/630
processed: 171/630
processed: 172/630
processed: 173/630
processed: 174/630
processed: 1

 Index 96 - Li+18 Ti4+9 O2-27{'total_energy': -377.54954127}
Index 194 - Li+24 Ti4+12 O2-36{'total_energy': -503.42615}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 195 - Li+24 Ti4+12 O2-36{'total_energy': -504.6361}
Index 198 - Li+4 Ti4+2 O2-6{'total_energy': -84.105641}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 196/630
processed: 197/630
processed: 198/630
processed: 199/630
processed: 200/630
processed: 201/630
processed: 202/630
processed: 203/630
processed: 204/630
processed: 205/630
processed: 206/630
processed: 207/630
processed: 208/630
processed: 209/630


 Index 96 - Li+18 Ti4+9 O2-27{'total_energy': -377.54954127}
Index 194 - Li+24 Ti4+12 O2-36{'total_energy': -503.42615}
Index 203 - Li+2 Ti4+1 O2-3{'total_energy': -41.919018}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 96 - Li+18 Ti4+9 O2-27{'total_energy': -377.54954127}
Index 194 - Li+24 Ti4+12 O2-36{'total_energy': -503.42615}
Index 203 - Li+2 Ti4+1 O2-3{'total_energy': -41.919018}
Index 205 - Li+6 Ti4+3 O2-9{'total_energy': -125.91502}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 210/630
processed: 211/630
processed: 212/630
processed: 213/630
processed: 214/630
processed: 215/630
processed: 216/630
processed: 217/630
processed: 218/630
processed: 219/630
processed: 220/630
processed: 221/630
processed: 222/630
processed: 223/630
processed: 224/630
processed: 225/630
processed: 226/630
processed: 227/630
processed: 228/630
processed: 229/630
processed: 230/630
processed: 231/630
processed: 232/630
processed: 233/630
processed: 234/630
processed: 235/630
processed: 236/630
processed: 237/630
processed: 238/630
processed: 239/630
processed: 240/630
processed: 241/630
processed: 242/630
processed: 243/630
processed: 244/630
processed: 245/630
processed: 246/630
processed: 247/630
processed: 248/630
processed: 249/630
processed: 250/630
processed: 251/630
processed: 252/630
processed: 253/630
processed: 254/630
processed: 255/630
processed: 256/630
processed: 257/630
processed: 258/630
processed: 259/630
processed: 260/630
processed: 261/630
processed: 2

 Index 266 - Mn3+15 Li+19 Ti4+2 O2-36{'total_energy': -478.20534}
Index 267 - Mn3+15 Li+19 Ti4+2 O2-36{'total_energy': -478.14156}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 268 - Li+19 Mn3+15 Ti4+2 O2-36{'total_energy': -478.19176}
Index 269 - Li+19 Ti4+2 Mn3+15 O2-36{'total_energy': -478.54775}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 271/630
processed: 272/630
processed: 273/630
processed: 274/630
processed: 275/630
processed: 276/630
processed: 277/630
processed: 278/630
processed: 279/630
processed: 280/630
processed: 281/630
processed: 282/630
processed: 283/630
processed: 284/630
processed: 285/630
processed: 286/630
processed: 287/630
processed: 288/630
processed: 289/630
processed: 290/630
processed: 291/630
processed: 292/630
processed: 293/630
processed: 294/630
processed: 295/630
processed: 296/630
processed: 297/630
processed: 298/630
processed: 299/630
processed: 300/630
processed: 301/630
processed: 302/630
processed: 303/630
processed: 304/630
processed: 305/630
processed: 306/630
processed: 307/630
processed: 308/630
processed: 309/630
processed: 310/630
processed: 311/630
processed: 312/630
processed: 313/630
processed: 314/630
processed: 315/630
processed: 316/630
processed: 317/630
processed: 318/630
processed: 319/630
processed: 320/630
processed: 321/630
processed: 322/630
processed: 3

 Index 96 - Li+18 Ti4+9 O2-27{'total_energy': -377.54954127}
Index 194 - Li+24 Ti4+12 O2-36{'total_energy': -503.42615}
Index 203 - Li+2 Ti4+1 O2-3{'total_energy': -41.919018}
Index 205 - Li+6 Ti4+3 O2-9{'total_energy': -125.91502}
Index 323 - Ti4+12 Li+24 O2-36{'total_energy': -503.49035}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 326/630
processed: 327/630
processed: 328/630
processed: 329/630
processed: 330/630
processed: 331/630
processed: 332/630
processed: 333/630
processed: 334/630
processed: 335/630
processed: 336/630
processed: 337/630
processed: 338/630
processed: 339/630
processed: 340/630


 Index 151 - Mn3+18 Li+18 O2-36{'total_energy': -473.16304}
Index 340 - Li+18 Mn3+18 O2-36{'total_energy': -473.18089}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 151 - Mn3+18 Li+18 O2-36{'total_energy': -473.16304}
Index 340 - Li+18 Mn3+18 O2-36{'total_energy': -473.18089}
Index 341 - Mn3+18 Li+18 O2-36{'total_energy': -473.08014}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 341/630
processed: 342/630
processed: 343/630
processed: 344/630
processed: 345/630
processed: 346/630
processed: 347/630
processed: 348/630
processed: 349/630
processed: 350/630
processed: 351/630
processed: 352/630
processed: 353/630
processed: 354/630
processed: 355/630
processed: 356/630
processed: 357/630
processed: 358/630
processed: 359/630
processed: 360/630
processed: 361/630
processed: 362/630
processed: 363/630
processed: 364/630
processed: 365/630
processed: 366/630
processed: 367/630
processed: 368/630
processed: 369/630
processed: 370/630
processed: 371/630
processed: 372/630
processed: 373/630
processed: 374/630
processed: 375/630
processed: 376/630
processed: 377/630
processed: 378/630
processed: 379/630
processed: 380/630
processed: 381/630
processed: 382/630
processed: 383/630
processed: 384/630
processed: 385/630
processed: 386/630
processed: 387/630
processed: 388/630
processed: 389/630
processed: 390/630
processed: 391/630
processed: 392/630


 Index 380 - Mn3+6 Li+22 Ti4+8 O2-36{'total_energy': -493.32099}
Index 390 - Li+22 Mn3+6 Ti4+8 O2-36{'total_energy': -493.33222}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 380 - Mn3+6 Li+22 Ti4+8 O2-36{'total_energy': -493.32099}
Index 390 - Li+22 Mn3+6 Ti4+8 O2-36{'total_energy': -493.33222}
Index 392 - Li+22 Mn3+6 Ti4+8 O2-36{'total_energy': -493.38501}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 380 - Mn3+6 Li+22 Ti4+8 O2-36{'total_energy': -493.32099}
Index 390 - Li+22 Mn3+6 Ti4+8 O2-36{'total_energy': -493.33222}
Index 392 - Li+22 Mn3+6 Ti4+8 O2-36{'total_energy': -493.38501}
Index 393 - Li+22 Ti4+8 Mn3+6 O2-36{'total_energy': -493.33785}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 393/630
processed: 394/630
processed: 395/630
processed: 396/630


 Index 391 - Li+22 Ti4+8 Mn3+6 O2-36{'total_energy': -493.20104}
Index 394 - Li+22 Ti4+8 Mn3+6 O2-36{'total_energy': -493.20144}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 397/630
processed: 398/630
processed: 399/630
processed: 400/630
processed: 401/630
processed: 402/630
processed: 403/630
processed: 404/630


 Index 395 - Li+23 Mn3+3 Ti4+10 O2-36{'total_energy': -498.74487}
Index 402 - Li+23 Ti4+10 Mn3+3 O2-36{'total_energy': -498.74407}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 405/630
processed: 406/630
processed: 407/630
processed: 408/630


 Index 395 - Li+23 Mn3+3 Ti4+10 O2-36{'total_energy': -498.74487}
Index 402 - Li+23 Ti4+10 Mn3+3 O2-36{'total_energy': -498.74407}
Index 408 - Li+23 Ti4+10 Mn3+3 O2-36{'total_energy': -498.73567}
 Consider adding more terms to the clustersubspace or filtering duplicates.
 Index 395 - Li+23 Mn3+3 Ti4+10 O2-36{'total_energy': -498.74487}
Index 402 - Li+23 Ti4+10 Mn3+3 O2-36{'total_energy': -498.74407}
Index 408 - Li+23 Ti4+10 Mn3+3 O2-36{'total_energy': -498.73567}
Index 410 - Li+23 Ti4+10 Mn3+3 O2-36{'total_energy': -498.74938}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 409/630
processed: 410/630
processed: 411/630
processed: 412/630
processed: 413/630
processed: 414/630
processed: 415/630
processed: 416/630
processed: 417/630
processed: 418/630
processed: 419/630
processed: 420/630
processed: 421/630
processed: 422/630
processed: 423/630
processed: 424/630
processed: 425/630
processed: 426/630


 Index 184 - Mn3+18 Li+18 O2-36{'total_energy': -472.72205}
Index 425 - Mn3+18 Li+18 O2-36{'total_energy': -473.1071}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 427/630
processed: 428/630
processed: 429/630
processed: 430/630
processed: 431/630
processed: 432/630
processed: 433/630
processed: 434/630
processed: 435/630
processed: 436/630
processed: 437/630
processed: 438/630
processed: 439/630
processed: 440/630
processed: 441/630
processed: 442/630
processed: 443/630
processed: 444/630
processed: 445/630
processed: 446/630
processed: 447/630
processed: 448/630
processed: 449/630
processed: 450/630
processed: 451/630
processed: 452/630
processed: 453/630
processed: 454/630
processed: 455/630
processed: 456/630
processed: 457/630
processed: 458/630
processed: 459/630
processed: 460/630
processed: 461/630
processed: 462/630
processed: 463/630
processed: 464/630
processed: 465/630
processed: 466/630
processed: 467/630
processed: 468/630
processed: 469/630
processed: 470/630
processed: 471/630
processed: 472/630
processed: 473/630
processed: 474/630
processed: 475/630
processed: 476/630
processed: 477/630
processed: 478/630
processed: 4

 Index 506 - Li+23 Ti4+10 Mn3+3 O2-36{'total_energy': -498.01167}
Index 507 - Li+23 Mn3+3 Ti4+10 O2-36{'total_energy': -497.84187}
 Consider adding more terms to the clustersubspace or filtering duplicates.


processed: 511/630
processed: 512/630
processed: 513/630
processed: 514/630
processed: 515/630
processed: 516/630
processed: 517/630
processed: 518/630
processed: 519/630
processed: 520/630
processed: 521/630
processed: 522/630
processed: 523/630
processed: 524/630
processed: 525/630
processed: 526/630
processed: 527/630
processed: 528/630
processed: 529/630
processed: 530/630
processed: 531/630
processed: 532/630
processed: 533/630
processed: 534/630
processed: 535/630
processed: 536/630
processed: 537/630
processed: 538/630
processed: 539/630
processed: 540/630
processed: 541/630
processed: 542/630
processed: 543/630
processed: 544/630
processed: 545/630
processed: 546/630
processed: 547/630
processed: 548/630
processed: 549/630
processed: 550/630
processed: 551/630
processed: 552/630
processed: 553/630
processed: 554/630
processed: 555/630
processed: 556/630
processed: 557/630
processed: 558/630
processed: 559/630
processed: 560/630
processed: 561/630
processed: 562/630
processed: 5

## 3) Training

Training a cluster expansion is one of the most critical steps. This is how you get **effective cluster interactions (ECI's)**. To do so you need an estimator class that implements some form of regression model. In this case we will use simple least squares regression using the `LinearRegression` estimator from `scikit-learn`.

In `smol` the coefficients from the fit are not exactly the ECI's but the ECI times the multiplicity of their orbit.

In [7]:
from sklearn.linear_model import LinearRegression
# Set fit_intercept to False because we already do this using
# the empty cluster.
estimator = LinearRegression(fit_intercept=False)
estimator.fit(wrangler.feature_matrix,
              wrangler.get_property_vector('total_energy'))
coefs = estimator.coef_

#### 3.1) Check the quality of the fit

In [8]:
from sklearn.metrics import mean_squared_error, max_error

train_predictions = np.dot(wrangler.feature_matrix, coefs)

rmse = mean_squared_error(wrangler.get_property_vector('total_energy'),
                          train_predictions, squared=False)
maxer = max_error(wrangler.get_property_vector('total_energy'),
                  train_predictions)

print(f'RMSE {rmse} eV/prim')
print(f'MAX {maxer} eV/prim')

RMSE 0.013719719976843134 eV/prim
MAX 0.10229881737488533 eV/prim


### 4) Create a cluster expansion

In [9]:
# save details of the regression used to fit
reg_data = RegressionData.from_sklearn(
    estimator, feature_matrix=wrangler.feature_matrix,
    property_vector=wrangler.get_property_vector('total_energy'))

expansion = ClusterExpansion(
    subspace, coefficients=coefs, regression_data=reg_data)

structure = np.random.choice(wrangler.structures)
prediction = expansion.predict(structure, normalize=True)

print(f'The predicted energy for a structure with composition '
      f'{structure.composition} is {prediction} eV/prim.\n')
print(f'The fitted coefficients are:\n{expansion.coefs}\n')
print(f'The effective cluster interactions are:\n{expansion.eci}\n')
print(expansion)

The predicted energy for a structure with composition Mn3+9 Li+21 Ti4+6 O2-36 is -13.550963091022656 eV/prim.

The fitted coefficients are:
[-1.41649677e+11 -3.54124192e+11  1.22672219e+11  7.93385756e-01
 -5.30298691e-01  1.55481332e-01  4.38792541e-01 -2.09197422e-01
  5.72421679e-02  1.99144459e-01  1.62832331e-01 -1.10884296e-01
  1.44084032e-01  6.09243067e-02 -3.38844332e-02  2.38991409e-01
 -1.30564341e-01 -1.23614042e-02 -2.50345094e-01 -1.03712695e-01
  4.50283643e-01  7.79379756e-02 -6.63553619e-02 -1.05656329e-01
  2.81985642e-01  1.38585686e-01 -1.30902417e-01]

The effective cluster interactions are:
[-1.41649677e+11 -3.54124192e+11  1.22672219e+11  1.32230959e-01
 -4.41915576e-02  2.59135553e-02  1.46264180e-01 -3.48662371e-02
  1.90807226e-02  1.65953715e-02  6.78468044e-03 -9.24035801e-03
  2.40140053e-02  5.07702555e-03 -5.64740553e-03  1.99159507e-02
 -5.44018086e-03 -1.03011701e-03 -3.12931368e-02 -4.32136227e-03
  1.87618185e-02  9.74224694e-03 -3.31776809e-02 -1.32

  structure = np.random.choice(wrangler.structures)


### 5) Saving your work
All core classes in `smol` are `MSONables` and so can be saved using their `as_dict` methods or better yet with `monty.serialization.dumpfn`.

Currently there is also a convenience function in `smol` that will nicely save all of your work for you in a standardized way. Work saved with the `save_work` function is saved as a dictionary with standardized names for the classes. Since a work flow should only contain 1 of each core classes the function will complain if you give it two of the same class (i.e. two wranglers)

In [10]:
from smol.io import load_work, save_work

file_path = '../data/lmto_sinusoid.mson'
# we can save the subspace as well, but since both the wrangler
# and the expansion have it, there is no need to do so.
save_work(file_path, wrangler, expansion)

#### 5.1) Loading previously saved work

In [11]:
from smol.io import load_work, save_work

work = load_work(file_path)
for name, obj in work.items():
    print(f'{name}: {type(obj)}\n')

StructureWrangler: <class 'smol.cofe.wrangling.wrangler.StructureWrangler'>

ClusterExpansion: <class 'smol.cofe.expansion.ClusterExpansion'>

