# CifEnsemble

## Initialize with folder path and preprocess automatically

- `cifkit` standardizes the site labels in `atom_site_label`. Some site labels may contain a comma or a symbol such as M due to atomic mixing. CBA reformats each `atom_site_label` so it can be parsed into an element type that matches atom_site_type_symbol.

- `cifkit` removes the content of `publ_author_address`. This section often has an incorrect format that otherwise requires manual modifications.

- `cifkit` relocates any ill-formatted files, such as those with duplicate labels in `atom_site_label`, missing fractional coordinates, or files that require supercell generation.

In [1]:
from cifkit import CifEnsemble, Example

# Initialize
ensemble = CifEnsemble(Example.ErCoIn_folder_path)

# Initialize with nested .cif files in the folder
ensemble_nested = CifEnsemble(Example.ErCoIn_folder_path, add_nested_files=True)

# Get .cif file count in the folder
print("File count:", ensemble.file_count) # 6

# Get .cif file count in the folder including nested files
print("File count including nested:", ensemble_nested.file_count) # 7

# Get the directory path
print("Directory path:", ensemble.dir_path)

# Get all file paths in the folder
print("File paths:", ensemble.file_paths)

# Get all Cif objects initialized
print("Cif objects:", ensemble.cifs)


CIF Preprocessing in Example.ErCoIn_folder_path begun...

Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/Er10Co9In20.cif (1/3)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/ErCo2.68In0.32.cif (2/3)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/ErCoIn5.cif (3/3)

SUMMARY
# of files moved to 'error_operations' folder: 0
# of files moved to 'error_duplicate_labels' folder: 0
# of files moved to 'error_wrong_loop_value' folder: 0
# of files moved to 'error_coords' folder: 0
# of files moved to 'error_invalid_label' folder: 0
# of files moved to 'error_others' folder: 0


CIF Preprocessing in Example.ErCoIn_folder_path begun...

Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/Er10Co9In20.cif (1/3)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/ErCo2.68In0.32.cif (2/3)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/ErCoIn5.cif (3/3)

SUMMARY
# of files moved to 'error_operation

## Get individual .cif properties

You may loop through the list of `Cif` objects to access all both instant and computed properties described in Section 1.

In [2]:
# Print each property for each .cif file
for cif in ensemble.cifs:
  print(f"\n{cif.file_name}")
  print("Formula:", cif.formula)
  print("Tag:", cif.tag)
  print("Mixing type:", cif.site_mixing_type)
  print("Unique bond pairs:", cif.bond_pairs)


Er10Co9In20.cif
Formula: Er10Co9In20
Tag: 
Mixing type: full_occupancy
Unique bond pairs: {('Er', 'Er'), ('Co', 'In'), ('Co', 'Co'), ('Co', 'Er'), ('In', 'In'), ('Er', 'In')}

ErCo2.68In0.32.cif
Formula: ErCo2.68In0.32
Tag: 
Mixing type: full_occupancy_atomic_mixing
Unique bond pairs: {('Er', 'Er'), ('Co', 'In'), ('Co', 'Co'), ('Co', 'Er'), ('In', 'In'), ('Er', 'In')}

ErCoIn5.cif
Formula: ErCoIn5
Tag: rt
Mixing type: full_occupancy
Unique bond pairs: {('Er', 'Er'), ('Co', 'In'), ('Co', 'Co'), ('Co', 'Er'), ('In', 'In'), ('Er', 'In')}


## Get unique properties

Here, we introduce properties that determine unique methods.


In [3]:
# Get unique formulas
print("Unique formulas:", ensemble.unique_formulas)

# Get unique elements
print("Unique elements:", ensemble.unique_elements)

# Get unique structures
print("Unique structures:", ensemble.unique_structures)

# Get unique atomix mixing types
print("Unique atomic mixing types:", ensemble.unique_site_mixing_types)

# Get unique elements
print("Unique elements including nested:", ensemble_nested.unique_elements)

# Get unique space group names
print("Unique space group names:", ensemble.unique_space_group_names)

# Get unique space group numbers
print("Unique space group numbers:",ensemble.unique_space_group_numbers)

# Get unique tags
print("Unique tags:", ensemble.unique_tags)

# Get unique composition types
print("Unique composition types:", ensemble.unique_composition_types)

Unique formulas: {'ErCo2.68In0.32', 'ErCoIn5', 'Er10Co9In20'}
Unique elements: {'Er', 'Co', 'In'}
Unique structures: {'Ho10Ni9In20', 'HoCoGa5', 'PuNi3'}
Unique atomic mixing types: {'full_occupancy', 'full_occupancy_atomic_mixing'}
Unique elements including nested: {'Er', 'Co', 'In'}
Unique space group names: {'R-3mh', 'P4/mmm', 'P4/nmm(originchoice2)'}
Unique space group numbers: {129, 123, 166}
Unique tags: {'', 'rt'}
Unique composition types: {3}


## Get unique and average computed properties

As shown in Section 1, the following properties require computing all distances.

In [16]:
from cifkit import CifEnsemble, Example

# Initialize
ensemble = CifEnsemble(Example.ErCoIn_folder_path)

for cif in ensemble.cifs:
    print(cif.file_name)
    print(cif.site_mixing_type)# Get unique CN values
    print(cif.CN_best_methods)

# print("CN_unique_values_by_min_dist_method:", ensemble.CN_unique_values_by_min_dist_method)
# print("CN_unique_values_by_best_methods:", ensemble.CN_unique_values_by_best_methods)


CIF Preprocessing in Example.ErCoIn_folder_path begun...

Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/Er10Co9In20.cif (1/3)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/ErCo2.68In0.32.cif (2/3)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn/ErCoIn5.cif (3/3)

SUMMARY
# of files moved to 'error_operations' folder: 0
# of files moved to 'error_duplicate_labels' folder: 0
# of files moved to 'error_wrong_loop_value' folder: 0
# of files moved to 'error_coords' folder: 0
# of files moved to 'error_invalid_label' folder: 0
# of files moved to 'error_others' folder: 0

Er10Co9In20.cif
full_occupancy
{'Er4': {'volume_of_polyhedron': 108.299, 'distance_from_avg_point_to_center': 0.126, 'number_of_vertices': 17, 'number_of_edges': 45, 'number_of_faces': 30, 'shortest_distance_to_face': 2.404, 'shortest_distance_to_edge': 2.297, 'volume_of_inscribed_sphere': 58.23, 'packing_efficiency': 0.538, 'method_used': 'dist_by_shortest_dist'},

TypeError: 'NoneType' object is not subscriptable

## Get overall stats by attribute

Get the number of files for each unique properties.

In [None]:
# Get file count per structure
print("Structure stats:", ensemble.structure_stats)

# Get file count per formula
print("Formula stats:", ensemble.formula_stats)

# Get file count per tag
print("Tag stats:", ensemble.tag_stats)

# Get file count per space group number
print("Space group number stats:", ensemble.space_group_number_stats)

# Get file count per space group name
print("Space group name stats:", ensemble.space_group_name_stats)

# Get file count per composition type
print("Composition type stats:", ensemble.composition_type_stats)

# Get file count per Element
print("Unique elements stats:", ensemble.unique_elements_stats)

# Get file count per site mixing type
print("Site mixing type stats:", ensemble.site_mixing_type_stats)

# Get file count per supercell atom count
print("Supercell size stats:", ensemble.supercell_size_stats)

# Get file count per min distance
print("Min distance stats:", ensemble.min_distance_stats)

# Get file count per CN value by min dist method
print("CN value using min dist method stats:", ensemble.unique_CN_values_by_min_dist_method_stat)

# Get file count per CN value by best methods
print("CN value using best methods stats:", ensemble.unique_CN_values_by_method_methods_stat)

## Filter .cif containing specific attributes


In [None]:
# Return file paths by formulas
ensemble.filter_by_formulas(["LaRu2Ge2"])
ensemble.filter_by_formulas(["LaRu2Ge2", "Mo"]) # LaRu2Ge2 or Mo

# Return file paths by structures
ensemble.filter_by_structures(["CeAl2Ga2"])

# Return file paths by space group names
ensemble.filter_by_space_group_names(["Im-3m"])

# Return file paths by space group numbers
ensemble.filter_by_space_group_numbers([139])

# Return file paths by site mixing types
ensemble.filter_by_site_mixing_types(["full_occupancy"])
ensemble.filter_by_site_mixing_types(["full_occupancy", "deficiency_without_atomic_mixing"])

# Return file paths by composition types (1-> unary, 2-> binary)
ensemble.filter_by_composition_types([3])



## Filter .cif by specific attributes

Filter .cif files either containing a set of items or files that exactly contain the values passed. `cifkit` supports elements and coordination numbers.


In [None]:

# Return a list of files that contain one of the elements
ensemble.filter_by_elements_containing(["Mo"])

# Return a list of files that exaclty contain the specific elements
ensemble.filter_by_elements_exact_matching(["La", "Ru", "Ge"])

# Return a list of files that contain CN value of 9
ensemble.filter_by_CN_dist_method_containing([9])
ensemble.filter_by_CN_best_methods_containing([9])

# # Return a list of files that exactly contains the exact values passed
# ensemble.filter_by_CN_dist_method_exact_matching([16, 12, 5])
# ensemble.filter_by_CN_best_methods_exact_matching([16, 10, 12])

AttributeError: 'Cif' object has no attribute '_CN_unique_values_by_min_dist_method'

## Filter by range

In [20]:
# Return a set of .cif file paths with min distance between 2.5 Å and 4.0 Å
ensemble.filter_by_min_distance(2.5, 4.0)

# Return a set of .cif file paths with supercell atom count above 300 and below 500.
ensemble.filter_by_supercell_count(300, 500)

set()

## Generate histograms

Histograms containing the number of files per property are saved and displayed optionally


In [4]:
from cifkit import CifEnsemble, Example
# Generate histograms
ensemble = CifEnsemble(Example.ErCoIn_big_folder_path)
ensemble.generate_stat_histograms(display=True)


CIF Preprocessing in Example.ErCoIn_big_folder_path begun...

Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1818414.cif (1/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1840445.cif (2/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1233938.cif (3/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1140826.cif (4/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1634753.cif (5/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1803318.cif (6/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1956508.cif (7/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1234749.cif (8/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1803512.cif (9/16)
Preprocessing /Users/imac/Downloads/cifkit/src/cifkit/data/ErCoIn_big/1234747.cif (10/16)
Preprocessing /Users/imac/Downloads/cifkit/src

TypeError: 'NoneType' object is not subscriptable

## Move and copy files

Assume you have a set of file paths filered using the functions described in the previous section. Since we are using Jupyter and with predefined examples, you need to provide `file_paths` and `dest_dir_path` for your system.

In [None]:
file_paths = {
    "tests/data/cif/ensemble_test/300169.cif",
    "tests/data/cif/ensemble_test/300171.cif",
    "tests/data/cif/ensemble_test/300170.cif",
}

# To move files
ensemble.move_cif_files(file_paths, dest_dir_path)

# To copy files
ensemble.copy_cif_files(file_paths, dest_dir_path)