In [2]:
import __init__

Navigated to package root: /home/cyprien/CrystaLLMv2_PKV
Added package root to Python path


#### 1st pass finetune - Mattergen XRD
- Dataset Source: [Mattergen Alex-MP-20](https://github.com/microsoft/mattergen/tree/main/data-release/alex-mp)
  - Columns: Database (manual) 
  - Reduced Formula (Source)
  - CIF (pmg - Cifwriter with symprec 0.1)
  - XRD 'Condition Vector' (with [_calculate_XRD.py](_utils/_preprocessing/_calculate_XRD.py))
    - pmg - XRDCalculator(wavelength="CuKa")
    - top 20 most intense peaks selected ($2\theta$ and int)
    - Normalisations
      - $2\theta$ min-max for 0,90
      - intensities min-max for 0,100
- Deduplicated
- Cleaned for CIF augmentation
  -  Note: I didnt filter to context length here because it was not implemented yet, but filter to context was flagged as True during model training which effectively does the same thing (less efficient)
- dataset pushed to HuggingFace as: c-bone/mattergen_XRD (90:10 train/valid sets)

In [None]:
!torchrun --nproc_per_node=2 _train.py --config '_config_files/training/conditional/xrd_studies/mattergen_XRD-slider.jsonc'

#### 2nd pass finetune - COD XRD
- Dataset Source: [COD hkl data](https://www.crystallography.net/hkl/)
  - Columns: Database (manual) 
  - Reduced Formula (automated extraction from source)
  - CIF
    - automated extraction of material id from COD source
    - converted to structure using pmg COD.get_structure_by_id
    - Cifwriter with symprec 0.1 for CIF
    - note: this was done because alot of COD cifs arent in clean standard format. Pymatgen already did a big job of cleaning them up so we dont need to reinvent the wheel and take CIF data straight from source.
  - XRD data
    - For every Material ID that has experimental hkl data and associated intensities, we extract it
    - Then:
      1. Calculate d_hkl from crystal structure.lattice.d_hkl([h, k, l])
      2. Use Bragg's law: sin($\theta$) = $\lambda$/($2$ × d_hkl)
      3. Find $\theta$ = arcsin($\lambda$/($2$ × d_hkl))
      4. Convert to degrees: $2\theta$ = $2$ × $\theta$ × (180/$\pi$)
    - Where:
      - $\lambda$: X-ray wavelength ($1.5406$ $\AA$ for Cu K$\alpha$)
      - d_hkl: d-spacing for the (hkl) planes
      - $\theta$: Bragg angle
    - Created 'Condition Vector'
      - top 20 most intense peaks selected ($2\theta$ and int)
      - Normalisations
        - $2\theta$ min-max for 0,90
        - intensities min-max for 0,100
  - Filtered out all hydrocarbons
    - symbols = struct.symbol_set
    - if "C" in symbols and "H" in symbols, remove it
  - Then cleaning for CIF augmentation
    - set --make_disordered_ordered flag
      - Makes every occupancy exactly integer if occupancy is int $\pm 0.05$. Element set needs to be exactly preserved or structure discarded.
    - Filtered to 1024 contect length
  - Pushed to HuggingFace as c-bone/COD_XRD_small_nohc

### Training
> **Note**: Here the hyperparamters change compared to regular finetuning because its 2nd pass. Backbone learning rates were set to decay from $5\times10^{-8}$ to $5\times10^{-10}$, while the learning rates for the newly initialised conditioning parameters were set 100 times higher

In [5]:
!python _utils/_preprocessing/_save_dataset_to_HF.py \
    --input_parquet 'HF-databases/COD_dev/COD_xrd_clean_nohc_small.parquet' \
    --output_parquet 'HF-databases/COD_XRD_small_nohc_full.parquet' \
    --valid_size 0.000 \
    --test_size 0.125 \
    --save_hub

Loading Hugging Face API key from API_keys.jsonc
Loading data from HF-databases/COD_dev/COD_xrd_clean_nohc_small.parquet as Parquet with zstd compression
Uploading the dataset shards:   0%|                       | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 100%|█████████| 2/2 [00:00<00:00, 65.69ba/s][A
Uploading the dataset shards: 100%|███████████████| 1/1 [00:01<00:00,  1.23s/it]
Uploading the dataset shards:   0%|                       | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 100%|████████| 1/1 [00:00<00:00, 164.92ba/s][A
Uploading the dataset shards: 100%|███████████████| 1/1 [00:00<00:00,  1.48it/s]
Dataset saved to Hugging Face Hub as c-bone/COD_XRD_small_nohc_full


In [None]:
!torchrun --nproc_per_node=2 _train.py --config '_config_files/training/conditional/xrd_studies/COD_XRD_small-slider-opt.jsonc'

### Generating

In [None]:
!python _utils/_generating/make_prompts.py \
    --HF_dataset 'c-bone/COD_XRD_small_nohc' \
    --split 'test' \
    --automatic \
    --output_parquet '_artifacts/cod-XRD/COD-small-XRD-nohc-test_prompts.parquet' \
    --level 'level_3' \
    --condition_columns 'Condition Vector'

#### Generate materials using 2-pass finetuning and XRD information

In [None]:
import __init__

Navigated to package root: /home/cyprien/CrystaLLMv2_PKV
Added package root to Python path


In [None]:
!python _utils/_generating/generate_CIFs.py --config '_config_files/generation/conditional/xrd_studies/COD-mgen-xrd_eval.jsonc'

python: can't open file '/home/cyprien/CrystaLLMv2_PKV/notebooks/_utils/_generating/generate_CIFs.py': [Errno 2] No such file or directory


In [None]:
!python _utils/_generating/postprocess.py \
    --input_parquet '_artifacts/cod-xrd/COD-mgen-topp-test_gen.parquet' \
    --output_parquet '_artifacts/cod-xrd/COD-mgen-topp-test_post.parquet' \
    --num_workers 32 \
    --column_name 'Generated CIF'

In [None]:
!python _utils/_metrics/XRD_metrics.py \
    --input_parquet '_artifacts/cod-xrd/cod-ft-20perp-test_post.parquet' \
    --num_gens 20 \
    --ref_parquet '_artifacts/cod-xrd/cod-test_ref.parquet' \
    --output_parquet '_artifacts/cod-xrd/cod-ft-20perp-test_metrics.parquet' \
    --num_workers 32 \
    --validity_check "none"

In [None]:
!python _utils/_metrics/XRD_metrics.py \
    --input_parquet '_artifacts/cod-xrd/cod-ft-20perp-test_post.parquet' \
    --num_gens 1 \
    --ref_parquet '_artifacts/cod-xrd/cod-test_ref.parquet' \
    --output_parquet '_artifacts/cod-xrd/cod-ft-1perp-test_metrics.parquet' \
    --num_workers 32 \
    --validity_check "none"

In [None]:
!python _utils/_metrics/XRD_metrics.py \
    --input_parquet '_artifacts/cod-xrd/cod-ft-20perp-test_post.parquet' \
    --num_gens 1 \
    --ref_parquet '_artifacts/cod-xrd/cod-test_ref.parquet' \
    --output_parquet '_artifacts/cod-xrd/cod-ft-1rand-test_metrics.parquet' \
    --num_workers 32 \
    --validity_check "none"\
    --sort_gens "random"

In [9]:
import __init__
from _utils import get_metrics_xrd
import pandas as pd

df_test = pd.read_parquet('_artifacts/cod-xrd/cod-test_ref.parquet')
df_metrics = pd.read_parquet('_artifacts/cod-xrd/cod-ft-20perp-test_metrics.parquet')
metrics = get_metrics_xrd(df_metrics, n_test=len(df_test), only_matched=False)
df_metrics = pd.read_parquet('_artifacts/cod-xrd/cod-ft-1perp-test_metrics.parquet')
metrics = get_metrics_xrd(df_metrics, n_test=len(df_test), only_matched=False)
df_metrics = pd.read_parquet('_artifacts/cod-xrd/cod-ft-1rand-test_metrics.parquet')
metrics = get_metrics_xrd(df_metrics, n_test=len(df_test), only_matched=False)

Computing metrics on all (also unmatched) structures (198 entries, 89 matched)
Number of matched structures: 89 / 198
Mean RMS-d: 0.1091
Percent Matched (%): 44.95% (89/198)
a MAE: 0.8845
b MAE: 0.6077
c MAE: 0.9357
Volume MAE: 54.3311
a R^2: 0.8651
b R^2: 0.9146
c R^2: 0.9305
Volume R^2: 0.9792
Average Score: 1.2760
Computing metrics on all (also unmatched) structures (198 entries, 61 matched)
Number of matched structures: 61 / 198
Mean RMS-d: 0.0773
Percent Matched (%): 30.81% (61/198)
a MAE: 2.1187
b MAE: 1.9925
c MAE: 2.9155
Volume MAE: 80.4460
a R^2: 0.5489
b R^2: 0.4371
c R^2: 0.4298
Volume R^2: 0.9508
Average Score: 1.2575
Computing metrics on all (also unmatched) structures (198 entries, 59 matched)
Number of matched structures: 59 / 198
Mean RMS-d: 0.0947
Percent Matched (%): 29.80% (59/198)
a MAE: 1.7297
b MAE: 1.6916
c MAE: 2.2709
Volume MAE: 67.6643
a R^2: 0.6424
b R^2: 0.5010
c R^2: 0.6053
Volume R^2: 0.9672
Average Score: 1.3053


#### Generate materials using 2-pass finetuning (Mattergen XRD + COD XRD nohc) but no XRD information fed during inference
> **Note**: replaced the condition_vector column in the prompt df made above with a series of [-100] missing values meaning no XRD information is fed during generation

In [10]:
import __init__

In [16]:
!python _utils/_generating/generate_CIFs.py --config '_config_files/generation/conditional/xrd_studies/cod-xrd-uncond_eval.jsonc'

Environment info
Available GPUs: 2
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4

Generation settings
Total sequences per prompt-condition pair: 20
Will save generated CIFs to _artifacts/cod-xrd/cod-ft-uncond-20perp-test_gen.parquet
Model's max_length: 1024
Tokenizer validation passed: token vocabulary is consistent.
Generation kwargs: {'max_length': 1024, 'pad_token_id': 371, 'eos_token_id': 373, 'renormalize_logits': True, 'remove_invalid_values': True, 'num_return_sequences': 20, 'do_sample': True, 'top_k': 10, 'top_p': 0.95, 'temperature': 0.75}

Generation Strategy
Number of condition-prompt pairs: 198
Target valid CIFs per prompt: 20
Will save all CIFs ranked by LOGP score (up to 20 per prompt)
Tokenizer validation passed: token vocabulary is consistent.
Generating CIFs...:   0%|                              | 0/3960 [00:00<?, ?it/s]Tokenizer validation passed: token vocabulary is consistent.
Tokenizer validation passed: token vocabulary is consistent.
Generating CIFs...:   0%|              

In [17]:
!python _utils/_generating/postprocess.py \
    --input_parquet '_artifacts/cod-xrd/cod-ft-uncond-20perp-test_gen.parquet' \
    --output_parquet '_artifacts/cod-xrd/cod-ft-uncond-20perp-test_post.parquet' \
    --num_workers 32 \
    --column_name 'Generated CIF'

Processing 3900 records using 32 worker(s)
Processing Generated CIF: 100%|███████████| 3900/3900 [00:03<00:00, 1124.07it/s]
Multi-worker processing completed: 3900 records


In [18]:
!python _utils/_metrics/XRD_metrics.py \
    --input_parquet '_artifacts/cod-xrd/cod-ft-uncond-20perp-test_post.parquet' \
    --num_gens 20 \
    --ref_parquet '_artifacts/cod-xrd/cod-test_ref.parquet' \
    --output_parquet '_artifacts/cod-xrd/cod-ft-uncond-20perp-test_metrics.parquet' \
    --num_workers 32 \
    --validity_check "none"

Using 20 generation(s) per compound
Using 32 workers for parallel processing (based on input size)
Loaded 195 materials from _artifacts/cod-xrd/cod-ft-uncond-20perp-test_post.parquet
Using 195 matched materials from test DB
Parsing true CIFs: 100%|█████████████████████| 195/195 [00:01<00:00, 173.24it/s]
Processing 3900 CIFs across 195 materials
Parsing and sensible check for gen CIFs: 100%|█| 3900/3900 [00:03<00:00, 1263.40
Materials processed: 195
Materials with sensible structures: 195
Comparing structures: 100%|███████████████████| 195/195 [00:13<00:00, 14.42it/s]

Results saved to: _artifacts/cod-xrd/cod-ft-uncond-20perp-test_metrics.parquet

Metrics:
  match_rate: 0.4256
  rms_dist: 0.0987
  n_matched: 83.0000
  a_diff: 1.0799
  b_diff: 0.8767
  c_diff: 1.4244


In [19]:
!python _utils/_metrics/XRD_metrics.py \
    --input_parquet '_artifacts/cod-xrd/cod-ft-uncond-20perp-test_post.parquet' \
    --num_gens 1 \
    --ref_parquet '_artifacts/cod-xrd/cod-test_ref.parquet' \
    --output_parquet '_artifacts/cod-xrd/cod-ft-uncond-1perp-test_metrics.parquet' \
    --num_workers 32 \
    --validity_check "none"

Using 1 generation(s) per compound
Using 32 workers for parallel processing (based on input size)
Using rank=1 rows for num_gens=1 (rank column detected)
Loaded 195 materials from _artifacts/cod-xrd/cod-ft-uncond-20perp-test_post.parquet
Using 195 matched materials from test DB
Parsing true CIFs: 100%|█████████████████████| 195/195 [00:01<00:00, 174.72it/s]
Processing 195 CIFs across 195 materials
Parsing and sensible check for gen CIFs: 100%|█| 195/195 [00:00<00:00, 518.39it/
Materials processed: 195
Materials with sensible structures: 195
Comparing structures: 100%|██████████████████| 195/195 [00:01<00:00, 151.49it/s]

Results saved to: _artifacts/cod-xrd/cod-ft-uncond-1perp-test_metrics.parquet

Metrics:
  match_rate: 0.2872
  rms_dist: 0.0702
  n_matched: 56.0000
  a_diff: 2.2329
  b_diff: 1.9798
  c_diff: 3.0517


In [1]:
metrics_1perp_parquet = '_artifacts/cod-xrd/cod-ft-1perp-test_metrics.parquet'
metrics_20perp_parquet = '_artifacts/cod-xrd/cod-ft-20perp-test_metrics.parquet'
metrics_uncond_1perp_parquet = '_artifacts/cod-xrd/cod-ft-uncond-1perp-test_metrics.parquet'
metrics_uncond_20perp_parquet = '_artifacts/cod-xrd/cod-ft-uncond-20perp-test_metrics.parquet'

# make a table with all the results
import __init__
from _utils import get_metrics_xrd
import pandas as pd
import numpy as np

paths = {
    'cond-20perp': metrics_20perp_parquet,
    'cond-1perp': metrics_1perp_parquet,
    'uncond-20perp': metrics_uncond_20perp_parquet,
    'uncond-1perp': metrics_uncond_1perp_parquet,
}
results = {}

for names, path in paths.items():
    df = pd.read_parquet(path)
    metrics_result = get_metrics_xrd(df, n_test=198, only_matched=False, verbose=False)
    results[names] = metrics_result
        
# Create final table with all results
final_table = pd.DataFrame.from_dict(results, orient='index')
final_table

Navigated to package root: /home/cyprien/CrystaLLMv2_PKV
Added package root to Python path


Unnamed: 0,Number of matched structures,Total number of structures,Mean RMS-d,Percent Matched (%),a MAE,b MAE,c MAE,Volume MAE,a R^2,b R^2,c R^2,Volume R^2,Average Score
cond-20perp,89,198,0.109084,44.949495,0.884482,0.607716,0.935713,54.331123,0.865129,0.914586,0.930499,0.979238,1.276041
cond-1perp,61,198,0.077264,30.808081,2.118677,1.99252,2.915536,80.446039,0.548865,0.437087,0.429849,0.950824,1.257517
uncond-20perp,83,198,0.098674,41.919192,1.07987,0.876746,1.424404,66.146401,0.831813,0.864943,0.786621,0.972178,1.236938
uncond-1perp,56,198,0.07021,28.282828,2.232943,1.979762,3.051658,105.704272,0.537775,0.486227,0.518912,0.937082,1.219801


### Testing on some real data

- Had the chance to get given some XRD data calculated by a group. It was calculated for brookite, anatase and rutiile poolymorphs of TiO2
- Anatase and rutile were seen during training and finetuning (in pretrain data and the mattergen xrd 1st pass finetune dataset), brookite was not
- Can the model generate the correct structures for experimental XRDs for materials seen in training, and one unseen?

1. First we make a dataset with the true structures as per their materials project structures
2. To this we add a prompt for each of the structures
3. And a condition vector as per below

In [28]:
import __init__
from _utils import process_xrd_to_condition_vector

anatase_raw_data = """2θ [°] Cu	Intensity
25.2280719392351	281.55012
30.7984477760649	148.62471
36.4922566866002	122.62704
37.6908921523268	119.93007
41.9139352787292	114.27506
48.0759377770583	93.23776
55.0743148175043	135.06294
62.592362748181	    81.58042
65.9512366402823	78.86014
77.6190112038205	75.32634"""

rutile_raw_data = """2θ [°] Cu	Intensity
23.4685203891387	203.0
27.4456323189006	922.0
30.8154418109335	163.0
36.1036065436626	473.0
39.224627779168	    112.0
41.2593176108602	270.0
44.0079436046877	133.0
46.2453487299531	109.0
54.3526968350901	450.0
56.6773553113041	186.0
62.8816130603703	127.0
64.1161640720417	117.0
69.0374099680916	164.0
69.9017250464323	127.0
82.4052975899525	90.0"""

brookite_raw_data = """2θ [°] Cu	Intensity
21.7068951575405	197.0
25.4158554191106	615.0
30.8494290781386	362.0
36.357093687628	159.0
37.387173876481	130.0
40.2006181429811	139.0
42.4506080337455	123.0
46.2786823244331	116.0
48.1590101936239	180.0
49.2875816475943	124.0
54.3691099189334	142.0
55.3364355858856	166.0
57.3462424126941	95.0
62.2385043787467	93.0
63.7639034593297	107.0
65.1227905836474	107.0
68.8799626318172	80.0
84.4701610611719	103.0"""


# Test the function
anatase = process_xrd_to_condition_vector(anatase_raw_data)
rutile = process_xrd_to_condition_vector(rutile_raw_data)
brookite = process_xrd_to_condition_vector(brookite_raw_data)

print(anatase)
print(rutile)
print(brookite)

Theta scaled to [0,1] (0 to 90), Intensity scaled to [0,1] (relative to max in pattern), -100 for padding
Theta scaled to [0,1] (0 to 90), Intensity scaled to [0,1] (relative to max in pattern), -100 for padding
Theta scaled to [0,1] (0 to 90), Intensity scaled to [0,1] (relative to max in pattern), -100 for padding
0.28,0.342,0.612,0.405,0.419,0.466,0.534,0.695,0.733,0.862,-100,-100,-100,-100,-100,-100,-100,-100,-100,-100,1.0,0.528,0.48,0.436,0.426,0.406,0.331,0.29,0.28,0.268,-100,-100,-100,-100,-100,-100,-100,-100,-100,-100
0.305,0.401,0.604,0.458,0.261,0.63,0.767,0.342,0.489,0.699,0.777,0.712,0.436,0.514,0.916,-100,-100,-100,-100,-100,1.0,0.513,0.488,0.293,0.22,0.202,0.178,0.177,0.144,0.138,0.138,0.127,0.121,0.118,0.098,-100,-100,-100,-100,-100
0.282,0.343,0.241,0.535,0.615,0.404,0.604,0.447,0.415,0.548,0.472,0.514,0.708,0.724,0.939,0.637,0.692,0.765,-100,-100,1.0,0.589,0.32,0.293,0.27,0.259,0.231,0.226,0.211,0.202,0.2,0.189,0.174,0.174,0.167,0.154,0.151,0.13,-100,-100


In [29]:
import pandas as pd
df = pd.read_parquet('_artifacts/cod-xrd/amil/amil-TiO2_ref_prompts.parquet')
df.head()

Unnamed: 0,MP_ehull,CIF,is_in_train,Material ID,name,Prompt,condition_vector
0,0.043639,# generated using pymatgen\ndata_TiO2\n_symmet...,True,mp-2657,Rutile (P4_2/mnm),<bos>\ndata_[Ti2O4]\n,"0.305,0.401,0.604,0.458,0.261,0.63,0.767,0.342..."
1,0.0,# generated using pymatgen\ndata_TiO2\n_symmet...,True,mp-390,Anatase (I4_1/amd),<bos>\ndata_[Ti4O8]\n,"0.28,0.342,0.612,0.405,0.419,0.466,0.534,0.695..."
2,0.003041,# generated using pymatgen\ndata_TiO2\n_symmet...,False,mp-1840,Brookite (Pbca),<bos>\ndata_[Ti8O16]\n,"0.282,0.343,0.241,0.535,0.615,0.404,0.604,0.44..."


> Note: we can use this as both prompt and ref dfs because it contains all the relevant columns

In [35]:
!python _utils/_generating/generate_CIFs.py \
    --config '_config_files/generation/conditional/xrd_studies/cod-amil-xrd_eval.jsonc'

Environment info
Available GPUs: 2
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4

Generation settings
Total sequences per prompt-condition pair: 20
Will save generated CIFs to _artifacts/cod-xrd/amil/amil-TiO2_gen.parquet
Model's max_length: 1024
Tokenizer validation passed: token vocabulary is consistent.
Generation kwargs: {'max_length': 1024, 'pad_token_id': 371, 'eos_token_id': 373, 'renormalize_logits': True, 'remove_invalid_values': True, 'num_return_sequences': 20, 'do_sample': True, 'top_k': 10, 'top_p': 0.95, 'temperature': 0.75}

Generation Strategy
Number of condition-prompt pairs: 3
Target valid CIFs per prompt: 20
Will save all CIFs ranked by LOGP score (up to 20 per prompt)
Tokenizer validation passed: token vocabulary is consistent.
Generating CIFs...:   0%|                                | 0/60 [00:00<?, ?it/s]Tokenizer validation passed: token vocabulary is consistent.
Tokenizer validation passed: token vocabulary is consistent.
Generating CIFs...:  22%|████▉                  | 13

In [36]:
!python _utils/_generating/postprocess.py \
    --input_parquet '_artifacts/cod-xrd/amil/amil-TiO2_gen.parquet' \
    --output_parquet '_artifacts/cod-xrd/amil/amil-TiO2_gen.parquet' \
    --num_workers 32 \
    --column_name 'Generated CIF'

Processing 60 records using 32 worker(s)
Processing Generated CIF: 100%|███████████████| 60/60 [00:00<00:00, 1814.75it/s]
Multi-worker processing completed: 60 records


Did we recover structures in the 20 generations?

In [37]:
!python _utils/_metrics/XRD_metrics.py \
    --input_parquet '_artifacts/cod-xrd/amil/amil-TiO2_gen.parquet' \
    --num_gens 20 \
    --ref_parquet '_artifacts/cod-xrd/amil/amil-TiO2_ref_prompts.parquet' \
    --output_parquet '_artifacts/cod-xrd/amil/amil-TiO2_metrics.parquet' \
    --num_workers 4 \
    --validity_check "none"

Using 20 generation(s) per compound
Using 32 workers for parallel processing (based on input size)
Loaded 3 materials from _artifacts/cod-xrd/amil/amil-TiO2_gen.parquet
Using 3 matched materials from test DB
Parsing true CIFs: 100%|█████████████████████████| 3/3 [00:00<00:00, 357.10it/s]
Processing 60 CIFs across 3 materials
Parsing and sensible check for gen CIFs: 100%|█| 60/60 [00:00<00:00, 252.37it/s]
Materials processed: 3
Materials with sensible structures: 3
Comparing structures: 100%|███████████████████████| 3/3 [00:00<00:00,  6.38it/s]

Results saved to: _artifacts/cod-xrd/amil/amil-TiO2_metrics.parquet

Metrics:
  match_rate: 1.0000
  rms_dist: 0.1613
  n_matched: 3.0000
  a_diff: 0.1962
  b_diff: 0.4166
  c_diff: 0.8394


In [4]:
import __init__
import pandas as pd

df = pd.read_parquet('_artifacts/cod-xrd/amil/amil-TiO2_gen.parquet')
df_ref = pd.read_parquet('_artifacts/cod-xrd/amil/amil-TiO2_ref_prompts.parquet')
df_metrics = pd.read_parquet('_artifacts/cod-xrd/amil/amil-TiO2_metrics.parquet')
