## get list of CoRE MOF - CSD - unmodified

In [14]:
import json

In [16]:
with open("./list_coremof_csd_unmodified_20250227.json", "r") as f:
    csd_unmodified = json.load(f)

### there are two subset (CR & NCR)

```python
 CSD Unmodified Dataset  #  (N = 12,261)
    │
    ├── CR               # computation-ready (N = 4,703)
    │   │
    │   ├── ASR          # all solvent removed (N = 1,894)
    │   ├── FSR          # free solvent removed (N = 2,657)
    │   └── Ion          # with ion (N = 152)
    │
    └── NCR              # not computation-ready (N = 7,558)


In [36]:
csd_unmodified.keys()

dict_keys(['CR', 'NCR'])

In [38]:
csd_unmodified["CR"].keys()

dict_keys(['ASR', 'FSR', 'ION'])

**Note that the refcode in the CoRE MOF DB has an additional string ending such as “_ASR_pacman”.**

**For CR subset, there are CoRE MOF ID and REFCODE.**
**For NCR subset, there is only REFCODE.**

In [60]:
# example for CR
print(csd_unmodified["CR"]["ASR"][1][0])
print(csd_unmodified["CR"]["ASR"][1][1])

ABUXUT_ASR_pacman
2016[Cu][nan]3[ASR]1


In [64]:
# example for NCR
print(csd_unmodified["NCR"][0])

ABECIX_FSR_pacman


## download original CIFs from CSD
you need install [*CSD python API*](https://downloads.ccdc.cam.ac.uk/documentation/API/installation_notes.html) and activate the licence first.

and install [CoREMOF_tools](https://coremof-tools.readthedocs.io/en/latest/index.html) by `pip install CoREMOF-tools`

In [70]:
from CoREMOF.structure import download_from_CSD

In [74]:
### download ASR structure

for refcodes in csd_unmodified["CR"]["ASR"][:10]: # test for 10 structures
    refcode = refcodes[0].replace("_ASR_pacman", "")
    download_from_CSD(refcode=refcode, output_folder="./structures/CR/ASR")

In [76]:
### download FSR structure

for refcodes in csd_unmodified["CR"]["FSR"][:10]: # test for 10 structures
    refcode = refcodes[0].replace("_FSR_pacman", "")
    download_from_CSD(refcode=refcode, output_folder="./structures/CR/FSR")

In [78]:
### download Ion structure

for refcodes in csd_unmodified["CR"]["ION"][:10]: # test for 10 structures
    refcode = refcodes[0].replace("_ion_pacman", "")
    download_from_CSD(refcode=refcode, output_folder="./structures/CR/Ion")

In [86]:
### download NCR structure

for refcode in csd_unmodified["NCR"][:10]: # test for 10 structures
    refcode = refcode.split("_")[0]
    download_from_CSD(refcode=refcode, output_folder="./structures/NCR/")

### process the structures

since solvent removal is not required, only make primitive cell and make P1 are needed.

In [110]:
from CoREMOF.structure import make_primitive_p1

In [116]:
structure_pri = make_primitive_p1(filename="./structures/CR/ASR/"+csd_unmodified["CR"]["ASR"][0][0].split("_")[0]+".cif") # ABAVOP 

predict partial atom charge by [PACMAN Charge](https://pubs.acs.org/doi/10.1021/acs.jctc.4c00434)
install by `pip install pip install PACMAN-charge`

In [18]:
from PACMANCharge import pmcharge
pmcharge.predict(cif_file="./structures/CR/ASR/"+csd_unmodified["CR"]["ASR"][0][0].split("_")[0]+".cif", # ABAVOP 
                 charge_type="DDEC6",
                 digits=10,
                 atom_type=True,neutral=True,
                 keep_connect=False)

CIF Name: ./structures/CR/ASR/ABAVOP.cif
Charge Type: DDEC6
Digits: 10
Atom Type: True
Neutral: True
Keep Connect: False
Compelete and save as ./structures/CR/ASR/ABAVOP_pacman.cif


**if you want to use CoRE MOF ID, you can change REFCODE to CoRE MOF ID**

In [22]:
import os

In [26]:
os.rename("./structures/CR/ASR/"+csd_unmodified["CR"]["ASR"][0][0].split("_")[0]+"_pacman.cif", "./structures/CR/ASR/" + csd_unmodified["CR"]["ASR"][0][1]+".cif")  
# ABAVOP_pacman.cif -> 2004[Co][rtl]3[ASR]2.cif