In [1]:
import eunomia
from eunomia.agents import Eunomia
from eunomia.tools import EunomiaTools

In [2]:
import logging
logging.getLogger().setLevel(logging.CRITICAL)
import warnings

# Disable all warning messages
warnings.filterwarnings("ignore")

# Obtaining Information from Online Datasets

## Get chemical structures (CIF files) from Crystallography Open Database (COD)

In [3]:
tools = EunomiaTools(tool_names = ['get_cif_from_COD', 'rename_cif']).get_tools()
agent = Eunomia(tools=tools, model='gpt-4', get_cost=True)
agent.run(prompt="Download cif files for this doi: 10.1021/cg301691d from COD and rename them.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to first download the CIF files from the Crystallography Open Database (COD) using the given DOI. Then, I will rename the files using the MOF-Name associated with a target key from each CIF file.
Action: get_cif_from_COD
Action Input: 10.1021/cg301691d[0m
Observation: [36;1m[1;3m['4507517.cif', '4507518.cif', '4507519.cif', '4507520.cif', '4507521.cif', '4507522.cif'][0m
Thought:[32;1m[1;3mNow that I have the CIF files, I need to rename each of them using the MOF-Name associated with a target key from each CIF file.
Action: rename_cif
Action Input: '4507517.cif'[0m
Observation: [33;1m[1;3mNone[0m
Thought:[32;1m[1;3mI have renamed the first file. Now I need to rename the rest of the files.
Action: rename_cif
Action Input: '4507518.cif'[0m
Observation: [33;1m[1;3mNone[0m
Thought:[32;1m[1;3mI have renamed the second file. Now I need to rename the rest of the files.
Action: rename_cif
Action Input: '45075

'All CIF files have been downloaded and renamed.'

## Get chemical structures (CIF files) from Cambridge Structural Database (CCDC)

In [4]:
tools = EunomiaTools(tool_names = ['get_cif_from_CCDC']).get_tools()
agent = Eunomia(tools=tools, model='gpt-4', get_cost=True)
agent.run(prompt="Download cif files for this doi: 10.1021/cg301691d from CCDC.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to use the tool get_cif_from_CCDC to download the CIF file for the given DOI.
Action: get_cif_from_CCDC
Action Input: 10.1021/cg301691d[0m
Observation: [36;1m[1;3mNone[0m
Thought:[32;1m[1;3mThe CIF file for the DOI 10.1021/cg301691d has been downloaded.
Final Answer: The CIF file for the DOI 10.1021/cg301691d has been downloaded.[0m

[1m> Finished chain.[0m
Tokens Used: 484
	Prompt Tokens: 397
	Completion Tokens: 87
Successful Requests: 2
Total Cost (USD): $0.01713


'The CIF file for the DOI 10.1021/cg301691d has been downloaded.'

# Obtaining Information from Local PDFs

In [10]:
paper_id = 69
paper_file =f'../Data/chosen_papers/{paper_id}.pdf'
docs_processor = eunomia.LoadDoc(file_name=paper_file, encoding="utf8")
sliced_pages = docs_processor.process(['references ', 'acknowledgement', 'acknowledgments', 'references\n'],
                                             chunk_size=1000, chunk_overlap=20, chunking_type='fixed-size')

In [11]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

Embedding_model = 'text-embedding-ada-002' # text-embedding-ada-002 text-davinci-003
faiss_index_path = f"../Data/chosen_papers/faiss/{Embedding_model}/faiss_index_{Embedding_model}_{paper_id}.pkl"
faiss_index = FAISS.from_documents(sliced_pages, OpenAIEmbeddings(model=Embedding_model))

In [14]:
tools = EunomiaTools(tool_names = ['read_doc', 'eval_justification', 'recheck_justification'], vectorstore=faiss_index).get_tools()
agent = Eunomia(tools=tools, model='gpt-4', get_cost=True)
agent.run(prompt="Read the document and find the MOFS and their water stability.\
            \Check your answers and if the reasoning is not valid, try to find better reasons.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to first read the document to identify the MOFs and their water stability.
Action: read_doc
Action Input: User's original prompt[0m
k=9 results hitting the token limit for the chosen LLM. Reducing k and retrying...

k=8 results hitting the token limit for the chosen LLM. Reducing k and retrying...

Observation: [36;1m[1;3m1. MOF: [ZrO(O2C-R-CO2)] (R =C6H4(MIL-140A), C10H6(B), C12H8(C), C12N2H6Cl2(D))
   Water Stability: Stable
   Probability Score: 0.9
   Justification: "The MIL-140 solids crystallize in the space group(SG) C2/c(No. 15) (MIL-140A and C) or Cc (No. 9) (MIL-140B and D; Supporting Information, Table S3). These solids all have the same cparameter (ca. 7.8 /C138), consistent with the inorganic subunit of complex zirconium oxide chains, ori- ented along the caxis (Figure 1), connected to six other chains through the dicarboxylate linkers. This delimits triangular channels along the caxis." and "The MIL-14

'The water stability of the MOFs are as follows: \n1. MOF: [ZrO(O2C-R-CO2)] (R =C6H4(MIL-140A), C10H6(B), C12H8(C), C12N2H6Cl2(D)) is stable. The justification is that the MOF exhibits almost no loss of free water and has a hydrophobic character.\n2. MOF: UiO-66(Zr) is stable. The justification is that the MOF remains unaltered upon water adsorption/desorption cycles.\n3. The water stability of MOF: [Zr6O4(OH)4(O2C-R-CO2)6] is not provided in the document.'