### Case I: Protein design and analysis

In [None]:
import os
from dotenv import load_dotenv

PROJECT_ROOT = os.getcwd()
load_dotenv(os.path.join(PROJECT_ROOT, 'example.env'), override=True)

In [2]:
from scripts.custom_kg_retrievers import KGTableRetriever
from scripts.retrieve_tool_info import execute_tool, check_toxicity
from scripts.utils import load_index_from_storage_dir, construct_question
from app.llms.planning import generate_tools_plan, generate_plan_input, generate_output
from app.llms.extraction import extract_tool_path, extract_answer_parts
from app.tools.safety.toxicity_checker import check_toxicity_by_type
from app.tools.safety import extract_smiles_or_sequences
from app.tools import run_tool
from app.core.config import Config

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core import QueryBundle
from llama_index.embeddings.openai import OpenAIEmbedding

config = Config()
Settings.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL_NAME)
Settings.llm = OpenAI(temperature=0, model=config.MODEL_NAME)




In [None]:
question = "Design an alpha-structure protein consisting of 200 amino acids and generate its corresponding sequence. Then, convert this sequence into a 3D structure, producing a PDB file and calculating the residue accuracy (e.g., pLDDT values). Analyze the secondary structure composition of the protein using the generated PDB file. Next, calculate both the unfolding force and energy from the protein sequence. Finally, Calculates the Anisotropic Network Model to analyze the first 10 vibrational frequencies and dynamic motions of the protein."
is_add_filename = False
file_path_list = []

question = construct_question(question, file_path_list, is_add_filename)
storage_context, kg_index = load_index_from_storage_dir(config.PERSIST_DIR)

INFO:llama_index.core.indices.loading:Loading all indices.
Knowledge Graph Index loaded successfully from KG/storage_graph_large


In [4]:
kg_table_retriever = KGTableRetriever(
    index=kg_index, 
    similarity_top_k=6, 
    graph_store_query_depth=3, 
    max_tools=10
)        
query_bundle = QueryBundle(query_str=question)
tools_retrieved = await kg_table_retriever._retrieve(query_bundle)

INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
['

In [None]:
responses_content = ""
total_output = ""

tools_plan = await generate_tools_plan(
    question=question, 
    tools_retrieved_list=tools_retrieved, 
    model=config.MODEL_NAME, 
    previous_answer=total_output, 
)
tools_path = extract_tool_path(tools_plan)
print(tools_path)

INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
['DesignProteinAlpha', 'SequenceToPdb', 'AnalyzeProteinStructure', 'CalculateForceEnergyFromSequence', 'CalculateProteinANM']


In [9]:
responses_content = ""
total_output = ""

In [6]:
for tool_name in tools_path:
    print(f"Executing tool: {tool_name}")
    current_output = await execute_tool(
        tool_name, 
        question, 
        question, 
        file_path_list, 
        config.MODEL_NAME, 
        kg_table_retriever, 
        total_output
        )
    print(f"current_output: {current_output}")
    total_output += current_output
    responses_content += current_output

Executing tool: DesignProteinAlpha
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
input_str: 200
current_output: 

Now we have run the **DesignProteinAlpha** and obtained result
 ## Protein Design Result
- **Protein Filename**: `alpha_protein`
- **CATH Annotation**: `alpha`
- **Sequence Length**: `200`

**Protein Sequence**:  
`SEEEKIKEIREKLKDAVLEAYELFRQGRPQEGEQRLIEAMKEAAKELNVKLSDEEIQREATRLLVEAFREQAKKEGIDEDLAVIAFAELMRALGDTSPLAQEVLREAEELRQKDVLGNKIKEIIEMAKKGNYDEAEKKFKQLKETVDKLIEELRRAADLYASRVSPEEAAAFRQRIDTLLSEMKEKVEKAENEIKKIKEK`.
Executing tool: SequenceToPdb
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
input_str: SEEEKIKEIREKLKDAVLEAYELFRQGRPQEGEQRLIEAMKEAAKELNVKLSDEEIQREATRLLVEAFREQAKKEGIDEDLAVIAFAELMRALGDTSPLAQEVLREAEELRQKDVLGNKIKEIIEMAKKGNYDEAEKKFKQLKETVDKLIEELRRAADLYASRVSPEEAAAFRQRIDTLLSEMKEKVEKAENEIKKIKEK
current_output: 

Now we have run the **SequenceToPdb** and obtained result
 The fil

In [7]:
answer_text = await generate_output(question, total_output, model=config.MODEL_NAME)
is_completed, current_final_answer = extract_answer_parts(answer_text)
responses_content += f"\nTask completion status: {is_completed}\nFinal Answer: {current_final_answer}"


INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"


In [8]:
print(is_completed)
print(current_final_answer)

Completed
Yes, your problem has been solved.

- **Protein Design Result**:
  - **Protein Filename**: `alpha_protein`
  - **CATH Annotation**: `alpha`
  - **Sequence Length**: `200`
  - **Protein Sequence**: `SEEEKIKEIREKLKDAVLEAYELFRQGRPQEGEQRLIEAMKEAAKELNVKLSDEEIQREATRLLVEAFREQAKKEGIDEDLAVIAFAELMRALGDTSPLAQEVLREAEELRQKDVLGNKIKEIIEMAKKGNYDEAEKKFKQLKETVDKLIEELRRAADLYASRVSPEEAAAFRQRIDTLLSEMKEKVEKAENEIKKIKEK`

- **PDB File**:
  - The file has been stored with the name `alpha_protein.pdb`
  - **plddt Value**: `78.67665230769232`

- **Secondary Structure Composition**:
  - **H (Helix)**: `85.5%`
  - **B (Isolated beta-bridge)**: `0.0%`
  - **E (Extended strand)**: `0.0%`
  - **G (3-helix (310 helix))**: `0.0%`
  - **I (5-helix (pi helix))**: `0.0%`
  - **T (Turn)**: `7.0%`
  - **S (Bend)**: `1.5%`
  - **P (Polyproline II helix)**: `0.0%`
  - **- (Coil)**: `6.0%`

- **Unfolding Force and Energy**:
  - **Unfolding Force**: `0.379`
  - **Energy**: `0.371`

- **ANM Calculation Results**:
  - **

***

### Case II: Chemical reactivity prediction and analysis

In [None]:
import os
from dotenv import load_dotenv

PROJECT_ROOT = os.getcwd()

load_dotenv(os.path.join(PROJECT_ROOT, 'example.env'), override=True)

PERSIST_DIR = os.path.join(PROJECT_ROOT, os.getenv("PERSIST_DIR"))
DATA_FILE_PATH = os.path.join(PROJECT_ROOT, os.getenv("DATA_FILE_PATH"))

print("PROJECT_ROOT:", PROJECT_ROOT)
print("PERSIST_DIR:", PERSIST_DIR)
print("DATA_FILE_PATH:", DATA_FILE_PATH)


PROJECT_ROOT: /home/hjj/project/SciToolAgent
PERSIST_DIR: /home/hjj/project/SciToolAgent/KG/storage_graph_large
DATA_FILE_PATH: /home/hjj/project/SciToolAgent/data/toolsKG_full.xlsx


In [2]:
import asyncio
import logging
import os
import pdb
import sys

from scripts.custom_kg_retrievers import KGTableRetriever
from scripts.retrieve_tool_info import execute_tool, check_toxicity
from scripts.utils import load_index_from_storage_dir, construct_question
from app.llms.planning import generate_tools_plan, generate_plan_input, generate_output
from app.llms.extraction import extract_tool_path, extract_answer_parts
from app.tools.safety.toxicity_checker import check_toxicity_by_type
from app.tools.safety import extract_smiles_or_sequences
from app.tools import run_tool
from app.core.config import Config

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core import QueryBundle
from llama_index.embeddings.openai import OpenAIEmbedding

# print()
config = Config()
Settings.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL_NAME)
Settings.llm = OpenAI(temperature=0, model=config.MODEL_NAME)




In [3]:
question = "Please predict the reactivity of chemical reactants, provided in SMILES format in Test.csv. You can try different molecular features, such as RDK fingerprints, Morgan fingerprints and electrical descriptors, and use a machine learning algorithm (such as MLP) to predict reactivity. Tell me which feature yields the best predictive performance."

is_add_filename = True
file_path_list = ['/home/hjj/project/ToolsKG/TempFiles/ChemLab/demo_test.csv']

question = construct_question(question, file_path_list, is_add_filename)
print(config.PERSIST_DIR)
storage_context, kg_index = load_index_from_storage_dir(config.PERSIST_DIR)

KG/storage_graph_large
INFO:llama_index.core.indices.loading:Loading all indices.
Knowledge Graph Index loaded successfully from KG/storage_graph_large


In [4]:
responses_content = ""
total_output = ""

kg_table_retriever = KGTableRetriever(
    index=kg_index, 
    similarity_top_k=5, 
    graph_store_query_depth=3, 
    max_tools=10
)        
query_bundle = QueryBundle(query_str=question)
tools_retrieved = await kg_table_retriever._retrieve(query_bundle)

INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embedding

In [None]:
tools_plan = await generate_tools_plan(
    question=question, 
    tools_retrieved_list=tools_retrieved, 
    model=config.MODEL_NAME, 
    previous_answer=total_output, 
)
tools_path = extract_tool_path(tools_plan)
print(tools_path)

INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
['GenerateRDKFingerprintsFromCSV', 'GenerateMorganfingerprintsFromCSV', 'GenerateElectricalDescriptorsFromCSV', 'MLPClassifier', 'MLPClassifier', 'MLPClassifier']


In [11]:
for tool_name in tools_path:
    print(f"Executing tool: {tool_name}")
    current_output = await execute_tool(
        tool_name, 
        question, 
        question, 
        file_path_list, 
        config.MODEL_NAME, 
        kg_table_retriever, 
        total_output
        )
    print(f"\ncurrent_output: {current_output}\n")
    total_output += current_output
    responses_content += current_output

Executing tool: GenerateRDKFingerprintsFromCSV
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
input_str: demo_test.csv

current_output: Now we have run the GenerateRDKFingerprintsFromCSV and obtained result 
smiles in the demo_test.csv file have been processed, and the fingerprints features are saved to the path ToolsKG/TempFiles/ChemLab/demo_test_RDK_fingerprints.csv
.

Executing tool: GenerateMorganfingerprintsFromCSV
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
input_str: demo_test.csv

current_output: Now we have run the GenerateMorganfingerprintsFromCSV and obtained result 
SMILES strings in the demo_test.csv file have been processed, and the morgan fingerprints features are saved to the path ToolsKG/TempFiles/ChemLab/demo_test_Morgan_fingerprints.csv
.

Executing tool: GenerateElectricalDescriptorsFromCSV
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions

In [None]:
answer_text = await generate_output(question, total_output, model=config.MODEL_NAME)
is_completed, current_final_answer = extract_answer_parts(answer_text)
responses_content += f"\nTask completion status: {is_completed}\nFinal Answer: {current_final_answer}"

INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"


'Completed'

In [15]:
print(is_completed)
print(current_final_answer)

Completed
The problem has been solved. The best predictive performance was achieved using electrical descriptors, with a test accuracy of 0.9091.


In [16]:
question = "Based on the results that electrical descriptors perform best for predicting reactivity, I would like to compare the performance of different machine learning algorithms. Please use multiple algorithms (MLP, AdaBoost, and Random Forest) to predict reactivity and identify which algorithm yields the best accuracy."
is_add_filename = True
file_path_list = ['/home/hjj/project/ToolsKG/TempFiles/ChemLab/demo_test.csv']

question = construct_question(question, file_path_list, is_add_filename)

In [17]:
responses_content = ""
total_output = ""

kg_table_retriever = KGTableRetriever(
    index=kg_index, 
    similarity_top_k=5, 
    graph_store_query_depth=3, 
    max_tools=10
)        
query_bundle = QueryBundle(query_str=question)
tools_retrieved = await kg_table_retriever._retrieve(query_bundle)

INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embedding

In [None]:
tools_plan = await generate_tools_plan(
    question=question, 
    tools_retrieved_list=tools_retrieved, 
    model=config.MODEL_NAME, 
    previous_answer=total_output, 
)
tools_path = extract_tool_path(tools_plan)
print(tools_path)

INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
['GenerateElectricalDescriptorsFromCSV', 'MLPClassifier', 'AdaBoostClassifier', 'RandomForestClassifier']


In [20]:
for tool_name in tools_path:
    print(f"Executing tool: {tool_name}")
    current_output = await execute_tool(
        tool_name, 
        question, 
        question, 
        file_path_list, 
        config.MODEL_NAME, 
        kg_table_retriever, 
        total_output
        )
    print(f"\ncurrent_output: {current_output}\n")
    total_output += current_output
    responses_content += current_output

Executing tool: GenerateElectricalDescriptorsFromCSV
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
input_str: demo_test.csv

current_output: Now we have run the GenerateElectricalDescriptorsFromCSV and obtained result Processed SMILES strings and saved electrical descriptors to ToolsKG/TempFiles/ChemLab/demo_test_electrical_descriptors.csv.

Executing tool: MLPClassifier
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
input_str: ToolsKG/TempFiles/ChemLab/demo_test_electrical_descriptors.csv

current_output: Now we have run the MLPClassifier and obtained result 
### MLP Classifier Results for Electrical Descriptors

- **Training Accuracy**: 0.7703
- **Test Accuracy**: 0.9545

.

Executing tool: AdaBoostClassifier
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
input_str: ToolsKG/TempFiles/ChemLab/demo_test_electrical_descriptors.csv

current_o

In [21]:
answer_text = await generate_output(question, total_output, model=config.MODEL_NAME)
is_completed, current_final_answer = extract_answer_parts(answer_text)
responses_content += f"\nTask completion status: {is_completed}\nFinal Answer: {current_final_answer}"

INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"


In [22]:
print(is_completed)
print(current_final_answer)

Completed
Yes, the problem has been solved. The performance of three different machine learning algorithms (MLP, AdaBoost, and Random Forest) for predicting reactivity using electrical descriptors has been compared. The test accuracies are as follows:

- MLP Classifier: Test Accuracy = 0.9545
- AdaBoost Classifier: Test Accuracy = 0.8636
- Random Forest Classifier: Test Accuracy = 0.9545

Both MLP and Random Forest classifiers yield the best test accuracy of 0.9545.


***

### Case III: Chemical synthesis and analysis

In [None]:
import os
from dotenv import load_dotenv

PROJECT_ROOT = os.getcwd()

load_dotenv(os.path.join(PROJECT_ROOT, 'example.env'), override=True)

PERSIST_DIR = os.path.join(PROJECT_ROOT, os.getenv("PERSIST_DIR"))
DATA_FILE_PATH = os.path.join(PROJECT_ROOT, os.getenv("DATA_FILE_PATH"))

print("PROJECT_ROOT:", PROJECT_ROOT)
print("PERSIST_DIR:", PERSIST_DIR)
print("DATA_FILE_PATH:", DATA_FILE_PATH)


PROJECT_ROOT: /home/hjj/project/SciToolAgent
PERSIST_DIR: /home/hjj/project/SciToolAgent/KG/storage_graph_large
DATA_FILE_PATH: /home/hjj/project/SciToolAgent/data/toolsKG_full.xlsx


In [2]:
import asyncio
import logging
import os
import pdb
import sys

from scripts.custom_kg_retrievers import KGTableRetriever
from scripts.retrieve_tool_info import execute_tool, check_toxicity
from scripts.utils import load_index_from_storage_dir, construct_question
from app.llms.planning import generate_tools_plan, generate_plan_input, generate_output
from app.llms.extraction import extract_tool_path, extract_answer_parts
from app.tools.safety.toxicity_checker import check_toxicity_by_type
from app.tools.safety import extract_smiles_or_sequences
from app.tools import run_tool
from app.core.config import Config

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core import QueryBundle
from llama_index.embeddings.openai import OpenAIEmbedding

# print()
config = Config()
Settings.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL_NAME)
Settings.llm = OpenAI(temperature=0, model=config.MODEL_NAME)




In [17]:
question = "Please predict the product of the chemical reaction between OC(=O)CCC(=O)O and O=C(C)OC(=O)C, and provide a textual description of the corresponding SELFIES. Moreover, check whether the substance has been patented and if it is explosive through CAS."
is_add_filename = False
file_path_list = []

question = construct_question(question, file_path_list, is_add_filename)

In [18]:
responses_content = ""
total_output = ""

storage_context, kg_index = load_index_from_storage_dir(config.PERSIST_DIR)
kg_table_retriever = KGTableRetriever(
    index=kg_index, 
    similarity_top_k=5, 
    graph_store_query_depth=3, 
    max_tools=10
)        
query_bundle = QueryBundle(query_str=question)
tools_retrieved = await kg_table_retriever._retrieve(query_bundle)

INFO:llama_index.core.indices.loading:Loading all indices.


Knowledge Graph Index loaded successfully from KG/storage_graph_large
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/embeddings "HTTP/1.1 200 OK"
['CheckExplosiveness', 'TexToMoleculeSELFIES', 'RXNRetrosynthetic', 'GenerateMoleculeDescription', 'RXNPredict', 'SMILEStoSELFIES', 'Convert3DMolecules2SMILES', 'InChIKeyToSMILES', 'SELFIEStoSMILES', 'CheckPatent']


In [None]:
tools_plan = await generate_tools_plan(
    question=question, 
    tools_retrieved_list=tools_retrieved, 
    model=config.MODEL_NAME, 
    previous_answer=total_output, 
)
tools_path = extract_tool_path(tools_plan)
print(tools_path)

Request URL: https://api.shubiaobiao.com/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
['RXNPredict', 'SMILEStoSELFIES', 'GenerateMoleculeDescription', 'CheckPatent', 'CheckExplosiveness']


In [20]:
for tool_name in tools_path:
    print(f"Executing tool: {tool_name}")
    current_output = await execute_tool(
        tool_name, 
        question, 
        question, 
        file_path_list, 
        config.MODEL_NAME, 
        kg_table_retriever, 
        total_output
        )
    print(f"current_output: {current_output}")
    total_output += current_output
    responses_content += current_output

Executing tool: RXNPredict
Request URL: https://api.shubiaobiao.com/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
input_str: OC(=O)CCC(=O)O.O=C(C)OC(=O)C
current_output: Now we have run the RXNPredict and obtained result 
### Reaction Prediction

#### Reactants

- **Reactants**: `OC(=O)CCC(=O)O.O=C(C)OC(=O)C`

#### Predicted Product

- **Product**: `O=C1CCC(=O)O1`

#### More Information

- [IBM RXN for Chemistry](https://rxn.res.ibm.com)
.
Executing tool: SMILEStoSELFIES
Request URL: https://api.shubiaobiao.com/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"
input_str: O=C1CCC(=O)O1
current_output: Now we have run the SMILEStoSELFIES and obtained result 
**SMILES to SELFIES**
Input: O=C1CCC(=O)O1

**Reuslt**
[O][=C][C][C][C][=Branch1][C][=O][O][Ring1][=Branch1]
.
Executing tool: GenerateMoleculeDescription
Request URL: https://api.shubiaobiao.com/v1/

In [21]:
answer_text = await generate_output(question, total_output, model=config.MODEL_NAME)
is_completed, current_final_answer = extract_answer_parts(answer_text)
responses_content += f"\nTask completion status: {is_completed}\nFinal Answer: {current_final_answer}"
is_completed

Request URL: https://api.shubiaobiao.com/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.shubiaobiao.com/v1/chat/completions "HTTP/1.1 200 OK"


'Completed'

In [22]:
print(current_final_answer)

Your problem has been solved. The predicted product of the chemical reaction between OC(=O)CCC(=O)O and O=C(C)OC(=O)C is `O=C1CCC(=O)O1`. The corresponding SELFIES is `[O][=C][C][C][C][=Branch1][C][=O][O][Ring1][=Branch1]`, which describes the molecule as a tetrahydrofurandione that is succinic anhydride substituted by oxo groups at positions 2 and 5, functioning as a metabolite. The molecule has been checked for patents and is considered novel according to SureChEMBL. Additionally, the CAS number indicates that the molecule is not explosive.


***

### Case IV: MOF materials analysis

In [1]:
import os
from dotenv import load_dotenv


PROJECT_ROOT = os.getcwd()
load_dotenv(os.path.join(PROJECT_ROOT, 'example.env'), override=True)

True

In [2]:
from scripts.custom_kg_retrievers import KGTableRetriever
from scripts.retrieve_tool_info import execute_tool, check_toxicity
from scripts.utils import load_index_from_storage_dir, construct_question
from app.llms.planning import generate_tools_plan, generate_plan_input, generate_output
from app.llms.extraction import extract_tool_path, extract_answer_parts
from app.tools.safety.toxicity_checker import check_toxicity_by_type
from app.tools.safety import extract_smiles_or_sequences
from app.tools import run_tool
from app.core.config import Config

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core import QueryBundle
from llama_index.embeddings.openai import OpenAIEmbedding

# print()
config = Config()
Settings.embed_model = OpenAIEmbedding(model=config.EMBEDDING_MODEL_NAME)
Settings.llm = OpenAI(temperature=0, model=config.MODEL_NAME)




In [None]:
question = "I want to know the thermal stability and CO2 absorption of this mof material, and then convert the smiles of this mof material into cas to query the corresponding price."

is_add_filename = True
file_path_list = [
            "/home/hjj/project/ToolsKG/TempFiles/cif/HKUST-1.cif", 
        ]

question = construct_question(question, file_path_list, is_add_filename)
print(question)

I want to know the thermal stability and CO2 absorption of this mof material, and then convert the smiles of this mof material into cas to query the corresponding price Here is the list of files: HKUST-1.cif


In [4]:

storage_context, kg_index = load_index_from_storage_dir(config.PERSIST_DIR)
kg_table_retriever = KGTableRetriever(
    index=kg_index, 
    similarity_top_k=10, 
    graph_store_query_depth=5, 
    max_tools=10
)        
query_bundle = QueryBundle(query_str=question)
tools_retrieved = await kg_table_retriever._retrieve(query_bundle)

INFO:llama_index.core.indices.loading:Loading all indices.
Knowledge Graph Index loaded successfully from KG/storage_graph_large
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/embeddings "HTTP/1.1 200 OK"
['PredictStability', 'PredictAdsorption', 'GetFloatingSolventMolecules', 'MofLattice', 'MofFractionalCoordinates', 'GetTerminalIndices', 'MOFToSMILES', 'CASToPrice', 'SMILESToCAS', 'MofNeighborIndices']


In [None]:
responses_content = ""
total_output = ""

tools_plan = await generate_tools_plan(
    question=question, 
    tools_retrieved_list=tools_retrieved, 
    model=config.MODEL_NAME, 
    previous_answer=total_output, 
)
tools_path = extract_tool_path(tools_plan)
print(tools_path)

https://api.ai-gaochao.cn/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
['PredictStability', 'PredictAdsorption', 'MOFToSMILES', 'SMILESToCAS', 'CASToPrice']


In [6]:
for tool_name in tools_path:
    print(f"Executing tool: {tool_name}")
    current_output = await execute_tool(
        tool_name, 
        question, 
        question, 
        file_path_list, 
        config.MODEL_NAME, 
        kg_table_retriever, 
        total_output
    )
    print(f"current_output: {current_output}")
    total_output += current_output
    responses_content += current_output

Executing tool: PredictStability
https://api.ai-gaochao.cn/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
input_str: HKUST-1.cif
current_output: 

Now we have run the **PredictStability** and obtained result
    filename  ANN_thermal_prediction  ANN_solvent_removal_prediction
HKUST-1.cif              324.049896                        0.948832.
Executing tool: PredictAdsorption
https://api.ai-gaochao.cn/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"
input_str: HKUST-1.cif
current_output: 

Now we have run the **PredictAdsorption** and obtained result
    filename  finished  CO2_absolute_mg/g  H2_absolute_mg/g  CO2_excess_mg/g  H2_excess_mg/g
HKUST-1.cif      True         106.885632          0.052386       106.885632        0.052386.
Executing tool: MOFToSMILES
https://api.ai-gaochao.cn/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.ai-gaoch

In [7]:
answer_text = await generate_output(question, total_output, model=config.MODEL_NAME)
is_completed, current_final_answer = extract_answer_parts(answer_text)
responses_content += f"\nTask completion status: {is_completed}\nFinal Answer: {current_final_answer}"


https://api.ai-gaochao.cn/v1/chat/completions
INFO:httpx:HTTP Request: POST https://api.ai-gaochao.cn/v1/chat/completions "HTTP/1.1 200 OK"


In [8]:
print(is_completed)
print(current_final_answer)

Completed
1. Thermal Stability of HKUST-1: 324.049896 Kelvin
2. CO2 Absorption of HKUST-1: 106.885632 mg/g
3. SMILES to CAS Conversion: The SMILES [O]C(=O)c1cc(C([O])=O)cc(C([O])=O)c1 corresponds to CAS number 554-95-0.
4. Price Query: The average price for CAS number 554-95-0 is ￥0.4096666666666667.
