## First kind of the task of querying : What is XXX?
The notebook provides a standard method to query the terminology such as gene, disease, cell and so on by BRICK.
As an example, we query the gene name of Isl1.

Load the BRICK module.

In [1]:
user_question = "Which is disease is most related to the gene Isl1?"

In [2]:
import sys
import os
import BRICK
import scanpy as sc

from dotenv import dotenv_values
import json
config = dotenv_values('/workspace/data/brick.env')
kg_url = config.get('KG_URL')
kg_auth = (config.get('KG_AUTH_USER'), config.get('KG_AUTH_PASS'))
model = config.get("MODEL_TYPE")
url = config.get("LLM_URL")
api_key = config.get("API_KEY")
llm_params = json.loads(config.get("LLM_PARAMS"))

BRICK.config(url=kg_url, auth=kg_auth)
BRICK.config_llm(modeltype=model, base_url=url,api_key=api_key, llm_params=llm_params)

  from .autonotebook import tqdm as notebook_tqdm


Graph database has been configured and initialized successfully.
LLM has been configured and initialized successfully.


Search the gene node of Isl1 in the knowledge graph.
The parameters are source_entity_set, source_entity_type, query_attribution.
Source_entity_set is the set of source entities to query,that can be a list,set.
Source_entity_type is the type of source entities to query,such as "Gene","Tissue","Cell".
Query_attribution is the attribute of source entities to query,it commonly is "name".


In [3]:
source_entity = BRICK.inp.extract_entities_node(user_question)
print(source_entity)

['Isl1']


In [4]:
client = BRICK.se.BRICKSearchClient()
print(f"字符串索引: {client.search_config.string_index_name}")
print(f"向量索引: {client.search_config.vector_index_name or '已禁用'}\n")

payload_fuzzy_auto = client.build_untyped_payload(query_text=source_entity, top_k=1)
print(f">>> [Fuzzy] 未指定类别：全类别检索（query_text='{source_entity}'）")
df_all = client.search_fuzzy(payload_fuzzy_auto)
entity_id = df_all.head(1)['entity_id'].iloc[0]
print(entity_id)


字符串索引: brick_index_*_string
向量索引: brick_index_*_vector

>>> [Fuzzy] 未指定类别：全类别检索（query_text='['Isl1']'）
NCBI:16392


In [5]:
df_all

Unnamed: 0,entity_id,primary_name,type_key,node_type,string_score,matched_alias
0,NCBI:16392,Isl1,Gene|Protein,Gene,67.38008,


In [6]:
entity_name = df_all.head(1)['primary_name'].iloc[0]
entity_name

'Isl1'

In [7]:
BRICK.qr.query_node(source_entity_set=entity_name, source_entity_type="Gene", query_attribution="name")

Unnamed: 0,n.def,n.id,n.name,n.synonym,n.type
0,"ISL1 transcription factor, LIM/homeodomain<loc>:13 D2.2|13 64.87 cM<xref>MGI:101791|ENSEMBL:ENSMUSG00000042258</xref>",NCBI:16392,Isl1,Undef,Gene


In order to answer the question :"What is Isl1?"
We shouldn' t only query the node, but also need to find the nodes that relates to the Isl1.
Therefore, we use the function "BRICK.qr.query_neighbor" to find the neighbor of Isl1.


In [10]:
neighbor_df = BRICK.qr.query_neighbor(
    source_entity_set=entity_name,
    relation=None,
    source_entity_type="Gene",
    target_entity_type=None,
    multi_hop=1,
    directed=True,
    query_attribution="name",
    return_type="dataframe",
)

Aggregate neighbor/path records by target term and count the number of unique source entities matched. Produces a summary DataFrame for ranking, filtering, and visualization.

In [11]:
target_df = BRICK.rk.match_count(neighbor_df)

Select the ten nodes most closely related to Isl1 from each category.

In [12]:
grouped = target_df.groupby('path.2.type').head(10)
grouped

Unnamed: 0,path.0.name,path.1.relation,path.1.info_source_length,path.1.relation_confidence,path.2.id,path.2.name,path.2.type,path.2.match_count
0,[Isl1],[regulated],[1],[nan],NCBI:68386,0610039K10Rik,Gene,1
977,[Isl1],[marker_of],[3],"[[1, 1, 1]]",UBERON:0003339,ganglion of central nervous system,Tissue,1
953,[Isl1],[marker_of],[1],[[1]],EMAPA:30984,female external genitalia,Tissue,1
952,[Isl1],[marker_of],[1],[[1]],EMAPA:16660,facio-acoustic preganglion complex,Tissue,1
951,[Isl1],[marker_of],[1],[[1]],EMAPA:16794,facio-acoustic ganglion complex,Tissue,1
950,[Isl1],[marker_of],[1],[[1]],UBERON:0006232,facio-acoustic VII-VIII preganglion complex,Tissue,1
949,"[Isl1, Isl1]","[marker_of, marker_of]","[6, 6]","[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]",UBERON:0000127,facial nucleus,Tissue,1
948,[Isl1],[marker_of],[2],"[[0.7926, 0.7926]]",UBERON:0001647,facial nerve,Tissue,1
947,[Isl1],[marker_of],[12],"[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]",EMAPA:17569,facial ganglion,Tissue,1
946,"[Isl1, Isl1]","[marker_of, marker_of]","[1, 1]","[[1], [1]]",UBERON:0001456,face,Tissue,1


Agent will generate an answer based on the "grouped" DataFrame.

In [13]:
ans = BRICK.inp.interpret_query(user_question,grouped)
print(ans)

Based on the provided table, to determine which disease is most related to the gene *Isl1*, we need to look for entries where *Isl1* is associated with a disease. The "path.2.type" column indicates the type of entity, and we are specifically interested in those marked as "Disease".

From the table, there are several instances where *Isl1* is related to diseases:

- **Ventricular septal defect (DOID:1657)**
- **Type 1 diabetes mellitus (DOID:9744)**
- **Type 2 diabetes mellitus (DOID:9352)**
- **Congenital heart disease (DOID:1682)**
- **Myocardial infarction (DOID:5844)**

To determine the most related disease, we can consider the frequency and the confidence of the relationships. However, since the table does not provide a clear confidence score for all disease relations, we will count the number of times each disease appears.

- **Ventricular septal defect (DOID:1657)**: 1 occurrence
- **Type 1 diabetes mellitus (DOID:9744)**: 1 occurrence
- **Type 2 diabetes mellitus (DOID:9352)**: 