Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0da7ce1
feat: Add DNA and RNA search functionality
CHERRY-ui8 Nov 26, 2025
27c5723
Merge branch 'open-sciencelab:main' into feat/add-dna-rna-search
CHERRY-ui8 Nov 26, 2025
9a26138
fix: fix pylint style issues
CHERRY-ui8 Nov 26, 2025
ea2214c
Merge branch 'feat/add-dna-rna-search' of github.com:CHERRY-ui8/Graph…
CHERRY-ui8 Nov 26, 2025
ef270b8
refactor: unify searcher interfaces and improve error handling
CHERRY-ui8 Nov 26, 2025
71fba90
Add UniProt IDs to search_protein_demo.jsonl
CHERRY-ui8 Nov 26, 2025
9f8c837
add: add UniProt IDs to search_protein_demo.jsonl
CHERRY-ui8 Nov 26, 2025
c60784e
feat: unify search interfaces to use gene ID as unified data source
CHERRY-ui8 Nov 27, 2025
0dac99d
add: an gene id example in DNA demo
CHERRY-ui8 Nov 27, 2025
1865120
feat: unify search interfaces to use RNA id as unified data source
CHERRY-ui8 Nov 27, 2025
bdba4f9
Merge branch 'feat/add-dna-rna-search' of github.com:CHERRY-ui8/Graph…
CHERRY-ui8 Nov 27, 2025
8678e33
fix: fix pylint style issues
CHERRY-ui8 Nov 27, 2025
40ef49e
fix: reduce nested blocks and fix all pylint issues
CHERRY-ui8 Nov 27, 2025
9382660
feat: add DNA RNA local blast
CHERRY-ui8 Nov 29, 2025
2a715de
style: reduce return statements and branches in searcher methods
CHERRY-ui8 Nov 29, 2025
b48930a
perf: optimize code style and search efficiency
ChenZiHong-Gavin Nov 30, 2025
bb84c0b
fix: fix import error
ChenZiHong-Gavin Nov 30, 2025
58ef1ec
fix: delete rate_limiter
ChenZiHong-Gavin Nov 30, 2025
d767096
perf: simplify RNA searcher and align with DNA searcher logic
CHERRY-ui8 Nov 30, 2025
ea30cef
fix: fix search params in get_best_hit
ChenZiHong-Gavin Dec 1, 2025
3adb956
perf: optimize search logic in rnacentral_searcher
ChenZiHong-Gavin Dec 1, 2025
5526381
perf: optimize search logic in rnacentral_searcher
ChenZiHong-Gavin Dec 1, 2025
e1530f9
perf: optimize code style
ChenZiHong-Gavin Dec 1, 2025
61f7f44
fix: fix lint problems
ChenZiHong-Gavin Dec 1, 2025
2c00b9e
fix: search setup problems
CHERRY-ui8 Dec 1, 2025
6d0be7a
feat: more examples in search demo
CHERRY-ui8 Dec 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions graphgen/configs/search_config.yaml

This file was deleted.

17 changes: 17 additions & 0 deletions graphgen/configs/search_dna_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
pipeline:
- name: read_step
op_key: read
params:
input_file: resources/input_examples/search_dna_demo.jsonl # input file path, support json, jsonl, txt, pdf. See resources/input_examples for examples

- name: search_step
op_key: search
deps: [read_step] # search_step depends on read_step
params:
data_sources: [ncbi] # data source for searcher, support: wikipedia, google, uniprot, ncbi, rnacentral
ncbi_params:
email: test@example.com # NCBI requires an email address
tool: GraphGen # tool name for NCBI API
use_local_blast: true # whether to use local blast for DNA search
local_blast_db: /your_path/refseq_241 # path to local BLAST database (without .nhr extension)

15 changes: 15 additions & 0 deletions graphgen/configs/search_protein_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
pipeline:
- name: read_step
op_key: read
params:
input_file: resources/input_examples/search_protein_demo.jsonl # input file path, support json, jsonl, txt, pdf. See resources/input_examples for examples

- name: search_step
op_key: search
deps: [read_step] # search_step depends on read_step
params:
data_sources: [uniprot] # data source for searcher, support: wikipedia, google, uniprot
uniprot_params:
use_local_blast: true # whether to use local blast for uniprot search
local_blast_db: /your_path/2024_01/uniprot_sprot # format: /path/to/${RELEASE}/uniprot_sprot
# options: uniprot_sprot (recommended, high quality), uniprot_trembl, or uniprot_${RELEASE} (merged database)
16 changes: 16 additions & 0 deletions graphgen/configs/search_rna_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
pipeline:
- name: read_step
op_key: read
params:
input_file: resources/input_examples/search_rna_demo.jsonl # input file path, support json, jsonl, txt, pdf. See resources/input_examples for examples

- name: search_step
op_key: search
deps: [read_step] # search_step depends on read_step
params:
data_sources: [rnacentral] # data source for searcher, support: wikipedia, google, uniprot, ncbi, rnacentral
rnacentral_params:
use_local_blast: true # whether to use local blast for RNA search
local_blast_db: /your_path/refseq_rna_241 # format: /path/to/refseq_rna_${RELEASE}
# can also use DNA database with RNA sequences (if already built)

2 changes: 1 addition & 1 deletion graphgen/graphgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def __init__(

# llm
self.tokenizer_instance: Tokenizer = tokenizer_instance or Tokenizer(
model_name=os.getenv("TOKENIZER_MODEL")
model_name=os.getenv("TOKENIZER_MODEL", "cl100k_base")
)

self.synthesizer_llm_client: BaseLLMWrapper = (
Expand Down
2 changes: 2 additions & 0 deletions graphgen/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
RDFReader,
TXTReader,
)
from .searcher.db.ncbi_searcher import NCBISearch
from .searcher.db.rnacentral_searcher import RNACentralSearch
from .searcher.db.uniprot_searcher import UniProtSearch
from .searcher.kg.wiki_search import WikiSearch
from .searcher.web.bing_search import BingSearch
Expand Down
Loading