LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval

This repository is the official implementation of LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval.

Large language models (LLMs) exhibit strong semantic understanding, yet struggle when user instructions involve ambiguous or conceptually misaligned terms. We propose the Language Graph Model (LGM) to enhance conceptual clarity by extracting meta-relations—inheritance, alias, and composition—from natural language. The model further employs a reflection mechanism to validate these meta-relations. Leveraging a Concept Iterative Retrieval Algorithm, these relations and related descriptions are dynamically supplied to the LLM, improving its ability to interpret concepts and generate accurate responses. Unlike conventional Retrieval-Augmented Generation (RAG) approaches that rely on extended context windows, our method enables large language models to process texts of any length without the need for truncation. Experiments on standard benchmarks demonstrate that the LGM consistently outperforms existing RAG baselines.

Requirements

Neo4j

We use neo4j-community-3.5.13 as the database for graph data storage. Download the Windows version or macOS/Linux version. Please follow the official manual for installation.
And then config the Neo4j URI, user name and password in sources\config.ini
To use the Neo4j database more efficiently, you can create indexes. The corresponding statements are as follows:

CALL db.index.fulltext.createNodeIndex("root_sentence_lemma_index", ["_ROOT_"], ["sentenceLemma"]);

Python

The python version is 3.10.x. And to install requirements:

pip install -r requirements.txt

Datasets

HotpotQA dataset can download from http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json
Musique dataset can download from https://huggingface.co/datasets/bdsaglam/musique/blob/main/musique_ans_v1.0_dev.jsonl

Models

If you need to test an online model, you must prepare the corresponding API_KEY and URL, then input them into the sources\config.ini file and configure the relevant parameters. For local models, only the corresponding URL is required. The Deepseek model must be configured as the answer matching model. If you want to test the performance of the LLama3 model, you also need to configure the LLama3 model.

Evaluation

Please run the learning stage first. Then run the answering stage.

To evaluate HotpotQA, run:

python evaluate/eval.py --dataset hotpot --model deepseek --stage learn --path path/to/hotpot_dev_distractor_v1.json
python evaluate/eval.py --dataset hotpot --model deepseek --stage answer --path path/to/hotpot_dev_distractor_v1.json

python evaluate/eval.py --dataset hotpot --model llama --stage learn --path path/to/hotpot_dev_distractor_v1.json
python evaluate/eval.py --dataset hotpot --model llama --stage answer --path path/to/hotpot_dev_distractor_v1.json

To evaluate Musique, run:

python evaluate/eval.py --dataset musique --model deepseek --stage learn --path path/to/musique_data_v1.0/musique_ans_v1.0_dev.jsonl
python evaluate/eval.py --dataset musique --model deepseek --stage answer --path path/to/musique_data_v1.0/musique_ans_v1.0_dev.jsonl

python evaluate/eval.py --dataset musique --model llama --stage learn --path path/to/musique_data_v1.0/musique_ans_v1.0_dev.jsonl
python evaluate/eval.py --dataset musique --model llama --stage answer --path path/to/musique_data_v1.0/musique_ans_v1.0_dev.jsonl

To evaluate Reflection, run:

python tests/test_reflection.py

Results

Our model achieves the following performance on :

Model / Dataset	HotpotQA			Musique
Model	Deepseek v3-0324	Llama-3.3-70B-Instruct-AWQ	AVG	Deepseek v3-0324	Llama-3.3-70B-Instruct-AWQ	AVG
Language Graph Model	89.46%	87.06%	88.26%	68.13%	63.07%	65.60%
GraphRAG 1	88.55%	82.59%	85.57%	64.98%	63.16%	64.07%
GraphRAG 2	86.90%	69.21%	78.06%	48.98%	48.61%	48.79%
LightRAG 2	87.94%	76.34%	82.14%	65.36%	50.33%	57.84%
FastRAG 3	72.66%	72.26%	72.46%	39.91%	36.51%	38.21%
Dify	68.53%	43.64%	56.09%	52.32%	18.27%	35.29%

We analyze the contribution of each component via ablation on HotpotQA (DeepSeek v3-0324). The maximum input size was reduced from 120,000 to 30,000 characters. Figure shows that F1 varies only mildly (std 0.009) and Recall remains stable (std 0.0038). The best F1 (89.46%) occurs at 60,000 with Recall 99.09%, indicating robustness to context budget.

Contributing

Language Graph Model source is under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
evaluate		evaluate
sources		sources
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval

Requirements

Neo4j

Python

Datasets

Models

Evaluation

Results

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

Philisense/language-graph-model

Folders and files

Latest commit

History

Repository files navigation

LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval

Requirements

Neo4j

Python

Datasets

Models

Evaluation

Results

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages