Skip to content

AhanBhatt/Text2KG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text2KG

This repository contains the implementation for Text2KG, a research project focused on automated knowledge graph (KG) generation from unstructured data using three prominent language models: GPT-4, LLaMA 2 (13B), and BERT. The goal is to simplify KG creation for GraphRAGs (Graph-based Retrieval-Augmented Generative Systems) and evaluate the models based on several performance metrics.

Features

  • Automated Knowledge Graph Creation: Generate KGs directly from raw text without manual relationship classification.
  • Multi-Model Support: Scripts for GPT-4, LLaMA 2, and BERT are provided.
  • Performance Evaluation: Includes metrics such as Precision, Recall, F1-Score, Graph Edit Distance, and Semantic Similarity for comparing the generated KGs.
  • Dataset: Contains the Wikipedia excerpt on C programming language used as input data.
  • Visualization: Generates visual representations of KGs for all three models.

Repository Structure

.
├── data/                  # Input dataset (Wikipedia excerpt)
├── models/                # Scripts for GPT-4, LLaMA 2, and BERT
│   ├── gpt4.py            # GPT-4 KG generation script
│   ├── llama2.py          # LLaMA 2 KG generation script
│   └── bert.py            # BERT KG generation script
├── metrics/               # Scripts to calculate evaluation metrics
├── visualizations/        # Generated KG visualizations
├── results/               # Output tables and evaluation results
├── README.md              # Project overview (this file)
└── requirements.txt       # Dependencies for the project

Usage

  1. Clone the repository:

    git clone https://github.com/AhanBhatt/Text2KG.git
    cd Text2KG
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run KG generation:

    • To generate KGs using GPT-4:
      python models/gpt4.py
    • To generate KGs using LLaMA 2:
      python models/llama2.py
    • To generate KGs using BERT:
      python models/bert.py
  4. Visualize and Evaluate:

    • Visualizations will be saved in /visualizations/.
    • Evaluation metrics will be stored in /results/.

Key Highlights

  • Provides a systematic comparison of GPT-4, LLaMA 2, and BERT for KG creation.
  • Reproducible methodology with pre-written scripts for all three models.
  • Evaluation focuses on precision, accuracy, and structural fidelity of KGs.

Applications

This project is a step toward automating KG creation for GraphRAGs and has applications in:

  • Information retrieval
  • Knowledge representation
  • AI-driven reasoning systems

Contributing

Feel free to contribute by extending the dataset, optimizing model implementations, or suggesting additional evaluation metrics.


About

This repository contains a Python-based project that automates the creation of knowledge graphs (KGs) from text using OpenAI's GPT-4 API. It leverages PDF document parsing, natural language processing, and graph visualization techniques to extract entities and relationships from unstructured text and represent them as a directed graph.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages