Added Crew4AI for URL extraction alongside unstructured.io for PDF extraction. crew4ai is a powerful tool that provides advanced capabilities for extracting structured data from unstructured sources such as web pages, documents, and more. With crew4ai, you can easily extract URLs from text and leverage them in your knowledge graph generation process. This integration enhances the functionality of the application by allowing you to incorporate web-based information into your knowledge graphs. By combining the power of crew4ai and unstructured.io, you can create comprehensive and dynamic knowledge graphs that capture information from both PDF documents and web sources.
- Docker
- Docker Compose
- Google API Key for Gemini model
.
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── src/
│ └── kg_generator/
├── tests/
├── examples/
├── initial_pdfs/ # Mount point for initial PDF files
└── additional_pdfs/ # Mount point for additional PDF files to update the graph
- Clone the repository:
git clone https://github.com/OGsiji/Enhanced_GraphRAG.git
cd Enhanced_GraphRAG- Create a
.envfile from the example:
touch .env- Edit the
.envfile and add your Google API key and other Important keys:
GOOGLE_API_KEY=your_google_api_key_here
- Create PDF directories and add your PDF files:
mkdir initial_pdfs additional_pdfs
# Add initial PDFs
cp path/to/your/initial/pdfs/*.pdf initial_pdfs/
# Add additional PDFs (optional)
cp path/to/your/additional/pdfs/*.pdf additional_pdfs/
# Add URLs
Navigate to `src/kg_generator/url.py` and add/edit the URLs.
## Select what to run
You can also set what to run, whether URLs or PDFs, in `src/kg_generator/config.py` using `LinkConfig.url` or `LinkConfig.pdf`. The default value is `true`.- Build and run the containers:
docker-compose up --build- The system first processes all PDFs in the
initial_pdfsdirectory to create the base knowledge graph - If any PDFs exist in the
additional_pdfsdirectory, they will be processed and used to update the existing knowledge graph - Both directories are mounted as volumes, so you can add or remove PDFs without rebuilding the container
To run the tests in a Docker container:
docker-compose run --rm kg_generator pytestA Python application that generates knowledge graphs from PDF documents using FalkorDB and Google's Gemini model. The Knowledge Graph generator extends the GraphRAG-SDK framework to handle PDF files using the Unstructured-IO library.
- Docker
- Docker Compose
- Google API Key for Gemini model
.
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── src/
│ └── kg_generator/
├── tests/
├── examples/
├── initial_pdfs/ # Mount point for initial PDF files
└── additional_pdfs/ # Mount point for additional PDF files to update the graph
- Clone the repository:
git clone https://github.com/OGsiji/Enhanced_GraphRAG.git
cd Enhanced_GraphRAG- Create a
.envfile from the example:
touch .env- Edit the
.envfile and add your Google API key and other Important keys:
GOOGLE_API_KEY=your_google_api_key_here
- Create PDF directories and add your PDF files:
mkdir initial_pdfs additional_pdfs
# Add initial PDFs
cp path/to/your/initial/pdfs/*.pdf initial_pdfs/
# Add additional PDFs (optional)
cp path/to/your/additional/pdfs/*.pdf additional_pdfs/- Build and run the containers:
docker-compose up --build- The system first processes all PDFs in the
initial_pdfsdirectory to create the base knowledge graph - If any PDFs exist in the
additional_pdfsdirectory, they will be processed and used to update the existing knowledge graph - Both directories are mounted as volumes, so you can add or remove PDFs without rebuilding the container
To run the tests in a Docker container:
docker-compose run --rm kg_generator pytest