Simple CLI Retrieval Augmented Generation Scanner

Aim of the project: A showcase of a RAG scanner written in Java and using Spring AI, which scans the targeted documents and you can ask questions to the LLM regarding the given documents.

Disclaimer

This tool is intended for educational and productivity purposes only. It is designed to assist users in managing and querying their own documents. Any illegal or unethical use of this software is strictly prohibited.

Requirements

Java 21 installed on your device
Docker
An environment variable named GOOGLE_API_KEY and add your Google Gemini API key

Installation

Navigate to the project directory
Open CMD/Powershell/Terminal
For Windows Run ./mvnw clean install, for Linux/Mac run ./mvn clean install

How to use:

Run docker-compose up in your CMD/Powershell/Terminal
Run the project using maven, on Windows: ./mvnw spring-boot:run, on Linux/Mac run ./mvn spring-boot:run.
When the shell opens type collection-size 768 (for Gemini 768 is compatible).
Place your files in a directory, copy the full path of the directory, and run something like this load /your/path, wait till the files are chunked and loaded to Qdrant vector database.
Finally in the shell write ask "your question here" and that's it.

Notes

It's a simple project, needs a lot of improvements like:

Improve chunking documents (Currently chunked by token size)
Support more file types (Currently supports txt, HTML, JSON, MD, docx, ppt, pdf, and a lot more)
Support other Chat models like GPT, Ollama, etc... (currently supports Gemini version gemini-1.5-flash-latest, the reason I decided to use Gemini is because it has a good free tier)
Support to make it a standalone executable and a jar file, (Currently you can build it yourself and run it, it has no problem, but I will simplify it)
Support other vector databases ( Currently supports Qdrant, to be honest, it's good enough)
Support custom System Context and custom similar returned documents in DB (Default, for now, is 5.)

Rabbit hole

Don't try to retrieve an API key from older .git versions, it's a rabbit hole :)

Please create an Issue, if something is wrong I will look into it, and feel free to contribute to the project.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
.mvn/wrapper		.mvn/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml
settings.xml		settings.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple CLI Retrieval Augmented Generation Scanner

Disclaimer

Requirements

Installation

How to use:

Notes

Rabbit hole

Please create an Issue, if something is wrong I will look into it, and feel free to contribute to the project.

About

Releases

Packages

Languages

License

Lunatix01/ragscan

Folders and files

Latest commit

History

Repository files navigation

Simple CLI Retrieval Augmented Generation Scanner

Disclaimer

Requirements

Installation

How to use:

Notes

Rabbit hole

Please create an Issue, if something is wrong I will look into it, and feel free to contribute to the project.

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages