Datadoc: Your AI DOC Assistant

🌟 Feature Update: Offline LLM support. Now you can run whole system offline. Click here.

Welcome to Datadoc, your personal AI document assistant. Datadoc is designed to help you extract information from documents without having to read or remember the entire content. It's like having a personal assistant who has read all your documents and can instantly recall any piece of information from them.

Features 🚀

Document RAG Search: Datadoc uses a Retrieval-Augmented Generation (RAG) approach for document search. This involves retrieving relevant documents or passages and then using them to generate a response. This allows Datadoc to provide detailed and contextually relevant answers.
- I have uploaded the whole book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron along with other several books. And picked random questions from the excercise section of the book. It answers every question precisely.
Offline Support: Datadoc supports offline mode i.e now you can the LLM model locally on your system. And you also don't need GPU for this. If you prefer to run LLM locally you can use this feature.
- Download the Model: Download mistral-7b-openorca.gguf2.Q4_0.gguf model from the Model Explorer section in GPT4All website.
- Place the model inside models/mistral-7b-openorca.gguf2.Q4_0.gguf.
Child Mode: It enables LLMs to elucidate topics as if they're explaining to a child. This feature proves invaluable for providing detailed and easily understandable explanations for each topic.
- Without child mode:
- After child mode:
Vector Database: Datadoc uses ChromaDB to store embeddings of the data. Embeddings are vector representations of text that capture semantic meaning. Storing these embeddings in a vector database allows for fast and efficient similarity search, enabling Datadoc to quickly find relevant information in your documents.
Supports Multiple Formats: Datadoc can read information from various document formats such as PDFs, DOCX, MD, and more.
Image Search: Datadoc can also answer queries based on the content of an uploaded image using gemini-pro-vision model.
Fast and Efficient: Powered by Langchain and ChromaDB for storing data embeddings, Datadoc provides instant results.

How Datadoc Works

Intelligent Fusion: Datadoc harnesses the power of Langchain's Gemini model (a sophisticated Language Model Mixture) in combination with ChromaDB's advanced embedding storage.
Versatile Processing: Datadoc handles a multitude of document formats with ease.
Image Understanding: For image-related queries, the Gemini API steps in to provide deep image analysis.

Getting Started 🎉

Clone the repository

git clone https://github.com/Adiii1436/datadoc.git
cd datadoc

Create virtual environment

python3 -m venv venv

Install the dependencies

pip install -r requirements.txt

Put all your files inside Transcripts folder.
Run the main script and start asking questions!

streamlit run app.py

You also need a gemini-api key which you can get from here.
Note that initial execution may take some time to create document embeddings and parse various document types, but subsequent runs will be faster.
Important Note: click here

Contributing 🤝

We welcome contributions from developers. Feel free to fork this repository, make changes, and submit a pull request.

License 📄

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
API		API
Transcripts		Transcripts
__pycache__		__pycache__
.gitignore		.gitignore
LICENSE		LICENSE
app.py		app.py
embed_data.py		embed_data.py
get_result.py		get_result.py
load_model.py		load_model.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datadoc: Your AI DOC Assistant

Features 🚀

How Datadoc Works

Getting Started 🎉

Contributing 🤝

License 📄

About

Releases

Packages

Languages

License

Adiii1436/datadoc

Folders and files

Latest commit

History

Repository files navigation

Datadoc: Your AI DOC Assistant

Features 🚀

How Datadoc Works

Getting Started 🎉

Contributing 🤝

License 📄

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages