LangChain Chroma Agent

This project demonstrates the use of LangChain with Chroma for document embedding and retrieval. It leverages Azure OpenAI for generating embeddings and executing chat-based interactions.

Features

PDF Embedding: Convert PDF documents into embeddings using Azure OpenAI
Chroma Vector Store: Store and retrieve document embeddings using Chroma DB
Chunking: Intelligent document chunking with customizable size and overlap
Chat Agent: Interact with the system using a chat-based interface powered by LangChain and Azure OpenAI

Prerequisites

Python 3.8+
Azure OpenAI account
Chroma DB (running in Docker)

Setup

Clone the repository:

git clone <repository-url>
cd <repository-directory>

Start Chroma DB:

docker run -d -p 8000:8000 -v C:/chroma/data:/vector_data -e CHROMA_SERVER_CORS_ALLOW_ORIGINS='["http://localhost:8090"]' -e PERSIST_DIRECTORY=/vector_data --name chromadb chromadb/chroma

Install dependencies:
```
pip install -r requirements.txt
```
Environment Variables:
- Copy .env_sample to .env and fill in the required API keys and endpoints.
Environment Variables Description
- OPENAI_API_KEY: Your OpenAI API key
- AZURE_OPENAI_API_KEY: Your Azure OpenAI API key
- AZURE_OPENAI_ENDPOINT: The endpoint URL for Azure OpenAI
- AZURE_OPENAI_API_VERSION: The API version for Azure OpenAI
- LANGSMITH_TRACING: Enable or disable LangSmith tracing (true/false)
- LANGSMITH_ENDPOINT: The endpoint URL for LangSmith
- LANGSMITH_API_KEY: Your LangSmith API key
- LANGSMITH_PROJECT: The project name for LangSmith
Prepare PDF Embeddings:
- Place your PDF files in the ./data/input/ directory
- Run the embedding process to create and store document embeddings in Chroma DB

Usage

Start the Application:
```
python agent.py
```
Interact with Documents:
- The system will process PDFs from the input directory
- Use the chat interface to ask questions about your documents
- The agent will retrieve relevant information using the Chroma vector store

Configuration

You can customize the document processing by adjusting these parameters:

Chunk size (default: 1000 characters)
Chunk overlap (default: 200 characters)
Collection name for vector storage

Project Structure

agent.py: Main script to run the chat agent
embeddings_tools.py: Functions to create and manage document embeddings with Chroma
tools.py: Utility functions for file writing and message extraction
.env_sample: Sample environment configuration file
data/: Directory for input PDFs and output markdown files

Acknowledgments

LangChain for the foundational framework
Chroma for vector storage and retrieval
Azure OpenAI for embedding and chat capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.streamlit		.streamlit
.vscode		.vscode
data/output		data/output
prompts		prompts
.env_sample		.env_sample
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
chat_app.py		chat_app.py
embeddings_tools.py		embeddings_tools.py
requirements.txt		requirements.txt
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangChain Chroma Agent

Features

Prerequisites

Setup

Environment Variables Description

Usage

Configuration

Project Structure

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Epommier/rag-agent

Folders and files

Latest commit

History

Repository files navigation

LangChain Chroma Agent

Features

Prerequisites

Setup

Environment Variables Description

Usage

Configuration

Project Structure

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages