This code is experimental and work in progress. It is a great tool to learn and should be regarded as that. You will find bugs and things that don't work.
Most of the code was written by different versions of Claude (Sonnet 3.5, 3.7 and Opus 4), closely supervised by me. This was part of the learning journey too.
Funes is a system that enhances local Large Language Models with persistent memory capabilities and a Retrieval-Augmented Generation (RAG) pipeline, inspired by Jorge Luis Borges' short story "Funes the Memorious" (Funes el Memorioso). Just as the character Funes remembered everything he experienced, our system provides local LLMs with a persistent and contextually relevant memory system.
The system supports multiple LLM backends including Ollama, llama.cpp, and HuggingFace, while maintaining a persistent memory of previous interactions and knowledge in a PostgreSQL database with vector storage capabilities for semantic search.
- Linux-based system (Ubuntu/Debian recommended)
- Sudo privileges (non-root user)
- Internet connection for downloading dependencies
The entire installation process has been simplified with a bash script that handles all the necessary setup:
# Clone the repository
git clone https://github.com/Auto2ML/FunesServer.git
cd FunesServer
# Make the install script executable
chmod +x install.sh
# Run the installation script
./install.sh
The installation script automatically:
- Installs system dependencies
- Sets up LLM backends (Ollama, required libraries for llama.cpp and HuggingFace)
- Configures PostgreSQL with pgvector extension
- Creates the required database and user
- Sets up Python virtual environment
- Downloads the required LLM models
- Creates a launcher script for easy execution
After installation, you can run Funes with the provided launcher script:
./run_funes.sh
The Gradio interface will be available at http://localhost:7860
Funes can be configured through the config.py
file to use different LLM backends and models:
# LLM model configuration
LLM_CONFIG = {
'model_name': 'llama3.2:latest', # The model name or path
'backend_type': 'ollama', # Options: 'ollama', 'llamacpp', 'huggingface'
'system_prompt': "You are Funes..." # System prompt for the LLM
}
# Embedding model configuration
EMBEDDING_CONFIG = {
'model_name': 'all-MiniLM-L6-v2', # Embedding model name
}
-
Ollama Backend:
- Set
'backend_type': 'ollama'
- Use any model available in Ollama for
model_name
- Set
-
llama.cpp Backend:
- Set
'backend_type': 'llamacpp'
- For
model_name
, provide the path to your GGUF model file
- Set
-
HuggingFace Backend:
- Set
'backend_type': 'huggingface'
- Use any model identifier from HuggingFace for
model_name
- Set
# Memory configuration
MEMORY_CONFIG = {
'short_term_capacity': 10, # Number of messages to keep in short-term memory
'short_term_ttl_minutes': 30, # Time-to-live for short-term memory items
'default_top_k': 3 # Default number of relevant memories to retrieve
}
Funes includes a robust tools system that allows the LLM to interact with external functionalities. This system enables the LLM to:
- Access Real-time Information: Get current date/time information and weather conditions
- Process and Format Data: Extract, transform, and present data in natural language
- Extend Capabilities: The framework allows for easy addition of new tools
- Provides current date and time information
- Supports multiple timezones
- Configurable output formats (full, date, time, iso)
- Example usage: "What's the current time in Tokyo?"
- Retrieves weather information for specified locations
- Supports different temperature formats (Celsius/Fahrenheit)
- Provides details like temperature, conditions, humidity, and wind speed
- Example usage: "What's the weather like in Paris today?"
The tools system uses a flexible architecture based on the GenericTool class, making it easy to implement new tools. Each tool defines:
- A unique name
- A descriptive explanation of its functionality
- Required and optional parameters
- An execution method that performs the actual functionality
The system includes a response enhancement mechanism that converts raw tool outputs into natural, conversational responses. This makes the LLM's responses feel more human-like when using tools.
To create a new tool, extend the GenericTool class and implement the required methods:
- Define properties: name, description, and parameters
- Implement the execute method that performs the tool's functionality
- Place the tool in the tools/ directory
- The tool will be automatically detected and made available to the LLM
If you prefer to install components manually, follow these steps:
# For macOS/Linux (using curl)
curl -fsSL https://ollama.ai/install.sh | sh
# For Windows
# Download the installer from https://ollama.ai/download
# Install PostgreSQL and development libraries
sudo apt update
sudo apt install postgresql postgresql-contrib postgresql-server-dev-all
# Install pgvector
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
# Create database and user
sudo -u postgres psql
postgres=# CREATE DATABASE funes;
postgres=# CREATE USER llm WITH PASSWORD 'llm';
postgres=# GRANT ALL PRIVILEGES ON DATABASE funes TO llm;
postgres=# \c funes
postgres=# CREATE EXTENSION vector;
postgres=# \q
# Create a virtual environment
python3 -m venv funes-env
source funes-env/bin/activate
# Install required packages
pip install --upgrade pip
pip install -r requirements.txt
Create a .env
file in the project root:
DATABASE_URL=postgresql://llm:llm@localhost:5432/funes
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=llm
POSTGRES_PASSWORD=llm
POSTGRES_DB=funes
OLLAMA_HOST=localhost
OLLAMA_PORT=11434
Funes implements a Retrieval-Augmented Generation (RAG) pipeline that enhances LLM capabilities by:
- Dual Memory System:
- Short-term memory for recent conversation context
- Long-term memory stored as vector embeddings in PostgreSQL
- Embedding Storage: Converting previous conversations and knowledge into vector embeddings stored in PostgreSQL using pgvector
- Semantic Retrieval: Finding contextually relevant information from past conversations using vector similarity search
- Context Enhancement: Augmenting LLM prompts with retrieved context to provide more informed responses
- Persistent Memory: Maintaining knowledge across sessions for continuous learning and improvement
This architecture allows Funes to provide more accurate, contextual responses based on conversation history and stored knowledge.
- Open your web browser and navigate to
http://localhost:7860
- The interface provides:
- A chat interface for interacting with the LLM
- Memory management options
- Context visualization tools
- RAG pipeline configuration
-
PostgreSQL Connection Issues:
- Verify PostgreSQL is running:
sudo systemctl status postgresql
- Check database credentials in
config.py
- Verify PostgreSQL is running:
-
LLM Backend Connection:
- For Ollama backend:
ollama list
- For llama.cpp: Ensure the model path in
config.py
is correct - For HuggingFace: Verify internet connection or local model availability
- For Ollama backend:
-
Python Dependencies:
- If you encounter module not found errors:
pip install -r requirements.txt --upgrade
- If you encounter module not found errors:
-
Model Behavior Issues:
- If the model seems to be "talking to itself" or duplicating context, try:
- Clearing the conversation history
- Checking your
config.py
settings - Verifying that your LLM model is properly installed
- If the model seems to be "talking to itself" or duplicating context, try:
Contributions are welcome! Please feel free to submit a Pull Request.
Copyright (c) 2025 AUTO2ML - J. Rodriguez Martino
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- Inspired by Jorge Luis Borges' "Funes the Memorious"
- Built with PostgreSQL, pgvector, and Gradio
- Supports multiple LLM backends: Ollama, llama.cpp, and HuggingFace
- Special thanks to the open-source community