A comprehensive toolkit for Large Language Models (LLM) and embedding models, supporting chat, embeddings, and reranking functionality with flexible configuration options and multi-engine support. Whether handling synchronous or asynchronous calls, LLM Tools efficiently manages various AI model integration tasks.
- LLM Chat: Interact with large language models (e.g., GPT) with support for streaming output and formatted responses
- Embedding Models: Support for single and multi-sentence embedding generation for semantic analysis and retrieval
- Reranking Models: Rank documents by query similarity with support for single and multi-sentence inputs
- Highly Configurable: Flexibly adjust parameters through YAML configuration files
- Multi-Engine Support: Support for Azure OpenAI, local models, and various embedding engines
- Async Support: Provides async interfaces for enhanced performance
- Memory Management: Built-in chat memory management with customizable history length
- Response Caching: Optional LLM response caching for improved efficiency
- LLM Engines: Compatible with OpenAI SDK format
- Embedding Models: m3e-base, bge-m3, and other embedding models
- Reranking Models: bge-reranker-large and other reranking models
# Clone the repository
git clone https://github.com/LLMSystems/llm_tools.git
cd llm_tools
# Install the package
pip install -e .
# Or install with development dependencies
pip install -e ".[dev]"Create a configs/models.yaml file based on example_configs/models.yaml and configure the model parameters:
params:
default:
temperature: 0.2
max_tokens: 1000
top_p: 1
frequency_penalty: 1.4
presence_penalty: 0
LLM_engines:
gpt-4o:
model: "gpt-4o"
azure_api_base: "your_azure_api_base_url"
azure_api_key: "your_azure_api_key"
azure_api_version: "your_azure_api_version"
Qwen2-7B-Instruct:
model: "Qwen2-7B-Instruct"
local_api_key: "Empty"
local_base_url: "http://localhost:8887/v1"
translate_to_cht: true # Optional: Translate to Traditional Chinese
embedding_models:
m3e-base:
model: "m3e-base"
local_api_key: "Empty"
local_base_url: "http://localhost:8887/v1"
reranking_models:
bge-reranker-large:
model: "bge-reranker-large"
local_api_key: "Empty"
local_base_url: "http://localhost:8887/v1"- DEFAULT: Default parameters including
temperature,max_tokens,top_p,frequency_penalty,presence_penalty - Azure OpenAI: Configure
azure_api_base,azure_api_key,azure_api_version(Note: Usage may incur costs) - Local Models: Configure
local_api_keyandlocal_base_url - translate_to_cht: When set to
True, automatically translates results to Traditional Chinese
from llm_chat import LLMChat
# Initialize LLM chat
llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")
# Simple chat
response, history = llmchat.chat(query="Hello, how are you?")
print(response)
# Interactive chat with history
history = []
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
response, history = llmchat.chat(query=user_input, history=history)
print(f"AI: {response}")from llm_chat import LLMChat
llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")
# Streaming response
for chunk in llmchat.chat(query="Tell me a story", stream=True):
print(chunk, end="", flush=True)
print()from llm_chat import LLMChat
from memory import ChatMemory
# Initialize chat memory
system_prompt = "You are a professional assistant who answers questions in Traditional Chinese."
chat_memory = ChatMemory(system_prompt=system_prompt, max_len=1000)
llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")
# Streaming with memory
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
text = ''
for chunk in llmchat.chat(query=user_input, history=chat_memory.get_history(), stream=True):
text += chunk
print(chunk, end="", flush=True)
print()
chat_memory.add_user_message(user_input)
chat_memory.add_system_response(text)import numpy as np
from embed_rerank_model import EmbeddingModel, RerankingModel
# Embedding generation
embed_model = EmbeddingModel(embedding_model="m3e-base", config_path="./configs/models.yaml")
query_embedding = np.array(embed_model.embed_query("The food is delicious."))
print(f"Embedding shape: {query_embedding.shape}")
# Document embedding
documents = ["The food is great.", "The service is excellent.", "The atmosphere is nice."]
doc_embeddings = embed_model.embed_documents(documents)
print(f"Document embeddings: {len(doc_embeddings)} vectors")
# Document reranking
rerank_model = RerankingModel(reranking_model="bge-reranker-large", config_path="./configs/models.yaml")
query = "Tell me about the food quality"
ranked_docs = rerank_model.rerank_documents(documents, query)
print(f"Reranked documents: {ranked_docs}")import asyncio
from async_llm_chat import AsyncLLMChat
async def async_chat_example():
# Initialize async LLM chat
async_llm = AsyncLLMChat(model="gpt-4o", config_path="./configs/models.yaml")
# Concurrent requests
async def query_a():
response, _ = await async_llm.chat(query="What is artificial intelligence?")
return response
async def query_b():
response, _ = await async_llm.chat(query="What is machine learning?")
return response
# Execute concurrently
responses = await asyncio.gather(query_a(), query_b())
for i, response in enumerate(responses):
print(f"Response {i+1}: {response}")
# Run async example
asyncio.run(async_chat_example())from async_llm_chat import AsyncLLMChat
# Enable caching
cache_config = {
'enable': True,
'cache_file': './cache/llm_cache.json'
}
async_llm = AsyncLLMChat(
model="gpt-4o",
config_path="./configs/models.yaml",
cache_config=cache_config
)llm_tools/
├── llm_chat.py # Synchronous LLM chat functionality
├── async_llm_chat.py # Asynchronous LLM chat functionality
├── embed_rerank_model.py # Embedding and reranking models
├── memory.py # Chat memory management
├── llm_response_cache.py # Response caching functionality
├── tutorial.py # Tutorial examples
├── tutorial.ipynb # Jupyter notebook tutorial
├── example_configs/ # Configuration examples
│ └── models.yaml # Model configuration template
├── pyproject.toml # Project configuration
└── README_zh-CN.md # Chinese README
└── README.md # This file (English)
For detailed usage examples, please refer to:
tutorial.py- Python script examplestutorial.ipynb- Jupyter notebook with interactive examples
This project is licensed under the MIT License.