ChatClue is an interactive platform where users can speak naturally to their computers, robots, and IoT devices in real-time. Devices can speak back using the built-in TTS (text-to-speech) adapters and perform real-world actions based on the user's requests.
The video below shows an example of the system being used to control a robotic car.
Primary features in the current version
- Natural language processing and computer responses are handled automatically through OpenAI's API.
- Conversation storage and references are built-in and managed using PostgreSQL with the pgvector extension enabled.
- Video feed processing is handled automatically, allowing user's to ask their computer/robot/device questions like, "What do you think of this shirt?", "How many fingers am I holding up", "Describe the current scene", "Do you see any animals or unexpected people?"
- OpenAI Tool calls are handled automatically by the system, allowing you to create an endless set of capabilities. Tools can be created in the
tools/
directory, please refer to the readme in the provided link. - Built-in broadcasting to listening devices/robots is handled through the system's
Broadcaster
, which uses Websockets. The Broadcaster uses an adapter architecture, so additional broadcaster types other than Websockets (e.g. MQTT) can be implemented with little trouble.
Out of the box, after running the install_dependencies.sh script, and setting up your credentials for OpenAI as described below, ChatClue (Osiris) will immediately start listening for a conversation to start, and will continue a back and forth conversation until the session has been terminated. If you have a camera running, it will automatically determine if you need something in the video analyzed, and perform the analysis for you, based on your natural language requests.
Broadcasting can be used to interact and communicate with other devices throughout the world (or just on your local network, as it is configured by default).
-
Installation: Run the
install_dependencies.sh
script to install all necessary dependencies (Tested on Ubuntu 22 and primarily designed for debian systems currently). -
Configuration: Set up your
config.py
settings with necessary API keys and parameters. Optionally, use environment variables for sensitive information likeGOOGLE_APPLICATION_CREDENTIALS
andOPENAI_API_KEY
. If you did not run theinstall_dependencies
script above, be sure to copy theconfig.example.py
file toconfig.py
-
Running the Application: Simply execute
python3 osiris.py
after configuration. The system will be ready to start conversations using defined wake words (config.py
) and will respond naturally using the chosen TTS (Text-to-Speech) adapter. -
Example Client Application:
osiris.py
can be run on its own, but you can also listen to its broadcaster for conversation parts and structured commands to facilitate robotic and other peripheral IoT controls. An example of this type of implementation can be found in theexamples/
directory. The current example shows how to control a robotic car with natural language by running Project Osiris and listening to it with theexamples/picarx/client.py
script running on the robotic car. The corresponding functions to control the robotic car in Project Osiris can be found in thetools/picarx
directory, which are decorated with the project's @openai_function decorator described below.
- Allows the user to start conversations with their predefined wake words (configured in
config.py
). - Handles the natural continuation of the conversation, back-and-forth, without the need to invoke the wake word again.
- Automatically handles user interruptions of the robot / computer's response.
- Utilizes a database to store and recall all conversation data.
- Builds conversation arrays for OpenAI based on historical data.
- Incorporates vector embeddings for each conversation part for advanced querying and topic-based conversation structuring (i.e. Do you remember when we discussed XYZ? What did we decide was the best course of action again?). This functionality is in current development.
- Creates and leverages the
@openai_function
decorator to integrate additional functionalities into OpenAI's tools array. - Handles tool calls and responses automatically, based on the
is_conversational
flag. - Example implementations and syntax can be found in the
tools/
directory.
- Incorporates automatic video analysis for enhanced media interaction.
- Image analysis is adapter-based, allowing flexibility and customizability.
- Currently includes built-in adapters for both Google Cloud's Vertex AI API and OpenAI vision models.
- Additional adapters for image analysis can be added at any time in the
video/analysis_adapters
directory for extended functionality. - Configurable via
VISION_ANALYSIS_ADAPTERS
,GOOGLE_VISION_ANALYSIS_ADAPTER_SETTINGS
, andOPENAI_SETTINGS['image_model']
in yourconfig.py
file.
- Features a WebSocket-based broadcaster for sending commands to clients.
- Adaptable for various protocols (e.g., MQTT) by adding custom adapters in
broadcast/adapters
. - Example usage for robot control demonstrated in the
examples/
directory.
- Offers TTS functionality using adaptable architecture.
- Includes offline
pyttsx3
adapter and Google Cloud's TTS API (GTTSAdapter
). - Configurable via
TTS_CONFIG
,GOOGLE_TTS_CONFIG
, andPYTTSX3_TTS_CONFIG
.
- Utilizes OpenAI's Streaming capability for quick and responsive interactions.
- Seamless audio streaming via TTS adapter aligned with OpenAI's streaming responses.
- Employs Celery as a background processor with Redis as the broker.
- Enhances performance by managing tasks like conversation storage in the background.
- Includes a
install_dependencies.sh
script for setting up necessary dependencies (tested on Ubuntu 22 for reference). - Simple configuration through the
config.py
file and environment variables.
Note: This project is in active development.
For more detailed information about tools and examples, please refer to the individual README files in their respective directories.