Aura: The Context-Aware Social AI Agent

"Stopping the typing, and starting the interaction."

Aura is a next-generation social AI companion designed to perceive the physical world, recognize identities, and build lasting relationships through social memory. It leverages a hybrid Edge-to-Cloud architecture, using local high-speed sensors (YOLO, OpenCV) and Gemini's multimodal reasoning to create immersive, real-time social experiences.

✨ Features

👁️ Proactive Vision: Constant person tracking and face recognition using YOLOv8 and SFace.
🎙️ Audio-Visual Fusion: Correlates visual mouth movements with voice activity to ensure context-aware addressing.
🧠 Context-First Brain: Driven by Gemini 2.0 Flash to understand emotion, spatial context, and intent.
💾 Cognitive Memory Architecture:
- Conversational (Short-Term): Real-time context management for fluid dialogue.
- Social (Persistent): Remembers personal facts and user preferences via Google Cloud Firestore.
- Knowledge (Long-Term): Vector-indexed history of all past interactions using ChromaDB.
- Biometric: Local-only face/voice embeddings (FAISS) for privacy.
📍 Active Spatial Awareness: Tracks physical objects over time (e.g., "Where did I leave my keys?").
🗣️ Natural Expression: High-quality local voice synthesis via Piper TTS.

🚀 Execution Modes

Aura is designed for flexibility. Configuration is managed in config/settings.yaml (which should be created from the provided .example file).

Option 1: Standard Multimodal (Recommended)

Uses local perception for detection and Gemini Pro/Flash for deep reasoning.

LLM: Gemini API or Groq.
Vision: Local YOLOv8 + occasional frames to Gemini for scene grounding.
STT/TTS: Local (Whisper/Piper) for privacy and speed.

Option 2: Gemini Live (Immersive)

Full real-time multimodal streaming.

Engine: Gemini Multimodal Live API.
Features: Low-latency audio/video streaming, natural interruptions, and visual grounding.

Option 3: Privacy-First (Fully Local)

Works completely offline without cloud dependencies.

LLM: Ollama / Llama.cpp (running locally).
Vision: Local person tracking and recognition.
STT/TTS: Whisper and Piper.

🛠 Prerequisites

Python 3.10+
Hardware: Webcam, Microphone, and Speakers.
Cloud Setup (Optional for Mode 1 & 2):
- Gemini API Key.
- Google Cloud Project with Firestore enabled.

📥 Installation

Clone and Install Dependencies
```
pip install -r requirements.txt
```
Download Local Models Run the helper script to fetch YOLO benchmarks and OpenCV Face models:
```
python models/download_models.py
```
External Services
- Ollama: If using Local Mode, Install Ollama and pull a model: ollama pull phi3.
- Google Cloud: Set your GOOGLE_APPLICATION_CREDENTIALS if using Firestore.

⚙️ Configuration

Initialize Settings: Copy the example configuration file to create your active settings:
```
cp config/settings.yaml.example config/settings.yaml
```
Edit Configuration: Open config/settings.yaml and customize your experience:

interaction:
  mode: "local" # options: local, gemini_live
  providers:
    stt: "whisper"
    tts: "piper"
    brain: "gemini" # use "gemini", "groq", or "ollama"

gemini:
  api_key: "YOUR_API_KEY"
  model: "gemini-2.0-flash"

🏃 Running the Agent

Start the Aura core loop:

python main.py

Command Line Overrides:

--config path/to/config.yaml: Use a custom settings file.
--headless: Run without the GUI/Visual window.
--duration N: Run for N seconds then shutdown.

📂 Project Structure

core/: Event bus, interaction strategies, and main agent loop.
vision/: YOLO tracking, Face Embeddings, and Emotion classification.
audio/: STT (Whisper), TTS (Piper), and Speaker ID.
dialogue/: Prompt engine and LLM client.
memory/: Identity store, Vector DB, and Social Memory management.
config/: System settings and environment tuning.
dashboard/: Web-based UI (available on localhost:5050).

🛡 Privacy & Ethics

Aura is built with a Consent-First approach. Biometric embeddings are stored locally by default, and the agent explicitly requests permission before persisting social memories.

Created for the Gemini Live Agent Challenge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aura: The Context-Aware Social AI Agent

✨ Features

🚀 Execution Modes

Option 1: Standard Multimodal (Recommended)

Option 2: Gemini Live (Immersive)

Option 3: Privacy-First (Fully Local)

🛠 Prerequisites

📥 Installation

⚙️ Configuration

🏃 Running the Agent

Command Line Overrides:

📂 Project Structure

🛡 Privacy & Ethics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
audio		audio
config		config
core		core
dashboard		dashboard
dialogue		dialogue
memory		memory
models		models
test		test
vision		vision
.gitignore		.gitignore
HACKATHON_SUBMISSION.md		HACKATHON_SUBMISSION.md
README.md		README.md
main.py		main.py
project_analysis_report.md		project_analysis_report.md
project_overview.md		project_overview.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Aura: The Context-Aware Social AI Agent

✨ Features

🚀 Execution Modes

Option 1: Standard Multimodal (Recommended)

Option 2: Gemini Live (Immersive)

Option 3: Privacy-First (Fully Local)

🛠 Prerequisites

📥 Installation

⚙️ Configuration

🏃 Running the Agent

Command Line Overrides:

📂 Project Structure

🛡 Privacy & Ethics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages