GitHub - devlopingandroid/Sahayak-AI: Project Sahayak is an AI-powered wearable assistant designed specifically for individuals suffering from Early-to-Mid Stage Alzheimer’s and Dementia. Unlike standard trackers that only monitor location, Sahayak acts as an "External Hippocampus" helping the user navigate social interactions, identify objects, and recall lost items in real-time.

Watch Product Demonstration

✨ Protex: Hack-2-Win ✨

--- Watch Product Demonstration

📂 Table of Contents (Click to Expand)

The Vision
Problem Statement
Solution Architecture
Episodic Memory Engine
Team Sicario
Tech Stack
Setup & Installation

📉 The Problem: The "Context Gap"

Alzheimer’s and Dementia are not just about forgetting names; they are about losing the narrative of life.

Patients suffer from Episodic Memory Loss, meaning they forget the context of recent events.

"Where did I put my glasses?" (Object Permanence)
"Did I already take my medicine?" (Action Verification)
"Who is this person standing next to me?" (Social Recognition)

❌ Why Current Tech Fails?

Existing solutions address the symptoms, not the root cause.

Technology	What it does	Why it fails for Dementia?
GPS Trackers	Tracks user's location.	Tells where they are, but not what they are doing.
Smart Speakers	Answers general questions.	Connected to the Internet, not the user's personal reality.
CCTV Cameras	Passive recording.	No intelligence; cannot answer user queries in real-time.
Reminder Apps	Sets alarms.	Passive; requires the user to remember to input the data.

💡 Our Solution: The "External Hippocampus"

Sahayak fills this gap by acting as an artificial extension of the human brain.

It is an autonomous, wearable AI agent that continuously:

Observes the environment.
Understands context (Objects + People + Time).
Logs events into a secure, offline memory bank.
Recalls specific details upon voice command.

🧩 Solution at a Glance

👁️ Contextual Vision Using YOLOv8 + CLIP, Sahayak identifies objects (Keys, Wallet) and specific people (Family members), linking them to a location.	🧠 Episodic Memory It doesn't just store video; it stores Events. "Glass placed on Table at 6 PM" becomes a searchable memory.
🗣️ Natural Interaction No screens, no typing. The user simply asks, "Where are my glasses?" and gets a voice answer via Bone Conduction Audio.	🛡️ Privacy First What happens at home, stays at home. All processing happens locally on the Raspberry Pi. No cloud uploads.

📉 The Silent Epidemic

Alzheimer's and Dementia strip away a person's ability to recall the Context of Life.

🚫 The Struggle	❌ Existing "Smart" Tech	✅ The Sahayak Way
"Where is my wallet?"	GPS Trackers: Only show map location.	Visual Memory: "You left it on the kitchen counter."
"Who is this person?"	CCTV: Passive recording.	Face Rec: "This is your grandson, Aryan."
"Did I eat medicine?"	Alarms: Ring blindly.	Action Log: "Yes, you took the blue pill at 2 PM."

⚙️ Solution Architecture

We have engineered a Modular Agent System that runs entirely offline on the Edge.

graph TD
    %% Styling Definitions
    classDef sensory fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    classDef process fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    classDef memory fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,stroke-dasharray: 5 5;
    classDef action fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px;

    subgraph SENSORY["📷 SENSORY LAYER"]
        Cam("Pi Camera Module 3") -->|Video Stream| Vision[Vision Pipeline]
        Mic("Bone Conduction Mic") -->|Audio Stream| Audio[Audio Pipeline]
    end

    subgraph PROCESSING["🧠 PROCESSING LAYER (RPi 4)"]
        Vision -->|Frames| YOLO["YOLOv8<br>(Object Detection)"]
        Vision -->|Frames| Face["dlib<br>(Face Recognition)"]
        Audio -->|Voice| Whisper["Whisper STT"]
        
        YOLO -->|Metadata| Context[Context Engine]
        Face -->|Metadata| Context
        Whisper -->|Query| Intent[Intent Classifier]
    end

    subgraph MEMORY["💾 MEMORY LAYER"]
        Context -->|Write Events| SQL[("SQLite Event Log")]
        Intent <-->|Read Context| SQL
    end

    subgraph ACTION["🔊 ACTION LAYER"]
        Intent -->|Response Text| TTS["Coqui TTS"]
        TTS -->|Audio Signal| Speaker("Bone Conduction<br>Transducer")
    end

    %% Apply Styles
    class Cam,Mic,Vision,Audio sensory;
    class YOLO,Face,Whisper,Context,Intent process;
    class SQL memory;
    class TTS,Speaker action;

🌟 Key Features: Why Sahayak stands out?

Sahayak is not just a reminder app; it is a fully autonomous cognitive system.

🧠 Artificial Episodic Memory Unlike standard assistants that fetch facts from Google, Sahayak builds a personal timeline of your life. It remembers Who, What, Where, and When an event happened using a custom JSON Event Indexer.	🔒 100% On-Device Privacy "Your memories stay yours." The entire system runs offline on a Raspberry Pi. No video or audio is ever uploaded to the cloud, ensuring complete data sovereignty for the patient.
👁️ Semantic Vision Intelligence Powered by YOLOv8 + CLIP, Sahayak doesn't just "detect" objects; it "understands" them. It distinguishes between "Generic Glasses" and "MY Glasses" and uses stability checks to avoid false memories.	🗣️ Zero-UI Voice Interaction Designed for the elderly. No screens, no buttons. Just speak naturally. Using OpenAI Whisper (STT) and Edge Neural TTS, the conversation feels human, not robotic.
🤖 Multi-Agent Orchestration A sophisticated backend where specialized AI Agents (Vision, Memory, Query) talk to each other. This modular design ensures that if one part fails, the system recovers automatically.	⌚ Wearable & Assistive Design Compact integration with Bone Conduction Audio ensures the user stays aware of their surroundings while receiving private memory cues directly into their ear.

🛠️ Tech Stack & Dependencies

Our system acts as a bridge between Hardware and Advanced AI.

🦾 Hardware Layer (The Body)

Component	Specification	Functionality
Compute Unit		The central brain handling all Edge AI processing offline.
Vision Sensor		Captures high-res frames for the Vision Agent.
Audio Input		Captures user queries in real-time.
Audio Output		Delivers private, non-intrusive voice responses.

🧠 AI & Perception Layer (The Mind)

Technology	Model / Tool	Role in Sahayak
Object Detection		Detects objects (keys, medicine, glasses) instantly.
Semantic Understanding		Understands context & identifies specific people.
Speech-to-Text		Converts voice to text locally (Offline).
Text-to-Speech		Generates human-like natural voice output.

💻 Core Software & Orchestration

Component	Tech Stack	Description
Language		Primary logic and agent orchestration.
Vision Lib		Image processing and frame handling.
Memory Store		Storing episodic events (Time, Loc, Object).

⚡ Installation & Setup Guide

Follow these steps to deploy Sahayak on a Raspberry Pi 5 (or 4B).

📋 Prerequisites

Device: Raspberry Pi 5 (Recommended) or Pi 4 (8GB RAM) or ESP32 CAM

OS: Raspberry Pi OS (64-bit) Or Arduino IDE

Python: 3.9 or 3.10

Internet: Required for initial model downloads

🟢 Step 1: Clone the Repository

Open your terminal on the Raspberry Pi and run:

git clone https://github.com/YourUsername/Sahayak.git cd Sahayak

🟡 Step 2: Install System Dependencies

We need system-level tools for Audio and Vision.

sudo apt-get update sudo apt-get install python3-pyaudio portaudio19-dev libcamera-dev ffmpeg -y

🟠 Step 3: Set Up Virtual Environment

Keep dependencies isolated.

python3 -m venv venv source venv/bin/activate

🔵 Step 4: Install Python Requirements

Install YOLO, Whisper, EdgeTTS, etc.

pip install --upgrade pip pip install -r requirements.txt

Note: First install may take 5–10 minutes (PyTorch).

🟣 Step 5: Hardware Connection

Camera: Connect Pi Camera Module 3 to CSI port

Mic: Plug USB Microphone

Speaker: 3.5mm jack or Bluetooth Bone Conduction

🔴 Step 6: Launch Sahayak

python main.py

🕹️ Usage Instructions

Once the system is running, Sahayak becomes your active memory companion. No buttons needed—just speak.

🟢 1. Positioning the Device

Wear the device (or place the camera) such that it has a clear view of your table/room.
Ensure the microphone is not covered by clothing.

🔵 2. Voice Commands

Sahayak listens for natural language. You don't need robotic commands.

Intent	Example User Query	Sahayak's Response
Locate Object	"Where did I keep my glasses?"	"You left your glasses on the coffee table 10 minutes ago."
Identify Person	"Who is standing in front of me?"	"That is your grandson, Aryan."
Recall Action	"Did I take my medicine?"	"Yes, I saw you taking the red pill at 2:00 PM."
General Context	"What was I doing just now?"	"You were reading a newspaper on the sofa."

💡 Pro Tips for Best Results

Stability Matters: The memory is only formed when an object is stable for 3 seconds. Don't wave objects around quickly.
Lighting: Ensure the room is reasonably lit for the Camera to detect objects accurately.

📸 Screenshots & Demo

Here is Sahayak in action, processing the real world in real-time.

👁️ Computer Vision View

YOLOv8 detecting 'cup' and 'keys'

🧠 Terminal / Memory Log

System creating JSON memory logs

🌍 Real-World Impact & Use Cases

Sahayak is designed specifically to address the "Context Gap" faced by dementia patients.

🏥 Primary Use Case: Alzheimer's Care

Problem Scenario	Sahayak's Solution	Impact
The "Lost Item" Anxiety Patient panics because they can't find their wallet.	Visual Memory Recall "It is on the bedside table."	Reduces panic attacks and dependency on caregivers.
Social Withdrawal Patient avoids guests because they don't recognize faces.	Face Identity Whisper "This is Sharma Ji, your neighbor."	Restores social confidence and dignity.
Repetitive Questioning Asking "What time is lunch?" 20 times.	Patient Patience AI answers calmly every single time without getting frustrated.	Reduces caregiver burnout.

🏘️ Secondary Use Cases

Visually Impaired Assistance: Helping blind users locate objects in a room.
Smart Home Automation: Triggering lights/fans based on user location (Future integration).

🚀 Future Scope & Roadmap

We have a clear vision to evolve Sahayak from a prototype to a medical-grade product.

🗓️ Phase 1: Immediate Enhancements (Next 2 Months)

🔋 Battery Optimization: Implementing "Sleep Mode" when no motion is detected to extend battery life to 12+ hours.
🚨 Fall Detection: Using the camera's pose estimation to detect sudden falls and alert family members instantly.
🗣️ Hindi Language Support: Training a fine-tuned Whisper model for local Indian dialects.

🗓️ Phase 2: Long-Term Vision (6 Months+)

📱 Caregiver Companion App: A mobile dashboard for doctors/family to view memory logs and set safety geofences.
❤️ Emotion Analysis: Analyzing voice tonality to detect if the patient is stressed or confused and calming them down.
☁️ Optional Cloud Sync: Secure, encrypted cloud backup for long-term memory retrieval (e.g., "What did I do last Christmas?").

"Our ultimate goal is to make Sahayak invisible—technology that helps you live, without getting in the way."

⚔️ Meet Team Percevia

</td>

Tanish Aggarwal

👑 Team Lead
Hardware & Edge Privacy

Khushi Sharma

🧠 Memory Architect
Episodic Memory & Agents

Aayushi Gupta

🗣️ Voice Engineer
NLP & Accessibility

College: Vivekananda Institute of Professional Studies (VIPS), Delhi 🏛️

🌟 If you find Sahayak interesting, please give it a Star! 🌟

"Preserving memories, one line of code at a time."

Made Team Percevia
Protex: Hack-2-Win (Round 1)

Development Workflow

All features are developed using feature branches and merged via Pull Requests.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Repo_Documents		Repo_Documents
Web Interface		Web Interface
sahayak		sahayak
Product Demonstration.mp4		Product Demonstration.mp4
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

✨ Protex: Hack-2-Win ✨

📉 The Problem: The "Context Gap"

❌ Why Current Tech Fails?

💡 Our Solution: The "External Hippocampus"

🧩 Solution at a Glance

👁️ Contextual Vision

🧠 Episodic Memory

🗣️ Natural Interaction

🛡️ Privacy First

📉 The Silent Epidemic

⚙️ Solution Architecture

🌟 Key Features: Why Sahayak stands out?

🧠 Artificial Episodic Memory

🔒 100% On-Device Privacy

👁️ Semantic Vision Intelligence

🗣️ Zero-UI Voice Interaction

🤖 Multi-Agent Orchestration

⌚ Wearable & Assistive Design

🛠️ Tech Stack & Dependencies

🦾 Hardware Layer (The Body)

🧠 AI & Perception Layer (The Mind)

💻 Core Software & Orchestration

⚡ Installation & Setup Guide

📋 Prerequisites

🟢 Step 1: Clone the Repository

🟡 Step 2: Install System Dependencies

🟠 Step 3: Set Up Virtual Environment

🔵 Step 4: Install Python Requirements

🟣 Step 5: Hardware Connection

🔴 Step 6: Launch Sahayak

🕹️ Usage Instructions

🟢 1. Positioning the Device

🔵 2. Voice Commands

💡 Pro Tips for Best Results

📸 Screenshots & Demo

🌍 Real-World Impact & Use Cases

🏥 Primary Use Case: Alzheimer's Care

🏘️ Secondary Use Cases

🚀 Future Scope & Roadmap

🗓️ Phase 1: Immediate Enhancements (Next 2 Months)

🗓️ Phase 2: Long-Term Vision (6 Months+)

⚔️ Meet Team Percevia

🌟 If you find Sahayak interesting, please give it a Star! 🌟

Development Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages