Skip to content

devlopingandroid/Sahayak-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Watch Product Demonstration Sahayak Banner

✨ Protex: Hack-2-Win ✨


   


--- Watch Product Demonstration
📂 Table of Contents (Click to Expand)
  1. The Vision
  2. Problem Statement
  3. Solution Architecture
  4. Episodic Memory Engine
  5. Team Sicario
  6. Tech Stack
  7. Setup & Installation

📉 The Problem: The "Context Gap"

Alzheimer’s and Dementia are not just about forgetting names; they are about losing the narrative of life.

Patients suffer from Episodic Memory Loss, meaning they forget the context of recent events.

  • "Where did I put my glasses?" (Object Permanence)
  • "Did I already take my medicine?" (Action Verification)
  • "Who is this person standing next to me?" (Social Recognition)

❌ Why Current Tech Fails?

Existing solutions address the symptoms, not the root cause.

Technology What it does Why it fails for Dementia?
GPS Trackers Tracks user's location. Tells where they are, but not what they are doing.
Smart Speakers Answers general questions. Connected to the Internet, not the user's personal reality.
CCTV Cameras Passive recording. No intelligence; cannot answer user queries in real-time.
Reminder Apps Sets alarms. Passive; requires the user to remember to input the data.

💡 Our Solution: The "External Hippocampus"

Sahayak fills this gap by acting as an artificial extension of the human brain.

It is an autonomous, wearable AI agent that continuously:

  1. Observes the environment.
  2. Understands context (Objects + People + Time).
  3. Logs events into a secure, offline memory bank.
  4. Recalls specific details upon voice command.

🧩 Solution at a Glance

👁️ Contextual Vision

Using YOLOv8 + CLIP, Sahayak identifies objects (Keys, Wallet) and specific people (Family members), linking them to a location.

🧠 Episodic Memory

It doesn't just store video; it stores Events.
"Glass placed on Table at 6 PM" becomes a searchable memory.

🗣️ Natural Interaction

No screens, no typing. The user simply asks, "Where are my glasses?" and gets a voice answer via Bone Conduction Audio.

🛡️ Privacy First

What happens at home, stays at home. All processing happens locally on the Raspberry Pi. No cloud uploads.


📉 The Silent Epidemic

Alzheimer's and Dementia strip away a person's ability to recall the Context of Life.

🚫 The Struggle ❌ Existing "Smart" Tech ✅ The Sahayak Way
"Where is my wallet?" GPS Trackers: Only show map location. Visual Memory: "You left it on the kitchen counter."
"Who is this person?" CCTV: Passive recording. Face Rec: "This is your grandson, Aryan."
"Did I eat medicine?" Alarms: Ring blindly. Action Log: "Yes, you took the blue pill at 2 PM."

⚙️ Solution Architecture

We have engineered a Modular Agent System that runs entirely offline on the Edge.

graph TD
    %% Styling Definitions
    classDef sensory fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    classDef process fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    classDef memory fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,stroke-dasharray: 5 5;
    classDef action fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px;

    subgraph SENSORY["📷 SENSORY LAYER"]
        Cam("Pi Camera Module 3") -->|Video Stream| Vision[Vision Pipeline]
        Mic("Bone Conduction Mic") -->|Audio Stream| Audio[Audio Pipeline]
    end

    subgraph PROCESSING["🧠 PROCESSING LAYER (RPi 4)"]
        Vision -->|Frames| YOLO["YOLOv8<br>(Object Detection)"]
        Vision -->|Frames| Face["dlib<br>(Face Recognition)"]
        Audio -->|Voice| Whisper["Whisper STT"]
        
        YOLO -->|Metadata| Context[Context Engine]
        Face -->|Metadata| Context
        Whisper -->|Query| Intent[Intent Classifier]
    end

    subgraph MEMORY["💾 MEMORY LAYER"]
        Context -->|Write Events| SQL[("SQLite Event Log")]
        Intent <-->|Read Context| SQL
    end

    subgraph ACTION["🔊 ACTION LAYER"]
        Intent -->|Response Text| TTS["Coqui TTS"]
        TTS -->|Audio Signal| Speaker("Bone Conduction<br>Transducer")
    end

    %% Apply Styles
    class Cam,Mic,Vision,Audio sensory;
    class YOLO,Face,Whisper,Context,Intent process;
    class SQL memory;
    class TTS,Speaker action;

Loading

🌟 Key Features: Why Sahayak stands out?

Sahayak is not just a reminder app; it is a fully autonomous cognitive system.

🧠 Artificial Episodic Memory

Unlike standard assistants that fetch facts from Google, Sahayak builds a personal timeline of your life. It remembers Who, What, Where, and When an event happened using a custom JSON Event Indexer.

🔒 100% On-Device Privacy

"Your memories stay yours." The entire system runs offline on a Raspberry Pi. No video or audio is ever uploaded to the cloud, ensuring complete data sovereignty for the patient.

👁️ Semantic Vision Intelligence

Powered by YOLOv8 + CLIP, Sahayak doesn't just "detect" objects; it "understands" them. It distinguishes between "Generic Glasses" and "MY Glasses" and uses stability checks to avoid false memories.

🗣️ Zero-UI Voice Interaction

Designed for the elderly. No screens, no buttons. Just speak naturally. Using OpenAI Whisper (STT) and Edge Neural TTS, the conversation feels human, not robotic.

🤖 Multi-Agent Orchestration

A sophisticated backend where specialized AI Agents (Vision, Memory, Query) talk to each other. This modular design ensures that if one part fails, the system recovers automatically.

⌚ Wearable & Assistive Design

Compact integration with Bone Conduction Audio ensures the user stays aware of their surroundings while receiving private memory cues directly into their ear.


🛠️ Tech Stack & Dependencies

Our system acts as a bridge between Hardware and Advanced AI.

🦾 Hardware Layer (The Body)

Component Specification Functionality
Compute Unit Raspberry Pi The central brain handling all Edge AI processing offline.
Vision Sensor Camera Captures high-res frames for the Vision Agent.
Audio Input Mic Captures user queries in real-time.
Audio Output Speaker Delivers private, non-intrusive voice responses.

🧠 AI & Perception Layer (The Mind)

Technology Model / Tool Role in Sahayak
Object Detection YOLOv8 Detects objects (keys, medicine, glasses) instantly.
Semantic Understanding CLIP Understands context & identifies specific people.
Speech-to-Text Whisper Converts voice to text locally (Offline).
Text-to-Speech EdgeTTS Generates human-like natural voice output.

💻 Core Software & Orchestration

Component Tech Stack Description
Language Python Primary logic and agent orchestration.
Vision Lib OpenCV Image processing and frame handling.
Memory Store JSON Storing episodic events (Time, Loc, Object).

⚡ Installation & Setup Guide

Follow these steps to deploy Sahayak on a Raspberry Pi 5 (or 4B).

📋 Prerequisites

Device: Raspberry Pi 5 (Recommended) or Pi 4 (8GB RAM) or ESP32 CAM

OS: Raspberry Pi OS (64-bit) Or Arduino IDE

Python: 3.9 or 3.10

Internet: Required for initial model downloads

🟢 Step 1: Clone the Repository

Open your terminal on the Raspberry Pi and run:

git clone https://github.com/YourUsername/Sahayak.git cd Sahayak

🟡 Step 2: Install System Dependencies

We need system-level tools for Audio and Vision.

sudo apt-get update sudo apt-get install python3-pyaudio portaudio19-dev libcamera-dev ffmpeg -y

🟠 Step 3: Set Up Virtual Environment

Keep dependencies isolated.

python3 -m venv venv source venv/bin/activate

🔵 Step 4: Install Python Requirements

Install YOLO, Whisper, EdgeTTS, etc.

pip install --upgrade pip pip install -r requirements.txt

Note: First install may take 5–10 minutes (PyTorch).

🟣 Step 5: Hardware Connection

Camera: Connect Pi Camera Module 3 to CSI port

Mic: Plug USB Microphone

Speaker: 3.5mm jack or Bluetooth Bone Conduction

🔴 Step 6: Launch Sahayak

python main.py


🕹️ Usage Instructions

Once the system is running, Sahayak becomes your active memory companion. No buttons needed—just speak.

🟢 1. Positioning the Device

  • Wear the device (or place the camera) such that it has a clear view of your table/room.
  • Ensure the microphone is not covered by clothing.

🔵 2. Voice Commands

Sahayak listens for natural language. You don't need robotic commands.

Intent Example User Query Sahayak's Response
Locate Object "Where did I keep my glasses?" "You left your glasses on the coffee table 10 minutes ago."
Identify Person "Who is standing in front of me?" "That is your grandson, Aryan."
Recall Action "Did I take my medicine?" "Yes, I saw you taking the red pill at 2:00 PM."
General Context "What was I doing just now?" "You were reading a newspaper on the sofa."

💡 Pro Tips for Best Results

  • Stability Matters: The memory is only formed when an object is stable for 3 seconds. Don't wave objects around quickly.
  • Lighting: Ensure the room is reasonably lit for the Camera to detect objects accurately.

📸 Screenshots & Demo

Here is Sahayak in action, processing the real world in real-time.


👁️ Computer Vision View
YOLO Detection
YOLOv8 detecting 'cup' and 'keys'
🧠 Terminal / Memory Log
Terminal Output
System creating JSON memory logs

🌍 Real-World Impact & Use Cases

Sahayak is designed specifically to address the "Context Gap" faced by dementia patients.

🏥 Primary Use Case: Alzheimer's Care

Problem Scenario Sahayak's Solution Impact
The "Lost Item" Anxiety
Patient panics because they can't find their wallet.
Visual Memory Recall
"It is on the bedside table."
Reduces panic attacks and dependency on caregivers.
Social Withdrawal
Patient avoids guests because they don't recognize faces.
Face Identity Whisper
"This is Sharma Ji, your neighbor."
Restores social confidence and dignity.
Repetitive Questioning
Asking "What time is lunch?" 20 times.
Patient Patience
AI answers calmly every single time without getting frustrated.
Reduces caregiver burnout.

🏘️ Secondary Use Cases

  • Visually Impaired Assistance: Helping blind users locate objects in a room.
  • Smart Home Automation: Triggering lights/fans based on user location (Future integration).

🚀 Future Scope & Roadmap

We have a clear vision to evolve Sahayak from a prototype to a medical-grade product.

🗓️ Phase 1: Immediate Enhancements (Next 2 Months)

  • 🔋 Battery Optimization: Implementing "Sleep Mode" when no motion is detected to extend battery life to 12+ hours.
  • 🚨 Fall Detection: Using the camera's pose estimation to detect sudden falls and alert family members instantly.
  • 🗣️ Hindi Language Support: Training a fine-tuned Whisper model for local Indian dialects.

🗓️ Phase 2: Long-Term Vision (6 Months+)

  • 📱 Caregiver Companion App: A mobile dashboard for doctors/family to view memory logs and set safety geofences.
  • ❤️ Emotion Analysis: Analyzing voice tonality to detect if the patient is stressed or confused and calming them down.
  • ☁️ Optional Cloud Sync: Secure, encrypted cloud backup for long-term memory retrieval (e.g., "What did I do last Christmas?").

"Our ultimate goal is to make Sahayak invisible—technology that helps you live, without getting in the way."


⚔️ Meet Team Percevia

</td>
Team Member
Tanish Aggarwal
LinkedIn
👑 Team Lead
Hardware & Edge Privacy
Team Member
Khushi Sharma
LinkedIn
🧠 Memory Architect
Episodic Memory & Agents
Team Member
Aayushi Gupta
LinkedIn
🗣️ Voice Engineer
NLP & Accessibility

College: Vivekananda Institute of Professional Studies (VIPS), Delhi 🏛️



🌟 If you find Sahayak interesting, please give it a Star! 🌟

"Preserving memories, one line of code at a time."


Made Team Percevia
Protex: Hack-2-Win (Round 1)

Development Workflow

All features are developed using feature branches and merged via Pull Requests.

About

Project Sahayak is an AI-powered wearable assistant designed specifically for individuals suffering from Early-to-Mid Stage Alzheimer’s and Dementia. Unlike standard trackers that only monitor location, Sahayak acts as an "External Hippocampus" helping the user navigate social interactions, identify objects, and recall lost items in real-time.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors