Skip to content

KrishMatrix/Spectral_Trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spectral Trainer Demo
"The trainer you always deserved. Now powered by AI."

iOS app that combines Amazon Nova Sonic (real-time speech-to-speech AI) with Apple Vision hand pose detection to coach users through physiotherapy exercises with a fully conversational AI companion.


What Pep Does

  • Conversational onboarding — learns your name, injury, and motivation through natural speech. No forms.
  • Real-time voice coaching — talks you through every rep, counts with you, checks your comfort, and adapts if you say something hurts.
  • Live hand tracking — Front camera detects your nody skeleton in real-time using Apple's Vision framework and overlays it on screen.
  • Barge-in support — Interrupt mid-sentence. It stops and listens immediately.
  • Progress tracking — Day streak, exercise stats, and a report you can share with your app .

Architecture

iPhone App (Swift/SwiftUI)
        │
        │  WebSocket — binary PCM audio + JSON control messages
        │
Python Backend (FastAPI)
        │
        │  HTTP/2 Bidirectional Stream (SigV4 signed)
        │
Amazon Nova Sonic — Bedrock (amazon.nova-sonic-v1:0)

iOS App

pepApp
  └── LandingView
        ├── NovaSonicManager ("onboard" mode)
        │     ├── URLSessionWebSocketTask   — WebSocket to backend
        │     ├── AVAudioEngine.inputNode   — microphone
        │     │     └── AVAudioConverter    — device format → 16kHz Int16 mono
        │     └── AVAudioPlayerNode         — speaker (24kHz Float32 mono)
        │
        └── ExerciseSelectionView
              └── ExerciseView
                    ├── NovaSonicManager ("exercise" mode)
                    └── ExerciseManager
                          ├── AVCaptureSession               — front camera
                          └── VNDetectHumanHandPoseRequest   — hand tracking

NovaSonicManager runs two concurrent tasks:

  • Mic tap → converts audio to 16 kHz Int16 PCM → sends binary frames over WebSocket
  • Receive loop → binary audio from WebSocket → converts Int16→Float32 → plays via AVAudioPlayerNode. Also handles JSON transcript and barge-in messages.

Python Backend

FastAPI  /ws/{mode}
  ├── receive_from_ios()       — puts binary audio into asyncio.Queue
  └── NovaSonicSession.run()
        ├── _send_loop()       — sends Nova Sonic event sequence + audio chunks
        └── _receive_loop()    — routes audioOutput (binary) and textOutput (JSON) back to iOS

The backend is a stateless relay — one WebSocket session per user, two concurrent async tasks bridging iOS and Nova Sonic.

Nova Sonic Event Protocol

Client → Nova Sonic              Nova Sonic → Client
──────────────────               ─────────────────────
sessionStart                     completionStart
promptStart                      contentStart (role + generationStage)
contentStart (SYSTEM)            textOutput  — ASR transcript of user
textInput (system prompt)        textOutput  — SPECULATIVE (what it will say)
contentEnd                       audioOutput — base64 PCM chunks (24kHz)
contentStart (AUDIO)             textOutput  — FINAL transcript
audioInput × N  (16kHz PCM)      completionEnd
contentEnd
promptEnd
sessionEnd

Audio Data Flow

User speaks
  → AVAudioEngine tap (device native format, ~100ms chunks)
  → AVAudioConverter → 16kHz, Int16, mono
  → WebSocket binary frame
  → Backend base64-encodes → audioInput event
  → Nova Sonic processes speech

Nova Sonic responds
  → audioOutput event (base64, 24kHz Int16 mono)
  → Backend decodes → WebSocket binary frame
  → iOS: Int16 → Float32
  → AVAudioPlayerNode plays
  → User hears Pep's voice

User Journey

First Launch — Onboarding

  1. App opens → connects to backend → Nova Sonic session starts
  2. app greets the user by voice and collects:
    • Name, age, injured body part, motivation
  3. communitate show the conversation in real-time
  4. "Continue to Exercises" button appears is done
  5. onboarded = true saved to UserDefaults

Every Launch After — Exercise

  1. App opens → Pep greets returning user immediately
  2. User taps "Continue to Exercises"
  3. Selects from three difficulty levels:
    • Finger Spreads — Best Match
    • Mobility Touches — Progression
    • Make a Fist — Regression
  4. Front camera opens + hand skeleton overlay appears
  5. Pep coaches live via voice — counts reps, checks comfort, encourages
  6. User can speak freely ("that hurt", "how many more?") — Pep responds
  7. "Complete Exercise" → Exercise Report with stats, streak, and PT report option

Tech Stack

Component Technology
Voice AI Amazon Nova Sonic via AWS Bedrock
Backend Python, FastAPI, aws_sdk_bedrock_runtime
iOS UI SwiftUI
Hand Tracking Apple Vision (VNDetectHumanHandPoseRequest)
Camera AVFoundation
Audio AVAudioEngine, AVAudioPlayerNode, AVAudioConverter
Storage UserDefaults

Setup & Running

Prerequisites

  • AWS account with Amazon Bedrock access and Nova Sonic enabled in us-east-1
  • Python 3.11+
  • Xcode 15+ with iOS 18.2 deployment target

1. Start the Backend

cd backend
pip install -r requirements.txt

export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1

python main.py
# Server starts at http://0.0.0.0:8000

2. Configure iOS Server URL

In pep/pep/Managers/NovaSonicManager.swift, line 13:

// iOS Simulator
let kServerBaseURL = URL(string: "ws://localhost:8000")!

// Physical device — use your Mac's local IP
let kServerBaseURL = URL(string: "ws://192.168.x.x:8000")!

3. Build & Run in Xcode

  • Open pep/pep.xcodeproj
  • Select your simulator or device
  • Build & Run (⌘R)

The app requires microphone and camera permissions — grant both on first launch.


Project Structure

Pep_11labs_hack/
├── backend/
│   ├── main.py                  # FastAPI WebSocket server
│   └── requirements.txt
└── pep/
    └── pep/
        ├── pepApp.swift
        ├── Managers/
        │   ├── NovaSonicManager.swift   # WebSocket + audio (core)
        │   ├── ExerciseManager.swift    # Camera + hand pose detection
        │   └── UserProfileManager.swift # UserDefaults persistence
        └── Views/
            ├── LandingView.swift        # Home + onboarding/check-in
            ├── ExerciseSelectionView.swift
            ├── ExerciseView.swift       # Live camera + coaching
            └── ExerciseReportView.swift

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors