GitHub - KrishMatrix/Spectral_Trainer

"The trainer you always deserved. Now powered by AI."

iOS app that combines Amazon Nova Sonic (real-time speech-to-speech AI) with Apple Vision hand pose detection to coach users through physiotherapy exercises with a fully conversational AI companion.

What Pep Does

Conversational onboarding — learns your name, injury, and motivation through natural speech. No forms.
Real-time voice coaching — talks you through every rep, counts with you, checks your comfort, and adapts if you say something hurts.
Live hand tracking — Front camera detects your nody skeleton in real-time using Apple's Vision framework and overlays it on screen.
Barge-in support — Interrupt mid-sentence. It stops and listens immediately.
Progress tracking — Day streak, exercise stats, and a report you can share with your app .

Architecture

iPhone App (Swift/SwiftUI)
        │
        │  WebSocket — binary PCM audio + JSON control messages
        │
Python Backend (FastAPI)
        │
        │  HTTP/2 Bidirectional Stream (SigV4 signed)
        │
Amazon Nova Sonic — Bedrock (amazon.nova-sonic-v1:0)

iOS App

pepApp
  └── LandingView
        ├── NovaSonicManager ("onboard" mode)
        │     ├── URLSessionWebSocketTask   — WebSocket to backend
        │     ├── AVAudioEngine.inputNode   — microphone
        │     │     └── AVAudioConverter    — device format → 16kHz Int16 mono
        │     └── AVAudioPlayerNode         — speaker (24kHz Float32 mono)
        │
        └── ExerciseSelectionView
              └── ExerciseView
                    ├── NovaSonicManager ("exercise" mode)
                    └── ExerciseManager
                          ├── AVCaptureSession               — front camera
                          └── VNDetectHumanHandPoseRequest   — hand tracking

NovaSonicManager runs two concurrent tasks:

Mic tap → converts audio to 16 kHz Int16 PCM → sends binary frames over WebSocket
Receive loop → binary audio from WebSocket → converts Int16→Float32 → plays via AVAudioPlayerNode. Also handles JSON transcript and barge-in messages.

Python Backend

FastAPI  /ws/{mode}
  ├── receive_from_ios()       — puts binary audio into asyncio.Queue
  └── NovaSonicSession.run()
        ├── _send_loop()       — sends Nova Sonic event sequence + audio chunks
        └── _receive_loop()    — routes audioOutput (binary) and textOutput (JSON) back to iOS

The backend is a stateless relay — one WebSocket session per user, two concurrent async tasks bridging iOS and Nova Sonic.

Nova Sonic Event Protocol

Client → Nova Sonic              Nova Sonic → Client
──────────────────               ─────────────────────
sessionStart                     completionStart
promptStart                      contentStart (role + generationStage)
contentStart (SYSTEM)            textOutput  — ASR transcript of user
textInput (system prompt)        textOutput  — SPECULATIVE (what it will say)
contentEnd                       audioOutput — base64 PCM chunks (24kHz)
contentStart (AUDIO)             textOutput  — FINAL transcript
audioInput × N  (16kHz PCM)      completionEnd
contentEnd
promptEnd
sessionEnd

Audio Data Flow

User speaks
  → AVAudioEngine tap (device native format, ~100ms chunks)
  → AVAudioConverter → 16kHz, Int16, mono
  → WebSocket binary frame
  → Backend base64-encodes → audioInput event
  → Nova Sonic processes speech

Nova Sonic responds
  → audioOutput event (base64, 24kHz Int16 mono)
  → Backend decodes → WebSocket binary frame
  → iOS: Int16 → Float32
  → AVAudioPlayerNode plays
  → User hears Pep's voice

User Journey

First Launch — Onboarding

App opens → connects to backend → Nova Sonic session starts
app greets the user by voice and collects:
- Name, age, injured body part, motivation
communitate show the conversation in real-time
"Continue to Exercises" button appears is done
onboarded = true saved to UserDefaults

Every Launch After — Exercise

App opens → Pep greets returning user immediately
User taps "Continue to Exercises"
Selects from three difficulty levels:
- Finger Spreads — Best Match
- Mobility Touches — Progression
- Make a Fist — Regression
Front camera opens + hand skeleton overlay appears
Pep coaches live via voice — counts reps, checks comfort, encourages
User can speak freely ("that hurt", "how many more?") — Pep responds
"Complete Exercise" → Exercise Report with stats, streak, and PT report option

Tech Stack

Component	Technology
Voice AI	Amazon Nova Sonic via AWS Bedrock
Backend	Python, FastAPI, `aws_sdk_bedrock_runtime`
iOS UI	SwiftUI
Hand Tracking	Apple Vision (`VNDetectHumanHandPoseRequest`)
Camera	AVFoundation
Audio	AVAudioEngine, AVAudioPlayerNode, AVAudioConverter
Storage	UserDefaults

Setup & Running

Prerequisites

AWS account with Amazon Bedrock access and Nova Sonic enabled in us-east-1
Python 3.11+
Xcode 15+ with iOS 18.2 deployment target

1. Start the Backend

cd backend
pip install -r requirements.txt

export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1

python main.py
# Server starts at http://0.0.0.0:8000

2. Configure iOS Server URL

In pep/pep/Managers/NovaSonicManager.swift, line 13:

// iOS Simulator
let kServerBaseURL = URL(string: "ws://localhost:8000")!

// Physical device — use your Mac's local IP
let kServerBaseURL = URL(string: "ws://192.168.x.x:8000")!

3. Build & Run in Xcode

Open pep/pep.xcodeproj
Select your simulator or device
Build & Run (⌘R)

The app requires microphone and camera permissions — grant both on first launch.

Project Structure

Pep_11labs_hack/
├── backend/
│   ├── main.py                  # FastAPI WebSocket server
│   └── requirements.txt
└── pep/
    └── pep/
        ├── pepApp.swift
        ├── Managers/
        │   ├── NovaSonicManager.swift   # WebSocket + audio (core)
        │   ├── ExerciseManager.swift    # Camera + hand pose detection
        │   └── UserProfileManager.swift # UserDefaults persistence
        └── Views/
            ├── LandingView.swift        # Home + onboarding/check-in
            ├── ExerciseSelectionView.swift
            ├── ExerciseView.swift       # Live camera + coaching
            └── ExerciseReportView.swift

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.sipu		.sipu
backend		backend
pep		pep
.DS_Store		.DS_Store
README.md		README.md
spectral_trainer.gif		spectral_trainer.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Pep Does

Architecture

iOS App

Python Backend

Nova Sonic Event Protocol

Audio Data Flow

User Journey

First Launch — Onboarding

Every Launch After — Exercise

Tech Stack

Setup & Running

Prerequisites

1. Start the Backend

2. Configure iOS Server URL

3. Build & Run in Xcode

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What Pep Does

Architecture

iOS App

Python Backend

Nova Sonic Event Protocol

Audio Data Flow

User Journey

First Launch — Onboarding

Every Launch After — Exercise

Tech Stack

Setup & Running

Prerequisites

1. Start the Backend

2. Configure iOS Server URL

3. Build & Run in Xcode

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages