"The trainer you always deserved. Now powered by AI."
iOS app that combines Amazon Nova Sonic (real-time speech-to-speech AI) with Apple Vision hand pose detection to coach users through physiotherapy exercises with a fully conversational AI companion.
- Conversational onboarding — learns your name, injury, and motivation through natural speech. No forms.
- Real-time voice coaching — talks you through every rep, counts with you, checks your comfort, and adapts if you say something hurts.
- Live hand tracking — Front camera detects your nody skeleton in real-time using Apple's Vision framework and overlays it on screen.
- Barge-in support — Interrupt mid-sentence. It stops and listens immediately.
- Progress tracking — Day streak, exercise stats, and a report you can share with your app .
iPhone App (Swift/SwiftUI)
│
│ WebSocket — binary PCM audio + JSON control messages
│
Python Backend (FastAPI)
│
│ HTTP/2 Bidirectional Stream (SigV4 signed)
│
Amazon Nova Sonic — Bedrock (amazon.nova-sonic-v1:0)
pepApp
└── LandingView
├── NovaSonicManager ("onboard" mode)
│ ├── URLSessionWebSocketTask — WebSocket to backend
│ ├── AVAudioEngine.inputNode — microphone
│ │ └── AVAudioConverter — device format → 16kHz Int16 mono
│ └── AVAudioPlayerNode — speaker (24kHz Float32 mono)
│
└── ExerciseSelectionView
└── ExerciseView
├── NovaSonicManager ("exercise" mode)
└── ExerciseManager
├── AVCaptureSession — front camera
└── VNDetectHumanHandPoseRequest — hand tracking
NovaSonicManager runs two concurrent tasks:
- Mic tap → converts audio to 16 kHz Int16 PCM → sends binary frames over WebSocket
- Receive loop → binary audio from WebSocket → converts Int16→Float32 → plays via
AVAudioPlayerNode. Also handles JSON transcript and barge-in messages.
FastAPI /ws/{mode}
├── receive_from_ios() — puts binary audio into asyncio.Queue
└── NovaSonicSession.run()
├── _send_loop() — sends Nova Sonic event sequence + audio chunks
└── _receive_loop() — routes audioOutput (binary) and textOutput (JSON) back to iOS
The backend is a stateless relay — one WebSocket session per user, two concurrent async tasks bridging iOS and Nova Sonic.
Client → Nova Sonic Nova Sonic → Client
────────────────── ─────────────────────
sessionStart completionStart
promptStart contentStart (role + generationStage)
contentStart (SYSTEM) textOutput — ASR transcript of user
textInput (system prompt) textOutput — SPECULATIVE (what it will say)
contentEnd audioOutput — base64 PCM chunks (24kHz)
contentStart (AUDIO) textOutput — FINAL transcript
audioInput × N (16kHz PCM) completionEnd
contentEnd
promptEnd
sessionEnd
User speaks
→ AVAudioEngine tap (device native format, ~100ms chunks)
→ AVAudioConverter → 16kHz, Int16, mono
→ WebSocket binary frame
→ Backend base64-encodes → audioInput event
→ Nova Sonic processes speech
Nova Sonic responds
→ audioOutput event (base64, 24kHz Int16 mono)
→ Backend decodes → WebSocket binary frame
→ iOS: Int16 → Float32
→ AVAudioPlayerNode plays
→ User hears Pep's voice
- App opens → connects to backend → Nova Sonic session starts
- app greets the user by voice and collects:
- Name, age, injured body part, motivation
- communitate show the conversation in real-time
- "Continue to Exercises" button appears is done
onboarded = truesaved to UserDefaults
- App opens → Pep greets returning user immediately
- User taps "Continue to Exercises"
- Selects from three difficulty levels:
- Finger Spreads — Best Match
- Mobility Touches — Progression
- Make a Fist — Regression
- Front camera opens + hand skeleton overlay appears
- Pep coaches live via voice — counts reps, checks comfort, encourages
- User can speak freely ("that hurt", "how many more?") — Pep responds
- "Complete Exercise" → Exercise Report with stats, streak, and PT report option
| Component | Technology |
|---|---|
| Voice AI | Amazon Nova Sonic via AWS Bedrock |
| Backend | Python, FastAPI, aws_sdk_bedrock_runtime |
| iOS UI | SwiftUI |
| Hand Tracking | Apple Vision (VNDetectHumanHandPoseRequest) |
| Camera | AVFoundation |
| Audio | AVAudioEngine, AVAudioPlayerNode, AVAudioConverter |
| Storage | UserDefaults |
- AWS account with Amazon Bedrock access and Nova Sonic enabled in
us-east-1 - Python 3.11+
- Xcode 15+ with iOS 18.2 deployment target
cd backend
pip install -r requirements.txt
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1
python main.py
# Server starts at http://0.0.0.0:8000In pep/pep/Managers/NovaSonicManager.swift, line 13:
// iOS Simulator
let kServerBaseURL = URL(string: "ws://localhost:8000")!
// Physical device — use your Mac's local IP
let kServerBaseURL = URL(string: "ws://192.168.x.x:8000")!- Open
pep/pep.xcodeproj - Select your simulator or device
- Build & Run (
⌘R)
The app requires microphone and camera permissions — grant both on first launch.
Pep_11labs_hack/
├── backend/
│ ├── main.py # FastAPI WebSocket server
│ └── requirements.txt
└── pep/
└── pep/
├── pepApp.swift
├── Managers/
│ ├── NovaSonicManager.swift # WebSocket + audio (core)
│ ├── ExerciseManager.swift # Camera + hand pose detection
│ └── UserProfileManager.swift # UserDefaults persistence
└── Views/
├── LandingView.swift # Home + onboarding/check-in
├── ExerciseSelectionView.swift
├── ExerciseView.swift # Live camera + coaching
└── ExerciseReportView.swift
