Skip to content

SaatvikPradhan/Align

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Align

Native iOS shell: AlignAppShell → home (AlignHomeView) with live pose (on-device Vision skeleton), Presage-style breathing (mock session today; swap in SmartSpectra), AI coaching (Gemini on-device or via backend), and voice (system speech and/or ElevenLabs). The product name is Align (not ReForm).

Presage SmartSpectra (vitals)

The Swift package Presage-Security/SmartSpectra is a dependency of AlignCore (iOS only). The Vitals tab embeds SmartSpectraView() for real pulse and breathing waveforms from the camera.

  1. Get an API key from physiology.presagetech.com.
  2. Set AlignSmartSpectraAPIKey in your app’s Info.plist (see ios/AlignApp/Info-ATS-example.plist).
  3. Run on a physical iPhone — SmartSpectra does not support Simulator.

Analyze keeps the Vision pose camera; Vitals runs SmartSpectra’s own session so the two don’t fight over the camera. Optional helpers: MockPresageBreathingSession and PresageBreathingStrip in ios/Align/Sources/AlignCore/Presage/ for UI previews.

Troubleshooting SmartSpectra

  • Usage verification failed / 401 Unauthorized: The Physiology API rejected your key. Re-copy the key from physiology.presagetech.com (no leading/trailing spaces), ensure the key is enabled for SDK usage, and confirm billing/access if your plan requires it. AlignSmartSpectraBootstrap.configureAtLaunch() in App.init() applies the key before UI loads.
  • **KalmanFilter / VideoWriter duplicate class**: Harmless warning from Presage’s PresagePreprocessing` vs Apple private frameworks; contact Presage if you see crashes.
  • Attempt to present … already presenting: Often the SDK’s tutorial/alert overlapping another modal; the Vitals screen delays showing SmartSpectraView briefly to reduce this.

Xcode app target (why you might not see Home)

  1. Add the Align Swift package (ios/Align) to your app target and depend on AlignCore (or Align).
  2. Add ios/AlignApp/AlignApp.swift to the app target — it is the @main entry and shows AlignAppShell() (tabs: Home, Analyze, Voice).
  3. Remove or disable the template *App.swift from Xcode (there must be only one @main), or replace its WindowGroup body with AlignAppShell() instead of ContentView() / LivePoseCameraView().

Without step 2–3, the simulator may still open whatever root view the template uses, so you will not see the home dashboard.

Do I run the backend separately?

Only if you want it.

  • Pose + skeleton: fully on the phone — no server.
  • Coaching + spoken cues: can be fully on the phone — the developer adds AlignGeminiAPIKey to the app’s Info.plist once; use Speak coaching aloud with the system voice (AVSpeechSynthesizer). No Python required.
  • Backend: optional — use the FastAPI app if you prefer keys on the Mac (GEMINI_API_KEY / ElevenLabs in .env) and leave the Gemini field empty in Analysis; the app will call POST /api/coach instead. ElevenLabs voice still needs the server unless you only use system speech.

Gemini coaching (gait / form)

Option A — on the iPhone (no backend):

  1. Get a key from Google AI Studio.
  2. As the developer, add AlignGeminiAPIKey (string) to your app target’s Info.plist — merge ios/AlignApp/Info-ATS-example.plist or set it in Xcode (Target → Info). Users never see or type the key.
  3. Get coaching from current pose calls Gemini from Swift. Speak coaching aloud uses the built-in Siri-style voice.

Option B — keys on the server:

  1. Set GEMINI_API_KEY in backend/.env and run uvicorn.
  2. Leave the in-app Gemini key empty and set the Voice tab server URL so POST /api/coach is used.

ElevenLabs connection check

VoiceTest connection runs GET /health, then GET /api/tts/verify, which calls ElevenLabs GET /v1/user with your API key (no TTS usage / charge). A green check means the key works with ElevenLabs’ API, not only that env vars exist.

ElevenLabs TTS

The ElevenLabs API key stays on the server. The iOS app talks only to your backend.

Backend

cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
<<<<<<< HEAD
# Set ELEVENLABS_* and optionally GEMINI_API_KEY in .env
=======
# Set ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID, and GEMINI_API_KEY (for POST /api/coach) in .env
>>>>>>> 57b60d7e704d7bf3e3668978c9a09c49b7cd8142
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

From the repo root instead (same app):
uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

<<<<<<< HEAD

  • GET /health — whether ElevenLabs + Gemini env vars are set.
  • GET /api/tts/verify — live ElevenLabs API check (/v1/user).
  • POST /api/coach — JSON {"pose_summary":"...","goal":"optional"}{"advice":"..."} (Gemini).
  • POST /api/tts — JSON {"text":"..."} → JSON with audio_base64 (MP3).

iOS (AlignCore)

  • AlignBundledAPIKeysInfo.plist: AlignGeminiAPIKey, AlignElevenLabsAPIKey, AlignElevenLabsVoiceID, optional AlignElevenLabsTTSModel.
  • GeminiDirectCoachClient — Gemini REST from the app.
  • ElevenLabsDirectTTSClient — ElevenLabs REST from the app (same as Python backend).
  • OnDeviceSpeechSynthesizerAVSpeechSynthesizer (no network).
  • ElevenLabsTTSClient — optional POST /api/tts via your backend.
  • TTSPlaygroundViewSpeak with system voice first; ElevenLabs server is optional. =======
  • GET /health — whether ElevenLabs, Gemini (GEMINI_API_KEY), and JWT (JWT_SECRET) are configured.
  • POST /api/tts — JSON {"text":"...","voice_id":null} → JSON with audio_base64 (MP3). Optional voice_id overrides ELEVENLABS_VOICE_ID in .env (the app sends the Profile “ElevenLabs voice id” when set).
  • POST /api/coach — JSON with pose_summary, optional goal, profile_context, user_message, doctor_pdf_text → JSON {"advice":"..."} (Gemini movement coach; key stays on the server).
  • POST /api/voices/clone — multipart form: name (string) + file (audio, e.g. .m4a) → JSON {"voice_id":"..."} (ElevenLabs instant clone; key stays on the server). The iOS Profile screen can record a sample and create voice avatars stored on-device.
  • POST /api/patients/register — JSON with display_name, email, and profile_json → JSON {"share_code":"..."} (patient shares code with doctor).
  • GET /api/doctor/{share_code} — JSON view of patient profile + plan + instructions.
  • POST /api/doctor/{share_code}/instructions — form author, message → adds instruction.
  • PUT /api/doctor/{share_code}/plan — JSON array → updates rehab plan.
  • GET /doctor/{share_code} — simple built-in doctor dashboard (HTML) for demo.

For microphone recording in the app target, add NSMicrophoneUsageDescription to your Xcode app’s Info.plist (the Swift package cannot declare it).

Auth — Firebase (Spark / free)

The iOS app uses Firebase Authentication (email/password + Sign in with Apple). Typical sign-in is $0 on the Spark plan; avoid SMS phone auth if you want to stay free.

  1. Create a project in Firebase Console (Spark / no billing required for normal auth usage).
  2. Add an iOS app with the same bundle ID as your Xcode app.
  3. Download GoogleService-Info.plist and add it to your Xcode app target (not only the Swift package).
  4. Enable AuthenticationSign-in method: Email/Password and Apple.
  5. In Xcode → Signing & Capabilities → add Sign in with Apple for the app target.

AlignAppShell calls AlignFirebase.configure() on appear (starts Firebase + session listener). No custom server URL is required for login.

Optional: FastAPI JWT auth (legacy / self-hosted)

If you still run the Python API with SQLite users, set JWT_SECRET (and optional APPLE_CLIENT_ID) in .env. Endpoints: POST /api/auth/register, /login, /apple, GET /api/auth/me. The iOS LoginView uses Firebase, not these endpoints, unless you switch the client back.

iOS (AlignCore)

  • LoginView — Firebase email/password, register, and Sign in with Apple (iOS).
  • AuthSessionStore.shared — Firebase User; AlignAppShell shows Login → Onboarding → Tabs.
  • ElevenLabsTTSClient — calls POST /api/tts (ElevenLabs voice on the server).
  • GeminiDirectCoachClient / GaitCoachClient — pose coaching with profile + user message + extracted doctor-PDF text; Analysis tab speaks replies with ElevenLabs (backend URL first, then bundled keys, else system voice).
  • TTSPlaygroundView — demo form: base URL, text, Speak with ElevenLabs.

57b60d7e704d7bf3e3668978c9a09c49b7cd8142

In your app: import AlignCore and use AlignAppShell() as root, or compose your own UI with ElevenLabsTTSClient + TTSAudioPlayer.

Simulator: base URL http://127.0.0.1:8000. Device: use your Mac’s IP and ensure the server binds 0.0.0.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors