Skip to content

Toltally-suck-at-code/LiveAPI

Repository files navigation

LiveAPI

Currently testing some LiveApi Stuff

Running the LiveTest Application

Prerequisites

  • Python 3.8 or higher
  • Gemini API key (get one from Google AI Studio)
  • Working microphone
  • Webcam (for camera mode)
  • Speakers or headphones

Installation Steps

  1. Clone the repository:
git clone https://github.com/Toltally-suck-at-code/LiveAPI.git
cd LiveAPI
  1. Create and activate a virtual environment:

For Mac/Linux:

# Create virtual environment
python -m venv venv

# Activate virtual environment
source venv/bin/activate

For Windows (Command Prompt):

# Create virtual environment
python -m venv venv

# Activate virtual environment
venv\Scripts\activate.bat

For Windows (PowerShell):

# Create virtual environment
python -m venv venv

# Activate virtual environment
.\venv\Scripts\Activate.ps1
  1. Install system dependencies:

For macOS:

# Install PortAudio using Homebrew
brew install portaudio

For Ubuntu/Debian Linux:

sudo apt-get install python3-pyaudio portaudio19-dev

For Windows:

# No additional system dependencies required
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Set up your Gemini API key:

For Mac/Linux:

export GEMINI_API_KEY='your-api-key-here'

For Windows (Command Prompt):

set GEMINI_API_KEY=your-api-key-here

For Windows (PowerShell):

$env:GEMINI_API_KEY='your-api-key-here'

Running the Application

python /app.py

You can run the application in three different modes: (BACKUP MODE!)

  1. Camera mode (default):
python Old/LiveTest.py --mode camera
  1. Screen capture mode:
python Old/LiveTest.py --mode screen
  1. Audio only mode:
python Old/LiveTest.py --mode none

Features

  • Real-time audio interaction with Gemini AI
  • Optional video streaming from camera or screen
  • Push-to-talk functionality
  • Voice responses from the AI

Troubleshooting

  • If you encounter audio issues, check your microphone and speaker settings
  • Make sure your webcam is working if using camera mode
  • Verify that your Gemini API key is correctly set in the environment variables
  • If you get dependency errors, try running pip install -r requirements.txt again
  • If you get "command not found" errors, make sure your virtual environment is activated
  • To deactivate the virtual environment when you're done, simply type deactivate in your terminal
  • For macOS users: If you get PortAudio errors, make sure you've installed it using brew install portaudio
  • For Linux users: If you get PortAudio errors, make sure you've installed the required packages using apt-get

Push-to-Talk Feature

Overview

This app now includes a push-to-talk (PTT) feature to prevent the AI from hearing its own responses through your laptop speakers.

How to Use

Mouse/Touch Control

  • Click and hold the microphone button in the UI to activate voice input
  • Release the button to stop transmitting audio

Keyboard Control

  • Press and hold Space bar to activate voice input
  • Release Space bar to stop transmitting audio

Visual Indicators

  1. Microphone Button States:

    • Gray = Idle (not transmitting)
    • Green with pulse effect = Active (transmitting audio)
    • Faded/Disabled = No live session running
  2. Listening Indicator:

    • A red "Listening..." indicator appears in the bottom-right of the video feed when PTT is active

Important Notes

  • The microphone will only transmit audio while PTT is active
  • PTT is automatically disabled when:
    • No live session is running
    • The browser window loses focus
    • You navigate away from the page
  • The feature works on both desktop and mobile devices

Benefits

  • Prevents audio feedback loops
  • Gives you complete control over when to speak
  • Reduces background noise transmission
  • Saves bandwidth by only sending audio when needed

About

Currently testing some LiveApi Stuff

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published