-
Notifications
You must be signed in to change notification settings - Fork 10
Add multi-provider AI support and improve microphone functionality #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adjusted prompts so AI is more likely to act in character and correctly choose a speaker
…vider proxy compatibility (https://github.com/Mirrowel/LLM-API-Key-Proxy) This commit introduces a major overhaul to the AI and microphone systems, adding a proxy provider for AI requests and a flexible, multi-provider microphone client. **AI Backend:** - Adds a new `proxy` AI provider that routes requests to a local LLM proxy server. This enables the use of models like Gemini for dialogue and transcription. - Introduces a "Reasoning Level" setting in the MCM, allowing users to control the reasoning effort of the proxy model. - Refactors API key handling to be more robust. Keys are now loaded on-demand from files or environment variables, with improved error messages if a key is not found. **Microphone Client:** - Replaces the single-backend microphone script with a new system supporting multiple transcription providers: Gemini (via proxy), local Whisper, and OpenAI Whisper API. - Adds `launch_mic.bat`, a new launcher that allows users to easily select their preferred transcription provider. - The Python client (`main.py`) is now modular, dynamically loading the chosen provider at runtime. - Adds a `silence_grace_period` to the recorder to prevent it from stopping prematurely at the beginning of speech. **Configuration:** - Updates the MCM with new options for the AI model (`proxy`), voice provider, and reasoning level. - Changes default custom models to newer Gemini versions.
…le, update launch script for missing executable, and clean up build scripts
… XML strings for English and add Russian localization
…UTPUT in PowerShell
…w for release upload
… in build workflow
…ian localization files
…ed in the build workflow
…tructions and API key management
…dels and Nvidia API limitations
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce support for multiple AI models and transcription providers, including Gemini via proxy, local Whisper, and OpenAI Whisper API. Enhance error handling, configuration options, and user experience with a new microphone launcher. Update documentation for clarity and improved setup instructions.
Bla-bla you know what it is. The Mirrowel branch, with the proxy and all the fun. And a lot of fucking guides. Fuck i am tired of writing this shit.