Discord Chat bot powered by your own self hosted LLM and Image Generation.
A 7B LLaMA model and stable diffusion (512x512) can confortably fit in a 12GB RTX3060. Using ~9GB of VRAM.
Disclaimer: I'm not the most experienced coder.
https://github.com/oobabooga/text-generation-webui (Recommended WizardLM-7B-Uncensored 4bit)
https://github.com/AUTOMATIC1111/stable-diffusion-webui
https://azure.microsoft.com/en-us/products/cognitive-services/text-to-speech/
Note: Still very in WIP and will break as I make commits.
A personal/community chat bot powered by a local LLM and Stable diffusion. The bot can have a customized personality config and send images.
Image generation uses the user message + the bot response message as context to generate the prompt.
Now supports voice generation when chatting though text.
To trigger image generation, user needs to send 1 word from first then second list:
(send|draw|show|display|generate|give)
(image|picture|photo|drawing|art|artwork)
Download the latest working Release here:
https://github.com/Dolyfin/LocalDiscordAIChatBot/releases
Alternativly clone the repo which may be broken:
git clone https://github.com/Dolyfin/LocalDiscordAIChatBot
- Run
startbot.bat
- Add discord bot token, API address and port into
.env
file. - Run
startbot.bat
again. - Use
/editconfig chat_channel [channel id here]
to select a chat channel.
*Optional for voice: Manually download ffmpeg.exe and place in root folder to allow for voice playback.
- cd to root folder
*/LocalDiscordAIChatBot/
git pull https://github.com/Dolyfin/LocalDiscordAIChatBot
Available options:
"chat_channel": integer
"persona": string
"chat_enabled": boolean
"message_delay": integer
"message_reply": boolean
"message_reply_mention": boolean
"mention_reply": boolean
"image_enabled": boolean
"filter_enabled": boolean
Clears chat history for current channel.
Connects to the current voice channel.
Disconnect from voice channel.
filter.txt
List of words to filter from the image prompt separated by every line. You can start with: "https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/blob/master/en"
Copy persona/example.json
template.
{
"name": "Alice",
"system_message": "Current time: {{time}}, {{date}}.\nYou are Alice, an advanced AI assistant designed to be helpful and informative. Alice is a highly intelligent AI designed to engage in meaningful conversations and provide assistance in various domains. Alice has a sub system that automatically sends images.\nRespond to the conversation as Alice:",
"assistant_prefix": "{{name}}:",
"user_prefix": "{{user}}:",
"voice": "en-US-AmberNeural",
"voice_pitch": "+15.0%"
}
Create new persona_name.json
in persona/
folder.
{{name}}
Name of the persona
{{user}}
Discord name of the user
{{time}}
12 Hr time in format: "09:30 PM"
{{date}}
Format: "June 19, 2023"
*Placeholder text will apply everywhere in the chat. Including user messages and bot replies, not just in the prompt.
- Chat and response
- Message history
- Personality config
- Should recognize different users in conversation
- Placeholder text for current time and date.
- Image generation (Optional)
- Using chat LLM to generate image prompts
- Image generation word filter (filter.txt)
- Text to speech (Optional)
- Using non local MS Azure Speech as protype.
- The local ones aren't good enough yet or fast enough.
- Implement chat experience when using @ mentions of the bot outside of chat channel.
- Fallback TTS Voice generation without Azure. Using system tts api.
- Speech to text (Whisper?)
- Optional openai api
- Message grouping. Waiting x seconds to respond to all messages in that time instead of every message.
- Casual conversation. A smart way to intergrate the bot to join and chat when prompted. (Need to figure out how to do this well)
- Macos support. I just got a macbook.
- Local TTS. Silero seems to be good. Looking to implement https://github.com/ouoertheo/silero-api-server
- More stuff I didnt write down
Models like WizardLM or models using the Vinuca 1.1 style seems best for chat generation. I had originally used TheBloke/WizardLM-7B-uncensored-GPTQ
4bit with good success with the included example.json persona.
Tested models that should work: WizardLM-7B-uncensored
WizardLM-7B-V1.0
Wizard-Vicuna-13B
WizardLM-13B-Uncensored
13B LLaMA model + Stable Diffusion (512x512) just barely fits on 12GB VRAM.
7B: 9.5 tokens/s
13B 5.5 tokens/s
*Since exllamaa speed has been 2x+ faster.
(Image Prompt: GreenHaus - A logotype featuring a leaf or plant design with the name "GreenHaus" written beneath in a clean, contemporary font.)