Gen-AI-Chatbot

Voice Assistant with AI text & image generation. Features speech recognition, Gemini AI integration, Stable Diffusion image creation, voice output, and MongoDB conversation storage. Built with Streamlit for an intuitive interface. The perfect personal AI assistant for both voice and text interactions.

Voice Assistant

A powerful, feature-rich AI voice assistant with text and image generation capabilities built using Streamlit, Google's Gemini API, and Stable Diffusion.

✨ Features

Voice Interaction: Speak commands and receive spoken responses
Text Generation: Powered by Google's Gemini 1.5 Flash for intelligent conversations
Image Generation: Create images from text descriptions using Stable Diffusion
Conversation Memory: Save and retrieve past conversations with MongoDB integration
Customizable Voice: Adjust speech rate, volume, and select different voices
Web Search: Access information from Wikipedia and open websites
Multi-Modal: Type your queries or use voice commands seamlessly

🖼️ Screenshots

🚀 Getting Started

Prerequisites

Python 3.8 or higher
Google Gemini API key
[Optional] Stable Diffusion API key for image generation
[Optional] MongoDB connection string for conversation storage

Installation

Clone the repository:

git clone https://github.com/yourusername/voice-assistant.git
cd voice-assistant

Install required packages:
```
pip install -r requirements.txt
```

Create a .env file for your API keys (optional):

GEMINI_API_KEY=your_gemini_api_key
STABLE_DIFFUSION_API_KEY=your_stable_diffusion_api_key
MONGODB_CONNECTION_STRING=your_mongodb_connection_string

Running the Application

Start the application with:

streamlit run genai_chatbot.py

📋 Usage

Enter your Google Gemini API key in the sidebar
[Optional] Add Stable Diffusion API key for image generation
[Optional] Connect to MongoDB to save conversations
Use the microphone button or type your query
Try commands like:
- "What's the time?"
- "Open YouTube"
- "Tell me about pandas"
- "Generate image of a sunset over mountains"
- "Draw a cat playing piano"
Use audio controls to pause/resume/stop speech
Click 🔊 next to any message to hear it again

🧠 API Integration

Google Gemini

This project uses the Google Generative AI API (Gemini 1.5 Flash model) to generate intelligent responses. You need to:

Get an API key from AI Google Dev
Enter the key in the sidebar of the application

Stable Diffusion

For image generation capabilities:

Get an API key from Stability AI
Enter the key in the sidebar under "Image Generation Setup"

MongoDB (Optional)

To enable conversation history:

Set up a MongoDB database
Enter your MongoDB connection string in the sidebar
Use the session management features to start new or load previous conversations

📜 Available Commands

General Queries: Ask any question to get an AI-generated response
Time: "What's the time?" returns the current time
Web Navigation: "Open YouTube/Google/StackOverflow"
Information: "Tell me about [topic]" searches Wikipedia
Image Generation: "Generate image of [description]" or "Draw [description]"
System: "Goodbye" or "Exit" to end the session

🔧 Customization

Voice Settings

Adjust the following in the sidebar:

Voice selection (depends on available system voices)
Speech rate (100-250)
Volume (0.0-1.0)

Session Management

With MongoDB connected:

Create new conversation sessions
Load previous conversation sessions
View current session ID

📦 Dependencies

streamlit: Web application framework
pyttsx3: Text-to-speech conversion
speech_recognition: Speech-to-text conversion
google.generativeai: Gemini AI interface
requests: API interaction for image generation
pymongo: MongoDB database interaction
Pillow: Image processing
wikipedia: Information retrieval

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

🙏 Acknowledgments

Google for the Gemini API
Stability AI for Stable Diffusion
Streamlit for the awesome web framework
All open-source contributors whose libraries make this project possible

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
genai_chatbot.py		genai_chatbot.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gen-AI-Chatbot

Voice Assistant

✨ Features

🖼️ Screenshots

🚀 Getting Started

Prerequisites

Installation

Running the Application

📋 Usage

🧠 API Integration

Google Gemini

Stable Diffusion

MongoDB (Optional)

📜 Available Commands

🔧 Customization

Voice Settings

Session Management

📦 Dependencies

🤝 Contributing

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gen-AI-Chatbot

Voice Assistant

✨ Features

🖼️ Screenshots

🚀 Getting Started

Prerequisites

Installation

Running the Application

📋 Usage

🧠 API Integration

Google Gemini

Stable Diffusion

MongoDB (Optional)

📜 Available Commands

🔧 Customization

Voice Settings

Session Management

📦 Dependencies

🤝 Contributing

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages