Skip to content
#

multimodal-ai

Here are 27 public repositories matching this topic...

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

  • Updated Jun 16, 2025
  • Python

#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2024! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

  • Updated Feb 16, 2025
  • JavaScript

This repository contains code for fine-tuning Google's PaliGemma vision-language model on the Flickr8k dataset for image captioning tasks

  • Updated May 25, 2025
  • Jupyter Notebook

Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.

  • Updated Jun 12, 2025
  • Jupyter Notebook

The teaches you to integrate text, images, and videos into applications using Gemini's state-of-the-art multimodal models. Learn advanced prompting techniques, cross-modal reasoning, and how to extend Gemini's capabilities with real-time data and API integration.

  • Updated Sep 2, 2024

🤖🤖 Gemini-Powered AI Chatbot 🤖🤖This is a Streamlit-based AI chatbot powered by Google Gemini models (1.5 Pro & 1.5 Flash). The chatbot supports both text and image input, making it capable of handling multimodal queries. It's perfect for experimenting with Google's generative AI capabilities through a clean, interactive web interface.

  • Updated Apr 18, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."

Learn more