Skip to content
#

multimodal-ai

Here are 37 public repositories matching this topic...

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

  • Updated Jul 7, 2025
  • Python

#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2024! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

  • Updated Feb 16, 2025
  • JavaScript

Mai is an emotionally intelligent, voice-enabled AI assistant built with FastAPI, Together.ai LLMs, memory persistence via ChromaDB, and real-time sentiment analysis. Designed to feel alive, empathetic, and human-like, Mai blends the charm of a flirty cyberpunk companion with the power of modern multimodal AI.

  • Updated Jun 28, 2025
  • Python

Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.

  • Updated Jun 25, 2025
  • Jupyter Notebook

This repository contains code for fine-tuning Google's PaliGemma vision-language model on the Flickr8k dataset for image captioning tasks

  • Updated May 25, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."

Learn more