multimodal-ai

Here are 37 public repositories matching this topic...

sinanuozdemir / oreilly-multimodal-ai

Learn how multimodal AI merges text, image, and audio for smarter models

openai diffusion multimodal deepgram livekit stable-diffusion dreambooth generative-ai llava dalle-3 llama3 multimodal-ai

Updated Jan 21, 2025
Jupyter Notebook

neocortex-link / neocortex-unity-sdk

Star

Neocortex Unity SDK for Smart NPCs and Virtual Assistants

ai game-development npc npcs game-ai ai-agents conversational-ai smart-agent ai-tools ai-agent aiagent smart-agents aiagents multimodal-ai smart-npc smart-npcs

Updated Apr 2, 2025
C#

Livyatan-melvillei / ai-clips-maker

Star

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

automatic-speech-recognition media-processing temporal-segmentation ml-pipeline ffmpeg-python deep-learning-pipelines video-scene-detection video-transcription huggingface-pipelines multimodal-ai video-resizing ai-video-summarization video-clip-generation intelligent-video-editing

Updated Jul 7, 2025
Python

alperensumeroglu / ai-clips-maker

Star

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Updated Apr 2, 2025
Python

microsoft / multimodal-ai

Star

Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image embeddings.

python ai azure video-analysis azure-ai enterprise-ai multimodal-ai

Updated Jun 19, 2025
HCL

doepking / gemini_multimodal_demo

Star

A demo multimodal AI chat application built with Streamlit and Google's Gemini model. Features include: secure Google OAuth, persistent data storage with Cloud SQL (PostgreSQL), and intelligent function calling. Includes a persona-based newsletter engine to deliver personalized insights.

postgresql google-cloud smtp cloud-sql cloud-run gemini-ai multimodal-ai

Updated Jul 6, 2025
Python

mghiasvand1 / Awesome-VLM-Synthetic-Data

Star

🔥 The first survey on bridging VLMs and synthetic data, for which I completed the entire process of reading 125 papers and writing the research paper in just 10 days.

awesome survey synthetic-data generative-ai vision-language-models multimodal-ai

Updated May 24, 2025

NxtGenLegend / TreeHacks-ZoneOut

Star

#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2024! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

Updated Feb 16, 2025
JavaScript

VectorInstitute / VLDBench

Star

VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.

nlp benchmarking machine-learning computer-vision deep-learning datasets benchmark-framework ai-safety llm vlms vision-language-models multimodal-ai disinformation-detection

Updated Jun 10, 2025
Python

krishnakaushik25 / AI-MiniLabs

Star

A hands-on collection of experimental AI mini-projects exploring large language models, multimodal reasoning, retrieval-augmented generation (RAG), reinforcement learning, and real-world applications in finance, eKYC, and voice interfaces.

nlp computer-vision large-language-models genai retrieval-augmented-generation gpt-4o multimodal-ai

Updated Jun 1, 2025
Jupyter Notebook

hi-space / amazon-bedrock-nova-gallery

Star

Gallery showcasing AI-generated images and videos created using the Nova model

bedrock text-to-image image-to-video text-to-video generative-ai multimodal-ai

Updated Feb 1, 2025
Python

Sh1nr1 / mai-ai-assistant-self-hosted

Star

Mai is an emotionally intelligent, voice-enabled AI assistant built with FastAPI, Together.ai LLMs, memory persistence via ChromaDB, and real-time sentiment analysis. Designed to feel alive, empathetic, and human-like, Mai blends the charm of a flirty cyberpunk companion with the power of modern multimodal AI.

sentiment-analysis chatbot async-python tts memory-management emotion-recognition fastapi voice-ai ai-assistant llm openai-whisper chromadb together-ai multimodal-ai fastapi-backend

Updated Jun 28, 2025
Python

mims-harvard / mims-harvard.github.io

Star

Lab website

therapeutics generative-ai biomedical-ai agentic-ai multimodal-ai

Updated Jul 5, 2025
HTML

Md-Emon-Hasan / Gen-AI-on-going

Star

Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.

Updated Jun 25, 2025
Jupyter Notebook

fereydoonboroojerdi / multimodal-customer-insights-generator

Star

Scalable multimodal AI system combining FSDP, RLHF, and Inferentia optimization for customer insights generation.

aws deep-learning pytorch customer-insights sagemaker inferentia rlhf fsdp multimodal-ai

Updated May 3, 2025
Python

3bdulah / multimodal-diagnosis

Star

Multi-modal AI system for diagnosing respiratory diseases using Vision Transformers and BERT.

nlp computer-vision deep-learning transformers pytorch healthcare diagnosis bert chest-xrays multimodal-deep-learning vision-transformer multimodal-ai

Updated May 9, 2025
Python

AHMEDSANA / PaliGemma-flickr8k-finetuning

Star

This repository contains code for fine-tuning Google's PaliGemma vision-language model on the Flickr8k dataset for image captioning tasks

Updated May 25, 2025
Jupyter Notebook

W3JDev / FlairAi

Star

Ai FnB Service & Menu Training Assistant Powered by Gemini & Google Cloud

react audio training api typescript google-cloud gemini conversational-ai livestreaming prompt-engineering generative-ai realtime-streaming multimodal-ai rolepaly

Updated Jun 11, 2025

shubharthaksangharsha / apsara2.5

Star

Apsara 2.5: Evolution from Langchain to Google Gemini API with multimodal capabilities, URL context analysis, and integrated tools for chat, voice, and visual interactions.

react nodejs chatbot voice-chat oauth-authentication screen-sharing gemini-api ai-assistant google-integration multimodal-ai apsara-2-5 apsara-ai apsara25 url-context-tool webcam-sharing

Updated Jul 6, 2025
JavaScript

debanjan06 / geospatial-rag

Star

AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface

machine-learning computer-vision geospatial pytorch embeddings remote-sensing clip earth-observation rag academic-research langchain multimodal-ai

Updated Jun 29, 2025
Python

Improve this page

Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-ai

Here are 37 public repositories matching this topic...

sinanuozdemir / oreilly-multimodal-ai

neocortex-link / neocortex-unity-sdk

Livyatan-melvillei / ai-clips-maker

alperensumeroglu / ai-clips-maker

microsoft / multimodal-ai

doepking / gemini_multimodal_demo

mghiasvand1 / Awesome-VLM-Synthetic-Data

NxtGenLegend / TreeHacks-ZoneOut

VectorInstitute / VLDBench

krishnakaushik25 / AI-MiniLabs

hi-space / amazon-bedrock-nova-gallery

Sh1nr1 / mai-ai-assistant-self-hosted

mims-harvard / mims-harvard.github.io

Md-Emon-Hasan / Gen-AI-on-going

fereydoonboroojerdi / multimodal-customer-insights-generator

3bdulah / multimodal-diagnosis

AHMEDSANA / PaliGemma-flickr8k-finetuning

W3JDev / FlairAi

shubharthaksangharsha / apsara2.5

debanjan06 / geospatial-rag

Improve this page

Add this topic to your repo