Autoshow

Outline

Project Overview
- Key Features
Setup
Run Autoshow Node Scripts
Project Structure

Project Overview

Autoshow automates the processing of audio and video content from various sources, including YouTube videos, playlists, podcast RSS feeds, and local media files. It performs transcription, summarization, and chapter generation using different language models (LLMs) and transcription services.

The Autoshow workflow includes the following steps:

The user provides input (video URL, playlist, RSS feed, or local file).
The system downloads the audio (if necessary).
Transcription is performed using the selected service.
The transcript is processed by the chosen LLM to generate a summary and chapters.
Results are saved in markdown format with front matter.

Key Features

Support for multiple input types (YouTube links, RSS feeds, local video and audio files)
Integration with various LLMs (ChatGPT, Claude, Cohere, Mistral) and transcription services (Whisper.cpp, Deepgram, Assembly)
Local LLM support (Llama 3.1, Phi 3, Qwen 2, Mistral)
Customizable prompts for generating titles, summaries, chapter titles/descriptions, key takeaways, and questions to test comprehension
Markdown output with metadata and formatted content
Command-line interface for easy usage
WIP: Node.js server and React frontend

See docs/roadmap.md for details about current development work and future potential capabilities.

Setup

Copy Environment Variable File

npm run autoshow expects a .env file even for commands that don't require API keys. You can create a blank .env file or use the default provided:

cp .env.example .env

This sets a default model for Llama.cpp which ensures --llama doesn't fail if you haven't downloaded a model yet. Before trying to run local LLM inference with Llama.cpp, callLlama checks for a model and downloads one if none is detected.

Install Local Dependencies

Install yt-dlp, ffmpeg, and run npm i.

brew install yt-dlp ffmpeg llama.cpp
npm i

Clone Whisper Repo

Run the following commands to clone whisper.cpp and build the base model:

git clone https://github.com/ggerganov/whisper.cpp.git && \
  bash ./whisper.cpp/models/download-ggml-model.sh base && \
  make -C whisper.cpp

Replace base with large-v2 for the largest model, medium for a middle sized model, or tiny for the smallest model.

Run Autoshow Node Scripts

Run on a single YouTube video.

npm run autoshow -- --video "https://www.youtube.com/watch?v=jKB0EltG9Jo"

Example commands for all available CLI options can be found in docs/examples.md.

Project Structure

Main Entry Point (src/autoshow.js)
- Defines the command-line interface using Commander.js
- Handles various input options (video, playlist, URLs, file, RSS)
- Manages LLM and transcription options
Command Processors (src/commands)
- processVideo.js: Handles single YouTube video processing
- processPlaylist.js: Processes all videos in a YouTube playlist
- processURLs.js: Processes videos from a list of URLs in a file
- processFile.js: Handles local audio/video file processing
- processRSS.js: Processes podcast RSS feeds
Utility Functions (src/utils)
- downloadAudio.js: Downloads audio from YouTube videos
- runTranscription.js: Manages the transcription process
- runLLM.js: Handles LLM processing for summarization and chapter generation
- generateMarkdown.js: Creates initial markdown files with metadata
- cleanUpFiles.js: Removes temporary files after processing
Transcription Services (src/transcription)
- whisper.js: Uses Whisper.cpp for transcription
- deepgram.js: Integrates Deepgram transcription service
- assembly.js: Integrates AssemblyAI transcription service
Language Models (src/llms)
- chatgpt.js: Integrates OpenAI's GPT models
- claude.js: Integrates Anthropic's Claude models
- cohere.js: Integrates Cohere's language models
- mistral.js: Integrates Mistral AI's language models
- octo.js: Integrates OctoAI's language models
- llama.js: Integrates Llama models (local inference)
- ollama.js: Integrates Ollama for local model inference
- prompt.js: Defines the prompt structure for summarization and chapter generation
Web Interface (web) and Server (server)
- Web interface built with React and Vite
- Node.js server that handles backend operations for the web interface
- Note: Just a proof of concept, very little functionality built at this point. Expect these to catch up with the CLI starting in Q4 2024

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github		.github
content		content
docs		docs
server		server
src		src
web		web
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoshow

Outline

Project Overview

Key Features

Setup

Copy Environment Variable File

Install Local Dependencies

Clone Whisper Repo

Run Autoshow Node Scripts

Project Structure

About

Releases

Sponsor this project

Packages

Languages

License

ajcwebdev/autoshow

Folders and files

Latest commit

History

Repository files navigation

Autoshow

Outline

Project Overview

Key Features

Setup

Copy Environment Variable File

Install Local Dependencies

Clone Whisper Repo

Run Autoshow Node Scripts

Project Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages