Skip to content

🎥Extract complete transcripts with precise timestamps ⏱️ and comprehensive video metadata from any YouTube video —> ⚡Fast, reliable, and ready to use.

Notifications You must be signed in to change notification settings

FlowExtractAPI/youtube-transcript-metadata-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

🎬 YouTube Transcript & Metadata Extractor

Extract YouTube transcripts, subtitles, and complete video metadata in seconds - no manual work, no copy-paste, just pure automated data extraction.

🧾 YouTube Extractor (Transcripts + Metadata)

📺Watch Demo


Extract transcripts from any YouTube video in 3 clicks

Paste URL → Click Start → Download data. That's it.


Why 1,000+ marketers, researchers, and developers choose this tool

What You Get Why It Matters
5-second extraction Process 100 videos while your coffee brews
🎯 100% accurate transcripts Official YouTube data, not AI guesses
📊 Complete metadata Views, likes, channel info, thumbnails - everything
💰 Free tier available Test with 10 videos before paying anything
🔄 Never extract twice Smart caching saves time and money
📥 Export anywhere JSON, CSV, Excel, or direct API integration

How It Works

Step 1: Paste Your URLs

{
  "youtubeUrl": [
    { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
  ]
}

Step 2: Configure (Optional)

  • Clean transcript: Remove "um", "uh", filler words
  • Include timestamps: Get second-by-second text timing
  • Extract comments: Pull top comments with replies (NEW!)

Step 3: Get Your Data

{
  "videoId": "dQw4w9WgXcQ",
  "Video_title": "Rick Astley - Never Gonna Give You Up",
  "Views": "1,234,567,890 views",
  "transcriptText": "We're no strangers to love...",
  "channel": {
    "name": "Rick Astley",
    "subscribers": "3.2M subscribers"
  }
}

What You Can Do With This Data

📝 Content Creators & Marketers

  • Repurpose videos into blogs - Extract transcript, feed to AI, generate 5 articles
  • Create social media posts - Pull key quotes and timestamps
  • Generate SEO-optimized content - Turn video content into searchable text
  • Subtitle generation - Export timestamps for perfect captions

Real example: Marketing agency extracts 100 competitor videos weekly, identifies trending topics, creates counter-content. Result: 40% traffic increase in 3 months.

🔬 Researchers & Academics

  • Analyze lecture content at scale - Process entire course catalogs
  • Study speaking patterns - Extract timestamps for linguistic analysis
  • Sentiment analysis - Feed transcripts into NLP models
  • Citation extraction - Find and verify sources mentioned in videos

Real example: PhD student analyzed 500 TED Talks in 2 days instead of 6 months, discovered key patterns in successful presentations.

🤖 Developers & AI Teams

  • Train chatbots - Use video transcripts as training data
  • Build recommendation engines - Analyze content similarity
  • Automated workflows - Trigger actions based on new videos
  • Knowledge base creation - Convert video libraries into searchable databases

Real example: SaaS company built AI support bot using 1,000 tutorial video transcripts. Result: 60% reduction in support tickets.

📊 Business Intelligence

  • Competitor monitoring - Track what competitors are saying
  • Brand sentiment tracking - Analyze mentions across video content
  • Market research - Extract insights from industry thought leaders
  • Product feedback analysis - Process customer testimonial videos

Real example: E-commerce brand analyzes 200 review videos monthly, identifies pain points, improves product design. Result: 25% reduction in returns.


Complete Feature List

Core Extraction

  • ✅ Full video transcripts with cleaned text
  • ✅ Precise second-by-second timestamps
  • ✅ Video metadata (title, views, likes, date)
  • ✅ Channel information (name, subscribers, verification)
  • ✅ Thumbnail URLs (multiple resolutions)
  • ✅ Video descriptions and tags
  • ✅ Duration and word count analytics

Smart Processing

  • 3 cleaning levels: None (raw), Mild (remove "um"/"uh"), Aggressive (conversational cleanup)
  • Automatic deduplication: Never process the same video twice
  • Batch processing: Handle 1 to 1000+ videos
  • Error recovery: Auto-retry on temporary failures
  • Multiple formats: All YouTube URL types (standard, shorts, live, youtu.be)

NEW: Comment Extraction

  • ✅ Top comments with like counts
  • ✅ Comment replies (configurable depth)
  • ✅ Sort by "top" or "newest"
  • ✅ Real-time streaming extraction
  • ✅ Automatic resumption if interrupted

Export & Integration

  • ✅ JSON, CSV, Excel download
  • ✅ Direct API access
  • ✅ Webhook integration
  • ✅ Pre-configured data views
  • ✅ Apify platform integration

Input Configuration

Basic Example (Just Transcripts)

{
  "youtubeUrl": [
    { "url": "https://www.youtube.com/watch?v=VIDEO_ID" }
  ],
  "cleaningLevel": "mild",
  "includeTimestamps": true
}

Advanced Example (With Comments)

{
  "youtubeUrl": [
    { "url": "https://www.youtube.com/watch?v=VIDEO_ID" }
  ],
  "cleaningLevel": "aggressive",
  "includeTimestamps": true,
  "extractcomments": true,
  "sortBy": "top",
  "maxComments": 50,
  "maxRepliesPerComment": 5
}

All Parameters

Parameter Type Default What It Does
youtubeUrl array Required List of YouTube video URLs (any format)
cleaningLevel string "mild" "none" (raw), "mild" (remove filler), "aggressive" (clean conversations)
includeTimestamps boolean true Include precise timing for each text segment
extractcomments boolean false Enable comment extraction (adds 10-40s per video)
sortBy string "top" Comment sort: "top" (most relevant) or "newest" (chronological)
maxComments integer 10 Max top-level comments (10-100,000)
maxRepliesPerComment integer 0 Max replies per comment. 0 = no replies (10x faster)

💡 Pro tip: Start with maxRepliesPerComment: 0 for 10x faster extraction if you don't need reply threads.


Output Structure

What You Get for Every Video

{
  "videoId": "1TThGG6guf0",
  "VideoURL": "https://youtu.be/1TThGG6guf0",
  "Video_title": "WordPress Custom Widget Development Tutorial",
  "published_Date": "Aug 12, 2020",
  "Views": "5,067 views",
  "likes": "122",
  
  "channel": {
    "name": "Codeytek Academy",
    "id": "UC0SDxbLAqoKLACyEPz2wXAg",
    "subscribers": "33.1K subscribers",
    "verified": false
  },
  
  "thumbnail": "https://i.ytimg.com/vi/1TThGG6guf0/maxresdefault.jpg",
  "Description": "Learn how to create custom WordPress widgets...",
  
  "hasTranscript": true,
  "transcriptText": "Hello and welcome everyone to another episode of advanced WordPress theme development. Today we're going to learn how to create custom widgets...",
  
  "timestamps": [
    { "time": "0:08", "text": "hello and welcome everyone to another" },
    { "time": "0:10", "text": "episode of advanced wordpress theme" },
    { "time": "0:12", "text": "development today we're going to learn" }
  ],
  
  "wordCount": 2847,
  "estimatedDuration": "11:23"
}

With Comments Enabled

{
  "videoId": "dQw4w9WgXcQ",
  "Video_title": "Never Gonna Give You Up",
  "transcriptText": "We're no strangers to love...",
  
  "commentsExtracted": true,
  "commentCount": 50,
  "comments": [
    {
      "commentId": "UgxQe-6VK3h-LZaul6x4AaABAg",
      "authorName": "@musiclover2024",
      "text": "Still the best song after all these years!",
      "likeCount": "1,543",
      "replyCount": 12,
      "publishedTime": "2 days ago",
      "replies": [
        {
          "commentId": "UgxQe-6VK3h-LZaul6x4AaABAg.9kF7...",
          "authorName": "@throwback90s",
          "text": "Facts! Never gets old.",
          "likeCount": "234",
          "publishedTime": "1 day ago"
        }
      ]
    }
  ]
}

Pricing & Performance

Transcript Extraction

  • Free mode: 5-10 seconds per video
  • Paid mode: 3-5 seconds per video (faster infrastructure)
  • Cost: ~$0.001-0.006 per video (depending on length)

Comment Extraction (Optional Add-on)

Uses YouTube Comments Scraper in Standby Mode.

Pricing:

  • Actor start: $0.001 (once per run)
  • Parent comments: $0.003 each
  • Replies: $0.0015 each

Example cost for 50 comments + 100 replies:

  • Start: $0.01
  • Comments: 50 × $0.003 = $0.15
  • Replies: 100 × $0.0015 = $0.15
  • Total: $0.31

With Apify subscription discounts:

  • Bronze: 50% off → $0.17 total
  • Silver: 67% off → $0.13 total
  • Gold: 73% off → $0.11 total

Speed impact:

  • Without replies: +5-10 seconds per video
  • With replies (10 per comment): +20-40 seconds per video

💡 Cost optimization tip: Set maxRepliesPerComment: 0 if you don't need reply threads - you'll get 10x faster extraction and cut costs in half.


Supported YouTube URL Formats

We handle ALL YouTube URL types:

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/shorts/VIDEO_ID
https://www.youtube.com/live/VIDEO_ID
https://youtube.com/watch?v=VIDEO_ID (no www)
https://m.youtube.com/watch?v=VIDEO_ID (mobile)

Just paste any YouTube link - we'll figure it out.


Pre-Configured Data Views

Save time with our built-in export templates:

1. 📊 Full Dataset

Everything - Complete metadata, transcripts, timestamps, analytics
Use for: Comprehensive analysis, data warehousing

2. 📝 Transcripts Only

Focus: Transcript text, timestamps, word count, duration
Use for: Content repurposing, subtitle generation

3. 📺 Channel Analytics

Focus: Channel info, subscribers, verification, video list
Use for: Influencer research, competitor analysis


Quick Start Examples

Use Case 1: Content Repurposing

Goal: Turn video into blog post

// 1. Extract transcript
const input = {
  youtubeUrl: [{ url: "YOUR_VIDEO_URL" }],
  cleaningLevel: "aggressive",
  includeTimestamps: false
};

// 2. Run actor
// 3. Get output: feed transcriptText to ChatGPT/Claude
// 4. Generate 5 blog posts in 2 minutes

Use Case 2: Competitor Analysis

Goal: Analyze 100 competitor videos

const input = {
  youtubeUrl: [
    { url: "competitor_video_1" },
    { url: "competitor_video_2" },
    // ... paste 100 URLs
  ],
  cleaningLevel: "mild",
  extractcomments: true,
  maxComments: 20
};

// Export to Excel → Analyze trends → Find content gaps

Use Case 3: AI Training Data

Goal: Build chatbot training dataset

const input = {
  youtubeUrl: YOUR_PLAYLIST_URLS, // from our Playlist Extractor
  cleaningLevel: "none", // keep raw data for AI
  includeTimestamps: true
};

// Process 1000 videos → Clean dataset → Train model

Integration Examples

JavaScript/Node.js

const ApifyClient = require('apify-client');

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const input = {
  youtubeUrl: [
    { url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
  ],
  cleaningLevel: "mild"
};

const run = await client.actor("dz_omar/youtube-transcript-metadata-extractor").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();

console.log(items[0].transcriptText);

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

run_input = {
    "youtubeUrl": [
        { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
    ],
    "cleaningLevel": "mild"
}

run = client.actor("dz_omar/youtube-transcript-metadata-extractor").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item['transcriptText'])

cURL (Direct API)

curl -X POST https://api.apify.com/v2/acts/dz_omar~youtube-transcript-metadata-extractor/runs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{
    "youtubeUrl": [
      { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
    ]
  }'

→ View full API documentation


Frequently Asked Questions

❓ What if a video doesn't have a transcript?

The actor will return "hasTranscript": false and still provide all other metadata (title, views, channel info, etc.). YouTube auto-generates transcripts for most videos, but some may not have them available.

❓ Can I extract from private or age-restricted videos?

No. This actor only works with publicly available videos. Private, members-only, or age-restricted content cannot be accessed.

❓ How accurate are the transcripts?

We extract the official transcripts provided by YouTube. Accuracy depends on whether the creator uploaded manual captions (99% accurate) or YouTube auto-generated them (85-95% accurate). We don't modify or AI-generate transcripts - you get exactly what YouTube provides.

❓ Can I process entire playlists or channels?

This actor processes individual videos. For bulk extraction, use our YouTube Playlist Extractor to get all video URLs, then feed them to this actor.

❓ What's the difference between cleaning levels?
  • None: Raw transcript as YouTube provides it
  • Mild: Removes filler words ("um", "uh", "you know")
  • Aggressive: Removes filler words + conversational redundancy ("so basically", "I mean", etc.)

Most users choose "mild" for the best balance.

❓ Do I need comments? Should I enable replies?

Enable comments if: You need engagement insights, sentiment analysis, or customer feedback.

Enable replies if: You need full conversation threads. Warning: Replies significantly increase runtime and cost.

Fastest option: extractcomments: false - just transcripts in 5-10 seconds.

❓ Is this legal? Will I get in trouble?

This actor extracts publicly available data that anyone can see on YouTube. It's equivalent to manually copying text. However:

  • ✅ Extracting public transcripts: Generally okay
  • ✅ Using for research, analysis, personal use: Generally okay
  • ⚠️ Republishing copyrighted content: Check copyright laws
  • ⚠️ Commercial use at scale: Review YouTube's Terms of Service

We are not lawyers. Consult legal counsel for your specific use case.

❓ Why did my run fail?

Common reasons:

  • Video is private, deleted, or age-restricted
  • Video has no transcript available
  • Invalid URL format
  • Temporary YouTube API issues (auto-retry handles this)

Check the run log for specific error messages.

❓ How many videos can I process at once?

Technical limit: 10,000 URLs per run
Recommended: Start with 10-50 to test, then scale up
Performance: Free mode handles 100 videos in ~10 minutes. Paid mode is 2x faster.


Technical Details

🔧 For Developers: Architecture & Performance

Tech Stack

  • Crawler: CheerioCrawler (optimized for speed)
  • Language: Node.js 20
  • Memory: 128MB-512MB (auto-scales)
  • Storage: Persistent caching with key-value store
  • API Format: RESTful JSON endpoint

Performance Specs

  • Throughput: 500-1000 videos/hour (paid mode)
  • Latency: 5-10s per video (free), 3-5s (paid)
  • Concurrency: 10 parallel requests
  • Retry Logic: 3 attempts with exponential backoff
  • Cache Hit Rate: ~40% (typical usage)

Comment Integration

  • Method: Standby Mode API call to YouTube Comments Scraper
  • Protocol: NDJSON streaming over HTTP
  • Resumption: Automatic from last successful comment
  • Timeout: 5 minutes per video (configurable)

Data Processing

  • Transcript cleaning: Regex + NLP tokenization
  • Timestamp parsing: ISO 8601 → human-readable
  • Deduplication: SHA-256 hash of video ID
  • Output format: Minified JSON (reduce bandwidth)

Rate Limiting

  • YouTube API: Respects official rate limits
  • Apify Platform: Auto-throttles to prevent blocking
  • Proxy support: Automatic rotation (paid tier)

Error Handling

// Auto-retry logic
const MAX_RETRIES = 3;
const BACKOFF_MS = [1000, 3000, 9000];

try {
  await extractTranscript(videoId);
} catch (error) {
  if (retries < MAX_RETRIES) {
    await sleep(BACKOFF_MS[retries]);
    // retry...
  }
}

Security

  • ✅ No authentication required (public data only)
  • ✅ HTTPS-only API calls
  • ✅ Input sanitization (XSS prevention)
  • ✅ No data retention beyond run duration

Comparison: Why Choose This Actor?

Feature This Actor Manual Copy-Paste Other Tools
Speed 5-10 sec/video 5-10 min/video 30-60 sec/video
Accuracy 100% (official data) 100% 70-90% (AI guessing)
Batch processing ✅ 1000+ videos ❌ One at a time ⚠️ Limited (10-50)
Timestamps ✅ Second-precise ❌ Manual work ⚠️ Minute-level only
Metadata ✅ Everything ❌ Manual scraping ⚠️ Basic only
Comments ✅ With replies ❌ Screenshot only ❌ Not available
API access ✅ Full REST API ❌ None ⚠️ Limited
Cost $0.001-0.006/video $5-10/video (labor) $0.10-0.50/video
Setup time 1 minutes N/A 30-60 minutes

Trust & Reliability

Platform Performance

  • Actor running smoothly - 100% success rate
  • Regular updates - Maintained actively

Legal & Compliance

What You Should Know

What's allowed:

  • Extracting public transcripts for personal use
  • Research and academic analysis
  • Business intelligence and market research
  • Content repurposing with proper attribution

⚠️ What requires caution:

  • Large-scale commercial redistribution
  • Republishing copyrighted video content
  • Using data in ways that violate YouTube's ToS
  • Scraping private or restricted content

🔒 Privacy & Data:

  • We only extract publicly visible data
  • No authentication or login required
  • No data stored beyond your run duration
  • GDPR compliant (EU users)

Disclaimer: This tool extracts publicly available data. Users are responsible for ensuring their usage complies with YouTube's Terms of Service, copyright laws, and applicable regulations. We are not lawyers - consult legal counsel for commercial use cases.


Related Tools from FlowExtract API

Build your complete YouTube data pipeline with our specialized actors:

🎬 Video & Content Tools

YouTube Playlist Extractor
Extract all videos from playlists in seconds. Get video URLs, titles, durations. Perfect for feeding into this transcript extractor.
Use together: Playlist → Transcript Extractor = Full channel analysis

YouTube Channel Scraper Pro
Complete channel extraction: videos, shorts, live streams, playlists. Comprehensive creator analytics.
Use together: Channel Scraper → Transcript Extractor = Creator deep-dive

YouTube Comments Scraper
Standalone comment extraction with advanced filtering. Perfect for sentiment analysis.
Integrated in this actor - enable with extractcomments: true

📹 Video Platform Tools

Zoom Scraper | Downloader & Transcript
Extract Zoom meeting recordings and transcripts. Perfect for meeting analysis.

Loom Scraper | Downloader & Transcript
Download Loom videos and extract transcripts. Ideal for training content.

🏠 Real Estate Data

Idealista Scraper API
Spanish real estate listings with API access. Property data at scale.

🛠️ Developer Tools

Screenshot | Ultimate Screenshot
Webpage screenshots with custom options. Perfect for monitoring and documentation.

Network Security Scanner
Website security vulnerability scanning. Comprehensive security reports.

→ View all FlowExtract API tools


Get Started Now

Free Trial Available

No credit card required. Test with 5$ to see the quality yourself.

💬 Get Support

Need Help?

📧 Email: flowextractapi@outlook.com
🌐 Website: flowextractapi.com
🐦 Twitter: @FlowExtractAPI
💼 LinkedIn: flowextract-api

Response time: Within 24 hours (usually much faster)

About

🎥Extract complete transcripts with precise timestamps ⏱️ and comprehensive video metadata from any YouTube video —> ⚡Fast, reliable, and ready to use.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published