Skip to content

Programming-Communities/YouTube-Transcript-Project-Setup-Guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation


YouTube Transcript Project Setup Guide

This guide will walk you through setting up a project that extracts transcripts from YouTube videos using FastAPI (backend) and displays them in a Next.js (frontend) application.


Project Structure

Here’s the folder structure for the project:

youtube-project/
├── backend/
│   ├── env/                   # Python virtual environment
│   ├── __pycache__/           # Python cache files
│   ├── main.py                # FastAPI entry point
│   ├── routes.py              # API routes
│   └── requirements.txt       # Python dependencies
├── frontend/
│   ├── app/                   # Next.js app directory
│   ├── node_modules/          # Node.js dependencies
│   ├── public/                # Static files
│   ├── package.json           # Node.js dependencies
│   └── package-lock.json      # Node.js lock file
├── youtube_ai/
│   ├── mp3/                   # Downloaded audio files
│   ├── audio.json             # Transcript in JSON format
│   ├── audio.txt              # Transcript in text format
│   ├── extract_code.py        # Script to extract transcripts
│   └── requirements.txt       # Python dependencies for transcript extraction
└── README.md                  # Project documentation

Step 1: Set Up the Backend (FastAPI)

1.1 Create the Backend Folder

  1. Open your terminal and navigate to your project directory:
    cd C:\Users\Aamir\OneDrive\Desktop\youtube-project
  2. Create the backend folder:
    mkdir backend
    cd backend

1.2 Set Up a Python Virtual Environment

  1. Create a virtual environment:
    python -m venv env
  2. Activate the virtual environment:
    • On Windows:
      .\env\Scripts\activate
    • On macOS/Linux:
      source env/bin/activate

1.3 Install FastAPI and Dependencies

  1. Install FastAPI and Uvicorn:
    pip install fastapi uvicorn
  2. Save the dependencies to requirements.txt:
    pip freeze > requirements.txt

1.4 Create the FastAPI Application

  1. Create main.py:

    from fastapi import FastAPI
    from routes import router  # Import the router
    
    app = FastAPI()
    
    # Include the router
    app.include_router(router)
    
    if __name__ == "__main__":
        import uvicorn
        uvicorn.run(app, host="127.0.0.1", port=8000)
  2. Create routes.py:

    from fastapi import APIRouter, HTTPException
    from pydantic import BaseModel
    import json
    from pathlib import Path
    
    # Create a router
    router = APIRouter()
    
    class Transcript(BaseModel):
        message: str
        transcription: str
    
    @router.get("/transcript", response_model=Transcript)
    async def get_transcript():
        # Define the path to the audio.json file
        audio_json_path = Path("C:/Users/Aamir/OneDrive/Desktop/youtube-project/youtube_ai/audio.json")
        
        # Check if the file exists
        if not audio_json_path.exists():
            raise HTTPException(status_code=404, detail="Transcript file not found")
        
        # Read the transcript data from the file
        try:
            with open(audio_json_path, "r", encoding="utf-8") as file:
                transcript_data = json.load(file)
        except Exception as e:
            raise HTTPException(status_code=500, detail=f"Error reading transcript file: {str(e)}")
        
        return {"message": "Transcript fetched successfully!", "transcription": transcript_data}
  3. Run the FastAPI server:

    uvicorn main:app --reload

Step 2: Set Up the Frontend (Next.js)

2.1 Create the Frontend Folder

  1. Navigate to your project directory:
    cd C:\Users\Aamir\OneDrive\Desktop\youtube-project
  2. Create the frontend folder:
    mkdir frontend
    cd frontend

2.2 Set Up a Next.js App

  1. Initialize a Next.js app:
    npx create-next-app .
  2. Install dependencies:
    npm install

2.3 Create the Transcript Page

  1. Open app/page.js and replace its content with:

    import { useEffect, useState } from 'react';
    
    export default function Home() {
      const [transcript, setTranscript] = useState(null);
      const [loading, setLoading] = useState(true);
      const [error, setError] = useState(null);
    
      useEffect(() => {
        const fetchTranscript = async () => {
          try {
            const response = await fetch('http://127.0.0.1:8000/transcript');
            if (!response.ok) {
              throw new Error('Failed to fetch transcript');
            }
            const data = await response.json();
            setTranscript(data.transcription);
          } catch (error) {
            setError(error.message);
          } finally {
            setLoading(false);
          }
        };
    
        fetchTranscript();
      }, []);
    
      if (loading) {
        return <p>Loading transcript...</p>;
      }
    
      if (error) {
        return <p>Error: {error}</p>;
      }
    
      return (
        <div>
          <h1>Transcript</h1>
          <pre>{JSON.stringify(transcript, null, 2)}</pre>
        </div>
      );
    }
  2. Run the Next.js app:

    npm run dev

Step 3: Set Up the YouTube Transcript Extraction

3.1 Create the youtube_ai Folder

  1. Navigate to your project directory:
    cd C:\Users\Aamir\OneDrive\Desktop\youtube-project
  2. Create the youtube_ai folder:
    mkdir youtube_ai
    cd youtube_ai

3.2 Install Dependencies

  1. Create a virtual environment:
    python -m venv env
  2. Activate the virtual environment:
    • On Windows:
      .\env\Scripts\activate
    • On macOS/Linux:
      source env/bin/activate
  3. Install dependencies:
    pip install youtube-transcript-api pytube openai-whisper
  4. Save the dependencies to requirements.txt:
    pip freeze > requirements.txt

3.3 Create the Transcript Extraction Script

  1. Create extract_code.py:

    from youtube_transcript_api import YouTubeTranscriptApi
    from pytube import YouTube
    import whisper
    import json
    import os
    
    def extract_transcript(video_url):
        try:
            # Extract video ID from URL
            video_id = video_url.split("v=")[1]
            
            # Try fetching the transcript
            transcript = YouTubeTranscriptApi.get_transcript(video_id)
            with open("audio.json", "w") as file:
                json.dump(transcript, file)
            print("Transcript saved to audio.json")
        except Exception as e:
            print(f"Error fetching transcript: {e}")
            print("Using Whisper AI to generate transcript...")
            
            # Download audio using pytube
            yt = YouTube(video_url)
            audio_stream = yt.streams.filter(only_audio=True).first()
            audio_stream.download(output_path="mp3", filename="audio.mp3")
            
            # Transcribe audio using Whisper
            model = whisper.load_model("base")
            result = model.transcribe("mp3/audio.mp3")
            with open("audio.json", "w") as file:
                json.dump(result, file)
            print("Transcript saved to audio.json")
    
    if __name__ == "__main__":
        video_url = input("Enter YouTube Video URL: ")
        extract_transcript(video_url)
  2. Run the script:

    python extract_code.py

Step 4: Run the Project

  1. Start the FastAPI backend:

    cd C:\Users\Aamir\OneDrive\Desktop\youtube-project\backend
    uvicorn main:app --reload
  2. Start the Next.js frontend:

    cd C:\Users\Aamir\OneDrive\Desktop\youtube-project\frontend
    npm run dev
  3. Open your browser and navigate to:

    http://localhost:3000
    

Troubleshooting

  • CORS Issues: Ensure CORS is enabled in the FastAPI backend (see Step 1.4).
  • File Not Found: Ensure the audio.json file exists in the youtube_ai folder.
  • Dependency Errors: Reinstall dependencies using pip install -r requirements.txt or npm install.

This guide should help you set up the project without errors. Let me know if you need further assistance! 😊

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published