# Groq Whisper Instagram Reel Subtitler
This guide will walk you through creating an automated subtitle generator for Instagram Reels using Groq Whisper. The script extracts audio from a video, transcribes it using Groq's Whisper API, and overlays subtitles onto the video.

## How It Works

Technologies Used
- Groq Whisper: AI-powered speech-to-text transcription.
- MoviePy: Handles video and subtitle overlaying.
- Python OS Module: Manages file paths.
- TextClip: Generates subtitle overlays. (Part of the MoviePy library)

# Step 1: Install Dependencies
Ensure you have the necessary Python packages installed:
```
pip install moviepy groq python-dotenv
```

Note: MoviePy requries FFmpeg, an open source program that handles audio and video. You can download it here: https://ffmpeg.org/download.html or if you have Homebrew installed on Mac, run this command in your terminal: ```brew install ffmpeg```

# Step 2: Setup API Key
Create a GroqCloud account and get your API key:

Sign up at [GroqCloud.](https://console.groq.com/)
Navigate to `API Keys` and click on `Generate API Key`

Store the key securely in an .env file:

```GROQ_API_KEY=your_groq_api_key```

Then in the captioner.py file, import the packages and load the API key.

In [None]:
import os
from groq import Groq
import datetime
from moviepy import *
from moviepy.video.tools.subtitles import SubtitlesClip
from moviepy.video.io.VideoFileClip import VideoFileClip

client = Groq(api_key=os.getenv("GROQ_API_KEY"))

# Step 3: Convert MP4 to MP3
Before transcribing, we must extract audio from the video.

In [None]:

def convert_mp4_to_mp3(mp4_filepath, mp3_file):
    """
    Converts an MP4 file to MP3.

    Args:
        mp4_filepath: Path to the input MP4 file.
        mp3_filepath: Path to save the output MP3 file.
    """
    video_clip = VideoFileClip(mp4_filepath)

    # Extract audio from video
    video_clip.audio.write_audiofile(mp3_file)
    print("now is an mp3")
    video_clip.close()

# Step 4: Transcribe Audio Using Groq Whisper
Now that we have the mp3 file from the above function, We send the extracted MP3 audio to Whisper hosted on Groq for lightning-fast transcription.

We use the verbose_json mode on Whisper to get back timestamped word segments so we know when to place each word on the video.

In [None]:
def transcribe_audio(mp3_file):
    """
    Transcribes an audio file using Groq Whisper API.
    
    Args:
        mp3_file (str): Path to the MP3 file.
    
    Returns:
        list: Transcribed text segments with timestamps.
    """
    with open(mp3_file, "rb") as file:
        transcription = client.audio.transcriptions.create(
            file=(mp3_file, file.read()),
            model="whisper-large-v3-turbo",
            response_format="verbose_json",
            language="en",
            temperature=0.0
        )
    
    print("Transcription completed.")
    return transcription.segments


# Step 5: Overlay Subtitle Clips
From the previous function, we'll recieve a JSON file that contains timestamped segments of words. With these word segments, we'll loop through them and create TextClips to be put into the video at the correct time.

In [None]:
def add_subtitles(transcript_segments, width, fontsize=40):
    """
    Generates a list of subtitle clips from transcription segments.
    
    Args:
        transcript_segments (list): Transcription JSON output.
        width (int): Video width for text alignment.
        fontsize (int): Subtitle font size.
    
    Returns:
        list: Subtitle TextClip objects.
    """
    text_clips = []

    for segment in transcript_segments:
        text_clips.append(
            TextClip(
                text=segment["text"],
                font_size=fontsize,
                stroke_width=5, 
                stroke_color="black",
                font="./Roboto-Condensed-Bold.otf",
                color="white",
                size=(width, None),
                method="caption",
                text_align="center",
                margin=(30, 0)
            )
            .with_start(segment["start"])
            .with_end(segment["end"])
            .with_position("center")
        )
    
    return text_clips


# Step 6: Call the functions
Now that we've defined the functions, we need to create the appropriate variables and call the functions in the correct order.

In [None]:

# Change the video_file to the path of where your video file name is
video_file = "input.mp4"

# The output video name and path
output_file = "output_with_subtitles.mp4"

# Loading the video as a VideoFileClip
original_clip = VideoFileClip(video_file)
width = original_clip.w # the width of the video, so the subtitles don't overflow

# where the extracted mp3 audio from the video will be saved
mp3_file = "output.mp3"
convert_mp4_to_mp3(video_file, mp3_file)

# Call Whisper hosted on Groq to get the timestamped word segments
segments = transcribe_audio(mp3_file)

# Create a list of text clips from the segments
text_clip_list = add_subtitles2(segments, width, fontsize=40)

# Create a CompositeVideoClip with the original video and textclips
final_clip = CompositeVideoClip([original_clip] + text_clip_list)

# Generate the final video with subtitles on it
final_clip.write_videofile("final.mp4", codec="libx264")
print("Subtitled video saved as:", output_file)

# Step 6: Run the python script
(replace captioner.py with your python file's name if it is not called captioner.py)
```
python3 run captioner.py
```

## Troubleshooting errors:
- Make sure to have a video file ready before running the script
- Make sure the path to the file is correct