# Reel-Agent for Addictive Learning with langchain

# Overview 🔎

The tutorial demonstrates how to generate addictive video generation like reels in instagram by using Langchain for script generation, Eleven labs for text to speech, Assembly AI to generate subtitles and moviepy to add speech and subtitles to video

## Motivation

Modern apps like YouTube and Instagram have shortened attention spans, making it harder for people to engage with text-heavy content like PDFs. This project uses AI to extract interesting facts from PDFs and presents them in short, engaging videos, such as Minecraft parkour, making information more accessible and captivating.

## Key Components
 1. **[PDF Text Extraction](#pdftextextract)**: The project begins by extracting text from the provided PDF using the pypdf library. This ensures all textual content is consolidated into a single string for further processing.

2. **[AI-Powered Fact Generation](#aipoweredfactgeneration)**: OpenAI's GPT model is used to extract interesting and specific facts from the PDF content. A structured prompt ensures the generated output aligns with the project's goals.

3. **[Script Generation](#scriptgeneration)**: Using the extracted facts, the system creates engaging scripts for short videos. These scripts are designed to capture attention quickly, such as by using hooks or interesting questions.

4. **[Audio Generation](#audiogeneration)**: Generate engaing voiceovers from our script using text-to-speech API to keep the user engaed.

5. **[Subtitles genearation](#subtitlesgenearation)**: Generate the .srt file for subtitles from audio file to add it in the video.

6. **[Video Creation](#videocreation)**: The moviepy library is used to combine the audio, music and subtitles with video templates eg. Minecraft parkour visuals, ensuring the videos are both engaging and educational.


## Implementation
- OpenAi's GPT: To Extract facts from pdf and generate the script for video
- Eleven Labs: To Generate audio from the script using Text to speech
- Assembly AI: To create subtitles from audio
- MoviePy: To integrate audio and subtitles with a base background video


## Conclusion

This project demonstrates the use of AI to transform dense and complex PDF content into engaging, bite-sized video formats. By leveraging modern tools like OpenAI's language models and creative visuals, it bridges the gap between traditional text-based information and the fast-paced, visually-driven preferences of today's audiences. The result is a powerful tool that not only makes information more accessible but also promotes learning in a way that's fun and aligned with modern attention spans.

### Future improvements
Include adding customization options for video themes or expanding support for more content types, ensuring the tool continues to evolve with user needs.<br>
Adding images to the video using api's or web scrapping.


[![](https://mermaid.ink/img/pako:eNpdkc1uwyAMx18Fceqk9AVy2GXpdq3UaoeRHlxwEqQAETHZpqrvPppA1pYDYPvvjx9cuHQKecmb3n3LDjyxY1VbFteBorUR83F6WXy7H_IgSaST7av3U1JLrwcSH2jRA2GyU_DTaYluQv8fX105P5xJU4_jXYnsSpI3Z87aotgHYpNW6Ao23aoUzIRRy4KBVWzMSYxci9StDXZWbUTcMsrMxbbb1wz1QDgHFoZ7vtm9jv4Et-Tk_k9YczARPOAsI1jFC27QG9Aq_sblJql5HN9gzct4VdhA6Knmtb1GKQRyh18reUk-YMG9C23Hywb6MVphUPEBKw2tB5MlA9gv50wSXf8AHlyp0A?type=png)](https://mermaid.live/edit#pako:eNpdkc1uwyAMx18Fceqk9AVy2GXpdq3UaoeRHlxwEqQAETHZpqrvPppA1pYDYPvvjx9cuHQKecmb3n3LDjyxY1VbFteBorUR83F6WXy7H_IgSaST7av3U1JLrwcSH2jRA2GyU_DTaYluQv8fX105P5xJU4_jXYnsSpI3Z87aotgHYpNW6Ao23aoUzIRRy4KBVWzMSYxci9StDXZWbUTcMsrMxbbb1wz1QDgHFoZ7vtm9jv4Et-Tk_k9YczARPOAsI1jFC27QG9Aq_sblJql5HN9gzct4VdhA6Knmtb1GKQRyh18reUk-YMG9C23Hywb6MVphUPEBKw2tB5MlA9gv50wSXf8AHlyp0A)

## Install and import required libraries

In [None]:
!pip install langchain langchain-openai langchain_core pypdf moviepy assemblyai



In [None]:
import os
import requests
import urllib
import time
from pypdf import PdfReader
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser
import assemblyai as aai
from moviepy.editor import *
from moviepy.editor import VideoFileClip
from moviepy.video.fx import crop
from moviepy.video.tools.subtitles import SubtitlesClip
from moviepy.config import change_settings
from google.colab import userdata

Set up your Gemini API Key




In [None]:
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

## PDF Text Extraction
<a name="pdftextextract"></a>


Download the pdf that you want to use

In [None]:
!wget https://arxiv.org/pdf/1706.03762
!mv 1706.03762 attention_is_all_you_need.pdf

--2024-11-26 06:13:25--  https://arxiv.org/pdf/1706.03762
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.131.42, 151.101.195.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2215244 (2.1M) [application/pdf]
Saving to: ‘1706.03762’


2024-11-26 06:13:25 (38.1 MB/s) - ‘1706.03762’ saved [2215244/2215244]



In [None]:
#Extracts contents from the pdf
def read_pdf():
    reader = PdfReader("attention_is_all_you_need.pdf")
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text

## AI-Powered Fact Generation
<a name="aipoweredfactgeneration"></a>


In [None]:
#from extracted content it gets facts and insights
def get_facts(text):
    facts_prompt = ChatPromptTemplate.from_template("""
        You are a research agent. You are given this information {information}. Your goal is to boil down to interesting and specific insights from this information.

        1. Interesting: Insights that people will find surprising or non-obvious.

        2. Specific: Insights that avoid generalities and include specific examples from the expert. Here is your topic of focus and set of goals.

        3. Provide your answer in points

        4. Do not make up your answer on your own and use the information that is provided to you.
    """)

    llm = ChatOpenAI(
        model="gpt-4o-mini",
    )

    chain = facts_prompt | llm
    return chain.invoke({"information": text}).content


## Script Generation
<a name="scriptgeneration"></a>


Create a script class for better structure and generating script using OpenAI

In [None]:
class Script(BaseModel):
    script: str = Field(description="script for the video")
    title: str = Field(description="title of the video")
    description: str = Field(description="description of the video")
    keywords: list[str] = Field(description="keywords for the video")

class Scripts(BaseModel):
    scripts: list[Script]

parser = JsonOutputParser(pydantic_object=Scripts)

# create scripts for the reel using the facts
def create_scripts(facts):
    script_prompt = PromptTemplate(template = """
    .\n{format_instructions}\
    You are an expert script writer. You are tasked with writing scripts for 20-second video that plays on YouTube. Given these facts {facts} you need to write five engaging scripts keeping these facts in the context of the script.
    keep in mind

    1. Your scripts should not sound monotonous.

    2. Each script should start with an engaging pitch that hooks viewers to watch the entire video. for example, you can use a fact or a question at the start of the video.
   """,
    input_variables=["facts"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
    )

    llm = ChatOpenAI(
        model="gpt-4o-mini",
    )
    chain = script_prompt | llm | parser

    return chain.invoke({"facts": facts})

In [None]:
# read text from pdf
text = read_pdf()

# find facts from the pdf
facts = get_facts(text)

# Generate scripts for the reel using the facts
scripts = create_scripts(facts)

## Audio Generation
<a name="audiogeneration"></a>

### Create the audio from the script using text-to-speech API<br>
We will be using eleven labs. You can get your eleven labs API Key from [Eleven labs](https://elevenlabs.io/)

Use appropiate voice and voice settings to make it more engaging<br>

Refer [Eleven Lab's Voice Selection](https://elevenlabs.io/docs/product/speech-synthesis/voice-selection) for voice selection<br>


Refer [Eleven Lab's Voice Settings](https://elevenlabs.io/docs/product/speech-synthesis/voice-settings) for voice settings<br>

In [None]:
voices_data = {
      "name": "Rachel",
      "description": "A smooth and natural voice ideal for conversational and professional use cases.",
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "voice_settings": {
        "pitch": 1.0,
        "speed": 1.3,
        "intonation": "balanced",
        "clarity": "high",
        "volume": "normal"
  }
}

Set up your Eleven Labs API Key

In [None]:
XI_API_KEY = userdata.get('XI_API_KEY')

Text to Speech [ElevenLab's API Reference](https://elevenlabs.io/docs/api-reference/text-to-speech)

In [None]:

CHUNK_SIZE = 1024

VOICE_ID = voices_data["voice_id"]
TEXT_TO_SPEAK = scripts["scripts"][0]["script"]
voice_settings = voices_data["voice_settings"]

tts_url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream"
headers = {
    "Accept": "application/json",
    "xi-api-key": XI_API_KEY
}

data = {
    "text": TEXT_TO_SPEAK,
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.8,
        "style": 0.0,
        "use_speaker_boost": True,
        **voice_settings
    }
}

response = requests.post(tts_url, headers=headers, json=data, stream=True)

# Handle response

if response.ok:
    output_path = f"output.mp3"
    with open(output_path, "wb") as audio_file:
        for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
            audio_file.write(chunk)
    print(f"Audio stream saved successfully to {output_path}.")
else:
    print(f"Error {response.status_code}: {response.text}")

Audio stream saved successfully to output.mp3.


## Subtitles genearation
<a name="subtitlesgenearation"></a>




### Generate subtitles

We will be using [Assembly AI](https://www.assemblyai.com) to generate our subtitles.

Set your Assembly AI key

In [None]:
aai.settings.api_key =  userdata.get("ASSEMBLYAI_API_KEY")

In [None]:
!touch video.srt
FILE_URL = "./output.mp3"
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(FILE_URL)
subtitles = transcript.export_subtitles_srt(20)
f = open("video.srt","w")
f.write(subtitles)
if transcript.status == aai.TranscriptStatus.error:
    print(transcript.error)
else:
    print(subtitles)
f.close()

1
00:00:00,320 --> 00:00:00,808
Did you know that

2
00:00:01,308 --> 00:00:01,496
the transformer

3
00:00:01,537 --> 00:00:02,593
architecture can cut

4
00:00:02,649 --> 00:00:03,393
training time by

5
00:00:03,409 --> 00:00:04,033
more than half

6
00:00:04,089 --> 00:00:04,801
compared to older

7
00:00:04,833 --> 00:00:07,089
models? In just 3.5

8
00:00:07,137 --> 00:00:08,625
days on eight GPUs,

9
00:00:08,745 --> 00:00:09,729
it achieved a bleu

10
00:00:09,777 --> 00:00:11,481
score of 28.4 for

11
00:00:11,513 --> 00:00:12,297
English to German

12
00:00:12,361 --> 00:00:14,137
translation that's

13
00:00:14,161 --> 00:00:15,073
revolutionary in the

14
00:00:15,089 --> 00:00:15,705
world of machine




# Video Creation
<a name = "videocreation"></a>

### Integrate audio with video using moviepy

Add audio to any attention grabbing background video using moviepy. We are using this [Video](https://drive.google.com/file/d/14kiCtrgoCwJzcdYFmZP3FKhoiUBFgVrv/view?usp=sharing)<br>
Also add background [music](https://drive.google.com/file/d/1mWBAD2b-vj3HZayAVeVk2PGsAYZDJLpf/view?usp=sharing) to make it more engaging



In [None]:
!wget https://github.com/Dark-Knight499/AgentCraftHAckathon-ReelAgent/raw/refs/heads/main/videoplayback.mp4
!wget https://github.com/Dark-Knight499/AgentCraftHAckathon-ReelAgent/raw/refs/heads/main/music.mp3

--2024-11-26 06:14:14--  https://github.com/Dark-Knight499/AgentCraftHAckathon-ReelAgent/raw/refs/heads/main/videoplayback.mp4
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/Dark-Knight499/AgentCraftHAckathon-ReelAgent/refs/heads/main/videoplayback.mp4 [following]
--2024-11-26 06:14:15--  https://raw.githubusercontent.com/Dark-Knight499/AgentCraftHAckathon-ReelAgent/refs/heads/main/videoplayback.mp4
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14392674 (14M) [application/octet-stream]
Saving to: ‘videoplayback.mp4.1’


2024-11-26 06:14:15 (159 MB/s) - ‘videoplayback.mp4.1’ saved [14392674/14

In [None]:
from moviepy.editor import VideoFileClip, AudioFileClip, concatenate_videoclips,CompositeAudioClip
from moviepy.audio.fx.all import audio_loop

video = VideoFileClip("./videoplayback.mp4")
audio = AudioFileClip("./output.mp3")
music = AudioFileClip("./music.mp3")
video_loops = int(audio.duration // video.duration) + 1
video = concatenate_videoclips([video] * video_loops).subclip(0, audio.duration)
music = audio_loop(music, duration=video.duration)
music = music.volumex(0.1)
audio_music = CompositeAudioClip([music,audio])
final_video = video.set_audio(audio_music)
final_video.write_videofile("reel.mp4", codec="libx264", audio_codec="aac")


Moviepy - Building video reel.mp4.
MoviePy - Writing audio in reelTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video reel.mp4





Moviepy - Done !
Moviepy - video ready reel.mp4


### Add subtitles to our video


Install Imagemagick and ffmpeg to add subtitles in the video

In [None]:
!apt update &> /dev/null
!apt install imagemagick &> /dev/null
!apt install ffmpeg &> /dev/null
!pip3 install moviepy[optional] &> /dev/null
!sed -i '/<policy domain="path" rights="none" pattern="@\*"/d' /etc/ImageMagick-6/policy.xml

Change the policy of imagemagik to add the subtitles

In [None]:
!wget https://gist.githubusercontent.com/Kaif987/38fca3821fbbcbd7b60cb54df348c2e8/raw/7745747309ffb1982467b138d07f6f2405a5da34/policy.xml
!mv policy.xml /etc/ImageMagick-6/policy.xml

--2024-11-26 06:16:03--  https://gist.githubusercontent.com/Kaif987/38fca3821fbbcbd7b60cb54df348c2e8/raw/7745747309ffb1982467b138d07f6f2405a5da34/policy.xml
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7947 (7.8K) [text/plain]
Saving to: ‘policy.xml’


2024-11-26 06:16:03 (84.5 MB/s) - ‘policy.xml’ saved [7947/7947]



Install the font that would like to be used in subtitles or use can use the default fonts<br>
We will be using Impact Font

In [None]:
!wget -O Impact.ttf "https://github.com/sophilabs/macgifer/blob/master/static/font/impact.ttf"
!mkdir -p ~/.fonts
!mv Impact.ttf ~/.fonts
!fc-cache -f -v

--2024-11-26 06:16:03--  https://github.com/sophilabs/macgifer/blob/master/static/font/impact.ttf
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘Impact.ttf’

Impact.ttf              [  <=>               ] 286.10K   942KB/s    in 0.3s    

2024-11-26 06:16:04 (942 KB/s) - ‘Impact.ttf’ saved [292966]

/usr/share/fonts: caching, new cache contents: 0 fonts, 5 dirs
/usr/share/fonts/cMap: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap: caching, new cache contents: 0 fonts, 5 dirs
/usr/share/fonts/cmap/adobe-cns1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-gb1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-japan1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-japan2: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/

Finally we add the subtitles to our video using moviepy

In [None]:
!os.environ["IMAGEMAGICK_BINARY"] = "/usr/bin/convert"
change_settings({"IMAGEMAGICK_BINARY": r"/usr/bin/convert"})
generator = lambda txt: TextClip(txt, font='Impact', fontsize=50, color='white',stroke_color="black",stroke_width=1)
subtitles = SubtitlesClip("video.srt", generator)
video = VideoFileClip("reel.mp4")
result = CompositeVideoClip([video, subtitles.set_pos(('center'))])
result.write_videofile("final_reel.mp4", fps=video.fps, remove_temp=True, codec="libx264", audio_codec="aac")

/bin/bash: line 1: os.environ[IMAGEMAGICK_BINARY]: command not found
Moviepy - Building video final_reel.mp4.
MoviePy - Writing audio in final_reelTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video final_reel.mp4






Moviepy - Done !
Moviepy - video ready final_reel.mp4
