# Addictive Learning with langchain

# Overview 🔎

The tutorial demonstrates how to generate addictive video generation like reels in instagram by using Langchain for script generation, Eleven labs for text to speech, Assembly AI to generate subtitles and moviepy to add speech and subtitles to video

## Motivation

Modern apps like YouTube and Instagram have shortened attention spans, making it harder for people to engage with text-heavy content like PDFs. This project uses AI to extract interesting facts from PDFs and presents them in short, engaging videos, such as Minecraft parkour, making information more accessible and captivating.


## Key Components
- OpenAI's GPT: To Extract facts from pdf and generate the script for video
- Eleven Labs: To Generate audio from the script using Text to speech
- Assembly AI: To create subtitles from audio
- MoviePy: To integrate audio and subtitles with a base background video

## Implementation
 1. **PDF Text Extraction**: The project begins by extracting text from the provided PDF using the pypdf library. This ensures all textual content is consolidated into a single string for further processing.
 2. **AI-Powered Fact Generation**: OpenAI's GPT model is used to extract interesting and specific facts from the PDF content. A structured prompt ensures the generated output aligns with the project's goals.
 3. **Video Script Generation**: Using the extracted facts, the system creates engaging scripts for short videos. These scripts are designed to capture attention quickly, such as by using hooks or interesting questions.
 4. **Video Creation**: The moviepy library is used to combine the script with Minecraft parkour visuals, ensuring the videos are both engaging and educational
 5. **Audio generation**: Elevenlabs will be used to generate audio from our script using text-to-speech API
 6. **Subtitles genearation**: Assembly AI will be used to generate the subtitles from audio

## Conclusion

This project demonstrates the use of AI to transform dense and complex PDF content into engaging, bite-sized video formats. By leveraging modern tools like OpenAI's language models and creative visuals, it bridges the gap between traditional text-based information and the fast-paced, visually-driven preferences of today's audiences. The result is a powerful tool that not only makes information more accessible but also promotes learning in a way that's fun and aligned with modern attention spans.

Future improvements could include adding customization options for video themes or expanding support for more content types, ensuring the tool continues to evolve with user needs.

![Reel Agent](../images/reel_agent.svg)

### Install and import the necessary libraries


In [None]:
!pip install langchain langchain_openai langchain_core pypdf moviepy assemblyai

In [None]:
import os
from pypdf import PdfReader
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser
import requests
import urllib
import time
import assemblyai as aai
from moviepy.editor import *
from moviepy.editor import VideoFileClip
from moviepy.video.fx import crop
from moviepy.video.tools.subtitles import SubtitlesClip
from moviepy.config import change_settings
from google.colab import userdata
import requests
import os
import random

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

### Querying PDF

We download a pdf and use OpenAI to extract interesting facts from the PDF. You can use your any PDF you want

In [None]:

!wget https://arxiv.org/pdf/1706.03762
!mv 1706.03762 attention_is_all_you_need.pdf

--2024-11-18 10:26:54--  https://arxiv.org/pdf/1706.03762
Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.195.42, 151.101.67.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2215244 (2.1M) [application/pdf]
Saving to: ‘1706.03762’


2024-11-18 10:26:54 (27.9 MB/s) - ‘1706.03762’ saved [2215244/2215244]



### Extract Facts and Generate script for video

Now we extract intresting facts from the PDF that user provides using OpenAI.



In [None]:

# Read text from the pdf
def read_pdf():
    reader = PdfReader("attention_is_all_you_need.pdf")
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text

# get facts from the text using OpenAI
def get_facts(text):
    facts_prompt = ChatPromptTemplate.from_template("""
        You are a research agent. You are given this information {information}. Your goal is to boil down to interesting and specific insights from this information.

        1. Interesting: Insights that people will find surprising or non-obvious.

        2. Specific: Insights that avoid generalities and include specific examples from the expert. Here is your topic of focus and set of goals.

        3. Provide your answer in points

        4. Do not make up your answer on your own and use the information that is provided to you.
    """)

    llm = ChatOpenAI(
        model="gpt-4o",
    )

    chain = facts_prompt | llm
    return chain.invoke({"information": text}).content




### Creating Script For the Video

we then generate the script for audio using the facts that we extracted

In [None]:
class Script(BaseModel):
    script: str = Field(description="script for the video")
    title: str = Field(description="title of the video")
    description: str = Field(description="description of the video")
    keywords: list[str] = Field(description="keywords for the video")

class Scripts(BaseModel):
    scripts: list[Script]

parser = JsonOutputParser(pydantic_object=Scripts)

# create scripts for the reel using the facts
def create_scripts(facts):
    script_prompt = PromptTemplate(template = """
    .\n{format_instructions}\
    You are an expert script writer. You are tasked with writing scripts for 20-second video that plays on YouTube. Given these facts {facts} you need to write five engaging scripts keeping these facts in the context of the script.
    keep in mind

    1. Your scripts should not sound monotonous.

    2. Each script should start with an engaging pitch that hooks viewers to watch the entire video. for example, you can use a fact or a question at the start of the video.
   """,
    input_variables=["facts"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
    )

    llm = ChatOpenAI(
        model="gpt-4o",
    )
    chain = script_prompt | llm | parser

    return chain.invoke({"facts": facts})

In [None]:
# read text from pdf
text = read_pdf()

# find facts from the pdf
facts = get_facts(text)

# Generate scripts for the reel using the facts
scripts = create_scripts(facts)


### Create audio file

We will now create the audio from the script using text-to-speech API provided by eleven labs. You can get your eleven labs API Key from [Eleven labs](https://elevenlabs.io/)


In [None]:
voices_data = {
      "name": "Rachel",
      "description": "A smooth and natural voice ideal for conversational and professional use cases.",
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "voice_settings": {
        "pitch": 1.0,
        "speed": 1.3,
        "intonation": "balanced",
        "clarity": "high",
        "volume": "normal"
  }
}


In [None]:

CHUNK_SIZE = 1024
XI_API_KEY = userdata.get('XI_API_KEY')

VOICE_ID = voices_data["voice_id"]
TEXT_TO_SPEAK = scripts["scripts"][0]["script"]
voice_settings = voices_data["voice_settings"]

tts_url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream"
headers = {
    "Accept": "application/json",
    "xi-api-key": XI_API_KEY
}

data = {
    "text": TEXT_TO_SPEAK,
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.8,
        "style": 0.0,
        "use_speaker_boost": True,
        **voice_settings
    }
}

response = requests.post(tts_url, headers=headers, json=data, stream=True)

# Handle response

if response.ok:
    output_path = f"output.mp3"
    with open(output_path, "wb") as audio_file:
        for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
            audio_file.write(chunk)
    print(f"Audio stream saved successfully to {output_path}.")
else:
    print(f"Error {response.status_code}: {response.text}")

Audio stream saved successfully to output.mp3.


### Integrate audio with video using moviepy

We will add audio to any attention grabbing background video using moviepy. I am using this [Video](https://drive.google.com/file/d/14kiCtrgoCwJzcdYFmZP3FKhoiUBFgVrv/view?usp=sharing) and we will also add a quiet background [music](https://drive.google.com/file/d/1mWBAD2b-vj3HZayAVeVk2PGsAYZDJLpf/view?usp=sharing)



In [None]:
from moviepy.editor import VideoFileClip, AudioFileClip, concatenate_videoclips,CompositeAudioClip
from moviepy.audio.fx.all import audio_loop
video = VideoFileClip("./videos/videoplayback.mp4")
audio = AudioFileClip("./output.mp3")
music = AudioFileClip("./music/music1.mp3")
video_loops = int(audio.duration // video.duration) + 1
video = concatenate_videoclips([video] * video_loops).subclip(0, audio.duration)
music = audio_loop(music, duration=video.duration)
music = music.volumex(0.1)
audio_music = CompositeAudioClip([music,audio])
final_video = video.set_audio(audio_music)
final_video.write_videofile("movie.mp4", codec="libx264", audio_codec="aac")



Moviepy - Building video movie.mp4.
MoviePy - Writing audio in movieTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video movie.mp4





Moviepy - Done !
Moviepy - video ready movie.mp4


# Generate subtitles using Assembly AI

We will also add subtitles to the video. we will be using [Assembly AI](https://www.assemblyai.com) to generate our subtitles. You will need an API key that you can get by registering in the website.

In [None]:
aai.settings.api_key =  userdata.get("ASSEMBLYAI_API_KEY")

# create subtitles from audio file using asssemby AI

FILE_URL = "./output.mp3"
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(FILE_URL)
subtitles = transcript.export_subtitles_srt(20)
f = open("video.srt","w")
f.write(subtitles)
if transcript.status == aai.TranscriptStatus.error:
    print(transcript.error)
else:
    print(subtitles)

1
00:00:00,360 --> 00:00:01,016
Ever wondered how

2
00:00:01,516 --> 00:00:01,721
Transformers beat

3
00:00:01,753 --> 00:00:02,825
RNNs at their own

4
00:00:02,865 --> 00:00:04,369
game? By ditching

5
00:00:04,417 --> 00:00:04,889
sequential

6
00:00:04,937 --> 00:00:05,873
computation for a

7
00:00:05,889 --> 00:00:06,417
cutting edge

8
00:00:06,441 --> 00:00:07,585
attention mechanism,

9
00:00:07,745 --> 00:00:08,817
Transformers train

10
00:00:08,881 --> 00:00:10,340
faster. Just 12

11
00:00:10,445 --> 00:00:12,505
hours on 8B100 GPUs

12
00:00:12,665 --> 00:00:13,465
ready for a deep

13
00:00:13,505 --> 00:00:13,985
learning upgrade.




### Add subtitles to our video
Once our subtitles are generated, we add the subtitles to video using moviepy

We need to install Imagemagick to add subtitles to our video

In [None]:
!apt update &> /dev/null
!apt install imagemagick &> /dev/null
!apt install ffmpeg &> /dev/null
!pip3 install moviepy[optional] &> /dev/null
!sed -i '/<policy domain="path" rights="none" pattern="@\*"/d' /etc/ImageMagick-6/policy.xml


In [None]:
!wget https://gist.githubusercontent.com/Kaif987/38fca3821fbbcbd7b60cb54df348c2e8/raw/7745747309ffb1982467b138d07f6f2405a5da34/policy.xml
!mv policy.xml /etc/ImageMagick-6/policy.xml

--2024-11-18 10:29:49--  https://gist.githubusercontent.com/Kaif987/38fca3821fbbcbd7b60cb54df348c2e8/raw/7745747309ffb1982467b138d07f6f2405a5da34/policy.xml
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7947 (7.8K) [text/plain]
Saving to: ‘policy.xml’


2024-11-18 10:29:49 (80.8 MB/s) - ‘policy.xml’ saved [7947/7947]



Here we are installing the Impact font that will be used in subtitles

In [None]:
!wget -O Impact.ttf "https://github.com/sophilabs/macgifer/blob/master/static/font/impact.ttf"
!mkdir -p ~/.fonts
!mv Impact.ttf ~/.fonts/
!fc-cache -f -v

--2024-11-18 10:29:49--  https://github.com/sophilabs/macgifer/blob/master/static/font/impact.ttf
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘Impact.ttf’

Impact.ttf              [ <=>                ] 285.73K  --.-KB/s    in 0.1s    

2024-11-18 10:29:49 (2.78 MB/s) - ‘Impact.ttf’ saved [292587]

/usr/share/fonts: caching, new cache contents: 0 fonts, 5 dirs
/usr/share/fonts/cMap: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap: caching, new cache contents: 0 fonts, 5 dirs
/usr/share/fonts/cmap/adobe-cns1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-gb1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-japan1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-japan2: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts

Finally we add the subtitles to our video using moviepy

In [None]:
!os.environ["IMAGEMAGICK_BINARY"] = "/usr/bin/convert"
change_settings({"IMAGEMAGICK_BINARY": r"/usr/bin/convert"})
generator = lambda txt: TextClip(txt, font='Impact', fontsize=50, color='white',stroke_color="black",stroke_width=1)
subtitles = SubtitlesClip("video.srt", generator)
video = VideoFileClip("movie.mp4")
result = CompositeVideoClip([video, subtitles.set_pos(('center'))])
result.write_videofile("final.mp4", fps=video.fps, remove_temp=True, codec="libx264", audio_codec="aac")


/bin/bash: line 1: os.environ[IMAGEMAGICK_BINARY]: command not found
Moviepy - Building video final.mp4.
MoviePy - Writing audio in finalTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video final.mp4






Moviepy - Done !
Moviepy - video ready final.mp4
