# Building Multimodal AI Applications with LangChain & the OpenAI API

## Goals
- Videos can be full of useful information, but getting hold of that info can be slow, since one needs to watch the whole thing or try skipping through it. It can be much faster to use a bot to ask questions about the contents of the transcript.
- In this project, a video will be downloaded from YouTube, the audio will be transcribed, and a simple Q&A bot will be created to ask questions about the content.

## Objectives
- Understanding the building blocks of working with Multimodal AI projects
- Working with some of the fundamental concepts of LangChain
- Learning how to use the Whisper API to transcribe audio to text
- Understanding how to combine both LangChain and Whisper API to create ask questions of any YouTube video




##  Setup

The project requires several packages that need to be installed into Workspace.

- `langchain` is a framework for developing generative AI applications.
- `yt_dlp` lets you download YouTube videos.
- `tiktoken` converts text into tokens.
- `docarray` makes it easier to work with multi-model data (in this case mixing audio and text).

## Installing libraries

In [1]:
# Install langchain
!pip install --upgrade langchain

Collecting langchain
  Downloading langchain-0.1.8-py3-none-any.whl (816 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m816.1/816.1 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.21 (from langchain)
  Downloading langchain_community-0.0.21-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m28.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2,>=0.1.24 (from langchain)
  Downloading langchain_core-0.1.25-py3-none-any.whl (242 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.1/242.1 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langsmith<0.2.0,>=0.1.0 (from langchain)
  Downloading langsmith

In [2]:
# Install yt_dlp
!pip install --upgrade yt_dlp

Collecting yt_dlp
  Downloading yt_dlp-2023.12.30-py2.py3-none-any.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting mutagen (from yt_dlp)
  Downloading mutagen-1.47.0-py3-none-any.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.4/194.4 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pycryptodomex (from yt_dlp)
  Downloading pycryptodomex-3.20.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
Collecting websockets>=12.0 (from yt_dlp)
  Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.2/130.2 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting brot

In [3]:
!pip install --upgrade tiktoken

Collecting tiktoken
  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.6.0


In [4]:
!pip install --upgrade docarray

Collecting docarray
  Downloading docarray-0.40.0-py3-none-any.whl (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.2/270.2 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting orjson>=3.8.2 (from docarray)
  Downloading orjson-3.9.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (138 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.0/139.0 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
Collecting types-requests>=2.28.11.6 (from docarray)
  Downloading types_requests-2.31.0.20240218-py3-none-any.whl (14 kB)
Installing collected packages: types-requests, orjson, docarray
Successfully installed docarray-0.40.0 orjson-3.9.14 types-requests-2.31.0.20240218


### Instructions

##  Import The Required Libraries

For this project, the `os` and `yt_dlp` packages are required to download the YouTube video of our choosing, convert it to an `.mp3`, and save the file. Additionally, the `openai` package will be used to make easy calls to the OpenAI models we will utilize.


In [9]:
!pip install openai

Collecting openai
  Downloading openai-1.12.0-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.3-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.3 h

In [10]:

# Import the os package
import os

# Import the glob package
import glob

# Import the openai package
import openai

# Import the yt_dlp package as youtube_dl
import yt_dlp as youtube_dl

# Import DownloadError from yt_dlp
from yt_dlp import DownloadError

# Import DocArray
import docarray


In [28]:
## Setting up the openai api key

from getpass import getpass

OPENAI_API_KEY = getpass()

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY # give the openai api key

··········


In [None]:
openai_api_key = os.getenv("openai_api_key)")

##  Download the YouTube Video

After setting up the environment, the first step is to download the video from YouTube and convert it to an audio file (.mp3).

We'll download a documentary on Bangladesh's economy, a documentary by econ(https://www.youtube.com/watch?v=cI0Cx5dfuVI&ab_channel=Econ).

We will do this by setting variables to store the `youtube_url` and the `output_dir` where we want the file to be stored.

The `yt_dlp` package allows us to download and convert the video in a few steps, but it does require some configuration.

Lastly, we'll create a loop to search the `output_dir` for any .mp3 files. These files will be stored in a list called `audio_files`, which will be used later for transcription with the Whisper model.


Create the following:
- Two variables - `youtube_url` to store the Video URL and `output_dir` that will be the directory where the audio files will be saved.


In [62]:

youtube_url = "https://www.youtube.com/watch?v=cI0Cx5dfuVI&ab_channel=Econ"
# Directory to store the downloaded video
output_dir = "files/audio/"

# Config for youtube-dl
ydl_config = {
    "format": "bestaudio/best",
    "postprocessors": [
        {
            "key": "FFmpegExtractAudio",
            "preferredcodec": "mp3",
            "preferredquality": "192",
        }
    ],
    "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
    "verbose": True
}

# Check if the output directory exists, if not create it
if not os.path.exists(output_dir):
    os.makedirs(output_dir)


# Print a message indicating which video is being downloaded

print(f"Downloading video from {youtube_url}")


# Attempt to download the video using the specified configuration
# If a DownloadError occurs, attempt to download the video again

try:
    with youtube_dl.YoutubeDL(ydl_config) as ydl:
        ydl.download([youtube_url])
except DownloadError:
    with youtube_dl.YoutubeDL(ydl_config) as ydl:
        ydl.download([youtube_url])



[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2023.12.30 from yt-dlp/yt-dlp [f10589e34] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'outtmpl': 'files/audio/%(title)s.%(ext)s', 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.58+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, mutagen-

Downloading video from https://www.youtube.com/watch?v=cI0Cx5dfuVI&ab_channel=Econ
[youtube] Extracting URL: https://www.youtube.com/watch?v=cI0Cx5dfuVI&ab_channel=Econ
[youtube] cI0Cx5dfuVI: Downloading webpage
[youtube] cI0Cx5dfuVI: Downloading ios player API JSON
[youtube] cI0Cx5dfuVI: Downloading android player API JSON
[youtube] cI0Cx5dfuVI: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] cI0Cx5dfuVI: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr3---sn-qxo7rn7r.googlevideo.com/videoplayback?expire=1708513318&ei=xoPVZdy8MdWAir4PvrmKmA8&ip=34.173.139.183&id=o-AFLk2001Kuazy_zWAAhcYj2hrOV1CfRqB-bggCur4UCo&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=M5&mm=31%2C29&mn=sn-qxo7rn7r%2Csn-qxoedn7k&ms=au%2Crdu&mv=m&mvi=3&pl=17&initcwndbps=5591250&spc=UWF9f8vubpQ-ScSq08seuH_AiLivgxvVey7QqFDAzkRb4Jg&vprv=1&svpuc=1&mime=audio%2Fwebm&gir=yes&clen=13726412&dur=822.021&lmt=1704833614198901&mt=1708491404&fvip=1&keepalive=yes&fexp=24007246&c=ANDROID&txp=5532434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cspc%2Cvprv%2Csvpuc%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRAIgPsZcdtgUCWwCzlwQu9fkspB3Z8TbJHG9OGLtBvQF3cQCIGUTydxmj0qNIMydnhKthwpO7YDWLabr8BMBqbIiZieG&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=APTiJQcwRAIgctDReoRNkDqaRQWYmkne2reFEIG2dEF5txMTPSUtGSwCIByjffXbMF_x4_x9DlBVJYsY-rUQZQfAX6ly63EKRvoR"


[download] Destination: files/audio/Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.webm
[download] 100% of   13.09MiB in 00:00:00 at 25.47MiB/s  


[debug] ffmpeg command line: ffprobe -show_streams 'file:files/audio/Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.webm'


[ExtractAudio] Destination: files/audio/Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.mp3


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:files/audio/Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.webm' -vn -acodec libmp3lame -b:a 192.0k -movflags +faststart 'file:files/audio/Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.mp3'


Deleting original file files/audio/Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.webm (pass -k to keep)


To find the audio files that we will use the `glob`module that looks in the `output_dir` to find any .mp3 files. Then we will append the file to a list called `audio_files`. This will be used later to send each file to the Whisper model for transcription.

Varifying the file name:

In [63]:
# Find the audio file in the output directory

# Find all the audio files in the output directory
audio_files = glob.glob(os.path.join(output_dir, "*.mp3"))


# Select the first audio file in the list
audio_filename = audio_files[0]

# Print the name of the selected audio file
print(audio_filename)

files/audio/Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.mp3


##  Transcribe the Video using Whisper

In this step we will take the downloaded and converted Youtube video and send it to the Whisper model to be transcribed. To do this we will create variables for the `audio_file`, for the `output_file` and the model.

Using these variables we will:
- create a list to store the transcripts
- Read the Audio File
- Send the file to the Whisper Model using the OpenAI package

In [64]:
from openai import OpenAI
client = OpenAI(api_key="openai-api-key")


In [77]:
# Define the directory containing the audio files
audio_directory = "/content/files/audio/"

# List all files in the directory
audio_files = [file for file in os.listdir(audio_directory) if file.endswith(".mp3")]

# Define the model to use for transcription
model = "whisper-1"

# Iterate over each audio file
for filename in audio_files:
    # Construct the full path of the audio file
    audio_file_path = os.path.join(audio_directory, filename)

    # Open the audio file
    with open(audio_file_path, "rb") as audio_file:
        # Transcribe the audio file to text using OpenAI API
        print(f"Transcribing {filename}...")
        transcription = client.audio.transcriptions.create(
            model=model,
            file=audio_file
        )
        # Access the text from the transcription object


        # Print the transcript for each audio file
        print(f"Transcript for {filename}: {transcription}")

Transcribing Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.mp3...
Transcript for Bangladesh Economy is Getting Rich but It Is at Risk ｜ Bangladesh Economy ｜ Econ.mp3: Transcription(text="This is Bangladesh, which has emerged as one of Asia's most remarkable and unexpected success stories in recent years. In colonial times, the eastern half of Bengal was one of the poorest parts of British India. However, in the 18th century, before the British Raj, it stood as the richest region in India. Bengal operated as a centre for the worldwide muslin, silk and pearl trades. It exported saltpetre to Europe, sold opium in Indonesia, sent raw silk to Japan and the Netherlands, and manufactured cotton and silk textiles for global export. Real wages and living standards in 18th century Bengal were comparable to Britain, which, in turn, had the highest living standards in Europe. After phases of economic exploitation in India, gaining independence and experiencing pa

In [78]:
## see the text data
transcription.text

"This is Bangladesh, which has emerged as one of Asia's most remarkable and unexpected success stories in recent years. In colonial times, the eastern half of Bengal was one of the poorest parts of British India. However, in the 18th century, before the British Raj, it stood as the richest region in India. Bengal operated as a centre for the worldwide muslin, silk and pearl trades. It exported saltpetre to Europe, sold opium in Indonesia, sent raw silk to Japan and the Netherlands, and manufactured cotton and silk textiles for global export. Real wages and living standards in 18th century Bengal were comparable to Britain, which, in turn, had the highest living standards in Europe. After phases of economic exploitation in India, gaining independence and experiencing partition in 1947, East Bengal, or East Pakistan, became one of the poorest countries in the Indian subcontinent. After declaring itself an independent country, Bangladesh, in 1971, became even poorer as the rump of Pakista

To save the transcripts to text files we will use the below provided code. The .txt file will be uploaded in langchain for the purpose of querying.

In [79]:
import os

# Define the directory where you want to save the transcript file
output_directory = "/content/files/transcripts/"

# Create the directory if it doesn't exist
os.makedirs(output_directory, exist_ok=True)

# Specify the name of the output file within the directory
output_file = os.path.join(output_directory, "BD_transcript.txt")

# Assuming you have a variable named transcript containing the transcript content
transcript = "This is the transcript content."

# Write the transcript to the output file
with open(output_file, 'w') as file:
    file.write(transcript)

print(f"Transcript saved to {output_file}")


Transcript saved to /content/files/transcripts/BD_transcript.txt


## Create a TextLoader using LangChain

In order to use text or other types of data with LangChain we must first convert that data into Documents. This is done by using loaders. Here, we will use the `TextLoader` that will take the text from our transcript and load it into a document.

In [81]:
# Import the TextLoader class from the langchain.document_loaders module

from langchain.document_loaders import TextLoader

# Create a new instance of the TextLoader class, specifying the directory containing the text files

loader = TextLoader("/content/files/transcripts/BD_transcript.txt")

# Load the documents from the specified directory using the TextLoader instance

docs = loader.load()

In [82]:
docs

[Document(page_content='This is the transcript content.', metadata={'source': '/content/files/transcripts/BD_transcript.txt'})]

We can also upload different documents. In that case, we need run a for loop and save all the documents together in a list for further analysis. But it is prefarable to use multiple files for pdf with vector database.

In [None]:
## example case of multiple file usage. BE AWARE OF THE TOKENS!!!!!

from langchain.document_loaders import TextLoader
import os

# Specify the directory containing the text files
directory_path = "./files/transcripts/"

# List all files in the directory
all_files = os.listdir(directory_path)

# Filter out only the .txt files
txt_files = [file for file in all_files if file.endswith(".txt")]

# Create a new instance of the TextLoader class for each text file
docs2 = []
for txt_file in txt_files:
    # Construct the full path of the text file
    txt_file_path = os.path.join(directory_path, txt_file)
    # Create a TextLoader instance for the text file
    loader = TextLoader(txt_file_path)
    # Load the document from the text file
    doc = loader.load()
    # Append the loaded document to the list of documents
    docs2.append(doc)

In [None]:
# Show the first element of docs to verify it has been loaded
docs2

[[Document(page_content="Machine learning. Teach a computer how to perform a task without explicitly programming it to perform said task. Instead, feed data into an algorithm to gradually improve outcomes with experience, similar to how organic life learns. The term was coined in 1959 by Arthur Samuel at IBM, who was developing artificial intelligence that could play checkers. Half a century later, and predictive models are embedded in many of the products we use every day, which perform two fundamental jobs. One is to classify data, like is there another car on the road, or does this patient have cancer? The other is to make predictions about future outcomes, like will the stock go up, or which YouTube video do you want to watch next? The first step in the process is to acquire and clean up data. Lots and lots of data. The better the data represents the problem, the better the results. Garbage in, garbage out. The data needs to have some kind of signal to be valuable to the algorithm 

## Creating a Vector Store

Now that we've transcribed the video content into documents, we'll organize these documents within a vector store. Vector stores facilitate LLMs in navigating through data to identify similarities based on spatial distance.


In [83]:
# Import the tiktoken package
import tiktoken

##  Create the Document Search

In [24]:
# Import the RetrievalQA class from the langchain.chains module
from langchain.chains import RetrievalQA

# Import the ChatOpenAI class from the langchain.chat_models module
from langchain.chat_models import ChatOpenAI

# Import the DocArrayInMemorySearch class from the langchain.vectorstores module
from langchain.vectorstores import DocArrayInMemorySearch

# Import the OpenAIEmbeddings class from the langchain.embeddings module
from langchain.embeddings import OpenAIEmbeddings

In [25]:
from langchain_core.vectorstores import VectorStoreRetriever

Now we will create a vector store that will use the `DocArrayInMemory` search methods which will search through the created embeddings created by the OpenAI Embeddings function.

To complete this step:
- Create a variable called `db`
- Assign the `db` variable to store the result of the method `DocArrayInMemorySearch.from_documents`
- In the DocArrayInMemorySearch method, pass in the `docs` and a function call to `OpenAIEmbeddings()`

In [34]:
import os
from getpass import getpass

OPENAI_API_KEY = getpass()

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

··········


In [31]:
pip install -U langchain-openai

Collecting langchain-openai
  Downloading langchain_openai-0.0.6-py3-none-any.whl (29 kB)
Installing collected packages: langchain-openai
Successfully installed langchain-openai-0.0.6


In [138]:
from langchain_openai import OpenAIEmbeddings

In [84]:
# Create a new DocArrayInMemorySearch instance from the specified documents and embeddings
db = DocArrayInMemorySearch.from_documents(
    docs,
    OpenAIEmbeddings()
)

We will now create a retriever from the `db` we created in the last step. This enables the retrieval of the stored embeddings. Since we are also using the `ChatOpenAI` model of openAI, will assigned that as our LLM.

In [139]:
# Convert the DocArrayInMemorySearch instance to a retriever
retriever = db.as_retriever()

# Create a new ChatOpenAI instance with a temperature of 0.0
llm = ChatOpenAI(temperature = 0.0)

## Create the 'qa chain' for query

Now we are ready to create queries about the YouTube video and read the responses from the LLM. This done first by creating a query and then running the RetrievalQA we setup in the last step and passing it the query.

In [86]:
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

In [140]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
splits = text_splitter.split_text(transcription.text)

In [141]:
# Build an index
embeddings = OpenAIEmbeddings()
vectordb = FAISS.from_texts(splits, embeddings) # install faiss-cpu

In [142]:
retriever = VectorStoreRetriever(vectorstore=db) # selecting db as our vectorstore
retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever)

In [53]:
#pip install faiss-cpu

In [143]:
# Build a QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0),
    chain_type="stuff",
    retriever=vectordb.as_retriever(),
)

## Create & Analyze the Queries

In [130]:
query = "summarize the text"
display(qa_chain.run(query))

"The text discusses the transformation of Bangladesh from a country devastated by war and natural disasters to a middle-income country today. It highlights the country's achievements in areas like education, women's empowerment, and financial inclusion. Despite progress in reducing poverty and promoting economic growth, Bangladesh faces challenges such as over-reliance on garment exports and the need for economic diversification. The text also mentions the importance of addressing infrastructure bottlenecks and strengthening the banking sector to sustain growth."

In [131]:
query = "why the economy of bangladesh is blooming?"
display(qa_chain.run(query))

"The economy of Bangladesh has seen growth primarily due to its success in exporting ready-made garments. The ready-made garment sector accounts for a significant portion of the country's exports and GDP, employing a large number of workers, especially women. Additionally, remittances from Bangladeshis working overseas contribute significantly to the national income. The country has also focused on improving financial inclusion, especially in rural areas, which has helped boost entrepreneurship and access to credit. However, challenges such as low productivity, dependence on a single category of exports, and the need for infrastructure development and banking sector strengthening remain."

In [132]:
query = "compare bd with ind and pak"
display(qa_chain.run(query))

"Bangladesh has made significant progress in reducing poverty and promoting economic growth, surpassing both Pakistan and India in terms of income per person at current prices. Before the pandemic, Bangladesh's economic growth exceeded 7%, outpacing not just Pakistan and India, but even China. Bangladeshis are wealthier, healthier, and better educated compared to the past. However, Bangladesh's economy is heavily dependent on a single category of exports, apparel and clothing, which raises questions about the sustainability of its economic record. In terms of the Human Development Index (HDI) ranking, Bangladesh is now ahead of both Pakistan and India. Despite its progress, Bangladesh faces challenges such as the potential loss of preferential trade agreements due to its graduation from the least developed country status by 2026. Bangladesh has been a pioneer in financial inclusion, with initiatives like microfinance and mobile financial services."

In [133]:
query = "tell me about the historical events of bd in chorological way"
display(qa_chain.run(query))

"1. In 1947, the partition of India led to the creation of East Pakistan, which included modern-day Bangladesh.\n2. Over two decades, East Pakistan faced economic deprivation under the central government based in West Pakistan.\n3. In 1971, Bangladesh declared independence from Pakistan after a devastating War of Independence.\n4. Following independence, Bangladesh faced challenges such as military coups in 1975, 1982, and 2007, as well as natural disasters.\n5. Despite these challenges, Bangladesh has made significant progress in reducing poverty and promoting economic growth.\n6. Bangladesh has become a middle-income country and has overtaken India's economic standing on a per capita basis.\n7. However, there are concerns about the country's heavy dependence on apparel and clothing exports and the need for diversification in its export sector."

1. In 1947, the partition of India led to the creation of East Pakistan, which included modern-day Bangladesh.
2. Over two decades, East Pakistan faced economic disparities and grievances under the central government based in West Pakistan.
3. In 1971, Bangladesh declared independence from Pakistan after a devastating War of Independence.
4. Following independence, Bangladesh faced challenges and was initially dismissed as an international basket case.
5. Despite dire expectations, Bangladesh made significant progress in reducing poverty and promoting economic growth.
6. Bangladesh has become a middle-income country and has overtaken India's economic standing on a per capita basis.
7. However, concerns exist about Bangladesh's heavy dependence on apparel and clothing exports and the need for diversification in its export sector.

In [134]:
query = "why the economy is at risk?"
display(qa_chain.run(query))

"The economy of Bangladesh is at risk due to several factors. Some of the key reasons include:\n\n1. Weak private investment in new industries: There is a weakness in private investment in new industries, which is exacerbated by constraints on the availability of credit, especially for small and medium enterprises.\n\n2. Deterioration of the business environment: The entry and growth of firms are hampered by the deterioration of the business environment in Bangladesh. The country's ranking in the World Bank's Doing Business Index has fallen significantly over the years.\n\n3. Low foreign direct investment (FDI): Despite steady economic growth, FDI inflow in Bangladesh is comparatively low, less than 1% of GDP, which is one of the lowest rates in Asia. This low FDI inflow hinders economic growth and development.\n\n4. Bureaucracy and corruption: Time-consuming bureaucracy, inadequate physical infrastructure, unreliable energy supply, low labor productivity, high cost of doing business, 

The economy of Bangladesh is at risk due to several factors. Some of the key reasons include:

1. Weak private investment in new industries: There is a weakness in private investment in new industries, which is exacerbated by constraints on the availability of credit, especially for small and medium enterprises.

2. Deterioration of the business environment: The entry and growth of firms are hampered by the deterioration of the business environment in Bangladesh. The country's ranking in the World Bank's Doing Business Index has fallen significantly over the years.

3. Low foreign direct investment (FDI): Despite steady economic growth, FDI inflow in Bangladesh is comparatively low, less than 1% of GDP, which is one of the lowest rates in Asia. This low FDI inflow hinders economic growth and development.

4. Bureaucracy and corruption: Time-consuming bureaucracy, inadequate physical infrastructure, unreliable energy supply, low labor productivity, high cost of doing business, and pervasive corruption are factors that discourage foreign investors from investing in Bangladesh.

5. Dependence on a single category of exports: Bangladesh's extreme dependence on the ready-made garment (RMG) sector for exports poses a risk to the economy. Diversification of exports is essential for long-term economic stability and growth.

Overall, addressing these challenges and implementing reforms to improve the business environment, attract more investment, and diversify the economy will be crucial in mitigating the risks to Bangladesh's economy.

In [120]:
query = "bangladesh education and economy"
qa_chain.run(query)

"Bangladesh has made significant progress in education and the economy over the years. The country has seen improvements in literacy rates, school completion rates, and overall educational attainment. In terms of the economy, Bangladesh has experienced strong economic growth, surpassing both Pakistan and India in terms of income per person. However, there are concerns about the sustainability of the economy, particularly due to its heavy reliance on a single category of exports, such as apparel and clothing. Additionally, there are challenges related to diversifying the economy and addressing infrastructure bottlenecks to support continued growth. Bangladesh's success in social achievements has been greater than its economic ones, highlighting the need for balanced growth and addressing policy challenges to ensure long-term sustainability."

Bangladesh has made significant progress in education and the economy over the years. The country has seen improvements in literacy rates, school completion rates, and overall educational attainment. In terms of the economy, Bangladesh has experienced strong economic growth, surpassing both Pakistan and India in terms of income per person. However, there are concerns about the sustainability of the economy, particularly due to its heavy reliance on a single category of exports, such as apparel and clothing. Additionally, there are challenges related to diversifying the economy and addressing infrastructure bottlenecks to support continued growth. Bangladesh's success in social achievements has been greater than its economic ones, highlighting the need for balanced growth and addressing policy challenges to ensure long-term sustainability.

In [135]:
query = "what are the challenges?"
display(qa_chain.run(query))

'Some of the challenges facing Bangladesh include inadequate and incorrect policies, poor policy implementation, inherent structural weaknesses, lack of good governance, and an absence of reform initiatives. Additionally, there are constraints on the availability of credit, especially to small and medium enterprises, deterioration of the business environment, low foreign direct investment (FDI) compared to regional peers, time-consuming bureaucracy, inadequate physical infrastructure, unreliable energy supply, low labor productivity, high cost of doing business, and pervasive corruption.'

Some of the challenges facing Bangladesh include inadequate and incorrect policies, poor policy implementation, inherent structural weaknesses, lack of good governance, and an absence of reform initiatives. Additionally, there are constraints on the availability of credit, especially to small and medium enterprises, deterioration of the business environment, low foreign direct investment (FDI) compared to regional peers, time-consuming bureaucracy, inadequate physical infrastructure, unreliable energy supply, low labor productivity, high cost of doing business, and pervasive corruption.

In [122]:
query = "how to mitigate it?"
qa_chain.run(query)

"To mitigate the economic challenges faced by Bangladesh, several strategies can be considered:\n\n1. **Diversification of exports**: Encouraging and supporting the growth of sectors beyond ready-made garments, such as light engineering, plastics, leather, and footwear, can help diversify the export basket. This can reduce the country's dependence on a single sector and increase economic resilience.\n\n2. **Improving competitiveness**: Enhancing the competitiveness of Bangladeshi companies by ensuring they meet international standards can help them compete on a global scale. This may involve investing in technology, innovation, and skills development.\n\n3. **Strengthening private investment**: Addressing constraints on the availability of credit to new industries can help stimulate private investment. This may involve improving access to finance for businesses, streamlining regulatory processes, and creating a conducive business environment.\n\n4. **Enhancing infrastructure**: Address

To mitigate the economic challenges faced by Bangladesh, several strategies can be considered:

1. **Diversification of exports**: Encouraging and supporting the growth of sectors beyond ready-made garments, such as light engineering, plastics, leather, and footwear, can help diversify the export basket. This can reduce the country's dependence on a single sector and increase economic resilience.

2. **Improving competitiveness**: Enhancing the competitiveness of Bangladeshi companies by ensuring they meet international standards can help them compete on a global scale. This may involve investing in technology, innovation, and skills development.

3. **Strengthening private investment**: Addressing constraints on the availability of credit to new industries can help stimulate private investment. This may involve improving access to finance for businesses, streamlining regulatory processes, and creating a conducive business environment.

4. **Enhancing infrastructure**: Addressing infrastructure bottlenecks can help boost productive investment in the country. Improving transportation networks, energy supply, and digital infrastructure can make Bangladesh more attractive for investors and businesses.

5. **Policy reforms**: Implementing effective policies that promote economic diversification, innovation, and competitiveness is crucial. This may involve regulatory reforms, trade facilitation measures, and investment incentives to attract both domestic and foreign investment.

By implementing these strategies and addressing the underlying challenges, Bangladesh can work towards sustaining economic growth and achieving greater prosperity for its people.

In [123]:
query = "are bangladeshi people lazy?"
qa_chain.run(query)

"I don't know."

In [124]:
query = "tell me about shundarban"
qa_chain.run(query)

'The Sundarbans is a vast mangrove forest in the coastal region of the Bay of Bengal, spread across Bangladesh and the Indian state of West Bengal. It is one of the largest mangrove forests in the world and is known for its unique ecosystem and biodiversity. The Sundarbans is home to the Bengal tiger and various other species of wildlife, including crocodiles, snakes, and a variety of bird species. The mangrove forest also serves as a natural barrier against cyclones and storm surges, protecting the coastal areas from natural disasters. The Sundarbans is a UNESCO World Heritage Site and is of significant ecological importance.'

The Sundarbans is a vast mangrove forest in the coastal region of the Bay of Bengal, spread across Bangladesh and the Indian state of West Bengal. It is one of the largest mangrove forests in the world and is known for its unique ecosystem and biodiversity. The Sundarbans is home to the Bengal tiger and various other species of wildlife, including crocodiles, snakes, and a variety of bird species. The mangrove forest also serves as a natural barrier against cyclones and storm surges, protecting the coastal areas from natural disasters. The Sundarbans is a UNESCO World Heritage Site and is of significant ecological importance.

In [125]:
query = "how can shundarban play vital role in economy?"
qa_chain.run(query)

'The Sundarbans, a mangrove forest in Bangladesh, can play a vital role in the economy through various ways. One significant aspect is ecotourism, as the Sundarbans is a unique and biodiverse ecosystem that attracts tourists. This can generate income and employment opportunities for local communities. Additionally, the Sundarbans provide natural resources like timber, honey, and fish, which can contribute to the economy through sustainable harvesting practices. Furthermore, the mangrove forest acts as a natural barrier against cyclones and storm surges, protecting coastal areas and infrastructure, which in turn can save costs related to disaster recovery and rebuilding. Overall, sustainable management and utilization of the Sundarbans can have positive economic impacts on Bangladesh.'

The Sundarbans, a mangrove forest in Bangladesh, can play a vital role in the economy through various ways. One significant aspect is ecotourism, as the Sundarbans is a unique and biodiverse ecosystem that attracts tourists. This can generate income and employment opportunities for local communities. Additionally, the Sundarbans provide natural resources like timber, honey, and fish, which can contribute to the economy through sustainable harvesting practices. Furthermore, the mangrove forest acts as a natural barrier against cyclones and storm surges, protecting coastal areas and infrastructure, which in turn can save costs related to disaster recovery and rebuilding. Overall, sustainable management and utilization of the Sundarbans can have positive economic impacts on Bangladesh.

In [127]:
query = "is dhaka the economical hub of bangladesh?"
qa_chain.run(query)

'Yes, Dhaka is considered the economic hub of Bangladesh. It is the capital city and the largest economic center in the country, housing many important industries, businesses, and financial institutions.'

Yes, Dhaka is considered the economic hub of Bangladesh. It is the capital city and the largest economic center in the country, housing many important industries, businesses, and financial institutions.

## querying multiple ques

In [136]:
# List of queries
queries = [
    "who is Sheikh Mujibur Rahman",
    "what are the major industries in Bangladesh?",
    "how does the export sector contribute to Bangladesh's economy?",
]

# Results dictionary to store responses
results = {}

# Iterate over each query
for query in queries:
    # Run the QA chain for the current query
    response = qa_chain.run(query)
    # Store the response in the results dictionary
    results[query] = response

# Print or process the results as needed
print(results)


{'who is Sheikh Mujibur Rahman': "Sheikh Mujibur Rahman, also known as Bangabandhu (Friend of Bengal), was a prominent political leader in Bangladesh. He played a crucial role in the country's independence movement and became the founding father of Bangladesh after it gained independence from Pakistan in 1971. Sheikh Mujibur Rahman served as the first President of Bangladesh and later as its Prime Minister. He is highly regarded for his efforts in leading the nation towards independence and is considered a national hero in Bangladesh.", 'what are the major industries in Bangladesh?': "The major industry in Bangladesh is the ready-made garment (RMG) sector, which accounts for 85% of the country's exports and contributes 9% to GDP. Additionally, Bangladesh has been striving to diversify into industries such as light engineering, plastics, leather, footwear, and pharmaceuticals. However, the RMG sector remains the dominant industry in the country.", "how does the export sector contribute 

In [137]:
display(results)

{'who is Sheikh Mujibur Rahman': "Sheikh Mujibur Rahman, also known as Bangabandhu (Friend of Bengal), was a prominent political leader in Bangladesh. He played a crucial role in the country's independence movement and became the founding father of Bangladesh after it gained independence from Pakistan in 1971. Sheikh Mujibur Rahman served as the first President of Bangladesh and later as its Prime Minister. He is highly regarded for his efforts in leading the nation towards independence and is considered a national hero in Bangladesh.",
 'what are the major industries in Bangladesh?': "The major industry in Bangladesh is the ready-made garment (RMG) sector, which accounts for 85% of the country's exports and contributes 9% to GDP. Additionally, Bangladesh has been striving to diversify into industries such as light engineering, plastics, leather, footwear, and pharmaceuticals. However, the RMG sector remains the dominant industry in the country.",
 "how does the export sector contribut