## Develop a versatile Q&A chatbot, employing LlamaIndex, ASTRA DB (Apache Cassandra), and Gradient's open-source models like LLama2, all designed for seamless interaction with YouTube videos



<a href="https://colab.research.google.com/github/bhattbhavesh91//youtube-q-a-gradient-astradb/blob/main/youtube-q-a-notebook.ipynb" target="_blank"><img height="40" alt="Run your own notebook in Colab" src = "https://colab.research.google.com/assets/colab-badge.svg"></a>

# Installation

In [1]:
!pip install -q cassandra-driver
!pip install -q cassio>=0.1.1
!pip install -q gradientai --upgrade
!pip install -q llama-index
!pip install -q tiktoken==0.4.0
!pip install -Uq openai-whisper
!pip install -Uq yt-dlp

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.1/19.1 MB[0m [31m38.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m166.3/166.3 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.6/137.6 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m834.9/834.9 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m85.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m217.8/217.8 kB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m97.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.8/143.8 kB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━

# Imports

In [2]:
import json
import os
import re
import time
import whisper
import yt_dlp
from cassandra.auth import PlainTextAuthProvider
from cassandra.cluster import Cluster
from llama_index import ServiceContext
from llama_index import set_global_service_context
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.embeddings import GradientEmbedding
from llama_index.llms import GradientBaseModelLLM
from llama_index.vector_stores import CassandraVectorStore

# Download Audio from YouTube video function

In [3]:
def download_audio(link):
    with yt_dlp.YoutubeDL({'extract_audio': True,
                           'format': 'bestaudio',
                           'outtmpl': '%(title)s.mp3'}) as video:
        info_dict = video.extract_info(link, download = True)
        video_title = info_dict['title']
        video.download(link)
    return video_title

# Example to extract audio -

In [4]:
youtube_video_url = "https://www.youtube.com/watch?v=Tt0arZN6EBM"

In [5]:
download_audio(youtube_video_url)

[youtube] Extracting URL: https://www.youtube.com/watch?v=Tt0arZN6EBM
[youtube] Tt0arZN6EBM: Downloading webpage
[youtube] Tt0arZN6EBM: Downloading ios player API JSON
[youtube] Tt0arZN6EBM: Downloading android player API JSON
[youtube] Tt0arZN6EBM: Downloading m3u8 information
[info] Tt0arZN6EBM: Downloading 1 format(s): 251
[download] Destination: Why Change Is So Scary -- and How to Unlock Its Potential ｜ Maya Shankar ｜ TED.mp3
[download] 100% of   11.35MiB in 00:00:00 at 15.23MiB/s  
[youtube] Extracting URL: https://www.youtube.com/watch?v=Tt0arZN6EBM
[youtube] Tt0arZN6EBM: Downloading webpage
[youtube] Tt0arZN6EBM: Downloading ios player API JSON
[youtube] Tt0arZN6EBM: Downloading android player API JSON
[youtube] Tt0arZN6EBM: Downloading m3u8 information
[info] Tt0arZN6EBM: Downloading 1 format(s): 251
[download] Why Change Is So Scary -- and How to Unlock Its Potential ｜ Maya Shankar ｜ TED.mp3 has already been downloaded
[download] 100% of   11.35MiB


'Why Change Is So Scary -- and How to Unlock Its Potential | Maya Shankar | TED'

# Transcribe Audio from mp3 file

In [6]:
os.makedirs("text_files")

In [7]:
def transcribe(model, audio):
    result = model.transcribe(audio)
    with open("text_files/transcription.txt", 'w') as f:
        f.write(result["text"])
    return 1

In [11]:
! pip install git+https://github.com/openai/whisper.git -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [8]:
import whisper

model = whisper.load_model("small")

100%|███████████████████████████████████████| 461M/461M [00:05<00:00, 95.2MiB/s]


In [10]:
transcribe(model, "Why Change Is So Scary -- and How to Unlock Its Potential ｜ Maya Shankar ｜ TED.mp3")

1

# Setup the DataStax Vector DB Connection

In [11]:
cloud_config= {
  'secure_connect_bundle': 'secure-connect-ullas-astra-test.zip'
}

with open("ullas_astra_test-token.json") as f:
    secrets = json.load(f)

CLIENT_ID = secrets["clientId"]
CLIENT_SECRET = secrets["secret"]

auth_provider = PlainTextAuthProvider(CLIENT_ID, CLIENT_SECRET)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session = cluster.connect()

row = session.execute("select release_version from system.local").one()
if row:
  print(row[0])
else:
  print("An error occurred.")

ERROR:cassandra.connection:Closing connection <AsyncoreConnection(132771642303312) 1b5e0326-f7bd-40fc-beaa-66850c904d73-us-east1.db.astra.datastax.com:29042:611ae092-da92-4dd6-a840-d493f50e4521> due to protocol error: Error from server: code=000a [Protocol error] message="Beta version of the protocol used (5/v5-beta), but USE_BETA flag is unset"


4.0.11-b86be92b8b5f


# Environment Variables

In [12]:
os.environ['GRADIENT_ACCESS_TOKEN'] = 'OnXh8FJJVSSKJgoYSRFJ65QNNJ6x'
os.environ['GRADIENT_WORKSPACE_ID'] = 'c71d878e-031c07c5c7725b0a_workspace'

# Define the Gradient's Model Adapter for LLAMA-2

In [13]:
llm = GradientBaseModelLLM(
    base_model_slug = "llama2-7b-chat",
    max_tokens = 400,
)

# Configure Gradient embeddings

In [14]:
embed_model = GradientEmbedding(
    gradient_access_token = os.environ["GRADIENT_ACCESS_TOKEN"],
    gradient_workspace_id = os.environ["GRADIENT_WORKSPACE_ID"],
    gradient_model_slug = "bge-large",
)

# Setup LLAMA Index Service Context

In [15]:
service_context = ServiceContext.from_defaults(
    llm = llm,
    embed_model = embed_model,
    chunk_size = 256,
)

set_global_service_context(service_context)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


# Load the Documents

In [16]:
documents = SimpleDirectoryReader("/content/text_files").load_data()
print(f"Loaded {len(documents)} document(s).")

Loaded 1 document(s).


# Setup and Query Index

In [18]:
index = VectorStoreIndex.from_documents(documents,
                                        service_context = service_context)
query_engine = index.as_query_engine()

In [22]:
response_out = query_engine.query("What is used to convert speech to text in the text file?")
print(response_out.response)


The query is asking about the tool or method used to convert speech to text in the text file provided in the context information. Based on the information provided, the answer is "their breath." The speaker mentions that they use their breath to convert speech to text in the text file, suggesting that their breath is the tool or method used for this purpose.


In [None]:
response_out = query_engine.query("Does this require an API key?")
print(response_out.response)

Yes.

Explanation: In the video, the speaker mentions that in order to use the Google Cloud Speech to Text API, you will require the credentials of the API. The speaker then goes on to explain how to create a JSON file containing the credentials and how to save it to an environment variable called "Google underscore application underscore credentials". This indicates that an API key is required to use the Google Cloud Speech to Text API.


In [19]:
response_out = query_engine.query("What is a big or small change that you made in your life? How did you overcome the fear of change?")
print(response_out.response)


A big change I made in my life was quitting my job to pursue my passion for cognitive science. I had been working in the same job for several years, and while it was stable and secure, I found myself feeling unfulfilled and restless. I knew that I needed to make a change, but the fear of the unknown was holding me back.

To overcome this fear, I took small steps towards my goal. I started by taking courses in cognitive science and attending seminars related to the field. I also began networking with professionals in the field, which helped me build a support system and gain a better understanding of what was possible.

As I gained more confidence in my decision, I began to take bigger steps towards my goal. I quit my job and started my own business, which allowed me to pursue my passion full-time. It was a scary and uncertain time, but I knew that I had to take the leap of faith in order to truly fulfill my potential.

Looking back, I realize that the fear of change was holding me bac