# RAG PIPELINE PROJECT

# Installing required libraries

In [1]:
!pip install beyondllm youtube_transcript_api llama-index-readers-youtube-transcript llama_index.embeddings.huggingface



### 1- Security Considerations:

Avoid Hard-Coding: API keys are sensitive pieces of information. By using getpass, the keys are input securely without being displayed on the screen or stored in the code, protecting them from accidental exposure.
### 2- Environment Variables:

Why Environment Variables?: Environment variables are used to manage configuration settings and sensitive information like API keys. This approach ensures that these keys are not directly embedded in the codebase, making the application more secure and easier to configure across different environments.
### 3- User Prompt:

User Interaction: The use of getpass to prompt the user for API keys is a simple and effective way to handle sensitive information. It prevents the keys from being displayed in plaintext, enhancing security during the input process.

In [2]:
# Python code: Retrieve API keys and set environment variables
from getpass import getpass
import os

# Securely retrieve and set Hugging Face and Google API keys
hf_token = getpass('Enter Your HuggingfaceHub Token')
google_api_key = getpass('Enter Your Google API Key')

# Set as environment variables
os.environ['HF_TOKEN'] = hf_token
os.environ['GOOGLE_API_KEY'] = google_api_key


1. **Data Loading**:
   - This step initiates the data extraction process from a YouTube video using BeyondLLM's `source.fit` method.
   - The video content is divided into chunks of 1024 characters, making it easier to process and analyze.

2. **Parameter Explanation**:
   - **path**: The URL of the video from which the content will be extracted.
   - **dtype**: The type of data source, in this case, "youtube", indicates that the input is a YouTube video.
   - **chunk_size**: Controls how much text is included in each chunk. Larger chunks might include more context but are also more computationally intensive.
   - **chunk_overlap**: Overlap between chunks is set to 0, meaning each chunk is unique and non-repetitive, which is useful for independent processing of sections.

The processed data will then be used for embedding, retrieval, and further analysis in subsequent steps of the pipeline.

In [4]:
from beyondllm import source, embeddings, retrieve, llms, generator

# Load data from a YouTube video
data = source.fit(
    path="https://www.youtube.com/watch?v=ZM1bdh2mDJQ",  # Video link
    dtype="youtube",  # Type of data source
    chunk_size=1024,  # Size of data chunks
    chunk_overlap=0   # Overlap between chunks
)


['https://www.youtube.com/watch?v=ZM1bdh2mDJQ']


- Select the embedding model to convert the data into vectors
- Embeddings are a crucial step in the pipeline, transforming textual data into numerical vectors that can be processed by machine learning models.
- In this case, we are using a pre-trained embedding model from Hugging Face's model repository.


In [5]:
# Select the embedding model to convert the data into vectors
model_name = 'BAAI/bge-small-en-v1.5'  # A model available on Hugging Face
embed_model = embeddings.HuggingFaceEmbeddings(
    model_name=model_name
)


- Configure the advanced retriever
- The retriever is a critical component of the RAG pipeline, responsible for fetching the most relevant data chunks in response to a query.
- Here, an advanced retriever method is configured to ensure high accuracy and relevance in the retrieval process.


In [6]:
# Configure the advanced retriever
retriever = retrieve.auto_retriever(
    data=data,
    embed_model=embed_model,
    type="cross-rerank",  # Advanced retriever type
    mode="OR",            # Operates in 'OR' mode
    top_k=2               # Retrieve the top two matches
)


-  Retrieve data with a sample query and display the results
-  This section demonstrates how to use the configured retriever to fetch relevant data based on a specific query.
-  The retrieved data is then displayed, showing how the RAG pipeline responds to user queries.


In [7]:
# Retrieve data with a sample query and display the results
query = "Which tool is mentioned in the video?"
retrieved_nodes = retriever.retrieve(query)
print("Data retrieved from the query:", retrieved_nodes)


Data retrieved from the query: [NodeWithScore(node=TextNode(id_='67d5d4af-a155-4bc8-ac77-a18084b72473', embedding=None, metadata={'video_id': 'ZM1bdh2mDJQ'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='ZM1bdh2mDJQ', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'video_id': 'ZM1bdh2mDJQ'}, hash='615e7d33180855d5371f6b37c8848fe6cb5ae545e1ba29af48aae860c75a6437')}, text="hi everyone welcome to the part seven\nvideo of building llm applications using\ngen stack in this video we are going to\nbuild a CSV agent using this tack so the\nCSV file will be the data source for the\nentire R pipeline so if you do not know\nwhat rag is then in short it's retrieval\naugmented generation it streamlines the\nprocess of retrieving data from a data\nsource then pre-processing it and\nstoring it in a certain way in a vector\nstore and then retri it when a query is\npassed and then presenting it with the\nhelp of 

-  Configure a language model from Hugging Face
-  This step involves setting up a Large Language Model (LLM) that will be used to generate responses based on the data retrieved from the RAG pipeline.
-  The model selected is hosted on Hugging Face's model hub, and requires an API token for access.


In [8]:
# Configure a language model from Hugging Face
llm = llms.HuggingFaceHubModel(
    model="mistralai/Mistral-7B-Instruct-v0.2",  # An LLM from Hugging Face
    token=os.environ.get('HF_TOKEN')            # Using the Hugging Face API token
)


In [10]:
# Adding system prompt
system_prompt = f"""
<s>[INST]
You are an AI Assistant.
Please provide direct answers to questions.
[/INST]
</s>
"""

-  Create a query and get a response from the pipeline
-  This step ties together all the components of the RAG pipeline, where a user query is processed, relevant data is retrieved, and a response is generated.
-  The response is produced by the language model, guided by a system prompt that ensures the generated text is relevant and coherent.


In [12]:
# Create a query and get a response from the pipeline
pipeline = generator.Generate(
    question=query,            # Using the "query" variable as the question
    retriever=retriever,
    system_prompt=system_prompt,  # Adding the system prompt
    llm=llm
)


In [13]:
# Get the response and display it
response = pipeline.call()
print("Model response:", response)


Model response: 
        ANSWER: The tools mentioned in the video are Titanic dataset (for reference data), Turing Tame Machine (TTM) for text splitting, Hugging Face Inference API for embedding model, Chroma Vector Store, Azure Chat Openi for LLM, and Rack for building the pipeline.


In [14]:
# Retrieve and display RAG Triad evaluation metrics
rag_evals = pipeline.get_rag_triad_evals()
print("RAG Triad Evaluation:", rag_evals)


Executing RAG Triad Evaluations...
RAG Triad Evaluation: Context relevancy Score: 10.0
This response meets the evaluation threshold. It demonstrates strong comprehension and coherence.
Answer relevancy Score: 10.0
This response meets the evaluation threshold. It demonstrates strong comprehension and coherence.
Groundness score: 7.0
This response does not meet the evaluation threshold. Consider refining the structure and content for better clarity and effectiveness.
