# RAG App using LLAMA Index Library having LLAMA3 LLM & nomic-embed-text as embedding model
Main idea is to chat with multiple PDFs, where our LLM model & embedding model will be open source and running locally. So this app is suitable to chat on private data

In [48]:
import os
from dotenv import load_dotenv
load_dotenv("/home/akhil/personalProjects/resAssist/.env")

True

In [64]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama

In [66]:
# Setup data directory for RAG with PDFs
DATA_DIR = "/home/akhil/personalProjects/resAssist/data_for_RAG/research_papers"

# load documents
documents = SimpleDirectoryReader(DATA_DIR).load_data()

# setup embedding model
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

# setup llm model
Settings.llm = Ollama(model="llama3", request_timeout=360.0)

# create indexes for documents | uses text-embedding-ada-002 model to create vector embeddings (indexes)
index = VectorStoreIndex.from_documents(documents, show_progress=True)

Generating embeddings:  75%|███████▌  | 9/12 [07:54<02:38, 52.72s/it]
Parsing nodes: 100%|██████████| 39/39 [00:00<00:00, 772.79it/s]
Generating embeddings: 100%|██████████| 64/64 [00:08<00:00,  7.32it/s]


In [67]:
# Building Query Engine Rather than Chat Engine | As it is a Q/A Bot
query_engine = index.as_query_engine()

## Queries

In [69]:
response = query_engine.query(
    """
    What are the key challenges reported in research for the task of Automated Audio Captioning? 
    Also let me know how to improve such captioning systems.
    """
)
print(response)

Based on the provided context, it seems that there is no explicit discussion about the key challenges reported in research for the task of Automated Audio Captioning. However, some potential challenges that can be inferred from the text are:

1. Balancing data: The authors mention the importance of balancing data when combining different datasets. This suggests that one challenge might be ensuring that each dataset contributes equally to the overall performance of the captioning system.
2. Handling variability in audio and captions: The text mentions the use of different speech recognition systems, such as Whisper, which could introduce variability in the output. Similarly, the authors mention using different datasets with varying characteristics, which could also affect the performance of the captioning system.
3. Overcoming limitations of language models: The authors discuss the use of pre-trained language models to improve the quality of captions. However, this might not be sufficie

## Getting Multiple Responses

In [75]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.indices.postprocessor import SimilarityPostprocessor
from llama_index.core.response.pprint_utils import pprint_response

retreiver = VectorIndexRetriever(index=index, similarity_top_k=4)
query_engine = RetrieverQueryEngine(retriever=retreiver)

response = query_engine.query(
    """
    Explain me the task of Automated Audio Captioning briefly.
    """
)
pprint_response(response, show_source=True)

Final Response: Based on the provided context information, here is a
brief explanation of the task of Automated Audio Captioning:
Automated Audio Captioning (AAC) involves generating natural language
descriptions or captions that accurately summarize and describe the
content of an audio file. The goal is to develop a system that can
automatically generate relevant and coherent text based on the audio
input, which can be used for various applications such as search,
retrieval, and indexing.
______________________________________________________________________
Source Node 1/4
Node ID: f35136db-a7fd-45b8-bbb9-04c7ffe25170
Similarity: 0.5272697155266008
Text: Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer
learningwith a unified text-to-text transformer,” J. Mach. Learn. Res.
, vol. 21, no. 1, jan 2020. [33] A. van den Oord, Y . Li, and O.
Vinyals, “Representation learning with contrastive predictive coding,”
2018. [34] T. Pellegrini, I. Khalfaoui-Hassani, E. Labb ´e, and T.

In [77]:
# Applying a similarity threshold
retreiver = VectorIndexRetriever(index=index, similarity_top_k=4)
postprocessor = SimilarityPostprocessor(similarity_cutoff=0.5)
query_engine = RetrieverQueryEngine(retriever=retreiver, node_postprocessors=[postprocessor])

response = query_engine.query(
    """
    Explain me the task of Automated Audio Captioning briefly.
    """
)
pprint_response(response, show_source=True)

Final Response: Based on the provided context, Automated Audio
Captioning is a task that involves generating natural language
captions to describe audio content, such as music or spoken words. The
goal is to automatically generate accurate and informative captions
for audio files, allowing users to easily identify and understand the
content of the audio without having to listen to it in its entirety.
______________________________________________________________________
Source Node 1/2
Node ID: f35136db-a7fd-45b8-bbb9-04c7ffe25170
Similarity: 0.5272697155266008
Text: Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer
learningwith a unified text-to-text transformer,” J. Mach. Learn. Res.
, vol. 21, no. 1, jan 2020. [33] A. van den Oord, Y . Li, and O.
Vinyals, “Representation learning with contrastive predictive coding,”
2018. [34] T. Pellegrini, I. Khalfaoui-Hassani, E. Labb ´e, and T.
Masquelier,...
______________________________________________________________________
Sour

## Persistence Indexes
Storing generated indexed presistently (on hard-disk) rather than on RAM and then applying RAG

In [78]:
from llama_index.core import load_index_from_storage, StorageContext

PERSIST_DIR = DATA_DIR + "/RAGIndexesStorage"

if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader(DATA_DIR).load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is Audio Captioning and What type of neural network architectures are used?")
print(response)

Based on the provided context, audio captioning refers to the task of generating a natural language description for an audio signal or a segment of an audio file. This task involves recognizing and describing the content, events, or actions present in the audio signal.

The neural network architectures used for audio captioning are not explicitly stated in the given context, but based on the papers cited (e.g., [34], [36], [45]), it can be inferred that various types of neural networks have been employed for this task. Some possible architectures include:

1. Convolutional Neural Networks (CNNs) or ConvNeXt models for audio classification and feature extraction.
2. Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks for sequence modeling and processing.
3. Transformer-based models, such as the unified text-to-text transformer mentioned in [33], which could be used for audio captioning.
4. MobileNetV2 architectures with inverted residuals and linear bottlenecks, a