# Augmented RAG Plugins Exploration

This notebook demonstrates the functionality of each plugin in the augmented RAG system. We'll explore each plugin separately, explain its purpose, and show how to use it.

In [None]:
import sys
import os
from dotenv import load_dotenv

src_path = os.path.abspath(os.path.join(os.getcwd(), "..", "src"))
print("Adding src folder to path:", src_path)
sys.path.insert(0, src_path)

load_dotenv(f"{src_path}/.env")

import jinja2

TEMPLATES_DIR = os.path.join(os.path.dirname(__file__), "templates")

# Import plugins
from app.plugins import *
from app.plugins.audio import AudioProcessor, AudioEmbedder
from app.plugins.image import ImageProcessor, ImageEmbedder
from app.plugins.text import TextProcessor, TextEmbedder
from app.plugins.video import VideoProcessor, VideoEmbedder
from app.plugins.retrieval import NLToSQL, NLToNoSQL, LocalFileRetriever
from app.plugins.statistical import StatisticalAnalysisPlugin

Adding src folder to path: c:\Users\ricar\Github\augumented-rag\src


## 1. Audio Processing Plugins

Audio plugins are responsible for processing, embedding, and answering questions about audio content.

In [2]:
# Process audio file
audio_processor = AudioProcessor()
processed_audio = audio_processor.process_transcription('C:\\Users\\ricar\\Github\\augumented-rag\\notebook\\data\\audios\\azure-podcast-1.wav')
print(f"Processed audio content")
processed_audio

Processed audio content


"Welcome to The Azure Podcast, a weekly podcast to keep you up to date on what's new on our cloud platform, Microsoft Azure. Your hosts Cynthia Krang, Evan Basilic, Sujit Dimelo, Kendall Rhoden, Kale Teeter, and Russell Young discuss a different service or solution on each show, with subject matter experts to explain how to get started, how different services work, and how to make decisions in tricky scenarios."

In [6]:
# Generate diarization
audio_processor = AudioProcessor()
answer = audio_processor.perform_diarization('C:\\Users\\ricar\\Github\\augumented-rag\\notebook\\data\\audios\\azure-podcast-1.wav')
print(f"Audio answer: {answer}")

Audio answer: [{'speaker': 'Guest-1', 'text': "Welcome to the Azure Podcast, a weekly podcast to keep you up to date on what's new on our cloud platform, Microsoft Azure. Your hosts Cynthia Crang, Evan Basilic, Sujit Dimelo, Kendall Rhoden, Kale Teeter, and Russell Young discuss a different service or solution on each show, with subject matter experts to explain how to get started, how different services work, and how to make decisions in tricky scenarios.", 'timestamp': 2.1}, {'speaker': 'Guest-1', 'text': 'You can find out more about our podcast at azpodcast.com.', 'timestamp': 27.96}]


In [7]:
# Create embeddings from audio
audio_embedder = AudioEmbedder()
audio_embeddings = await audio_embedder.embed_audio('C:\\Users\\ricar\\Github\\augumented-rag\\notebook\\data\\audios\\azure-podcast-1.wav')
for embedding in audio_embeddings:
    print(f"Audio embeddings: {embedding}")

Audio embeddings: [0.00021542949, -0.0033547096, -0.022241848, 0.012816711, -0.0038124493, -0.02804193, -0.03504627, 0.034284394, 0.013812064, -0.012951883, 0.031630117, -0.019931644, -0.0009991935, -0.00282324, -0.010912023, -0.008399062, -0.021922352, -0.009038054, 0.02804193, -0.0111455005, 0.022032946, -0.012546368, -0.028656347, 0.006076571, -0.0033024843, 0.006709419, 0.02120963, 0.016441762, -0.02258592, 0.013615451, -0.0017157558, 0.0012618562, 0.004540532, -0.0064206435, -0.026198683, 0.0028846816, 0.036447138, 0.028238544, -0.025682574, -0.023863904, 0.012036403, 0.013578586, -0.015606158, 0.044631153, -0.009535731, 0.01515149, 0.019071462, 0.021393953, -0.010217733, 0.023151182, -0.022266423, -0.020275718, 0.027353786, 6.384163e-05, 0.005290119, -0.01442648, 0.035439495, -0.0434269, -0.009830651, -0.041780267, -0.01246035, 0.024171112, -0.011981105, 0.0036588453, 0.0042118193, -0.0064206435, -0.02659191, 0.019833338, -0.006543527, 0.011206942, -0.0039384044, 0.020963863, -0.

## 2. Image Processing Plugins

Image plugins handle processing, embedding, and answering questions about visual content.

In [None]:
import os

image_processor = ImageProcessor()
images_folder = 'C:\\Users\\ricar\\Github\\augumented-rag\\notebook\\data\\images'
image_files = [os.path.join(images_folder, file) for file in os.listdir(images_folder) if file.endswith(('.png', '.jpg', '.jpeg'))]

# Process all images
processed_image = await image_processor.add_captions([open(image_file, 'rb').read() for image_file in image_files])
print(f"Processed image content")
print(processed_image)

Processed image content
[{'caption': 'a diagram of a software project', 'confidence': 0.5965802669525146}, {'caption': 'a diagram of a blockchain', 'confidence': 0.7268205881118774}, {'caption': 'a diagram of a software application', 'confidence': 0.630570650100708}, {'caption': 'a diagram of a software application', 'confidence': 0.6310561895370483}]


In [2]:
import os

image_processor = ImageProcessor()
images_folder = 'C:\\Users\\ricar\\Github\\augumented-rag\\notebook\\data\\images'
image_files = [os.path.join(images_folder, file) for file in os.listdir(images_folder) if file.endswith(('.png', '.jpg', '.jpeg'))]

# Process all images
processed_image = await image_processor.extract_tags([open(image_file, 'rb').read() for image_file in image_files])
print(f"Processed image content")
print(processed_image)

Processed image content
[{'tags': [{'name': 'text', 'confidence': 0.9869052171707153}, {'name': 'screenshot', 'confidence': 0.9573432207107544}, {'name': 'diagram', 'confidence': 0.9291446208953857}, {'name': 'line', 'confidence': 0.8401660919189453}]}, {'tags': [{'name': 'screenshot', 'confidence': 0.9786124229431152}, {'name': 'text', 'confidence': 0.9722787737846375}]}, {'tags': [{'name': 'text', 'confidence': 0.9987995624542236}, {'name': 'screenshot', 'confidence': 0.970331072807312}, {'name': 'diagram', 'confidence': 0.9554139375686646}, {'name': 'plot', 'confidence': 0.8843179941177368}, {'name': 'line', 'confidence': 0.8638424873352051}, {'name': 'number', 'confidence': 0.8533275127410889}]}, {'tags': [{'name': 'text', 'confidence': 0.9992938041687012}, {'name': 'screenshot', 'confidence': 0.9672480225563049}, {'name': 'diagram', 'confidence': 0.9545629024505615}, {'name': 'font', 'confidence': 0.87006676197052}, {'name': 'design', 'confidence': 0.4075535237789154}]}]


In [None]:
# Create embeddings from image
image_embedder = ImageEmbedder()
image_embeddings = image_embedder.embed_image(processed_image)
print(f"Generated image embeddings")

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000001FFBEF58920>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x000001FFBEF17DD0>, 352715.406)])']
connector: <aiohttp.connector.TCPConnector object at 0x000001FFBEF58680>
Error saving to CosmosDB or Azure Search: (Unauthorized) Local Authorization is disabled. Use an AAD token to authorize all requests.
ActivityId: 021a4096-4665-43d7-917d-3b1577f20c65, Microsoft.Azure.Documents.Common/2.14.0
Code: Unauthorized
Message: Local Authorization is disabled. Use an AAD token to authorize all requests.
ActivityId: 021a4096-4665-43d7-917d-3b1577f20c65, Microsoft.Azure.Documents.Common/2.14.0


Generated image embeddings
[-0.03520571, -0.018253172, -0.012400333, -0.017972354, 0.018179271, 0.0077077155, -0.0024719376, 0.05929694, -0.009422183, -0.017366378, 0.01136574, 0.03529439, 0.001602695, -0.024253808, -0.0004494012, 0.009244825, -0.013523605, 0.022214184, 0.013213227, -0.042034023, -0.01279939, 0.01981984, -0.047561705, -0.0017514176, 0.011247501, 0.008860547, 0.0046039373, 0.009414794, -0.0025975667, -0.005065809, -0.016331784, -0.00566809, 0.020174557, -0.02707677, 0.030505704, 0.019376444, -0.0424183, 0.021489969, -0.013154107, -0.022687139, -0.020824874, -0.0124816215, -0.019627701, -0.0006844939, -0.0035674972, -0.031658538, -0.010693255, -0.018253172, 0.0031961524, 0.008934447, -0.019346884, -0.041974902, 0.020883992, -0.0044635283, 0.012954579, -0.004651972, -0.007112825, 0.018637449, 0.02834784, -0.047561705, -0.03272269, 0.008838378, -0.0072827935, 0.00849844, -0.00707218, -0.02540664, 0.013826592, 0.03561955, -0.029589351, 0.004090336, 0.0034548007, 0.029397212

## 3. Text Processing Plugins

Text plugins handle processing, embedding, and answering questions about text content.

In [15]:
# Process text file
text_processor = TextProcessor()
processed_text = await text_processor.extract_layout_entities('C:\\Users\\ricar\\Github\\augumented-rag\\notebook\\data\\documents\\aoai.pdf')
print(f"Processed text content")
print(processed_text)

Processed text content
[{'text': 'Azure OpenAI Service models', 'category': 'line', 'polygon': [0.75, 0.5607, 4.6861, 0.5532, 4.6871, 0.8699, 0.751, 0.8783]}, {'text': 'Article · 03/25/2025', 'category': 'line', 'polygon': [0.7458, 0.9367, 1.7245, 0.9333, 1.725, 1.0612, 0.7463, 1.0646]}, {'text': 'Azure OpenAI Service is powered by a diverse set of models with different capabilities and price points. Model availability', 'category': 'line', 'polygon': [0.7501, 1.2376, 7.2654, 1.24, 7.2653, 1.4038, 0.75, 1.3997]}, {'text': 'varies by region and cloud. For Azure Government model availability, please refer to Azure Government OpenAI Service.', 'category': 'line', 'polygon': [0.7432, 1.4399, 7.1945, 1.4367, 7.1946, 1.5959, 0.7433, 1.5992]}, {'text': '[] Expand table', 'category': 'line', 'polygon': [6.5808, 1.8386, 7.4232, 1.8437, 7.4224, 1.9848, 6.5799, 1.9816]}, {'text': 'Models', 'category': 'line', 'polygon': [0.8155, 2.1816, 1.2031, 2.1817, 1.203, 2.2975, 0.8154, 2.2974]}, {'text': 'D

In [16]:
# Create embeddings from text
text_embedder = TextEmbedder()
original_text_list = []

for text in processed_text:
    original_text_list.append(text['text'])
text_embeddings = await text_embedder.embed_text(" ".join(original_text_list), {})
print(f"Generated text embeddings")
print(text_embeddings)

Error saving to CosmosDB or Azure Search: (Unauthorized) Local Authorization is disabled. Use an AAD token to authorize all requests.
ActivityId: 29075837-2533-44f8-be15-aa7a7b21bbb7, Microsoft.Azure.Documents.Common/2.14.0
Code: Unauthorized
Message: Local Authorization is disabled. Use an AAD token to authorize all requests.
ActivityId: 29075837-2533-44f8-be15-aa7a7b21bbb7, Microsoft.Azure.Documents.Common/2.14.0


Generated text embeddings
[-0.031260144, 0.001292746, -0.0062002125, 0.024389159, 0.004407936, -0.01587141, -0.021919012, 0.036938645, 0.0028711918, 0.043639276, 0.012549486, -0.0098096095, 0.012556584, -0.03154407, 0.008411279, -0.0025996885, -0.013465144, 0.006846142, 0.031004611, -0.031600855, -0.01605596, 0.024474336, -0.0141465645, -0.020343227, -0.0016361179, 4.8522343e-05, 0.00045339277, 0.012123599, -0.03418457, 0.010015455, 0.008134452, -0.00780084, -0.005611068, -0.026901895, 0.007829232, -0.025794588, -0.0036111714, 0.032850124, -0.031402107, 0.004766391, 0.0035579354, -0.00746013, 0.0062640957, -0.0151474, 0.0017674332, -0.05493949, -0.022898553, -0.0049012555, -0.008893951, 0.0014648756, -0.012428817, -0.01839834, 0.005948229, -0.036569543, 0.0057317363, -0.021237591, 0.009518586, 0.04957331, 0.020414209, -0.048409216, -0.031402107, 0.007531111, 0.0052845543, 0.0036235931, -0.01223007, -0.010228399, 0.00905011, 0.032537807, -0.028648034, 0.0017257318, -0.03461046, 0.014338

## 4. Video Processing Plugins

Video plugins handle processing, embedding, and answering questions about video content.

In [None]:
# Example of using video plugins

# Process video file
video_processor = VideoProcessor()
processed_video = video_processor.process_video('sample_data/video.mp4')
print(f"Processed video content")

In [None]:
# Example of using video plugins

# Process video file
video_processor = VideoProcessor()
processed_video = video_processor.analyze_frames('sample_data/video.mp4')
print(f"Processed video content")

In [None]:
# Create embeddings from video
video_embedder = VideoEmbedder()
video_embeddings = video_embedder.embed_video(processed_video)
print(f"Generated video embeddings")

## 5. Retrieval Plugins

Retrieval plugins enable natural language querying of structured and unstructured data sources.

In [None]:
# Example of using retrieval plugins
JINJA_ENV = jinja2.Environment(loader=jinja2.FileSystemLoader(TEMPLATES_DIR))
query_template = JINJA_ENV.get_template("select_template.jinja")

# SQL query generation from natural language
nl_to_sql = NLToSQL()

query_data = {
    "table": "video_data",
    "columns": ["title", "description", "tags"],
    "conditions": {
        "duration": "> 10",
        "resolution": "= '1080p'"
    }
}

natural_query = query_template.render(query_data)
sql_query = nl_to_sql.convert(natural_query, query_data)
print(f"Generated SQL query: {sql_query}")

In [None]:
# NoSQL query generation from natural language
nl_to_nosql = NLToNoSQL()
nosql_query = nl_to_nosql.convert("Get users who signed up after January 2023")
print(f"Generated NoSQL query: {nosql_query}")

In [11]:
# Local file retrieval
file_retriever = LocalFileRetriever("notebook/data")
files = file_retriever.load_all()
print(f"Retrieved {len(files)} files from local storage")

Retrieved 4 files from local storage


## 6. Statistical Analysis Plugin

The statistical analysis plugin provides data analysis capabilities on structured data.

In [None]:
# Example of using statistical analysis plugin
import pandas as pd

# Create sample data
data = pd.DataFrame({
    'value': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
    'category': ['A', 'B', 'A', 'B', 'C', 'C', 'D', 'D', 'E', 'E']
})

# Perform statistical analysis
stats_plugin = StatisticalAnalysisPlugin()
analysis = stats_plugin.kolmogorov_smirnov_test(data)
print("Statistical Analysis Results:")
print(analysis)

## 7. Using Plugin Collections

The system provides pre-configured plugin collections for different types of content.

In [12]:
# Example of using plugin collections
from app.plugins import AUDIO_PLUGINS, IMAGE_PLUGINS, TEXT_PLUGINS, VIDEO_PLUGINS

print(f"Audio plugins available: {len(AUDIO_PLUGINS)}")
print(f"Image plugins available: {len(IMAGE_PLUGINS)}")
print(f"Text plugins available: {len(TEXT_PLUGINS)}")
print(f"Video plugins available: {len(VIDEO_PLUGINS)}")

# Display available plugins in the TEXT_PLUGINS collection
print("\nText plugins:")
for plugin in TEXT_PLUGINS:
    print(f"- {plugin.__class__.__name__}")

Audio plugins available: 7
Image plugins available: 7
Text plugins available: 7
Video plugins available: 7

Text plugins:
- TextProcessor
- TextEmbedder
- TextAnswer
- NLToSQL
- NLToNoSQL
- LocalFileRetriever
- StatisticalAnalysisPlugin


## Conclusion

In this notebook, we've explored all the major components of the augmented RAG system:

1. **Audio Processing Plugins** for handling audio content
2. **Image Processing Plugins** for working with visual data
3. **Text Processing Plugins** for processing text documents
4. **Video Processing Plugins** for analyzing video content
5. **Retrieval Plugins** for querying various data sources
6. **Statistical Analysis Plugin** for data analysis
7. **Combined Plugin Usage** for creating workflows
8. **Plugin Collections** for content-specific processing

Each plugin plays a crucial role in the overall functionality of the augmented RAG system, enabling more accurate and relevant responses to user queries across different types of media and data sources.