<a href="https://colab.research.google.com/github/dev-nileshpawar/python-aiml/blob/main/multi_modal_embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install tabula
!pip install boto3
!pip install faiss-cpu
!pip install pymupdf
!pip install langchain_text_splitters
!pip install langchain
!pip install tabula-py


In [1]:
import boto3
import tabula
import faiss
import json
import base64
import pymupdf
import requests
import os
import logging
import numpy as np
# import warning
from tqdm import tqdm
from botocore.exceptions import ClientError
from langchain_text_splitters import RecursiveCharacterTextSplitter
from IPython import display

# logger = logging.getLogger(__name__)
# logger.setLevel(logging.DEBUG)

# warnings.filterwarnings("ignore")

In [2]:
url = "https://arxiv.org/pdf/1706.03760"
response = requests.get(url)
filename = "attention_paper.pdf"
filepath = os.path.join("data", filename)

os.makedirs("data", exist_ok=True)

with open(filepath, "wb") as f:
  if response.status_code==200:
    f.write(response.content)
    print(f"File downloaded successfully: {filepath}")
  else:
    print(f"Failed to download file. Status code: {response.status_code}")

File downloaded successfully: data/attention_paper.pdf


## Data extractio

<h1 align="center">Multi modal RAG with Amazon Bedrock, Amazon Nova and LangChain</h1>


<h1 align="center"><b>Customize a Foundation Model</b></h1>
### 1. Instruction-based Fine-Tuning



```
  +------------------------------+
  | Task-Specific Labeled Data   |
  +------------------------------+
                |
                v   Fine-Tuning
                |
                v
  +------------------------------+
  |            LLM               | <---------------------- User/System Prompt
  +------------------------------+
```


### 2. Domain adoption
```
  +--------------------------------+
  | Domain specific unlebeled data |
  +--------------------------------+
                   |
                   v  Continious pre training
                   |
                   v
  +--------------------------------+
  |            LLM                 |<----------------user/system prompts
  +--------------------------------+
```


### 3. Informative Retrieval
1. convert knowledge data (Audio, video, Image, Text) into embeddings and store vectors into vector DB
2. whenever user sends a query then we search into vector DB for relevant info (data chunk)
3. we call LLM by passing relavant info chunk and user query and extract final answer.

```
    +-----------------------------------+
    |  Domain specific unlabeled data   |
    +-----------------------------------+
                     |   Embeddings
                     V   
                     |   prompt
                     v
                     |   Prompt with context
                     v
       +---------------------------+
       |           LLM             |
       +---------------------------+
```


In [3]:
# wrapper function to extract the data from file

def create_directories(base_path):
  directories = ["images", "text", "tables", "page_images"]
  for dir in directories:
    if not os.path.exists(os.path.join(base_path, dir)):
      os.makedirs(os.path.join(base_path, dir))

def process_tables(doc, page_num, base_dir, items):
  try:
    tables = tabula.read_pdf(filepath, pages = page_num+1, multiple_tables=True)
    # print(tables)
    for table_id, table in enumerate(tables):
      table_text = "\n".join([" | ".join(map(str, row)) for row in table.values])
      table_file_name = f"{base_dir}/tables/{os.path.basename(filepath)}_table_{page_num}_{table_id}.txt"

      with open(table_file_name, "w") as f:
        f.write(table_text)
        items.append({"page": page_num, "type":"table", "text": table_text, "path": table_file_name})
  except Exception as e:
    print(e)
    return
def process_text_chunks(text, text_splitter, page_num, base_dir, items):
  text_chunks = text_splitter.split_text(text)
  for chunk_id, chunk in enumerate(text_chunks):
    text_file_name = f"{base_dir}/text/{os.path.basename(filepath)}_text_{page_num}_{chunk_id}.txt"
    with open(text_file_name, "w") as f:
      f.write(chunk)
      items.append({"page": page_num, "type": "text", "text": chunk, "path": text_file_name})
  return
def process_images(page, page_num, base_dir, items):
  image_list = page.get_images()
  for image_id, image in enumerate(image_list):
    xref = image[0]
    pix = pymupdf.Pixmap(doc, xref)
    image_file_name = f"{base_dir}/images/{os.path.basename(filepath)}_image_{page_num}_{image_id}_{xref}.png"
    pix.save(image_file_name)
    with open(image_file_name, "rb") as f:
      image_bytes = base64.b64encode(f.read()).decode("utf-8")
    items.append({"page": page_num, "type": "image", "path":image_id, "image": image_bytes})
  return
def process_page_images(page, page_num, base_dir, items):
  return

In [None]:
doc = pymupdf.open(filepath)
num_pages = len(doc)
base_dir = "data"

create_directories(base_dir)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, length_function=len)
items = []

# process each page of the pdf
for page_num in tqdm(range(num_pages)):
  page = doc.load_page(page_num)
  text = page.get_text("text")
  process_tables(doc, page_num, base_dir, items)
  process_text_chunks(text, text_splitter, page_num, base_dir, items)
  process_images(page, page_num, base_dir, items)
  process_page_images(page, page_num, base_dir, items)


In [5]:
def generate_multimodal_embedding(prompt=None, image=None, output_embedding_length=384):
  if not prompt and not image:
    raise ValueError("prompt or image must be provided")

  model_id = "amazon.titan-embed-image-v1"
  body = {"embeddingConfig": {"outputEmbeddingLength": output_embedding_length}}

  if prompt:
    body["inputText"] = prompt
  if image:
    body["inputImage"] = image

  try:
    response = client.invoke_model(
        body=json.dumps(body),
        modelId=model_id,
        accept="application/json",
        contentType="application/json"
    )
    response_body = json.loads(response.get("body").read())
    embedding = response_body["embedding"]
    return embedding
  except Exception as e:
    print(e)
    return None


In [6]:
from google.colab import userdata

PRO_MODEL_ID = "amazon.nova-pro-v1:0"
LITE_MODEL_ID = "amazon.nova-lite-v1:0"
MACRO_MODEL_ID = "amazon.nova-macro-v1:0"

YOUR_ACCESS_KEY = userdata.get('YOUR_ACCESS_KEY')
YOUR_SECRET_KEY = userdata.get("YOUR_SECRET_KEY")
print("--------", YOUR_ACCESS_KEY)
print("--------", YOUR_SECRET_KEY)
client = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1",
    aws_access_key_id=YOUR_ACCESS_KEY,
    aws_secret_access_key=YOUR_SECRET_KEY,
)



In [7]:
embedding_vector_dimension = 384

item_counts = {
  "text": sum(1 for item in items if item["type"] == "text"),
  "table": sum(1 for item in items if item["type"] == "table"),
  "image": sum(1 for item in items if item["type"] == "image"),
  "page": sum(1 for item in items if item["type"] == "page")
}

counters = dict.fromkeys(item_counts.keys(), 0)
bar_format="{l_bar}/{bar}| {n_fmt}/{total_fmt} [{elapsed} < {remaining}, {rate_fmt}{postfix}]"

print("---", item_counts)
with tqdm(
    total=len(items),
    desc="Generating embeddings",
    bar_format=bar_format
)as pbar:
  for item in items:
    item_type = item["type"]
    counters[item_type]+=1
    if item_type in ["text", "table"]:
      item["embedding"] = generate_multimodal_embedding(prompt=item["text"], output_embedding_length=embedding_vector_dimension)
    elif item_type in ["image"]:
      item["embedding"] = generate_multimodal_embedding(image=item["image"], output_embedding_length=embedding_vector_dimension)

  pbar.set_postfix_str(f"Text: {counters['text']}/{item_counts['text']}, Table: {counters['table']}/{item_counts['table']}, Image: {counters['image']}/{item_counts['image']}")

  pbar.update(1)

--- {'text': 52, 'table': 1, 'image': 26, 'page': 0}


Generating embeddings:   1%|/▏         | 1/79 [00:18 < 23:28, 18.06s/it, Text: 52/52, Table: 1/1, Image: 26/26]


In [16]:
items[0]["text"]

'Operational quasiprobabilities for continuous variables\nJeongwoo Jae,1 Junghee Ryu,2, ∗and Jinhyoung Lee1, †\n1Department of Physics, Hanyang University, Seoul, 133-791, Republic of Korea\n2Centre for Quantum Technologies, National University of Singapore, 3 science Drive 2, 117543 Singapore, Singapore\nWe generalize the operational quasiprobability involving sequential measurements proposed by\nRyu et al. [Phys. Rev. A 88, 052123] to a continuous-variable system. The quasiprobabilities in\nquantum optics are incommensurate, i.e., they represent a given physical observation in diﬀerent\nmathematical forms from their classical counterparts, making it diﬃcult to operationally interpret\ntheir negative values. Our operational quasiprobability is commensurate, enabling one to compare\nquantum and classical statistics on the same footing. We show that the operational quasiprobability\ncan be negative against the hypothesis of macrorealism for various states of light.\nQuadrature'

In [9]:
all_embeddings = all_embeddings = np.array(
    [item["embedding"] for item in items if item.get("embedding") is not None],
    dtype="float32"
)

index = faiss.IndexFlatL2(embedding_vector_dimension)
index.reset()

index.add(np.array(all_embeddings, dtype=np.float32))

In [10]:
# !pip install langchain_aws

In [11]:
from langchain_aws import ChatBedrock

def invoke_nova_multimodal(prompt, matched_items):
  system_message = [{
      "text": """You are an helpful assistant for question asnwering,
        The text context is relavant information retrieved.
        The provided image(s) are relavant information retrieved.
        Answer if answer is available in provided context otherwise return \"answer not found\" reply
      """
  }]

  message_content = []
  for item in matched_items:
    if item["type"] == "text" or item["type"] =="table":
      message_content.append({"text": item["text"]})
    else:
      message_content.append({"image": item["image"]})

    inf_params = {
        "max_new_tokens" : 300,
        "top_p":0.9,
        "top_k":30,
    }

    message_list = [{
      "role":"user", "content" : message_content
    }]

    native_request = {
        "message": message_list,
        "system": system_message,
        "inferenceConfig": inf_params
    }

    model_id = "amazon.nova-pro-v1:0"
    client = ChatBedrock(
        model_id=model_id,
        aws_access_key_id=YOUR_ACCESS_KEY,
        aws_secret_access_key=YOUR_SECRET_KEY,
        region_name="us-east-1",
        )

    response = client.invoke(json.dumps(native_request))

    model_response = response.content
    return model_response



In [21]:
query = "how long were the base and bigg models trained"
query_embedding = generate_multimodal_embedding(prompt=query, output_embedding_length=embedding_vector_dimension)


result = index.search(np.array(query_embedding, dtype=np.float32).reshape(1, -1), k=5)

print('====',result)

==== (array([[1.3791456, 1.3848724, 1.3874055, 1.411334 , 1.411334 ]],
      dtype=float32), array([[19,  5, 40, 13, 14]]))


In [22]:
D, I = result

matched_items = [
    {k: v for k, v in items[idx].items() if k != "embedding"}
    for idx in I[0]
]
print(matched_items)
response = invoke_nova_multimodal(query, matched_items)

display.Markdown(response)

[{'page': 2, 'type': 'text', 'text': 'consider a classical model assuming realism and nonin-\nvasive measurability. Classical physics has been consid-\nered as the realistic theory which assumes predetermined\nphysical quantities before the actual measurements. This\nimplies the existence of an underlying joint probability\ndistribution for the outcomes of all possible measure-\nments.\nIn a temporal scenario, Leggett and Garg examined\nnoninvasive measurability at the macroscopic level. One\ncan measure a physical quantity of a macroscopic object\nwithout disturbing it. This hypothesis together with re-\nalism, called macrorealism (MR), leads the Leggett-Garg\ninequality involving temporal correlations [11]. It shows\nthat quantum prediction is incompatible with the clas-\nsical one. More precisely, MR is deﬁned by the follow-\ning three hypotheses [29, 30]: “Macrorealism per se. A\nmacroscopic object which has available to it two or more\nmacroscopically distinct states is at any giv

Based on the provided context, the classical model under consideration assumes realism and non-invasive measurability. Classical physics is viewed as a realistic theory where physical quantities are predetermined before measurements, implying an underlying joint probability distribution for all possible measurement outcomes.

In a temporal scenario, Leggett and Garg investigated non-invasive measurability at the macroscopic level, proposing that one can measure a physical quantity of a macroscopic object without disturbing it. This, combined with realism, forms the concept of macrorealism (MR). MR leads to the Leggett-Garg inequality, which involves temporal correlations and demonstrates that quantum predictions are incompatible with classical ones.

Macrorealism is defined by three hypotheses:
1. **Macrorealism per se**: A macroscopic object that can exist in two or more macroscopically distinct states is, at any given time, in a definite state.

No further answer can be derived from the provided context.