## Quick Tour

The following examples show how to get started with the `unstructured` library. See
our [documentation page](https://unstructured-io.github.io/unstructured) for a full description
of the features in the library.

Another way to try out the `unstructured` library is by running a docker container -- compatible with either Intel/AMD or Apple Silicon! Check out the [instructions for using the docker image](https://github.com/Unstructured-IO/unstructured#dizzy-instructions-for-using-the-docker-image).

In [1]:
# Install Requirements
!apt-get -qq install poppler-utils tesseract-ocr
# Upgrade Pillow to latest version
%pip install -q --user --upgrade pillow
# Install Python Packages
%pip install -q unstructured["all-docs"]==0.12.5
# NOTE: you may also upgrade to the latest version with the command below,
#       though a more recent version of unstructured will not have been tested with this notebook
# %pip install -q --upgrade unstructured

Selecting previously unselected package poppler-utils.
(Reading database ... 123598 files and directories currently installed.)
Preparing to unpack .../poppler-utils_22.02.0-2ubuntu0.5_amd64.deb ...
Unpacking poppler-utils (22.02.0-2ubuntu0.5) ...
Selecting previously unselected package tesseract-ocr-eng.
Preparing to unpack .../tesseract-ocr-eng_1%3a4.00~git30-7274cfa-1.1_all.deb ...
Unpacking tesseract-ocr-eng (1:4.00~git30-7274cfa-1.1) ...
Selecting previously unselected package tesseract-ocr-osd.
Preparing to unpack .../tesseract-ocr-osd_1%3a4.00~git30-7274cfa-1.1_all.deb ...
Unpacking tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
Selecting previously unselected package tesseract-ocr.
Preparing to unpack .../tesseract-ocr_4.1.1-2.1build1_amd64.deb ...
Unpacking tesseract-ocr (4.1.1-2.1build1) ...
Setting up tesseract-ocr-eng (1:4.00~git30-7274cfa-1.1) ...
Setting up tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
Setting up poppler-utils (22.02.0-2ubuntu0.5) ...
Setting up tess

See our [example docs page](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs) to find example docs used in this tutorial. You can also upload your own files by clicking on “Choose Files” on the left panel then select and upload the file to Colab.

In [None]:
!mkdir -p example-docs
# Install example-10k.html and layout-parser-paper.pdf
!wget  https://raw.githubusercontent.com/Unstructured-IO/unstructured/main/example-docs/example-10k.html -P example-docs
!wget  https://raw.githubusercontent.com/Unstructured-IO/unstructured/main/example-docs/layout-parser-paper-fast.pdf -P example-docs

--2024-08-05 08:20:10--  https://raw.githubusercontent.com/Unstructured-IO/unstructured/main/example-docs/example-10k.html
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2456707 (2.3M) [text/plain]
Saving to: ‘example-docs/example-10k.html’


2024-08-05 08:20:10 (55.8 MB/s) - ‘example-docs/example-10k.html’ saved [2456707/2456707]

--2024-08-05 08:20:10--  https://raw.githubusercontent.com/Unstructured-IO/unstructured/main/example-docs/layout-parser-paper-fast.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-08-05 08:20:10

In [2]:
# Install NLTK Data
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

### PDF Parsing

There are two strategies availalbe for parsing PDF documents: "fast" and "hi_res." The default strategy is "hi_res"

If your main objective is extracting text from a "clean" PDF, i.e. one that does not include text in images that require OCR), go with the "fast" option.

Otherwise, if your PDF may have images with text to extract, or, you prefer to have better structured Elements that better characterize the text items within the document, go with with the "hi_res" option.

Naturally, "fast" is faster than "hi_res" -- by an order of magnitude!

**Connecting to Google Drive**

In [4]:
from google.colab import drive
# Mount your Google Drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [7]:
from unstructured.partition.pdf import partition_pdf


elements = partition_pdf(
    "/content/drive/My Drive/UET_Lahore.pdf",
    chunking_strategy="by_title",
    extract_images_in_pdf=True,
    infer_table_structure=True,
    max_characters=3000,
    new_after_n_chars=2800,
    combine_text_under_n_chars=2000,
    combine_similar_elements=True,
    image_output_dir_path='/content'
)



yolox_l0.05.onnx:   0%|          | 0.00/217M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/115M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/46.8M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/table-transformer-structure-recognition were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Let's examine the types of elements returned for both the "hi_res" and "fast" strategies:

In [8]:
# Categorize elements by type
def categorize_elements(raw_pdf_elements):
    text_elements = []
    table_elements = []
    for element in raw_pdf_elements:
        if 'CompositeElement' in str(type(element)):
            text_elements.append(str(element))
        elif 'Table' in str(type(element)):
            table_elements.append(str(element))
    return text_elements, table_elements


Let's display the type and text of some of the elements in the document:

In [9]:
# extract tables and texts
texts, tables = categorize_elements(elements)

# length of text elem
print(len(texts))

# length of table elem
print(len(tables))

199
148


### generate text, table and image summaries

In [10]:
%%capture
!pip install -U google-generativeai langchain langchain-google-genai python-dotenv

In [None]:
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = 'LANGCHAIN_API_KEY'
os.environ['GOOGLE_API_KEY'] = 'GOOGLE_API_KEY'

# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
import google.generativeai as genai
GOOGLE_API_KEY= os.getenv('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

In [37]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda
from langchain.prompts import PromptTemplate

model = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0, max_tokens=1024)
model_vision = ChatGoogleGenerativeAI(model="gemini-1.5-flash",temperature=0, max_tokens=1024)

In [38]:
# texts[10]
tables[10]


'from a HEC Degree Title M.Sc. Electrical Engineering M.Sc. Artificial Intelligence M.Sc. Telecommunication Networks M.Sc. Computer Engineering M.Sc. Computer Science M.Sc. Thermal Power Engineering M.Sc. Mechanical Design Engineering M.Sc. Automotive Engineering M.Sc. Thermo-fluid Engineering M.Sc. Railway Engineering M.Sc. Renewable Energy Systems Engineering M.Sc. Mechatronics Engineering M.Sc. Engineering Management M.Sc. Environmental Engineering'

text & table summaries

In [41]:
import time
from tenacity import retry, stop_after_attempt, wait_exponential

# Generate summaries of text elements
def generate_text_summaries(texts, tables, summarize_texts=False):
    """
    Summarize text elements
    texts: List of str
    tables: List of str
    summarize_texts: Bool to summarize texts
    """

    # Prompt
    prompt_text = """You are an assistant tasked with summarizing tables and text for retrieval. \
    These summaries will be embedded and used to retrieve the raw text or table elements. \
    Give a concise summary of the table or text that is well-optimized for retrieval. Table \
    or text: {element} """
    prompt = PromptTemplate.from_template(prompt_text)
    #empty_response = RunnableLambda(
      #  lambda x: AIMessage(content="Error processing document")
   # )
    # Text summary chain
    summarize_chain = {"element": lambda x: x} | prompt | model | StrOutputParser()

    # Initialize empty summaries
    text_summaries = []
    table_summaries = []

    @retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=4, max=60))
    def batch_process(elements):
        return summarize_chain.batch(elements, {"max_concurrency": 1})

    # Apply to text if texts are provided and summarization is requested
    if texts and summarize_texts:
        for i in range(0, len(texts), 1):  # Process in smaller batches
            text_summaries.extend(batch_process(texts[i:i+1]))
    elif texts:
        text_summaries = texts

    # Apply to tables if tables are provided
    if tables:
        for i in range(0, len(tables), 1):  # Process in smaller batches
            table_summaries.extend(batch_process(tables[i:i+1]))

    return text_summaries, table_summaries


# Get text & table summaries
text_summaries, table_summaries = generate_text_summaries(texts[0:5], tables[0:5], summarize_texts=True)

In [42]:
len(text_summaries)

5

In [44]:
len(table_summaries)

5

image summaries

In [45]:
import os
import base64
# encode image
def encode_image(image_path):
    """Getting the base64 string"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

In [46]:
def image_summarize(img_base64, prompt):
    """Make image summary"""
    msg = model_vision.invoke(
        [
            HumanMessage(
                content=[
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"},
                    },
                ]
            )
        ]
    )
    return msg.content


In [50]:
def generate_img_summaries(path):
    """
    Generate summaries and base64 encoded strings for images
    path: Path to list of .jpg files extracted by Unstructured
    """
    # Store base64 encoded images
    img_base64_list = []

    # Store image summaries
    image_summaries = []

    # Prompt
    prompt = """You are an assistant tasked with summarizing images for retrieval. \
    These summaries will be embedded and used to retrieve the raw image. \
    Give a concise summary of the image that is well optimized for retrieval."""
    image_files = sorted([f for f in os.listdir(path) if f.endswith(('.png', '.jpg', '.jpeg'))])
    image_files = image_files[:5]
    for img_file in image_files:
        img_path = os.path.join(path, img_file)
        base64_image = encode_image(img_path)
        img_base64_list.append(base64_image)
        image_summaries.append(image_summarize(base64_image, prompt))
    return img_base64_list, image_summaries


In [51]:
fpath = "/content/figures"
# Image summaries
img_base64_list, image_summaries = generate_img_summaries(fpath)

In [54]:
image_summaries[2]

'A microscope connected to a computer monitor displaying a magnified image of a green leaf.'

## Multi-vector retriever

In [56]:
%%capture
!pip install langchain-community

In [57]:
import uuid
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.schema.document import Document
from langchain.storage import InMemoryStore
from langchain.vectorstores import Chroma

In [58]:
def create_multi_vector_retriever(vectorstore, text_summaries, texts, table_summaries, tables, image_summaries, images):
    """
    Create retriever that indexes summaries, but returns raw images or texts
    """
    # Initialize the storage layer
    store = InMemoryStore()
    id_key = "doc_id"

    # Create the multi-vector retriever
    retriever = MultiVectorRetriever(
        vectorstore=vectorstore,
        docstore=store,
        id_key=id_key,
    )

    # Helper function to add documents to the vectorstore and docstore
    def add_documents(retriever, doc_summaries, doc_contents):
        doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
        summary_docs = [
            Document(page_content=s, metadata={id_key: doc_ids[i]})
            for i, s in enumerate(doc_summaries)
        ]
        retriever.vectorstore.add_documents(summary_docs)
        retriever.docstore.mset(list(zip(doc_ids, doc_contents)))

    # Add texts, tables, and images
    # Check that text_summaries is not empty before adding
    if text_summaries:
        add_documents(retriever, text_summaries, texts)
    # Check that table_summaries is not empty before adding
    if table_summaries:
        add_documents(retriever, table_summaries, tables)
    # Check that image_summaries is not empty before adding
    if image_summaries:
        add_documents(retriever, image_summaries, images)

    return retriever



In [59]:
%%capture
!pip install chromadb

In [60]:
# The vectorstore to use to index the summaries
vectorstore = Chroma(
    collection_name="mm_rag_gemini",
    embedding_function=GoogleGenerativeAIEmbeddings(model="models/embedding-001"), # embedding model
)

# Create retriever
retriever_multi_vector_img = create_multi_vector_retriever(
    vectorstore,
    text_summaries,
    texts,
    table_summaries,
    tables,
    image_summaries,
    img_base64_list,
)

  warn_deprecated(


## RAG Pipeline

In [61]:
import io
import re

from IPython.display import HTML, display
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
from PIL import Image


def plt_img_base64(img_base64):
    """Disply base64 encoded string as image"""
    # Create an HTML img tag with the base64 string as the source
    image_html = f'<img src="data:image/jpeg;base64,{img_base64}" />'
    # Display the image by rendering the HTML
    display(HTML(image_html))

def looks_like_base64(sb):
    """Check if the string looks like base64"""
    return re.match("^[A-Za-z0-9+/]+[=]{0,2}$", sb) is not None


def is_image_data(b64data):
    """
    Check if the base64 data is an image by looking at the start of the data
    """
    image_signatures = {
        b"\xFF\xD8\xFF": "jpg",
        b"\x89\x50\x4E\x47\x0D\x0A\x1A\x0A": "png",
        b"\x47\x49\x46\x38": "gif",
        b"\x52\x49\x46\x46": "webp",
    }
    try:
        header = base64.b64decode(b64data)[:8]  # Decode and get the first 8 bytes
        for sig, format in image_signatures.items():
            if header.startswith(sig):
                return True
        return False
    except Exception:
        return False

def resize_base64_image(base64_string, size=(128, 128)):
    """
    Resize an image encoded as a Base64 string
    """
    # Decode the Base64 string
    img_data = base64.b64decode(base64_string)
    img = Image.open(io.BytesIO(img_data))

    # Resize the image
    resized_img = img.resize(size, Image.LANCZOS)

    # Save the resized image to a bytes buffer
    buffered = io.BytesIO()
    resized_img.save(buffered, format=img.format)

    # Encode the resized image to Base64
    return base64.b64encode(buffered.getvalue()).decode("utf-8")

def split_image_text_types(docs):
    """
    Split base64-encoded images and texts
    """
    b64_images = []
    texts = []
    for doc in docs:
        # Check if the document is of type Document and extract page_content if so
        if isinstance(doc, Document):
            doc = doc.page_content
        if looks_like_base64(doc) and is_image_data(doc):
            doc = resize_base64_image(doc, size=(1300, 600))
            b64_images.append(doc)
        else:
            texts.append(doc)
    if len(b64_images) > 0:
        return {"images": b64_images[:1], "texts": []}
    return {"images": b64_images, "texts": texts}



In [62]:
def img_prompt_func(data_dict):
    """
    Join the context into a single string
    """
    formatted_texts = "\n".join(data_dict["context"]["texts"])
    messages = []

    # Adding the text for analysis
    text_message = {
        "type": "text",
        "text": (
            "You are an AI scientist tasking with providing factual answers from Context.\n"
            "You will be given a mixed of text, tables, and image(s) usually of charts or graphs.\n"
            "Use this information to provide answers related to the user question. \n"
            f"User-provided question: {data_dict['question']}\n\n"
            "Text and / or tables:\n"
            f"{formatted_texts}"
        ),
    }
    messages.append(text_message)
    # Adding image(s) to the messages if present
    if data_dict["context"]["images"]:
        for image in data_dict["context"]["images"]:
            image_message = {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{image}"},
            }
            messages.append(image_message)
    return [HumanMessage(content=messages)]

def multi_modal_rag_chain(retriever):
    """
    Multi-modal RAG chain
    """

    # RAG pipeline
    chain = (
        {
            "context": retriever | RunnableLambda(split_image_text_types),
            "question": RunnablePassthrough(),
        }
        | RunnableLambda(img_prompt_func)
        | model_vision  # MM_LLM
        | StrOutputParser()
    )
    return chain

In [63]:
# Create RAG chain
chain_multimodal_rag = multi_modal_rag_chain(retriever_multi_vector_img)

In [96]:
query = """What is the VICE CHANCELLOR'S MESSAGE  Give me One Paragraph on it?"""
docs = retriever_multi_vector_img.get_relevant_documents(query, limit=1)

In [97]:
print(docs)

["VICE CHANCELLOR'S MESSAGE\n\nDespite challenges and difficulties being faced by the administration, a concerted effort, with the help of faculty and staff, is being made to achieve the milestones set for teaching, research, commercialization, entrepreneurship and better learning outcomes in all programs. These efforts have led to improvement in quality of education, services as well as national and international ranking of the University. Moreover, stronger linkages with alumni, industry, Government and international partners are being pursued.\n\n6\n\n7\n\n8\n\n9\n\n10\n\n11\n\n12\n\nIt is a great honor for me to serve my alma mater, UET, which last year celebrated hundred years of excellence in engineering education. The realignment of institute’s vision and mission has led to a rapid growth in research, innovation as well as quality education, which are necessary for technological development in the country and ultimately, financial independence.\n\n13\n\n14\n\n15\n\n16\n\n17\n\nI

In [98]:
split_image_text_types(docs)

{'images': [],
 'texts': ["VICE CHANCELLOR'S MESSAGE\n\nDespite challenges and difficulties being faced by the administration, a concerted effort, with the help of faculty and staff, is being made to achieve the milestones set for teaching, research, commercialization, entrepreneurship and better learning outcomes in all programs. These efforts have led to improvement in quality of education, services as well as national and international ranking of the University. Moreover, stronger linkages with alumni, industry, Government and international partners are being pursued.\n\n6\n\n7\n\n8\n\n9\n\n10\n\n11\n\n12\n\nIt is a great honor for me to serve my alma mater, UET, which last year celebrated hundred years of excellence in engineering education. The realignment of institute’s vision and mission has led to a rapid growth in research, innovation as well as quality education, which are necessary for technological development in the country and ultimately, financial independence.\n\n13\n\n

In [90]:
docs[0]

"VICE CHANCELLOR'S MESSAGE\n\nDespite challenges and difficulties being faced by the administration, a concerted effort, with the help of faculty and staff, is being made to achieve the milestones set for teaching, research, commercialization, entrepreneurship and better learning outcomes in all programs. These efforts have led to improvement in quality of education, services as well as national and international ranking of the University. Moreover, stronger linkages with alumni, industry, Government and international partners are being pursued.\n\n6\n\n7\n\n8\n\n9\n\n10\n\n11\n\n12\n\nIt is a great honor for me to serve my alma mater, UET, which last year celebrated hundred years of excellence in engineering education. The realignment of institute’s vision and mission has led to a rapid growth in research, innovation as well as quality education, which are necessary for technological development in the country and ultimately, financial independence.\n\n13\n\n14\n\n15\n\n16\n\n17\n\nI 

In [91]:
docs[1]

"Postgraduate Prospectus 2023         www.uet.edu.pk       1   \n\nPostgraduate Prospectus 2023\n\n1\n\n1\n\nPostgraduate Prospectus 2023\n\nwww.uet.edu.pk\n\n1 2 3 4 5 6 7 8 9 10 11\n\nVISION\n\n12\n\nTo generate knowledge for global competitive advantage and become\n\n13 14 15 16 17 18 19 20 21 22\n\nA leading world class research university.\n\n23 24 25 26\n\nMISSION\n\n27\n\nTo play a leading role as a university of engineering and technology, in teaching, Innovation and commercialization that is internationally relevant and has a direct bearing on national industrial, technological and socio-economic development.\n\n28\n\n29 30 31 32 33\n\n2\n\nPostgraduate Prospectus 2023\n\nwww.uet.edu.pk\n\n1 2 3 4 5 6 7\n\nCHANCELLOR'S MESSAGE\n\nThe University of Engineering and Technology (UET) Lahore holds a place of eminence among the prestigious engineering universities of the world. Being a pioneering institution of engineering and technology in Pakistan, UET has unlocked all its potenti

In [92]:
# We get back relevant images
docs[2]

'CHAIRPERSONS/ DIRECTORS OF TEACHING DEPARTMENTS/ INSTITUTES\n\nElectrical Engineering PROF. DR. MUHAMMAD TAHIR Computer Science PROF. DR. MUHAMMAD USMAN GHANI KHAN Computer Engineering PROF. DR. ALI HAMMAD AKBAR Mechanical Engineering PROF. DR. NASIR HAYAT Industrial & Manufacturing Engineering PROF. DR. QAISER SALEEM Mechatronics & Control Engineering DR. ALI RAZA Civil Engineering PROF. DR. KHALID FAROOQ Institute of Environmental Engineering & Research PROF. DR. SAJJAD H. SHEIKH Architectural Engineering & Design PROF. DR. SAJJAD MUBIN Transportation Engineering & Management PROF. DR. AMMAD HASSAN KHAN Chemical Engineering PROF. DR. SAIMA YASIN Polymer & Process Engineering PROF. DR. ASIF ALI QAISER\n\n1\n\nDepartment of Geological Engineering DR. MUHAMMAD FAROOQ AHMED Petroleum and Gas Engineering PROF. DR. MUHAMMAD KHURRAM ZAHOOR Metallurgical & Materials Engineering PROF. DR-ING. FURQAN AHMED School of Architecture & Design PROF. DR. RIZWAN HAMEED Architecture DR. MUNAZZA AKHTAR

In [93]:
docs[3]

'Director Students Financial Aid & Career Services PROF.DR. NOOR KHAN\n\nDirector, Al-Khawarizmi Institute of Computer Sciences PROF. DR. WAQAR MAHMOOD\n\n2\n\n3\n\n8\n\nPostgraduate Prospectus 2023\n\nwww.uet.edu.pk\n\n1\n\nACADEMIC CALENDAR (2023-2024)\n\n2\n\nFall Semester'

In [99]:
for doc in docs:
    if is_image_data(doc):
        plt_img_base64(resize_base64_image(doc))
    else:
            print(doc)

VICE CHANCELLOR'S MESSAGE

Despite challenges and difficulties being faced by the administration, a concerted effort, with the help of faculty and staff, is being made to achieve the milestones set for teaching, research, commercialization, entrepreneurship and better learning outcomes in all programs. These efforts have led to improvement in quality of education, services as well as national and international ranking of the University. Moreover, stronger linkages with alumni, industry, Government and international partners are being pursued.

6

7

8

9

10

11

12

It is a great honor for me to serve my alma mater, UET, which last year celebrated hundred years of excellence in engineering education. The realignment of institute’s vision and mission has led to a rapid growth in research, innovation as well as quality education, which are necessary for technological development in the country and ultimately, financial independence.

13

14

15

16

17

I congratulate you for choosing U

In [100]:
chain_multimodal_rag.invoke(query)

"The Vice Chancellor's message highlights the University of Engineering and Technology, Lahore's (UET) commitment to achieving its milestones in teaching, research, commercialization, entrepreneurship, and improving learning outcomes.  The message emphasizes the university's efforts to enhance the quality of education and services, leading to improved national and international rankings.  The Vice Chancellor also emphasizes the importance of strengthening relationships with alumni, industry, government, and international partners. \n"