<a href="https://colab.research.google.com/github/Alfred9/Exploring-LLMs/blob/main/Generative-AI-Intensive/Document_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install -U -q "google-generativeai>=0.8.3" chromadb
%pip install PyPDF2



In [2]:
import google.generativeai as genai
from IPython.display import Markdown

In [3]:
from google.colab import auth, userdata
import google.generativeai as genai

auth.authenticate_user()
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
print("Google GenAI API configured.")

Google GenAI API configured.


In [4]:
for m in genai.list_models():
    if "embedContent" in m.supported_generation_methods:
        print(m.name)

models/embedding-001
models/text-embedding-004


In [5]:
from google.colab import files

# Upload the file interactively
uploaded = files.upload()

# If you know the filename
file_name = list(uploaded.keys())[0]

from PyPDF2 import PdfReader

# Read the uploaded PDF
reader = PdfReader(file_name)
pdf_text = ""
for page in reader.pages:
    pdf_text += page.extract_text() + "\n"
print(f"Content of {file_name}:\n")
print(pdf_text)

Saving Foundational Large Language models _ text generation.pdf to Foundational Large Language models _ text generation (3).pdf
Content of Foundational Large Language models _ text generation (3).pdf:

Foundational 
Large Language 
Models & 
Text Generation
Authors: Mohammadamin Barektain,  
Anant Nawalgaria, Daniel J. Mankowitz,  
Majd Al Merey, Yaniv Leviathan, Massimo Mascaro,  
Matan Kalman, Elena Buchatskaya,                                     
Aliaksei Severyn, and Antonio Gulli
Foundational Large Language Models & Text Generation2
September 2024Acknowledgements
Reviewers and Contributors
Adam Sadvovsky
Yonghui Wu
Andrew Dai
Efi Kokiopolou
Chuck Sugnet
Aleksey Vlasenko
Erwin Huizenga
Curators and Editors
Antonio Gulli
Anant Nawalgaria
Grace Mollison 
Technical Writer
Mark Iverson
Designer
Michael Lanning 

Introduction  6
Why language models are important 7
Large language models 8
 Transformer 9
  Input preparation and embedding 11
  Multi-head attention  12
   Understanding sel

In [6]:
documents = [pdf_text]

In [9]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry


class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        retry_policy = {"retry": retry.Retry(predicate=retry.if_transient_error)}

        response = genai.embed_content(
            model="models/text-embedding-004",
            content=input,
            task_type=embedding_task,
            request_options=retry_policy,
        )
        return response["embedding"]

In [10]:
import chromadb

DB_NAME = "googlecardb"
embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=documents, ids=[str(i) for i in range(len(documents))])

In [11]:
db.count()


1

In [12]:
# You can peek at the data too.
#db.peek(1)

In [13]:
from IPython.display import Markdown

def generate_answer_from_query(db, embed_fn, query):
    """
    Generates an answer from a query using Chroma DB and a generative model.

    Args:
        db: The Chroma database object to query.
        embed_fn: The embedding function for adjusting document mode.
        query: The user-provided question (string).

    Returns:
        str: The generated answer in Markdown format.
    """

    # Switch to query mode when generating embeddings
    embed_fn.document_mode = False

    # Search the Chroma DB using the specified query
    result = db.query(query_texts=[query], n_results=1)
    [[passage]] = result["documents"]

    # Format the query and passage for the prompt
    passage_oneline = passage.replace("\n", " ")
    query_oneline = query.replace("\n", " ")

    # Create the prompt for the generative model
    prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below.
    Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.
    However, you are talking to a non-technical audience, so be sure to break down complicated concepts and
    strike a friendly and conversational tone. If the passage is irrelevant to the answer, you may ignore it.

    QUESTION: {query_oneline}
    PASSAGE: {passage_oneline}
    """

    # Initialize the generative model
    model = genai.GenerativeModel("gemini-1.5-flash-latest")  # Initialize the correct model as per your setup

    # Generate the answer using the generative model
    answer = model.generate_content(prompt)

    # Return the answer in Markdown format
    return Markdown(answer.text)

In [14]:
# Assuming `db` and `embed_fn` are initialized as per your setup:
query = "What can you tell me about the PaLM model?"
answer = generate_answer_from_query(db, embed_fn, query)

# Display the result (Markdown formatting)
display(answer)

The Pathways Language Model (PaLM) is a large language model created by Google AI.  It boasts 540 billion parameters and was trained on a massive dataset of text and code.  At the time of its release, PaLM achieved state-of-the-art results on many language benchmarks, demonstrating impressive capabilities in areas like common sense reasoning, arithmetic, joke explanation, code generation, and translation.


In [15]:
query = "What are some of the applications of Foundational models in Healthcare?"
answer = generate_answer_from_query(db, embed_fn, query)

# Display the result (Markdown formatting)
display(answer)

The provided text focuses on the technical aspects of foundational large language models (LLMs) and doesn't offer specific applications of these models in healthcare.  Therefore, I cannot answer your question using the given text.


In [17]:
query = "What can you tell me about the evolution of transfromers in 300 words?"
answer = generate_answer_from_query(db, embed_fn, query)

# Display the result (Markdown formatting)
display(answer)

The evolution of transformers in large language models (LLMs) has been remarkable.  Initially, the transformer architecture, introduced in the "Attention is all you need" paper in 2017, was a sequence-to-sequence model with an encoder and decoder.  This design was used in GPT-1 (2018), which pioneered the use of unsupervised pre-training on a massive text dataset and demonstrated the potential of transformers for various tasks like text generation and translation. BERT (2018), an encoder-only model, focused on deep contextual understanding through masked language modeling.  

Subsequent models like GPT-2 (2019) and GPT-3/3.5/4 (2020-2024) significantly increased in size (parameter count and training data), leading to more coherent and versatile text generation capabilities.  Google's LaMDA (2021) specialized in dialogue, while other models like Gopher and Chinchilla refined training techniques and scaling laws, focusing on dataset quality and compute efficiency.  PaLM (2022) and PaLM 2 (2023) highlighted efficient scaling using Google's Pathways system.  Finally, Google's Gemini (2023) represents the current state-of-the-art, introducing multimodality (processing various data types such as text, images, and video).  Open-source models like LLaMA and Mixtral further contributed to the field's advancement by providing accessible alternatives.  This journey showcases a continuous increase in model size and capability, coupled with advancements in training methodologies and efficiency improvements.


In [18]:
query = "What can you tell me about ptompt engineering techniques?"
answer = generate_answer_from_query(db, embed_fn, query)

# Display the result (Markdown formatting)
display(answer)

Prompt engineering is the art and science of crafting effective prompts to get the desired response from a large language model (LLM).  This involves techniques like providing clear instructions, offering examples (few-shot prompting), or even giving a step-by-step reasoning process (chain-of-thought prompting).  There's also zero-shot prompting, where the LLM is given only the instructions and relies on its existing knowledge.  The goal is to guide the LLM's behavior and elicit the most relevant and useful output.
