## RAG

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

## create .env file or replace directly 
LLAMA_CLOUD_API_KEY = os.getenv("LLAMA_CLOUD_API_KEY")
GROQ_API_KEY = os.environ.get('GROQ_API_KEY') 



### PDF Preprocessing 

##### why use llamaParse 
LlamaParse vs. other pdf loaders over the Apple 10K filing\
green correctly retrieved red not
![LlamaParse](images/h4r.drawio.png) 

In [11]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("HAI_AI-Index-Report-20242.pdf")
pages = loader.load_and_split()

In [12]:
pages[49].page_content

'50\nArtificial Intelligence\nIndex Report 2024 Chapter 1 Preview Table of ContentsCompute Trends\nThe term “compute” in AI models denotes the \ncomputational resources required to train and operate \na machine learning model. Generally, the complexity \nof the model and the size of the training dataset \ndirectly influence the amount of compute needed. \nThe more complex a model is, and the larger the \nunderlying training data, the greater the amount of \ncompute required for training.\nFigure 1.3.6 visualizes the training compute required for notable machine learning models in the last \n20 years. Recently, the compute usage of notable \nAI models has increased exponentially.6 This \ntrend has been especially pronounced in the last \nfive years. This rapid rise in compute demand \nhas critical implications. For instance, models \nrequiring more computation often have larger \nenvironmental footprints, and companies typically \nhave more access to computational resources \nthan acade

In [4]:
for i in range(3):
    print(pages[i].metadata)

{'source': 'HAI_AI-Index-Report-20242.pdf', 'page': 0}
{'source': 'HAI_AI-Index-Report-20242.pdf', 'page': 1}
{'source': 'HAI_AI-Index-Report-20242.pdf', 'page': 2}


In [3]:
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse  

parser = LlamaParse(
    api_key=LLAMA_CLOUD_API_KEY,  # can also be set in your env as LLAMA_CLOUD_API_KEY or set it up here
    result_type="markdown", ## text
    language='en'
)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
documents = LlamaParse(result_type="markdown").load_data("./HAI_AI-Index-Report-20242.pdf")

Started parsing the file under job_id 14def8fa-9195-4e3d-ad12-1d2f07230b66
.

In [7]:
documents[0]



In [8]:
print(documents[0].text[0:2000])

     Artificial
 Intelligence
Index Report
          2024
---
## Artificial Intelligence Index Report 2024

Introduction to the AI Index Report 2024

Welcome to the seventh edition of the AI Index report. The 2024 Index is our most comprehensive to date and arrives at an important moment when AI’s influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ever before, this edition introduces new estimates on AI training costs, detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI’s impact on science and medicine.

The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order 

In [9]:
len(documents[0].text)

692460

#### Chuncking & Vector DB

Semantic Chunking: At a high level, this splits into sentences, then groups into groups of 3 sentences, and then merges one that are similar in the embedding space.

In [13]:
### not used pyparser chuncks 

from langchain_experimental.text_splitter import SemanticChunker
 ##### smeantic better quality chucnks 

semantic_chunker2 = SemanticChunker(gpt4all_embd)
#
semantic_chunks2 = semantic_chunker2.create_documents([d.page_content for d in pages])

In [14]:
from langchain_community.vectorstores import Chroma

vectorstoreS2 = Chroma.from_documents(semantic_chunks2, 
                                     collection_name="ragAI",
                                     embedding=gpt4all_embd)
retriever2 = vectorstoreS2.as_retriever() 

In [37]:
len(semantic_chunks2)

968

In [67]:
semantic_chunks2[10].page_content

'The data is in: AI makes workers more productive and leads to higher quality work. In \n2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more \nquickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge \nthe skill gap between low- and high-skilled workers. Still, other studies caution that using AI without proper \noversight can lead to diminished performance. 5'

In [2]:
from langchain_community.embeddings import GPT4AllEmbeddings
gpt4all_embd = GPT4AllEmbeddings()

In [5]:
from langchain_experimental.text_splitter import SemanticChunker
 ##### smeantic better quality chucnks 

semantic_chunker = SemanticChunker(gpt4all_embd)
#
semantic_chunks = semantic_chunker.create_documents([d.text for d in documents])
#

In [12]:
len(semantic_chunks)

197

In [13]:
semantic_chunks[196].page_content

'(2023). “Learning From Prepandemic Data to Forecast Viral Escape.” Nature 622: 818–25. https://doi.org/10.1038/s41586-023-06617-0\n\n## Table of Contents\n\n## Appendix\n\n490\n---\n## Chapter 6: Education\n\nCode.org\n\nState-Level Data\n\nThe following link includes a full description of the methodology used by Code.org to collect its data. The staff at Code.org also maintains a database of the state of American K–12 education and, in this policy primer, provides a greater amount of detail on the state of American K–12 education in each state. AP Computer Science Data\n\nThe AP Computer Science data is provided to Code.org as per an agreement the College Board maintains with Code.org. The AP Computer Science data comes from the college board’s national and state summary reports. Access to Computer Science Education\n\nData on access to computer science education was drawn from Code.org’s State of Computer Science Education 2023 report. Computing Research Association (CRA Taulbee Sur

In [17]:
from langchain_community.vectorstores import Chroma

vectorstoreS3 = Chroma.from_documents(semantic_chunks, 
                                     collection_name="ragAI",
                                     embedding=gpt4all_embd)
retriever = vectorstoreS3.as_retriever() 

In [71]:
retriever = vectorstoreS.as_retriever(search_kwargs={"k": 2}) ## search_kwargs={"k": 1}

In [18]:
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_groq import ChatGroq



llm = ChatGroq(temperature=0, model_name="mixtral-8x7b-32768",groq_api_key =GROQ_API_KEY ) ### replace or use from above 


### Contextualize question ###
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)


### Answer question ###
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


### Statefully manage chat history ###
store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [19]:
conversational_rag_chain.invoke(
    {"input": " What is the most expenisve AI model to train ?"},
    config={
        "configurable": {"session_id": "123"}
    },  
)["answer"]

"Based on the provided context, Google's Gemini Ultra is the most expensive AI model to train, having cost an estimated $191 million for compute."

In [21]:
conversational_rag_chain.invoke(
    {"input": "What is this about ?"},
    config={
        "configurable": {"session_id": "123"}
    }, 
)["answer"]

'The provided context is about the escalating costs of training state-of-the-art AI models. It highlights that the training costs for models like OpenAI’s GPT-4 and Google’s Gemini Ultra have reached unprecedented levels, with the latter costing an estimated $191 million for compute. This trend has significant implications for AI research, as it has become increasingly expensive for institutions like universities to develop their own leading-edge foundation models.'

### Lang Graph

not used/ done

In [83]:
retriever = vectorstoreS.as_retriever() ## search_kwargs={"k": 1}

In [22]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain import hub
from langchain_core.output_parsers import StrOutputParser


llm = ChatGroq(temperature=0, model_name="mixtral-8x7b-32768",groq_api_key =GROQ_API_KEY ) ### replace or use from above 


## grader
prompt = PromptTemplate(
    template="""You are a grader assessing relevance of a retrieved document to a user question. \n 
    Here is the retrieved document: \n\n {document} \n\n
    Here is the user question: {question} \n
    If the document contains keywords related to the user question, grade it as relevant. \n
    It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. \n
    Provide the binary score as a JSON with a single key 'score' and no premable or explanation.""",
    input_variables=["question", "document"],
)

retrieval_grader = prompt | llm | JsonOutputParser()



### Generate
# Prompt
prompt = hub.pull("rlm/rag-prompt")

# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
# Chain
rag_chain = prompt | llm | StrOutputParser()

In [23]:
from typing_extensions import TypedDict
from typing import List
import json


class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        documents: list of documents
    """

    question: str
    generation: str
    documents: List[str]


from langchain.schema import Document


def retrieve(state):
    """
    Retrieve documents

    Args:
        state (dict): The current graph state

    Returns:ß
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    question = state["question"]

    # Retrieval
    documents = retriever.get_relevant_documents(question)
    return {"documents": documents, "question": question}


def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]

    # RAG generation
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}


def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with only filtered relevant documents
    """

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        print(f"Score for document: {score}")
        try:
            grade = score["score"]  # directly access the 'score' key
        except KeyError:
            print("Error: score is not a valid JSON object or missing 'score' key")
            grade = None

        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")

    return {"documents": filtered_docs, "question": question}




In [24]:
from langgraph.graph import END, StateGraph

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generate

# Build graph
workflow.set_entry_point("retrieve")

workflow.add_edge("retrieve", "grade_documents")
workflow.add_edge("grade_documents", "generate")  # New edge
workflow.add_edge("generate", END)

# Compile
app = workflow.compile()

In [25]:
from pprint import pprint

# Run
inputs = {"question": "Which AI model took most to time train?"}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        pprint(f"Node '{key}':")
        # Optional: print full state at each node
        # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint("\n---\n")

# Final generation
pprint(value["generation"])

---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
Score for document: {'score': 'no'}
---GRADE: DOCUMENT NOT RELEVANT---
Score for document: {'score': 'no'}
---GRADE: DOCUMENT NOT RELEVANT---
Score for document: {'score': 'no'}
---GRADE: DOCUMENT NOT RELEVANT---
Score for document: {'score': 'yes'}
---GRADE: DOCUMENT RELEVANT---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
"Node 'generate':"
'\n---\n'
('Based on the provided context, the document discusses the relationship '
 'between the complexity of AI models, the size of training datasets, and the '
 'compute required for training. However, it does not provide specific '
 'information about which AI model took the most time to train.')
