### Task 1: Dealing with the Data
You identify the following important documents that, if used for context, you believe will help people understand what’s happening now:
Your boss, the SVP of Technology, green-lighted this project to drive the adoption of AI throughout the enterprise.  It will be a nice showpiece for the upcoming conference and the big AI initiative announcement the CEO is planning.

https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf

https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf


Your boss, the SVP of Technology, green-lighted this project to drive the adoption of AI throughout the enterprise.  It will be a nice showpiece for the upcoming conference and the big AI initiative announcement the CEO is planning.

In [7]:
%pip install -qU langgraph langchain langchain_openai langchain_experimental

Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install -qU --disable-pip-version-check qdrant-client pymupdf tiktoken

Note: you may need to restart the kernel to use updated packages.


### RAGAS DEPENDENCIES

In [9]:
%pip install -qU langsmith langchain-qdrant ragas
%pip install langchain-community>=0.3.0,<0.4.0
%pip install langchain-core>=0.3.0,<0.4.0

Note: you may need to restart the kernel to use updated packages.
/bin/bash: 0.4.0: No such file or directory
Note: you may need to restart the kernel to use updated packages.
/bin/bash: 0.4.0: No such file or directory
Note: you may need to restart the kernel to use updated packages.


In [5]:
import os
import getpass
from dotenv import load_dotenv
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [9]:
from langchain.document_loaders import PyMuPDFLoader

docB = PyMuPDFLoader("https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf").load()
docN = PyMuPDFLoader("https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf").load()

documents = docB + docN
print(f"Loaded {len(documents)} documents")
print(f"Loaded {documents[:1]} documents")

Loaded 137 documents
Loaded [Document(metadata={'source': 'https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf', 'file_path': 'https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf', 'page': 0, 'total_pages': 73, 'format': 'PDF 1.6', 'title': 'Blueprint for an AI Bill of Rights', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Adobe Illustrator 26.3 (Macintosh)', 'producer': 'iLovePDF', 'creationDate': "D:20220920133035-04'00'", 'modDate': "D:20221003104118-04'00'", 'trapped': ''}, page_content=' \n \n \n \n \n \n \n \n \n \nBLUEPRINT FOR AN \nAI BILL OF \nRIGHTS \nMAKING AUTOMATED \nSYSTEMS WORK FOR \nTHE AMERICAN PEOPLE \nOCTOBER 2022 \n')] documents


In [11]:
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter

def tiktoken_len(text):
    tokens = tiktoken.encoding_for_model("gpt-4o-mini").encode(
        text,
    )
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 300,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

split_chunks = text_splitter.split_documents(documents)

print(f"Split {len(split_chunks)} chunks")

Split 363 chunks


In [12]:
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

In [19]:
from langchain_community.vectorstores import Qdrant

qdrant_vectorstore = Qdrant.from_documents(
    split_chunks,
    embedding_model,
    location=":memory:",
    collection_name="extending_context_window_llama_3",
)



In [23]:
qdrant_retriever = qdrant_vectorstore.as_retriever()

In [31]:
from langchain_core.prompts import ChatPromptTemplate

RAG_PROMPT = """
CONTEXT:
{context}

QUERY:
{question}

You are a helpful assistant. Use the available context to answer the question only. If the question is not in the context then, say 'I don't know brah!'.
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

In [27]:
from langchain_openai import ChatOpenAI

openai_chat_model = ChatOpenAI(model="gpt-4o-mini")

In [33]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser

rag_chain = (
    {"context": itemgetter("question") | qdrant_retriever, "question": itemgetter("question")}
    | rag_prompt | openai_chat_model | StrOutputParser()
)

In [37]:
rag_chain.invoke({"question" : "tell about CBRN Information or Capabilities?"})

'CBRN Information or Capabilities refer to information or capabilities related to Chemical, Biological, Radiological, and Nuclear (CBRN) threats. It is important to periodically evaluate the potential misuse of such information or capabilities, as well as offensive cyber capabilities. Establishing policies, procedures, and processes for risk measurement in the context of CBRN information is essential, which includes standardized measurement protocols and structured public feedback exercises, such as AI red-teaming or independent external evaluations. Furthermore, it encompasses governance and oversight concerning dangerous, violent, or hateful content, as well as obscene, degrading, and/or abusive content, and harmful bias and homogenization.'

### TASK 3: RAGAS FRAMEWORK - GENERATION SYNTHETIC DATA


In [10]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

generator_llm = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-experimental 0.3.0 requires langchain-community<0.4.0,>=0.3.0, but you have langchain-community 0.2.17 which is incompatible.
langchain-experimental 0.3.0 requires langchain-core<0.4.0,>=0.3.0, but you have langchain-core 0.2.41 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
