### CTSE Lecture Notes Load

### Code Explanation

The following code snippet is used to load all PDF files from a specific folder (`./ctse_lectures`) into a list called `docs`.

1. **Importing Libraries:**
    - `os`: A module that provides a way to interact with the operating system. Here, it is used to list the files in the given directory.
    - `PyPDFLoader`: A class from the `langchain_community.document_loaders` module that is used to load PDF documents.

2. **Setting Folder Path:**
    - The folder containing the PDF files is specified as `./ctse_lectures`.

3. **Initialize List `docs`:**
    - An empty list `docs` is initialized to store the loaded documents.

4. **Looping through Files in the Folder:**
    - The code loops through each file in the directory `./ctse_lectures` using `os.listdir()`.
    - It checks whether the file ends with the `.pdf` extension using the `endswith()` method.

5. **Loading PDFs:**
    - If the file is a PDF, it uses `PyPDFLoader` to load the PDF and then extends the `docs` list with the contents of the loaded PDF.

---

### Final Outcome
After running this code, all the PDF documents in the folder will be loaded into the `docs` list, and you can use the documents for further processing or analysis.


In [1]:
import os
from langchain_community.document_loaders import PyPDFLoader

# Load all PDFs from the folder
folder_path = "./ctse_lectures"
docs = []

for filename in os.listdir(folder_path):
    if filename.endswith(".pdf"):
        loader = PyPDFLoader(os.path.join(folder_path, filename))
        docs.extend(loader.load())


### Creating a Vector Store

#### 1. **Updated Imports**
The code imports the necessary libraries:
- `FAISS` from `langchain_community.vectorstores`: Used for creating a vector store.
- `OpenAIEmbeddings` from `langchain_community.embeddings`: Used to generate embeddings with OpenAI.
- `RecursiveCharacterTextSplitter` from `langchain_text_splitters`: Used to split documents into chunks.
- `load_dotenv` from `dotenv`: Used to load environment variables from a `.env` file.
- `os`: Used to interact with the operating system, particularly to fetch environment variables.

#### 2. **Loading Environment Variables from the `.env` file**
The `load_dotenv()` function is called to load environment variables, specifically the OpenAI API key, from the `.env` file. This is a secure way to manage sensitive data such as API keys.

#### 3. **Retrieve OpenAI API Key**
The OpenAI API key is fetched using the `os.getenv()` function by specifying the environment variable `OPENAI_API_KEY` that contains the API key. This ensures the key is not hard-coded into the script.

#### 4. **Splitting Documents into Chunks**
The `RecursiveCharacterTextSplitter` is used to break documents into smaller chunks. The `chunk_size` is set to 1000 characters, and the `chunk_overlap` is set to 100 characters, ensuring that documents are split into manageable pieces for processing.

#### 5. **Creating the Vector Store**
The `OpenAIEmbeddings` class is used to create embeddings from the split documents. These embeddings are then stored in a **FAISS** vector store, allowing for efficient retrieval and similarity searches.

---

### Outcome:
After running this code, the documents are split into chunks, embeddings are generated, and they are stored in the FAISS vector store. You can now use this vector store to perform similarity searches and retrieve relevant information.


In [2]:
# Updated imports
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv  
import os

# Load environment variables from .env file
load_dotenv()

# Retrieve OpenAI API Key from environment variable
openai_api_key = os.getenv("OPENAI_API_KEY")

# Split documents into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_docs = splitter.split_documents(docs)

# Create vector store
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
vectorstore = FAISS.from_documents(split_docs, embeddings)

  embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)


###  Querying the Vector Store with OpenAI API using RetrievalQA

#### 1. **Updated Imports**
The following libraries are imported:
- **RetrievalQA** from `langchain.chains`: This is used to create a Question-Answering (QA) system based on a retrieval model.
- **ChatOpenAI** from `langchain_openai`: This is used to create the OpenAI model that generates responses using the OpenAI API.
- **load_dotenv** from `dotenv`: This is used to load environment variables from a `.env` file, which helps keep sensitive information like API keys secure.
- **os**: This standard Python module is used to interact with the operating system, specifically to fetch the environment variables.

#### 2. **Loading Environment Variables from `.env` File**
The `load_dotenv()` function loads environment variables from a `.env` file. This ensures that the API key is stored securely and not hardcoded into the script.

#### 3. **Retrieving the OpenAI API Key**
The OpenAI API key is retrieved from the environment variables using `os.getenv()`. The key is stored in the variable `openai_api_key`, which is passed to the `ChatOpenAI` class.

#### 4. **Setting Up the OpenAI Model and Retriever**
The `ChatOpenAI` class is initialized with the API key to create an LLM (Language Model). The retriever is created from the `vectorstore`, which will be used to retrieve relevant documents based on queries.

#### 5. **Setting Up the RetrievalQA Chain**
The `RetrievalQA` chain is created by combining the LLM and the retriever. This allows the system to perform question answering based on the retrieved documents.

#### 6. **Example Query**
The code sends an example query: `"What is the CAP THEOREM?"` to the `qa.run()` method, which uses the LLM and the retriever to generate a response.

#### 7. **Formatting the Response**
Finally, the response is formatted using `textwrap.fill()` to ensure that the output text is displayed neatly, with a specified width of 100 characters.

---

### Final Outcome:
After executing this code, the system will return an answer to the query "What is the CAP THEOREM?" by retrieving relevant documents, processing them with the OpenAI model, and displaying the result in a formatted manner.


In [3]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
import textwrap
from dotenv import load_dotenv  # Import dotenv to load environment variables
import os

# Load environment variables from .env file
load_dotenv()

# Retrieve OpenAI API Key from environment variable
openai_api_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(openai_api_key=openai_api_key)
retriever = vectorstore.as_retriever()

qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

# Example query
response = qa.run("What is the CAP THEOREM?")
print(textwrap.fill(response, width=100))


  response = qa.run("What is the CAP THEOREM?")


The CAP Theorem, also known as Brewer's Theorem, is a fundamental concept in distributed systems. It
states that in a distributed system, it is impossible to simultaneously achieve all three of the
following properties: consistency, availability, and partition tolerance. Instead, a system can only
have at most two out of these three properties at any given time.
