1. PromptTemplate

The PromptTemplate class in LangChain is used to create custom prompts for your language model. A prompt template allows you to define a general structure for the questions or tasks you're going to ask the model, with placeholders for variables. This is particularly useful when you want to ensure consistency in how questions are posed to the model.

In [1]:
from langchain.prompts import PromptTemplate

template = """Given the following context, answer the question:

Context: {context}
Question: {question}
Answer:"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


2. RetrievalQA

RetrievalQA is a chain in LangChain designed to handle question-answering tasks by first retrieving relevant documents from a vector store and then passing those documents to the language model for answering the question. It combines document retrieval and language model inference into one seamless process.

    Steps Involved:
        Document Retrieval: The system searches for relevant documents in a vector store (e.g., Pinecone) based on the query.
        Answer Generation: The retrieved documents are then used as context for the language model to generate an answer.

In [2]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=my_language_model, 
    retriever=my_vector_store.as_retriever(),
    chain_type="stuff"  # This is a simple chain type that directly uses the documents retrieved
)
response = qa.run("What is the capital of France?")


NameError: name 'my_language_model' is not defined

3. HuggingFaceEmbeddings

HuggingFaceEmbeddings is used to convert text into dense vector embeddings using models from Hugging Face. These embeddings are essential for tasks like document retrieval, where similar documents are identified based on their vector representations.

In [3]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector = embeddings.embed_text("This is a sample sentence.")


ModuleNotFoundError: Module langchain_community.embeddings not found. Please install langchain-community to access this module. You can install it using `pip install -U langchain-community`

. Pinecone Vector Store

Pinecone is a scalable vector database that allows you to store and search large numbers of vector embeddings efficiently. You can use it to store document embeddings and retrieve similar documents based on a query.

In [1]:
#setup
import pinecone

pinecone.init(api_key="YOUR_API_KEY", environment="us-west1-gcp")
index = pinecone.Index("my-index")


ModuleNotFoundError: No module named 'pinecone'

In [5]:
# storing vectors
from langchain.vectorstores import Pinecone

vector_store = Pinecone(embedding_function=embeddings.embed_text, index_name="my-index")
vector_store.add_texts(["Document 1", "Document 2", ...])


ModuleNotFoundError: Module langchain_community.vectorstores not found. Please install langchain-community to access this module. You can install it using `pip install -U langchain-community`

In [2]:
# retrieving vectors
similar_documents = vector_store.similarity_search("What is AI?")


NameError: name 'vector_store' is not defined

5. Document Loaders (PyPDFLoader, DirectoryLoader)

LangChain provides various document loaders to ingest documents into your system. PyPDFLoader is used to load PDF files, and DirectoryLoader can be used to load all files in a directory.

In [3]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("example.pdf")
documents = loader.load()


ModuleNotFoundError: Module langchain_community.document_loaders not found. Please install langchain-community to access this module. You can install it using `pip install -U langchain-community`

In [4]:
from langchain.document_loaders import DirectoryLoader

loader = DirectoryLoader("path/to/directory", glob="**/*.pdf")
documents = loader.load()


ModuleNotFoundError: Module langchain_community.document_loaders not found. Please install langchain-community to access this module. You can install it using `pip install -U langchain-community`

6. Text Splitter (RecursiveCharacterTextSplitter)

Documents often need to be split into smaller chunks for more effective processing, especially when dealing with long texts. The RecursiveCharacterTextSplitter breaks down large texts into manageable chunks, ensuring that each chunk is complete and coherent.

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)


NameError: name 'documents' is not defined

CTransformers (Language Model)

CTransformers is used to interface with transformer models in LangChain. This can be any model from the Hugging Face hub or others supported by the framework.

In [6]:
from langchain.llms import CTransformers

model = CTransformers(model_name="gpt2", model_type="text-generation")
response = model.generate("Tell me a joke.")


ModuleNotFoundError: No module named 'langchain_community'

In [7]:
#end to end implementation
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import CTransformers
import pinecone

# Initialize Pinecone
pinecone.init(api_key="YOUR_API_KEY", environment="us-west1-gcp")
index = pinecone.Index("my-index")

# Load documents from a directory
loader = DirectoryLoader("path/to/documents", glob="**/*.pdf")
documents = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

# Initialize embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = Pinecone(embedding_function=embeddings.embed_text, index_name="my-index")

# Add chunks to vector store
vector_store.add_documents(chunks)

# Define LLM
model = CTransformers(model_name="gpt2", model_type="text-generation")

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=model, 
    retriever=vector_store.as_retriever(), 
    chain_type="stuff"
)

# Ask a question
response = qa_chain.run("What is the main theme of the document?")
print(response)


ModuleNotFoundError: Module langchain_community.embeddings not found. Please install langchain-community to access this module. You can install it using `pip install -U langchain-community`