## **Step 1: Setup the Environment**

In [1]:
!pip install langchain
!pip install transformers
!pip install faiss-cpu  # For similarity search
!pip install datasets

Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.10 (from langchain)
  Downloading langchain_core-0.3.10-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.132-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4.0,>=0.3.10->langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting httpx<1,>=0.23.0 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading orjson-3.10.7-cp310-cp310

Some important sub fucntions might not be downloaded or install properly therefore perform the given command

In [11]:
!pip install -U langchain-community


Collecting langchain-community
  Downloading langchain_community-0.3.2-py3-none-any.whl.metadata (2.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.5.2-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.22.0-py3-none-any.whl.metadata (7.2 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloa

This library is mainly used to create embeddings

In [15]:
!pip install sentence_transformers

Collecting sentence_transformers
  Downloading sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.1.1-py3-none-any.whl (245 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence_transformers
Successfully installed sentence_transformers-3.1.1


## **Step 2: Import Libraries**

Import the necessary libraries into your notebook:

In [12]:
import os
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import HuggingFacePipeline
from datasets import load_dataset
from transformers import pipeline


## **Step 3: Load the Dataset**

Load the dataset from Hugging Face using the datasets library:

In [17]:
from langchain.schema import Document  # Import Document class

# Load the dataset from Hugging Face
dataset = load_dataset("parsi-ai-nlpclass/Psychology_RAG")

# Convert to a list of documents using the Document class
documents = [
    Document(page_content=item["question"], metadata={"answer": item["answer"]})
    for item in dataset["train"]
]


## **Step 4: Create the Vector Store**

Create a vector store using FAISS with Hugging Face embeddings:

In [18]:
# Initialize the Hugging Face embeddings model
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Create the FAISS vector store
vector_store = FAISS.from_documents(documents, embeddings)


In [19]:
# Create the FAISS vector store
vector_store = FAISS.from_documents(documents, embeddings)


## **Step 5: Set Up the Language Model**

Instead of using OpenAI, use a Hugging Face model for generating responses. Here, we can use a text generation model like distilgpt2:

In [23]:
# Initialize the Hugging Face text generation pipeline
generator = pipeline("text-generation", model="distilgpt2",max_new_tokens=50)

# Create a custom HuggingFacePipeline for LangChain
llm = HuggingFacePipeline(pipeline=generator)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


## **Step 6: Set Up the RetrievalQA Chain**

Set up the RetrievalQA chain using the vector store and the language model:

In [24]:
# Create the RetrievalQA chain
retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever(),
)

## **Step 7: Query the Model**

Now you can use the model to answer questions based on the dataset:

In [26]:
# Ask a question
query = "A 6-year-old girl has difficulty making eye contact and often repeats phrases she hears from her favorite TV show."
response = retrieval_qa.run(query)

print("Response:", response)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Response: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

A 6-year-old girl has difficulty making eye contact and often repeats phrases she hears from her favorite TV show. She becomes extremely distressed when her daily routine is disrupted.

A 6-year-old girl has difficulty making eye contact and often repeats phrases she hears from her favorite TV show. She becomes extremely distressed when her daily routine is disrupted.

A 7-year-old girl has significant delays in language development and prefers to play alone, often spinning objects repeatedly.

A 7-year-old girl has significant delays in language development and prefers to play alone, often spinning objects repeatedly.

Question: A 6-year-old girl has difficulty making eye contact and often repeats phrases she hears from her favorite TV show.
Helpful Answer: A 7-year-old girl currently needs medication to help trea