<a href="https://colab.research.google.com/github/ShraddhaSharma24/Natural-Language-Processing/blob/main/Basic_RAG_System_using_Transformers_%2B_FAISS_%2B_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [20]:
!pip install sentence-transformers faiss-cpu transformers datasets




In [1]:
mental_health_docs = [
    "Depression is a mental disorder characterized by a persistent feeling of sadness and loss of interest.",
    "Cognitive Behavioral Therapy (CBT) helps people manage their problems by changing negative thought patterns.",
    "Anxiety disorders involve excessive fear or worry and include panic disorder, social anxiety, and phobias.",
    "Mindfulness and meditation can help reduce symptoms of stress and improve overall mental well-being.",
    "Sleep hygiene involves habits that help you get a good night's sleep, which is crucial for mental health."
]


In [2]:
from sentence_transformers import SentenceTransformer

embed_model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = embed_model.encode(mental_health_docs)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [3]:
import faiss
import numpy as np

dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(doc_embeddings))


In [4]:
query = "How can I manage anxiety symptoms?"
query_embedding = embed_model.encode([query])

# Get top 3 most relevant documents
D, I = index.search(np.array(query_embedding), k=3)
retrieved_docs = [mental_health_docs[i] for i in I[0]]


In [5]:
from transformers import pipeline

generator = pipeline("text2text-generation", model="facebook/bart-large")

context = " ".join(retrieved_docs)
prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"

response = generator(prompt, max_length=100, do_sample=True)[0]['generated_text']
print("ðŸ’¬ Chatbot Response:", response)


config.json:   0%|          | 0.00/1.63k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.02G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.02G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


ðŸ’¬ Chatbot Response: Context: Anxiety disorders involve excessive fear or worry and include panic disorder, social anxiety, and phobias. Cognitive Behavioral Therapy (CBT) helps people manage their problems by changing negative thought patterns. Mindfulness and meditation can help reduce symptoms of stress and improve overall mental well-being.Question: How can I manage anxiety symptoms?Answer:


**Using Langchain**

In [21]:
!pip install -U langchain-community





In [22]:
mental_health_docs = [
    "Depression is a common mental disorder that affects mood, thoughts, and daily functioning.",
    "Cognitive Behavioral Therapy (CBT) is effective for treating anxiety and depression.",
    "Symptoms of anxiety include restlessness, increased heart rate, and excessive worry.",
    "Talking to a mental health professional can provide support and strategies for recovery.",
    "Mindfulness and breathing exercises can help reduce symptoms of stress and anxiety."
]


In [23]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore.document import Document

# Split and prepare docs
docs = [Document(page_content=doc) for doc in mental_health_docs]
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
split_docs = text_splitter.split_documents(docs)

# Embed and store in FAISS
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(split_docs, embedding_model)


In [24]:
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

# Text generation pipeline (lightweight model for now)
generator = pipeline("text-generation", model="distilgpt2")
llm = HuggingFacePipeline(pipeline=generator)


Device set to use cpu


In [25]:
from langchain.chains import RetrievalQA

retriever = vectorstore.as_retriever()
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)


In [26]:
while True:
    query = input("You: ")
    if query.lower() in ["exit", "quit"]:
        break
    answer = qa_chain.run(query)
    print("Bot:", answer.strip())


You: Is CBT helpful?


  answer = qa_chain.run(query)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


ValueError: Input length of input_ids is 121, but `max_length` is set to 50. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.