In [19]:
from datasets import load_dataset

# Load the dataset
ds = load_dataset("ShenLab/MentalChat16K")

# Preview
print(ds)
print(ds['train'][0])  # View a sample

DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 16084
    })
})
{'instruction': "You are a helpful mental health counselling assistant, please answer the mental health questions based on the patient's description. \nThe assistant gives helpful, comprehensive, and appropriate answers to the user's questions. ", 'input': "I've been struggling with my mental health for a while now, and I can't seem to find a way to cope with it. I've tried visualization, positive thinking, and even medication, but nothing seems to work. I've been feeling lost and helpless, and I don't know what to do next. My mind is a whirlwind of thoughts and emotions, and I can't seem to make sense of it all. I feel like I'm drowning in a sea of confusion, and I can't seem to find my way out.", 'output': "I understand that you've been dealing with a sense of confusion and chaos in your thoughts and emotions for some time now. It's been a challenging journey, an

In [20]:
from langchain.schema import Document

docs = []

if 'ds' in globals():
    for row in ds["train"]:
        user_input = row['input']
        bot_reply = row['output']
        
        docs.append(Document(
            page_content=f"Q: {user_input}\nA: {bot_reply}",
            metadata={"source": "MentalChat16K"}
        ))
else:
    print("Please run the cell that loads the dataset into 'ds' first.")


In [21]:
print(len(docs))
print(docs[1].page_content)


16084
Q: I've been feeling overwhelmed with my caregiving responsibilities, and it's been a struggle to balance these duties with my personal relationships. I've tried to communicate my limitations to my friends and church members, but they don't seem to understand or respect my boundaries. I've been dealing with high anxiety levels, which makes it even harder for me to focus on my own needs. I've tried to take care of myself, but it feels like an insurmountable task.
A: Your situation is complex, and it's important to acknowledge the challenges you're facing. Balancing caregiving responsibilities with personal relationships can be a delicate dance, and it's common to encounter resistance when setting boundaries. I want to help you explore strategies for communicating your needs more effectively and setting clearer boundaries. Additionally, I see that your anxiety levels are significantly impacting your ability to focus on self-care. We can work together to identify the root causes of 

In [22]:
from langchain.schema import Document

# This is your list of original Q&A docs
docs = [
    Document(
        page_content=f"Q: {row['input']}\nA: {row['output']}",
        metadata={"source": "MentalChat16K"}
    )
    for row in ds["train"]
]


In [23]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=500,       # ~100–150 tokens; adjust if needed
    chunk_overlap=50,     # small overlap keeps context
    length_function=len,
)

chunked_docs = text_splitter.split_documents(docs)
chunked_docs[:5]

Created a chunk of size 818, which is longer than the specified 500
Created a chunk of size 565, which is longer than the specified 500
Created a chunk of size 656, which is longer than the specified 500
Created a chunk of size 505, which is longer than the specified 500
Created a chunk of size 571, which is longer than the specified 500
Created a chunk of size 537, which is longer than the specified 500
Created a chunk of size 628, which is longer than the specified 500
Created a chunk of size 920, which is longer than the specified 500
Created a chunk of size 530, which is longer than the specified 500
Created a chunk of size 693, which is longer than the specified 500
Created a chunk of size 517, which is longer than the specified 500
Created a chunk of size 723, which is longer than the specified 500
Created a chunk of size 533, which is longer than the specified 500
Created a chunk of size 562, which is longer than the specified 500
Created a chunk of size 556, which is longer tha

[Document(metadata={'source': 'MentalChat16K'}, page_content="Q: I've been struggling with my mental health for a while now, and I can't seem to find a way to cope with it. I've tried visualization, positive thinking, and even medication, but nothing seems to work. I've been feeling lost and helpless, and I don't know what to do next. My mind is a whirlwind of thoughts and emotions, and I can't seem to make sense of it all. I feel like I'm drowning in a sea of confusion, and I can't seem to find my way out."),
 Document(metadata={'source': 'MentalChat16K'}, page_content="A: I understand that you've been dealing with a sense of confusion and chaos in your thoughts and emotions for some time now. It's been a challenging journey, and it's commendable that you've tried various approaches like visualization, positive thinking, and medication to manage your symptoms. However, it's clear that these methods haven't been effective for you. It's essential to acknowledge that mental health issues

In [24]:
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA



In [27]:
prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""
    You are a helpful and empathetic mental health assistant.

    Your task is to ONLY answer the user's question using the context provided below.
    You are NOT allowed to use prior knowledge, external facts, or make assumptions.
    If the context is missing or irrelevant to the question, respond strictly with:
    "I'm sorry, I couldn't find information to help with that."

    Always respond in English.

    Context:
    {context}

    Question:
    {question}

    Answer:""".strip()
    )

In [28]:
# Create embeddings using HuggingFaceBgeEmbeddings
huggingface_embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",      
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

In [29]:
import  numpy as np
print(np.array(huggingface_embeddings.embed_query(chunked_docs[0].page_content)))
print(np.array(huggingface_embeddings.embed_query(chunked_docs[0].page_content)).shape)

[ 1.00227945e-01 -4.58985902e-02 -1.32571990e-02  1.67972464e-02
  6.06495626e-02  6.65660650e-02  1.67618468e-02  3.89710926e-02
  3.90161015e-02 -8.79539922e-02 -8.89472589e-02 -5.11576384e-02
 -1.75623950e-02 -1.07180374e-02  2.54784822e-02  5.85092138e-03
 -9.63734612e-02  1.73081588e-02 -4.11460884e-02  7.88984522e-02
 -8.08711648e-02  6.51032925e-02 -5.90799823e-02  1.20766293e-02
 -6.76950961e-02  1.32015005e-01 -8.15813318e-02 -4.25389744e-02
 -4.89839725e-02 -3.44678611e-02  1.03357740e-01 -6.29847944e-02
 -8.28355029e-02  3.73950936e-02  5.79311848e-02  3.87340635e-02
 -9.92183313e-02  6.07130490e-02  2.47510653e-02  2.03340556e-02
 -1.43517144e-02  4.69626784e-02  1.57353580e-02  4.45137471e-02
  3.43918204e-02 -6.38742894e-02 -6.22050427e-02  7.69000426e-02
  4.95540537e-02 -1.10678524e-01 -8.88385326e-02  2.15603132e-02
  1.31380502e-02  2.21031215e-02  3.21648014e-03  6.72739372e-02
  9.97070745e-02  1.46289002e-02 -4.40492816e-02  6.29532635e-02
  5.15529402e-02 -8.23065

In [30]:
# Create a Chroma vector store
from langchain_community.vectorstores import Chroma

db = Chroma(
    collection_name="example_collection",
    embedding_function=huggingface_embeddings,
    persist_directory="./chroma_db",  # Where to save data locally, remove if not necessary
)
db.persist()

In [31]:
# Recreate the embedding model
embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Reload vector store from disk
db = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embedding
)


In [32]:
retriever = db.as_retriever()


In [37]:
from langchain.chains import RetrievalQA
from langchain_community.llms import Ollama

llm = Ollama(model="phi3:3.8b")

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,  # retriever is already defined in cell 13
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)


In [36]:
query = "I'm having frequent panic attacks. How can I cope?"

result = qa_chain.invoke({"query": query})
print(result['result'])


To help manage your panic attacks, consider implementing these strategies into your routine for coping:

1. **Breathing Exercises** - Practice deep breathing techniques like the 4-7-8 method to calm your nervous system during an attack. Find a quiet space and breathe in slowly through your nose while counting to four, hold it gently for seven seconds at the peak of your chest or belly expansion without holding air tight inside your lungs, then exhale completely through your mouth making a whoosh sound as you count back to eight.

2. **Grounding Techniques** - Use grounding techniques such as focusing on physical sensations around you like the feel of clothing against skin or sounds in the room that can bring awareness away from fear and into the present moment, which may help reduce panic intensity.

3. **Mindfulness Meditation** – Engage in mindfulness meditation to train your attention on the here-and-now experience without judgment. This practice might improve how you respond emotio