<a href="https://colab.research.google.com/github/Samiya-AW/LLM-Based-Cognitive-Reframing-for-Negative-Thoughts/blob/main/LLM_based_Cognitive_Reframing_for_Negative_Thoughts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Retrieval-Based Cognitive Reframing System**

In [1]:
!pip install sentence-transformers transformers accelerate

# Install Sentence Transformers for creating text embeddings
!pip install sentence-transformers



In [2]:
# Load data set
# Prepare dataset for retrieval
# Combine situation + thought

import pandas as pd

url = "https://raw.githubusercontent.com/behavioral-data/Cognitive-Reframing/refs/heads/main/data/reframing_dataset.csv"
df = pd.read_csv(url)

df["combined_text"] = df["situation"] + " " + df["thought"]
df.head()


Unnamed: 0,situation,thought,reframe,thinking_traps_addressed,combined_text
0,A Roomate of mine stole my comptuer,Someone I trusted stole something valuable of ...,"My roommate stole something of mine, and I wil...",emotional reasoning,A Roomate of mine stole my comptuer Someone I ...
1,A Roomate of mine stole my comptuer,Someone I trusted stole something valuable of ...,While I would like there to be consequences fo...,emotional reasoning,A Roomate of mine stole my comptuer Someone I ...
2,"A few days ago, I got angry at my husband's gr...",She doesn't respect me.,She is older and may have been tired,overgeneralizing,"A few days ago, I got angry at my husband's gr..."
3,"A few days ago, I got angry at my husband's gr...",She doesn't respect me.,"I felt disrespected by her actions, but that d...",overgeneralizing,"A few days ago, I got angry at my husband's gr..."
4,A friend who is a recent widower has started d...,My friend is ignoring his recently-deceased wife.,Maybe my friend is in a healthy spot to date n...,disqualifying the positive,A friend who is a recent widower has started d...


In [3]:
# Load embedding model

from sentence_transformers import SentenceTransformer, util
import torch

embed_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Compute embeddings for all thoughts i.e. vectorize the dataset

corpus_embeddings = embed_model.encode(
    df["combined_text"].tolist(),
    convert_to_tensor=True
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
# Prepare dataset for retrieval
# Combine situation + thought

df["combined_text"] = df["situation"] + " " + df["thought"]

# Compute embeddings for all thoughts i.e. vectorize the dataset
corpus_embeddings = embed_model.encode(df["combined_text"].tolist(), convert_to_tensor=True)

In [5]:
# Generation of retrieval-augmented prompt with similarity score display

def get_best_match(user_situation, user_thought):
    query = user_situation + " " + user_thought
    query_embedding = embed_model.encode(query, convert_to_tensor=True)

    scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
    best_idx = torch.argmax(scores)
    best_score = torch.max(scores)

    return int(best_idx), float(best_score)

In [6]:
# Generate LLM-based reframe

def retrieve_reframe(user_situation, user_thought):
    idx, score = get_best_match(user_situation, user_thought)
    matched_row = df.iloc[idx]

    return {
        "matched_thought": matched_row["thought"],
        "retrieved_reframe": matched_row["reframe"],
        "similarity_score": score
    }

In [7]:
# Test the reframe

situation = "I failed my exam."
thought = "I am a complete failure."

result = retrieve_reframe(situation, thought)

print("Most Similar Thought:", result["matched_thought"])
print("Retrieved Rational Reframe:", result["retrieved_reframe"])
print("Similarity Score:", result["similarity_score"])

Most Similar Thought: I'll never be able to do anything
Retrieved Rational Reframe: Failure is a part of the process. I can learn and grow from this. I still have time to succeed.
Similarity Score: 0.6181330680847168


In [8]:
# Randomly sample 20 examples for evaluation
test_df = df.sample(20, random_state=42).reset_index(drop=True)

# Evaluate retrieval similarity
def evaluate_retrieval(test_dataframe):
    similarity_scores = []

    for _, row in test_dataframe.iterrows():
        situation = row["situation"]
        thought = row["thought"]

        _, score = get_best_match(situation, thought)
        similarity_scores.append(score)

    return similarity_scores

# Run evaluation

scores = evaluate_retrieval(test_df)

print("Average Similarity:", sum(scores)/len(scores))
print("Max Similarity:", max(scores))
print("Min Similarity:", min(scores))

Average Similarity: 1.000000062584877
Max Similarity: 1.0000001192092896
Min Similarity: 0.9999999403953552


In [9]:
exact_matches = 0

for i, row in test_df.iterrows():
    idx, _ = get_best_match(row["situation"], row["thought"])
    if df.iloc[idx]["thought"] == row["thought"]:
        exact_matches += 1

print("Exact Thought Retrieval Accuracy:", exact_matches / len(test_df))

Exact Thought Retrieval Accuracy: 1.0
