# Semantic Mapping of Mental Health Survey Questions

## Overview

This notebook implements **Method 1 (Baseline): Semantic Mapping** for the mental health dimension reduction project.

The goal of this method is to map diverse mental health survey questions onto a shared set of **eight high-level wellness dimensions** (Emotional, Social, Physical, Occupational, Intellectual, Spiritual, Environmental, Financial) using a **concept-driven, label-free approach**.

Instead of learning dimensions from data, this method defines each wellness dimension through a short natural-language description, which acts as a **semantic prototype (anchor)**. Both survey questions and dimension prototypes are embedded into the same semantic space using a pretrained language model. Each question is then assigned to the most semantically similar dimension(s) based on cosine similarity.


Semantic mapping operationalizes theoretical wellness constructs directly in embedding space, forming a principled bridge between psychological theory and data-driven modeling.

## 2. Data

Each data point corresponds to a **single survey question**, represented with:
- `qid`: the original item identifier (e.g., PSQI_5_3, PSS_12)
- `dataset`: the source questionnaire or scale
- `text`: the question text

All questions are consolidated into a canonical dataset (`questions_master`) during preprocessing to ensure:
- No modification of raw source files
- Consistent identifiers across methods
- Reproducibility across models and experiments


In [1]:
from utils import load_questions
import pandas as pd
import numpy as np
from pathlib import Path

df = load_questions()
df.head()

Unnamed: 0,qid,dataset,text
0,CD_RISC_1,CD-RISC,I am able to adapt when changes occur.
1,CD_RISC_2,CD-RISC,I have one close and secure relationship.
2,CD_RISC_3,CD-RISC,Sometimes fate or God helps me.
3,CD_RISC_4,CD-RISC,I can deal with whatever comes my way.
4,CD_RISC_5,CD-RISC,Past successes give me confidence.


In [2]:
print("Total questions:", len(df))
df["dataset"].value_counts()

Total questions: 145


dataset
PWS        36
CD-RISC    25
PERMA      23
PSS        23
UCLA       20
PWB        18
Name: count, dtype: int64

## 3. Approach (Baseline)
Each wellness dimension is represented by a short natural-language description, which serves as a semantic prototype.

Both survey questions and dimension prototypes are embedded into a shared semantic space using a pretrained sentence encoder.  
Questions are mapped to the closest dimension prototypes based on cosine similarity, with a margin-based rule allowing multi-dimensional assignments when concepts overlap.

This method requires no labeled data and provides an interpretable, theory-aligned baseline for comparison with clustering and supervised approaches.

In [3]:
dimensions = [
    "Emotional: Coping effectively with life stressors, maintaining self-esteem, expressing optimism, and being aware of, accepting, and appropriately expressing a full range of emotions in oneself and others.",

    "Environmental: Honoring the dynamic relationship with social, natural, built, and digital environments, and engaging with spaces that are safe, nurturing, stimulating, and sustainable.",

    "Financial: Meeting basic needs, managing financial resources responsibly, making informed financial decisions, setting realistic financial goals, and preparing for short- and long-term needs or emergencies.",

    "Intellectual: Engaging in lifelong learning, expanding knowledge and skills, interacting with the world through curiosity and problem-solving, and thinking critically while exploring new ideas.",

    "Occupational: Deriving personal satisfaction and enrichment from work, study, hobbies, or volunteer activities that align with oneâ€™s values, goals, and lifestyle, and taking a proactive approach to career development.",

    "Physical: Supporting physical health through physical activity, sleep, nutrition, preventive care, and low-risk behaviors related to substance use and overall health maintenance.",

    "Social: Connecting with others and communities in meaningful ways, maintaining a strong support system, engaging in constructive dialogue, and fostering a sense of belonging, inclusion, and mattering.",

    "Spiritual: Seeking purpose and meaning in life, practicing self-reflection and gratitude, extending compassion toward others, and cultivating harmony with personal values and the broader world."
]

In [4]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(df["text"].tolist())
embeddings.shape
dim_embeddings = model.encode(dimensions)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
import numpy as np

# 1) similarity matrix (N, 8)
S = model.similarity(embeddings, dim_embeddings)
S = np.asarray(S)

print(S.shape)

# 2) dimension names
dim_names = [d.split(":", 1)[0].strip() for d in dimensions]

sorted_idx = np.argsort(S, axis=1)[:, ::-1]

top1 = sorted_idx[:, 0]
top2 = sorted_idx[:, 1]

df["dim_top1"] = [dim_names[i] for i in top1]
df["dim_top1_score"] = S[np.arange(len(df)), top1]

# 4) margin-based multi-label
margin = 0.05   

dim_multi = []
dim_multi_scores = []

for i in range(len(df)):
    s1 = S[i, top1[i]]
    s2 = S[i, top2[i]]

    labels = [dim_names[top1[i]]]
    scores = [float(s1)]

    if s1 - s2 <= margin:
        labels.append(dim_names[top2[i]])
        scores.append(float(s2))

    dim_multi.append(labels)
    dim_multi_scores.append(scores)

df["dim_multi"] = dim_multi
df["dim_multi_scores"] = dim_multi_scores

df[["text", "dim_top1", "dim_top1_score", "dim_multi", "dim_multi_scores"]].head(10)

(145, 8)


Unnamed: 0,text,dim_top1,dim_top1_score,dim_multi,dim_multi_scores
0,I am able to adapt when changes occur.,Intellectual,0.16211,"[Intellectual, Occupational]","[0.16210977733135223, 0.13327734172344208]"
1,I have one close and secure relationship.,Social,0.246298,[Social],[0.24629847705364227]
2,Sometimes fate or God helps me.,Spiritual,0.334456,[Spiritual],[0.33445626497268677]
3,I can deal with whatever comes my way.,Social,0.253442,"[Social, Environmental]","[0.25344187021255493, 0.21439789235591888]"
4,Past successes give me confidence.,Occupational,0.222193,[Occupational],[0.22219298779964447]
5,I try to see the humorous side of things when ...,Intellectual,0.195248,"[Intellectual, Emotional]","[0.19524841010570526, 0.18635015189647675]"
6,Having to cope with stress can make me stronger.,Emotional,0.341348,[Emotional],[0.3413480520248413]
7,"I tend to bounce back after illness, injury, o...",Physical,0.268132,[Physical],[0.268131822347641]
8,I believe most things happen for a reason.,Spiritual,0.153251,"[Spiritual, Emotional]","[0.1532508134841919, 0.14529845118522644]"
9,"I make my best effort, no matter what.",Occupational,0.238592,"[Occupational, Intellectual]","[0.238592267036438, 0.21458716690540314]"


In [10]:
df["dim_top1"].value_counts()

dim_top1
Emotional        35
Social           31
Occupational     31
Physical         22
Spiritual        13
Intellectual     11
Environmental     1
Financial         1
Name: count, dtype: int64

4. Analysis

In [11]:
from pathlib import Path

OUT = Path("../../results/mapping")
OUT.mkdir(parents=True, exist_ok=True)

df.to_csv(OUT / "v1_allMiniLM_dimdesc_top1.csv", index=False)

In [14]:
for dim in df["dim_top1"].unique():
    print(f"\n=== {dim} ===")
    sub = df[df["dim_top1"] == dim].sort_values(
        by="dim_top1_score", ascending=False
    )
    for _, row in sub.head(5).iterrows():
        print(f"- ({row['dim_top1_score']:.3f}) {row['text']}")


=== Intellectual ===
- (0.564) In the past, I have generally found intellectual challenges to be vital to my overall well-being.
- (0.471) Generally, I feel pleased with the amount of intellectual stimulation I receive in my daily life.
- (0.431) I prefer to take the lead in problem-solving.
- (0.379) For me, life has been a continuous process of learning, changing, and growth.
- (0.350) My interests and ideas are not shared by those around me

=== Social ===
- (0.456) To what extent do you receive help and support from others when you need it?
- (0.408) I feel isolated from others
- (0.365) My social relationships are superficial
- (0.342) I feel shut out and excluded by others
- (0.329) Members of my family come to me for support.

=== Spiritual ===
- (0.548) To what extent do you lead a purposeful and meaningful life?
- (0.428) I believe there is a real purpose for my life.
- (0.420) I have a strong sense of purpose in life.
- (0.378) I think it is important to have new experiences