#### Vishnu_Kadam_MANASHI

## Part A

#### Why general Purpose LLMs are insufficient ?
General purpose LLMs are trained with specific benchmarks in mind. AI labs focus on popular advertisable metrics related to question answersing, coding, maths, etc. The task of mental health and related qa's is not a major priority. This is evident through the technical reports of popular open models like deepseekV2 [1] and kimi-k2 [2].When these general purpose models are used their training forces them to focus more on answering questions and keeping user engaged in a question answer loop instead of prioritizing the mental health of the user. This leads to vague or misleading answers being generated by the models, which can be harmful when interacting with vulnerable individuals. There are various documented instances of this happening[3]. 

---

#### Loss
To reduce the chances of the model ignoring critical clinical mental health facts, we use the following loss formulation during training

We assume there are $N$ clinical categories indexed by $i \in \{1,\dots,N\}$, each assigned an importance rank $i$ ($i = 1$ is least important, $i = N $ is most important). Let $p \in \mathbb{N}$ be a hyperparameter controlling the strength of prioritization.

The unnormalized weight for category $i$ is
$$
w_i = i^{p}.
$$

The normalized weights are
$$
\tilde{w}_i = \frac{i^{p}}{\sum_{j=1}^{N} j^{p}}.
$$

If $\ell_i$ denotes the loss associated with category $i$, the final clinical riskâ€“weighted loss is
$$
\mathcal{L}_{\text{clinical}}
=
\sum_{i=1}^{N}
\tilde{w}_i \, \ell_i
=
\sum_{i=1}^{N}
\left(
\frac{i^{p}}{\sum_{j=1}^{N} j^{p}}
\right)
\ell_i.
$$


---

#### Uncertainty Visualisation
The model uncertainty can be directly integrated as an additional UI layer only available to the clinicians. This will work in the following way.

 Once the text is generated, entropy will be calculated for the output tokens using methods similar to task C and depending on it's value the tokens will be colored  <span style="background: red;">red </span>  <span style="background: yellow;">yellow </span>  <span style="background: green;">green </span> respectively. This overlay can further be optimized such that instead of showing a seperate color for each token, which will lead to a lot of visual clutter, we can calculate the frequency of high entropy tokens per paragraph/sentence, and color the <span style="background: red;">entire paragraph/sentence </span> to show that these sections need more consideration. This way the user can interact with the chat interface similar to any other chat based llms, and the clinicians can quicky glance the underlying uncertainty present in the generated text.

## Part B

Ambiguous speakers are handled by assuming that most question will be asked by the therapist. So if the sentence contains a ? and the speaker is unknown, assume them to be therapist. Overlapping speech is handled by arranging the transcript in ascending time order 

In [None]:
## demo transcripts
# 1. Depression Transcript
transcript_depression = [
    {"time": "09:00", "speaker": "Therapist", "text": "How have things been since we last met?"},
    {"time": "09:01", "speaker": "Client", "text": "I've been feeling so heavy. I haven't slept in 4 days."},
    {"time": "09:02", "speaker": "Therapist", "text": "That's a long time to go without rest. Are you having dark thoughts?"},
    {"time": "09:04", "speaker": "Client", "text": "Sometimes I just want to give up and end it."},
    {"time": "09:05", "speaker": "Therapist", "text": "I'm glad you shared that with me. Let's create a safety plan. We will review your emergency contacts next session."}
]

# 2. Anxiety Transcript
transcript_anxiety = [
    {"time": "14:15", "speaker": "Client", "text": "My heart keeps racing every time I leave the house. I feel anxious constantly."},
    {"time": "14:16", "speaker": "Therapist", "text": "That sounds incredibly exhausting. When did this spike start?"},
    {"time": "14:18", "speaker": "Client", "text": "It's been constant for 3 weeks now. I can barely focus."},
    {"time": "14:20", "speaker": "Therapist", "text": "What specific thoughts go through your mind when the racing starts?"}
]

# 3. Schizophrenia Transcript
transcript_schizophrenia = [
    {"time": "11:30", "speaker": "Client", "text": "The voices have been very loud for 2 days. They won't stop."},
    {"time": "11:31", "speaker": "Therapist", "text": "Are they telling you to hurt yourself?"},
    {"time": "11:32", "speaker": "Client", "text": "No, just whispering criticisms. But it's exhausting to listen to."},
    {"time": "11:35", "speaker": "Therapist", "text": "We'll discuss adjusting your antipsychotic dosage next time."}
]

# 4. Bipolar Disorder Transcript
transcript_bipolar = [
    {"time": "16:00", "speaker": "Client", "text": "I feel amazing! I've been awake for 48 hours working on a new business plan."},
    {"time": "16:01", "speaker": "Therapist", "text": "Being awake that long without feeling tired is something we need to monitor."},
    {"time": "16:02", "speaker": "Client", "text": "Why? I feel better than ever, so calm and powerful."},
    {"time": "16:05", "speaker": "Therapist", "text": "How have you been managing your finances during this burst of energy?"}
]

# 5. OCD Transcript
transcript_ocd = [
    {"time": "10:00", "speaker": "Client", "text": "I spent 5 hours checking the locks on my doors yesterday."},
    {"time": "10:02", "speaker": "Therapist", "text": "That sounds exhausting. What happens if you try to stop checking?"},
    {"time": "10:03", "speaker": "Client", "text": "I get physically sick. I feel terribly anxious until I do it again."},
    {"time": "10:08", "speaker": "Therapist", "text": "Let's practice some grounding and exposure techniques next session."}
]



In [2]:
import json
import re

def process_session(transcript):
    output = {
        "observations" : [],
        "objective_facts" : [],
        "emotional_trajectory": [],
        "risk_flags": [],
        "open_loops_for_next_session": []
    }
    objective_patterns = [r'\b(\d+)\s*(days|weeks|months|years|hours)\b', r'\b(slept|awake)\b']
    risk_keywords = ['suicide', 'end it', 'kill myself', 'give up', 'hurt myself']
    emotion_lexicon = {'anxious': -1, 'racing': -1, 'exhausting': -1, 'better': 1, 'calm': 1}
    # Sort by timestamp to handle overlapping speech/edge cases
    transcript.sort(key=lambda x: x.get('time', '00:00'))
    
    current_emotion_score = 0
    
    for i, turn in enumerate(transcript):
        speaker = turn.get("speaker", "Unknown")
        text = turn.get("text", "").lower()
        time = turn.get("time", "")
        
        # Resolve ambiguous speakers by context (if it's a question, likely therapist)
        if speaker == "Unknown":
            speaker = "Therapist" if "?" in text else "Client"

        if speaker == "Client":
            # 1. Subjective Observations & 2. Objective Facts
            has_objective = False
            for pattern in objective_patterns:
                if re.search(pattern, text):
                    output["objective_facts"].append(f"[{time}] {turn['text']}")
                    has_objective = True
                    break
            
            if not has_objective:
                output["observations"].append(f"[{time}] {turn['text']}")

            # 3. Risk Flags
            if any(risk in text for risk in risk_keywords):
                output["risk_flags"].append(f"URGENT [{time}]: {turn['text']}")
                
            # Track Emotional Trajectory
            for word, score in emotion_lexicon.items():
                if word in text:
                    current_emotion_score += score
                    state = "Negative" if current_emotion_score < 0 else "Positive"
                    output["emotional_trajectory"].append({"time": time, "state": state})

        elif speaker == "Therapist":
            # 4. Open Loops (Questions asked at the end or explicit forward-looking statements)
            if "?" in text and i == len(transcript) - 1:
                output["open_loops_for_next_session"].append(turn['text'])
            if "next time" in text or "next session" in text:
                output["open_loops_for_next_session"].append(turn['text'])

    return json.dumps(output, indent=4)

# Test with the provided input example
transcript_input = [
    {"time": "10:00", "speaker": "Client", "text": "I haven't slept in three days."},
    {"time": "10:01", "speaker": "Therapist", "text": "That sounds exhausting. Are you feeling anxious?"},
    {"time": "10:01", "speaker": "Client", "text": "Yes, my heart keeps racing."}
]

print(process_session(transcript_input))

{
    "observations": [
        "[10:01] Yes, my heart keeps racing."
    ],
    "objective_facts": [
        "[10:00] I haven't slept in three days."
    ],
    "emotional_trajectory": [
        {
            "time": "10:01",
            "state": "Negative"
        }
    ],
    "risk_flags": [],
    "open_loops_for_next_session": []
}


## Part C
### C1
Architecture of LLaMA-3.2-1B information flow during forward pass.
The model has a vocab size of 128256, i.e it recognizes 128k different tokens, and each individual token is a 2048 dimensional vector.

Let's say we input 4 tokens. [101, 102, 104, 108] shape = 4

##### Step 1 : Embedding lookup table
Here, use the indices of the 4 tokens from the input to look up their respective embeddings, which in case of llama-3.2-1B are 2048 dimensional vectors.

Current size: [4 x 2048]

##### Step 2: Transformer layers

Here, first the input goes through a normalization layer (RMSNorm). Size doesn't change.

Llama uses grouped query attention (GQA). In GQA, the dimension of query matrix is different from key and value matrix. Here the dimensions of q_proj, k_proj, and v_proj matrices are 2048 x 2048, 2048 x 512, 2048 x 512 respectively.

So after the matrix multiplications, the q_proj_out  is 4 x 2048 and the k_proj_out and v_proj_out is 4 x 512 respectively. RoPE embeddings are added here.

Llama uses multihead attention, where the process of calculation attention is done in 32 different heads for query part and 8 heads in key and value part.

Note: 32 = 8*4. i.e 4 query heads share  a key-value head (K-V duplicated 4 times ). This is the reason this type of attention is called qrouped query attention.

Note: the q_proj_out has dimension of 4 x 2048, inorder to create 32 heads from this matrix, we reshape the matrix into the shize 32 x 4 x 64. Now we have 32 heads 4 tokens and 64 dimensional embedding. For K and V the matrix size is 8 x 4 x 64.

The attention scores are calculates using $Q \times K^{T}$. So, their dimension is 32 x 4 x 4. After multiplying with V matrix our final output size for a single grouped attention is 32 x 4 x 64. 

The outputs of the  32 heads are flattened to 4 x 2048 and then is multiplied by a new out_proj matrix which functions as a way to mix the information up. out_proj matrix has the size 2048 x 2048. We add the residual connection from the input. The final dimension is still 4 x 2048.

The outputs are then normalized using RMSNorm. Followed by the mlp block containing gate, up and down projection matrices

The outputs are multiplied by the gate 2048 x 8192 and up 2048 x 8192 dimensional matrics parallely. The SilU activation function is applied after gate matrix output and it is added element by element to the up matrix output. The final dimension after this is 4 x 8192. This is followed by a down projection matrix to reduce dimension to the original input size of 4 x 8192. We add another residual connection from the previous part before MLP layer. 


This transformer layer is repeated 16 times and the output of the final layer is normalized. The output dimension still is 4 x 2048. We multiply this with a final lm_head matrix of size 2048 x 128256, this gives logits over our vocab. We do a final softmax of the lm_head matrix layer, and choose the relevent tokens depending on our generation strategy.





### C2

I've chose the final MLP layer. The below code uses two different parameters, threshold and suppression factor. The core idea here is check if the absolute value of the mlp output is greater then the fixed threshold value. If so, we will multiply the output with a small suppression factor < 1. This will reduce highly confident model outputs as these high values when pass through the softmax will lead to high probabilities. 

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

class ClinicalSafetyMLPHook:
    def __init__(self, threshold=5.0, suppression_factor=0.2):
        """
        threshold: The magnitude at which an activation is considered an "outlier" (overconfident).
        suppression_factor: How much to scale down the outlier activation (0.2 = reduce to 20%).
        """
        self.threshold = threshold
        self.suppression_factor = suppression_factor

    def __call__(self, module, input, output):
       
        activation = output[0] if isinstance(output, tuple) else output

        
        mask = torch.where(
            torch.abs(activation) > self.threshold,
            torch.tensor(self.suppression_factor, dtype=activation.dtype, device=activation.device),
            torch.tensor(1.0, dtype=activation.dtype, device=activation.device)
        )
        
        # Apply the filter element-wise
        modified_activation = activation * mask

       

        if isinstance(output, tuple):
            return (modified_activation,) + output[1:]
        return modified_activation


# -------------------------
# 1. Load Model
# -------------------------
model_id = "meta-llama/Llama-3.2-1B-Instruct"
# Load on CPU to ensure it runs easily in a standard notebook environment
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
tokenizer = AutoTokenizer.from_pretrained(model_id)

# -------------------------
# 2. Register the Hook
# -------------------------
# Target the final MLP layer as requested
target_layer = model.model.layers[-1].mlp

handle = target_layer.register_forward_hook(
    ClinicalSafetyMLPHook(threshold=5.0, suppression_factor=0.2)
)

# -------------------------
# 3. Run Forward Pass
# -------------------------
dummy_input = torch.tensor([[101, 202, 303]])

with torch.no_grad():
    output = model(dummy_input)

print("Forward pass completed. Outlier MLP activations successfully gated.")

# Clean up the hook
handle.remove()

Q. Which part of this assignment challenged your understanding the most?
-> Part C: As it made me look into the llama architecture in a greater detail than I have ever done and figure out the ins and outs of the model. I learned at lot about GQA and LlamaMLP(gate, up, down project) layers from this.

Q. If you had one more week to work on this, how would you improve your solutions?
-> The filtering part used in the part C is very primitive. I've checked that it does have some effect in changing the model outputs, but given more time I would have turned my attention on the GQA layer. The idea here would be to first get the attention scores (softmax probabilities) for each head, and filter the high entropy tokens there. This method could be better than current method because even though I use thresholding and llama uses layernorms, it could be that all model outputs after the final mlp layer are above my threshold and I am always supressing my output. Working with softmax probabilities make choseing a correct threshold easier.


Q. In your opinion, what is one aspect of the therapeutic relationship that can never be automated?
-> Humans are social creatures. Even if the job of a clinical therapist can be automated, I believe that the act to opening up to a person and discussing your issues cannot be abstracted away using automation.

###### REFERENCES
1. https://arxiv.org/pdf/2501.12948
2. https://arxiv.org/pdf/2507.20534
3. https://www.bbc.com/news/articles/cgerwp7rdlvo
4. https://chatgpt.com/share/6999d505-d150-8001-93c2-42ced78e908a

###### AI disclosure
In Part A, ChatGTP was used only for latex formatting

In Part B, Gemini was used to create demo data, and code, I have manually checked each line and ran the code to ensure it's correctness.

In Part C, Gemini was used to learn about the llama architecture and for researching papers related to hallucination dampening. I've checked the correctness of the generated code by using it in google collab to verify, if thresholding indeed does have any effect in changing model generated text.