<a href="https://www.kaggle.com/code/gouthamvarmaindukuri/mind-companion?scriptVersionId=208276006" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [9]:
# Install dependencies
!pip install -q transformers accelerate bitsandbytes peft
!pip install -q huggingface_hub

In [10]:
# Download dataset directly
!wget https://huggingface.co/datasets/Amod/mental_health_counseling_conversations/resolve/main/combined_dataset.json

--2024-11-18 22:30:43--  https://huggingface.co/datasets/Amod/mental_health_counseling_conversations/resolve/main/combined_dataset.json
Resolving huggingface.co (huggingface.co)... 3.165.160.59, 3.165.160.11, 3.165.160.61, ...
Connecting to huggingface.co (huggingface.co)|3.165.160.59|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4790520 (4.6M) [text/plain]
Saving to: 'combined_dataset.json.1'


2024-11-18 22:30:43 (17.5 MB/s) - 'combined_dataset.json.1' saved [4790520/4790520]



In [11]:
# Add at the start of your code
import os
import warnings
warnings.filterwarnings("ignore")

# Set environment variables
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["JAX_DISABLE_FORK"] = "1"

# Update torch amp settings
import torch
torch.amp.GradScaler = lambda *args, **kwargs: torch.amp.GradScaler("cuda", *args, **kwargs)

In [4]:
import os
import json
import pandas as pd
from datasets import Dataset
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch
from huggingface_hub import login

In [5]:
# Print GPU info
!nvidia-smi

Mon Nov 18 22:30:24 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   34C    P8             10W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00

In [6]:
# Clear any existing memory
import gc
import torch
gc.collect()
torch.cuda.empty_cache()

In [7]:
# Load JSONL file
data = []
with open('combined_dataset.json', 'r') as f:
    for line in f:
        try:
            data.append(json.loads(line.strip()))
        except json.JSONDecodeError:
            continue

# Convert to pandas DataFrame
df = pd.DataFrame(data)

In [8]:
# Print sample to verify data
print("Sample data:")
print(df.head())
print("\nColumns:", df.columns.tolist())

Sample data:
                                             Context  \
0  I'm going through some things with my feelings...   
1  I'm going through some things with my feelings...   
2  I'm going through some things with my feelings...   
3  I'm going through some things with my feelings...   
4  I'm going through some things with my feelings...   

                                            Response  
0  If everyone thinks you're worthless, then mayb...  
1  Hello, and thank you for your question and see...  
2  First thing I'd suggest is getting the sleep y...  
3  Therapy is essential for those that are feelin...  
4  I first want to let you know that you are not ...  

Columns: ['Context', 'Response']


In [12]:
# Split into train/val
train_df = df.sample(frac=0.8, random_state=42)
val_df = df.drop(train_df.index)

# Convert to HF datasets
train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)

In [13]:
print(f"\nTraining examples: {len(train_dataset)}")
print(f"Validation examples: {len(val_dataset)}")


Training examples: 2810
Validation examples: 702


In [14]:
# Set your Hugging Face token
HF_TOKEN = "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"  # Replace with your token
login(token=HF_TOKEN)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [15]:
# Format conversations
def format_conversation(example):
    return {
        'text': f"User: {example['Context']}\nAssistant: {example['Response']}"
    }

train_dataset = train_dataset.map(format_conversation)
val_dataset = val_dataset.map(format_conversation)

print("\nSample formatted conversation:")
print(train_dataset[0]['text'])

Map:   0%|          | 0/2810 [00:00<?, ? examples/s]

Map:   0%|          | 0/702 [00:00<?, ? examples/s]


Sample formatted conversation:
User: I've hit my head on walls and floors ever since I was young. I sometimes still do it but I don't exactly know why,    I have anxiety and I had a rough childhood but now I'll start to hit my head and sometimes not realize it but I don't know how to stop or even why I'm doing it.    How can I help myself to change my behavior?
Assistant: The best way to handle anxiety of this level is with a combination of appropriate medication given to you by a medical doctor, and therapy to help you understand the thoughts, feelings, and behaviors that are causing the anxiety. This is not something that anyone should just “white knuckle” and try to get through on their own with no help. Cognitive Behavioral Therapy is a technique that has been proven helpful for depression and anxiety. This takes a therapist trained in CBT. You will learn to recognize when and why you perform the behavior of hitting your head, help you deal with the underlying cause of this, and r

In [16]:
# Get current device
device = torch.cuda.current_device()

In [17]:
# Configure 4-bit quantization with maximum memory savings
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

In [18]:
# Initialize model and tokenizer
print("\nInitializing model and tokenizer...")
model_name = "google/gemma-2b-it"
tokenizer = AutoTokenizer.from_pretrained(
    model_name, 
    token=HF_TOKEN,
    trust_remote_code=True
)


Initializing model and tokenizer...


tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

In [19]:
# Model loading with different memory settings
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    token=HF_TOKEN,
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
    use_cache=False
)

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [20]:
# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Get PEFT model
model = get_peft_model(model, lora_config)
print("\nTrainable parameters:")
model.print_trainable_parameters()


Trainable parameters:
trainable params: 3,686,400 || all params: 2,509,858,816 || trainable%: 0.1469


In [21]:
# Tokenize datasets
def tokenize(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=512
    )

print("\nTokenizing datasets...")
tokenized_train = train_dataset.map(tokenize, batched=True, remove_columns=train_dataset.column_names)
tokenized_val = val_dataset.map(tokenize, batched=True, remove_columns=val_dataset.column_names)


Tokenizing datasets...


Map:   0%|          | 0/2810 [00:00<?, ? examples/s]

Map:   0%|          | 0/702 [00:00<?, ? examples/s]

In [22]:
# Memory optimization environment variables
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"  # Simplified memory config

In [23]:
# Training arguments - balanced optimization
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=1,        # Minimal batch size
    per_device_eval_batch_size=1,         
    gradient_accumulation_steps=32,       # Increased to compensate for small batch
    warmup_steps=50,                    
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=20,                   
    eval_strategy="epoch",              # Keep evaluation, but only per epoch
    save_strategy="epoch",             
    load_best_model_at_end=True,       # Keep this for best model
    gradient_checkpointing=True,
    report_to="tensorboard",           # Keep tensorboard reporting
    remove_unused_columns=False,
    learning_rate=3e-4,                
    fp16=True,                         
    max_grad_norm=0.3,                 
    optim="paged_adamw_32bit",
    lr_scheduler_type="cosine",        
    dataloader_num_workers=0,
    gradient_checkpointing_kwargs={"use_reentrant": False}
)
# Additional model loading parameters
model_kwargs = {
    "device_map": "auto",
    "max_memory": {0: "10GB"},  # Limit memory usage
    "torch_dtype": torch.float16
}

In [24]:
# Initialize trainer
print("\nInitializing trainer...")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
)

# Train with error handling
print("\nStarting training...")
try:
    trainer.train()
except Exception as e:
    print(f"Error during training: {str(e)}")
    # Free up memory
    torch.cuda.empty_cache()
    raise e

# Save trained model
print("\nSaving model...")
trainer.model.save_pretrained("./final_model_lora")


Initializing trainer...

Starting training...


Epoch,Training Loss,Validation Loss
0,2.4887,2.445267
1,2.2786,2.308133
2,2.1453,2.282104



Saving model...


In [25]:
def generate_response(prompt, max_length=256):
    try:
        formatted_prompt = (
            f"User: {prompt}\n"
            "Assistant: I hear you, and what you're feeling is valid. You're not alone in this, and there are ways to help. "
            "Let me share some supportive suggestions that might help you feel better. "
        )
        
        inputs = tokenizer(formatted_prompt, return_tensors="pt", truncation=True, max_length=max_length).to(model.device)
        
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.6,
            do_sample=True,
            top_p=0.85,
            top_k=40,
            no_repeat_ngram_size=3,
            repetition_penalty=1.3,
            length_penalty=1.1
        )
        
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = response.replace(formatted_prompt, "")
        
        # Combined list of patterns to remove
        patterns_to_remove = [
            # Endings/Signatures
            "Please contact", "Best,", "Best regards", "Sincerely",
            "Dr.", "Licensed", "Certified", "Therapist", "Counselor",
            "I hope this helps", "Remember,", "reach out", ":)", "💫",
            "Best wishes", "Take care", "Warm regards", "Contact me",
            "For more information", "Feel free to",
            
            # Assumptions/References
            "you mentioned", "you said", "already", "as we discussed",
            "years in", "my suggestion", "I am", "my experience",
            "If you are in", "please contact", "call", "website",
            "helpline", "1-800", "1-", "800-", "www.", "http"
        ]
        
        for pattern in patterns_to_remove:
            if pattern.lower() in response.lower():
                response = response.split(pattern)[0]
        
        return response.strip()
        
    except Exception as e:
        return f"Error generating response: {str(e)}"
        
# Test examples
test_prompts = [
    "I've been feeling really anxious lately about work.",
    "I can't sleep at night because of stress.",
    "I feel lonely and isolated."
]

print("\nTesting model with example prompts:")
for prompt in test_prompts:
    response = generate_response(prompt)
    print(f"\nUser: {prompt}")
    print(f"Assistant: {response}")

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)



Testing model with example prompts:

User: I've been feeling really anxious lately about work.
Assistant: The first thing that comes to mind is that you may be experiencing a lot of stress at work. The workplace can be very stressful, especially if you have a demanding job. If you find yourself working late or taking breaks often, it's possible that you've reached your limit and need to take a break from the workplace. If you do find yourself needing a break, try to avoid caffeine and alcohol. Both of these substances can worsen anxiety.  Another suggestion is to practice relaxation techniques. This could include yoga, meditation, or progressive muscle relaxation. These practices can help to calm your nerves and lower your heart rate.  Finally, if you'd like to talk about your concerns, I would encourage you to see a therapist. A therapist can help you to identify the source of your anxiety and develop coping mechanisms.  Best of luck! Robin, LPC/IST/RPT/CEDS/CP/BCTCP/CBT-LP/

User: I

In [26]:
# save both model and tokenizer
output_dir = "./supportive-ai-model"

# Save model
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"Model and tokenizer saved to {output_dir}")

Model and tokenizer saved to ./supportive-ai-model


In [27]:
!pip install gradio --quiet

In [29]:
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the saved model and tokenizer
model_path = "./supportive-ai-model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

def chat_response(message, history):
    # Format the prompt similar to training data
    formatted_prompt = f"User: {message}\nAssistant: "
    
    inputs = tokenizer(formatted_prompt, return_tensors="pt", truncation=True, max_length=512)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    # Generate response
    outputs = model.generate(
        **inputs,
        max_length=512,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        top_p=0.85,
        top_k=40,
        no_repeat_ngram_size=3,
        repetition_penalty=1.3,
        pad_token_id=tokenizer.eos_token_id
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only the assistant's response
    response = response.split("Assistant: ")[-1].strip()
    
    return response

# Create Gradio Interface
demo = gr.ChatInterface(
    fn=chat_response,
    title="Mind Companion",
    description="A supportive AI assistant trained to provide empathetic responses to mental health concerns. Please note: This is not a replacement for professional mental health support.",
    theme="soft",
    examples=[
        "I've been feeling really anxious lately about work.",
        "I can't sleep at night because of stress.",
        "I feel lonely and isolated."
    ]
)

# Launch the interface
if __name__ == "__main__":
    demo.launch(share=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

* Running on local URL:  http://127.0.0.1:7861
* Running on public URL: https://c713cabe3fa36e2e7d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
