Mental Health Conversation Analysis

This project analyzes conversations between users and psychologists related to mental health topics using Natural Language Processing (NLP) techniques. The analysis aims to identify patterns, extract insights, and develop models that could potentially support automated mental health guidance systems.

Dataset Source

The dataset used in this project is sourced from the Mental Health Counseling Conversations dataset on Hugging Face. This dataset contains:

3,512 conversations between users and psychologists
Questions covering a wide range of mental health topics
Professional responses from qualified psychologists
All data is anonymized and contains no personally identifiable information

The dataset is particularly valuable for:

Training and fine-tuning language models for mental health advice
Analyzing patterns in mental health conversations
Developing automated mental health guidance systems

Features

Text Preprocessing: Cleaning, tokenization, and lemmatization of conversation data
Sentiment Analysis: Analysis of emotional content in both user questions and professional responses
Topic Modeling: Latent Dirichlet Allocation (LDA) to identify key topics in conversations
Text Embeddings: Multiple embedding approaches including:
- Word2Vec
- LDA-based embeddings
- BERT embeddings (both pre-trained and fine-tuned)
Document Similarity: Comparison of different similarity models
Response Generation: Framework for generating response recommendations based on similar conversations

Requirements

R Packages

install.packages(c(
  "jsonlite",
  "tidyverse",
  "tidytext",
  "wordcloud",
  "tm",
  "topicmodels",
  "text2vec",
  "sentimentr",
  "ggplot2",
  "textdata",
  "bit",
  "reticulate"
))

Python Dependencies

pip install sentence-transformers torch pandas sklearn datasets numpy tqdm

Data Structure

The dataset contains two main columns:

Context: Questions or concerns expressed by users about mental health issues
Response: Professional responses from psychologists

Analysis Components

Text Statistics
- Word count distribution
- Text length analysis
- Comparison of question vs. response lengths
Sentiment Analysis
- Basic sentiment scoring
- Emotional dimensions analysis
- Comparison of sentiment between questions and responses
Topic Modeling
- Optimal topic number determination
- Topic distribution analysis
- Key term extraction for each topic
Text Embeddings
- Word2Vec implementation
- LDA-based embeddings
- BERT embeddings (with fine-tuning capability)
Model Comparison
- Performance evaluation of different similarity models
- Runtime comparison
- Success rate analysis
Response Generation
- Similar conversation identification
- Response recommendation system
- Multiple model support

Usage

Data Loading

# Load the JSON dataset
json_data <- stream_in(file("combined_dataset.json"))
mental_health_df <- as.data.frame(json_data)

Text Preprocessing

# Clean and tokenize text
mental_health_df$clean_context <- sapply(mental_health_df$Context, clean_text)
mental_health_df$clean_response <- sapply(mental_health_df$Response, clean_text)

Response Generation

# Generate response recommendations
query <- "I feel anxious all the time and can't focus on my work. What should I do?"
recommendations <- generate_response_recommendations(query)

Model Performance

The project compares three main similarity models:

Word2Vec
LDA-based
BERT (pre-trained or fine-tuned)

Each model is evaluated based on:

Average runtime
Top similarity scores
Mean similarity scores
Success rate

Notes

The BERT model can be fine-tuned on the specific mental health conversation dataset
The system automatically selects the best performing model for response generation
All models include duplicate detection to ensure diverse recommendations

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Analysis of Mental Health Conversations.Rmd		Analysis of Mental Health Conversations.Rmd
README.md		README.md
finetune_bert_in_R.R		finetune_bert_in_R.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mental Health Conversation Analysis

Dataset Source

Features

Requirements

R Packages

Python Dependencies

Data Structure

Analysis Components

Usage

Model Performance

Notes

About

Uh oh!

Releases

Packages

Languages

NIEr66/Mental-Health-Conversation-Analysis

Folders and files

Latest commit

History

Repository files navigation

Mental Health Conversation Analysis

Dataset Source

Features

Requirements

R Packages

Python Dependencies

Data Structure

Analysis Components

Usage

Model Performance

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages