##  HIV & Mental Health Risk Assessment System
### Analysis of Synthetic Healthcare Conversations

Time Spent: 1h15mins

-   Setup and data exploration: 5 mins

-   Rule-based risk scorer: 15 mins

-   LLM integration: 20 mins

-   RAG system: 25 mins

-   Analysis and documentation: 10 mins


### Approach: Hybrid system combining:

Rule-based scoring: Keyword matching for baseline risk detection

LLM analysis: GPT-4 for nuanced contextual assessment

RAG recommendations: Vector search through SA NDOH guidelines for evidence-based recommendations

In [3]:
import sys
sys.path.append('../src')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from analyzer import ConversationAnalyzer
from data_loader import load_conversations
from config import CONVERSATIONS_FILE

# Styling
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

### 1. Load and Explore Data

In [7]:
conversations = load_conversations(CONVERSATIONS_FILE)

print(f"---- Total conversations loaded: {len(conversations)}")
print(f"-> Average messages per conversation: {np.mean([c['message_count'] for c in conversations]):.1f}")
print(f"\n---- Sample conversation preview:")
print(conversations[0]['full_text'][:300] + "...")

---- Total conversations loaded: 100
-> Average messages per conversation: 25.3

---- Sample conversation preview:
Hi, I need help about something sensitive. I'm here with you. Tell me what's going on. I'm not sure if what I'm feeling is normal. Thanks for sharing that. Symptoms can have many causes. It started a few days ago. How intense is it and has anything changed over time? I thought it would go away but i...


### 2. Initialize Analyzer and Run Assessment

The analyzer uses a hybrid approach:

-   Rule-based: Keyword matching against predefined risk vocabularies
-   LLM-based: GPT-4 for contextual understanding
-   Final score: Weighted combination (40% rules, 60% LLM)

In [8]:
analyzer = ConversationAnalyzer()


# Analyze first 5 conversations as demonstration
results = analyzer.analyze_dataset(limit=5)
print(f"\n Analyzed {len(results)} conversations")

Analyzing conversation 1/5...
Analyzing conversation 2/5...
Analyzing conversation 3/5...
Analyzing conversation 4/5...
Analyzing conversation 5/5...

 Analyzed 5 conversations


### 3. Checking the results: Full Conversation Analysis (First conversation)

In [17]:
# Display detailed results for first conversation
example = results[0]

print("="*80)
print("CONVERSATION ANALYSIS - First Conversation detailed view")
print("="*80)

print("\n-> CONVERSATION METADATA")
print(f"ID: {example['conversation_id']}")
print(f"Messages: {example['conversation']['message_count']}")
print(f"Preview: {example['conversation']['text_preview']}")

print("\n-> HIV RISK ASSESSMENT")
hiv = example['risk_assessment']['hiv_risk']
print(f"Final Score: {hiv['final_score']} ({hiv['final_category']})")
print(f"  - Rule-based: {hiv['rule_based']['score']} ({hiv['rule_based']['category']})")
print(f"  - LLM-based: {hiv['llm_based'].get('score', 'N/A')}")
print(f"\nLLM Reasoning: {hiv['llm_based'].get('reasoning', 'N/A')}")
print(f"\nRisk Factors Identified:")
for factor in hiv['llm_based'].get('risk_factors', []):
    print(f"  • {factor}")
print(f"\n-> Rule-based Keyword Matches:")
print(f"  High: {hiv['rule_based']['details']['high_risk_matches']}")
print(f"  Medium: {hiv['rule_based']['details']['medium_risk_matches']}")
print(f"  Symptoms: {hiv['rule_based']['details']['symptom_matches']}")

print("\n-> MENTAL HEALTH RISK ASSESSMENT")
mh = example['risk_assessment']['mental_health_risk']
print(f"Final Score: {mh['final_score']} ({mh['final_category']})")
print(f"  - Rule-based: {mh['rule_based']['score']} ({mh['rule_based']['category']})")
print(f"  - LLM-based: {mh['llm_based'].get('score', 'N/A')}")
print(f"\nLLM Reasoning: {mh['llm_based'].get('reasoning', 'N/A')}")
print(f"\nRisk Factors Identified:")
for factor in mh['llm_based'].get('risk_factors', []):
    print(f"  • {factor}")

print("\n-> RECOMMENDATIONS (SA NDOH Guidelines-based)")
recs = example['recommendations']
print(f"\nHIV Recommendation:\n{recs.get('hiv_recommendation', 'N/A')}")
print(f"\nMental Health Recommendation:\n{recs.get('mh_recommendation', 'N/A')}")
print(f"\nIntegrated Treatment Plan:\n{recs.get('integrated_plan', 'N/A')}")

print("\n" + "="*80)

CONVERSATION ANALYSIS - First Conversation detailed view

-> CONVERSATION METADATA
ID: 1
Messages: 24
Preview: Hi, I need help about something sensitive. I'm here with you. Tell me what's going on. I'm not sure if what I'm feeling is normal. Thanks for sharing that. Symptoms can have many causes. It started a ...

-> HIV RISK ASSESSMENT
Final Score: 0.44 (MEDIUM)
  - Rule-based: 0.8 (HIGH)
  - LLM-based: 0.2

LLM Reasoning: The conversation does not mention any specific HIV risk factors such as unprotected sex, needle sharing, or known exposure. The client expresses general health anxiety and concern for their partner, but no details about sexual behavior or acute HIV symptoms are provided.

Risk Factors Identified:
  • Unspecified symptoms
  • Concern for partner's health

-> Rule-based Keyword Matches:
  High: ['sti']
  Medium: ['partner', 'test', 'worried', 'concerned', 'symptoms']
  Symptoms: []

-> MENTAL HEALTH RISK ASSESSMENT
Final Score: 0.32 (LOW)
  - Rule-based: 0.5 (MEDIUM)
