# Healthcare Risk Assessment System
## HIV Acquisition & Mental Health Disorder Risk Scoring
### Based on South African NDOH Guidelines


## Executive Summary

This notebook presents a comprehensive risk assessment system for:
1. **HIV Acquisition Risk** - Based on SA NDOH HIV Testing Services Policy 2016
2. **Mental Health Disorders** - Based on SA NDOH Mental Health Policy Framework 2023-2030

### Key Features:
- Natural language processing of healthcare conversations
- Evidence-based risk scoring algorithms (0-100 scale)
- Treatment recommendations aligned with SA NDOH guidelines
- Integrated care approach for comorbid conditions
- Comprehensive visualization and reporting

### Clinical Guidelines Referenced:
- SA NDOH HIV Testing Services Policy (2016)
- SA NDOH Mental Health Policy Framework (2023-2030)
- Primary Care 101 (PC101) Guidelines
- Mental Health Care Act 17 of 2002

## Section 1: Setup and Imports

In [None]:
# Core imports
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from collections import defaultdict, Counter
import warnings
warnings.filterwarnings('ignore')

# Visualization settings
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 8)

print("✓ All libraries imported successfully")

## Section 2: Data Loading

The dataset contains 100 synthetic conversations between users and an AI healthcare chatbot in WhatsApp format.

In [None]:
def load_conversations(file_path):
    """
    Load and parse WhatsApp-style healthcare conversations
    
    Parameters:
    -----------
    file_path : str
        Path to the conversation file
        
    Returns:
    --------
    list : List of conversation dictionaries
    """
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
    
    conversations = content.split("========== Conversation ==========")
    parsed_conversations = []
    
    for idx, conv in enumerate(conversations):
        if not conv.strip():
            continue
            
        messages = []
        lines = conv.strip().split('\n')
        
        for line in lines:
            match = re.match(r'\[(.*?)\] (User|AI): (.*)', line)
            if match:
                timestamp, sender, message = match.groups()
                messages.append({
                    'timestamp': timestamp,
                    'sender': sender,
                    'message': message
                })
        
        if messages:
            parsed_conversations.append({
                'conversation_id': idx + 1,
                'messages': messages,
                'user_messages': [m['message'] for m in messages if m['sender'] == 'User'],
                'full_text': ' '.join([m['message'] for m in messages if m['sender'] == 'User'])
            })
    
    return parsed_conversations

# Load the data
conversations = load_conversations('/mnt/user-data/uploads/health_ai_whatsapp_100_conversations_long.txt')
print(f"✓ Loaded {len(conversations)} conversations")
print(f"✓ Average messages per conversation: {np.mean([len(c['messages']) for c in conversations]):.1f}")

# Display sample conversation
print("\n" + "="*80)
print("SAMPLE CONVERSATION #1")
print("="*80)
for msg in conversations[0]['messages'][:5]:
    print(f"{msg['sender']:6s}: {msg['message']}")

## Section 3: HIV Risk Assessment Algorithm

### Methodology:
Based on SA NDOH HIV Testing Services Policy 2016, the algorithm identifies:

**High Risk Indicators (3x weight):**
- Unprotected sexual contact
- Partner HIV status concerns
- Recent exposure events
- STI symptoms
- Pregnancy/breastfeeding

**Medium Risk Indicators (2x weight):**
- Testing history
- Partner concerns
- HIV-related symptoms
- High-risk behaviors

**Low Risk Indicators (1x weight):**
- General health concerns
- Prevention queries
- Information seeking

### Risk Categories:
- **HIGH (≥60)**: Urgent testing within 24 hours, PEP/PrEP assessment
- **MEDIUM (30-59)**: Testing within 1 week, counseling services
- **LOW (<30)**: Routine testing, prevention education

In [None]:
class HIVRiskAssessor:
    def __init__(self):
        self.risk_keywords = {
            'high': {
                'unprotected': ['unprotected', 'no condom', 'without protection', 'unsafe sex'],
                'partner_risk': ['partner hiv', 'partner positive', 'partner infected', 
                               'multiple partners', 'new partner'],
                'exposure': ['exposure', 'exposed to', 'came in contact', 'needlestick'],
                'sti_symptoms': ['discharge', 'sores', 'genital pain', 'burning urination',
                               'genital ulcer', 'painful urination'],
                'pregnancy': ['pregnant', 'pregnancy', 'breastfeeding', 'antenatal']
            },
            'medium': {
                'testing': ['never tested', 'long time since test', 'should i test',
                          'need test', 'get tested'],
                'partner_concern': ['partner', 'worried about partner', 'concerned about partner',
                                  'should i tell partner'],
                'symptoms': ['fever', 'night sweats', 'weight loss', 'swollen glands',
                           'persistent cough', 'diarrhea', 'rash'],
                'behavioral': ['sex worker', 'inject drugs', 'sharing needles']
            },
            'low': {
                'general_health': ['feeling sick', 'not well', 'worried', 'stressed'],
                'prevention': ['how to prevent', 'stay safe', 'protect myself'],
                'information': ['what is hiv', 'how does', 'can i get']
            }
        }
        self.risk_weights = {'high': 3.0, 'medium': 2.0, 'low': 1.0}
    
    def calculate_risk_score(self, conversation_text):
        text_lower = conversation_text.lower()
        risk_factors = []
        total_score = 0
        max_possible_score = 0
        
        for risk_level, categories in self.risk_keywords.items():
            weight = self.risk_weights[risk_level]
            for category, keywords in categories.items():
                found_keywords = [kw for kw in keywords if kw in text_lower]
                if found_keywords:
                    category_score = weight * len(found_keywords)
                    total_score += category_score
                    risk_factors.append({
                        'category': category,
                        'risk_level': risk_level,
                        'keywords_found': found_keywords,
                        'score': category_score
                    })
                max_possible_score += weight * len(keywords)
        
        normalized_score = min(100, (total_score / max_possible_score) * 100) if max_possible_score > 0 else 0
        
        # Urgency adjustment
        urgency_keywords = ['urgent', 'emergency', 'severe', 'very worried', 'scared']
        urgency_count = sum(1 for word in urgency_keywords if word in text_lower)
        normalized_score = min(100, normalized_score + urgency_count * 5)
        
        return {
            'risk_score': round(normalized_score, 2),
            'risk_category': self._categorize_risk(normalized_score),
            'risk_factors': risk_factors,
            'total_indicators': len(risk_factors)
        }
    
    def _categorize_risk(self, score):
        if score >= 60:
            return 'HIGH'
        elif score >= 30:
            return 'MEDIUM'
        else:
            return 'LOW'

print("✓ HIV Risk Assessment algorithm loaded")

## Section 4: Mental Health Risk Assessment Algorithm

### Methodology:
Based on SA NDOH Mental Health Policy Framework 2023-2030 and PC101 guidelines:

**Depression Indicators (PHQ-9 based):**
- Depressed mood
- Loss of interest/pleasure
- Sleep disturbances
- Appetite changes
- Concentration problems

**Anxiety Indicators (GAD-7 based):**
- Excessive worry
- Physical symptoms
- Avoidance behaviors
- Restlessness

**Severe Symptoms (Priority):**
- Suicidal ideation
- Psychotic symptoms
- Substance abuse

**Functional Impairment:**
- Work/school disruption
- Relationship problems
- Daily living difficulties

### Risk Categories:
- **SEVERE (≥70 or suicidal)**: Emergency psychiatric evaluation
- **MODERATE (45-69)**: 1-2 week assessment, psychotherapy + medication
- **MILD (20-44)**: Within 1 month, counseling and monitoring
- **MINIMAL (<20)**: Routine wellness, prevention

In [None]:
class MentalHealthAssessor:
    def __init__(self):
        self.mental_health_indicators = {
            'depression': {
                'mood': ['sad', 'depressed', 'hopeless', 'worthless', 'empty', 'down', 'crying', 'tearful'],
                'interest': ['no interest', 'no motivation', 'dont care', 'nothing matters', 'lost interest'],
                'sleep': ['cant sleep', 'insomnia', 'sleep too much', 'oversleeping', 'nightmares'],
                'appetite': ['no appetite', 'not eating', 'lost weight', 'overeating', 'gained weight'],
                'concentration': ['cant focus', 'cant concentrate', 'memory problems', 'cant think']
            },
            'anxiety': {
                'worry': ['worried', 'anxious', 'nervous', 'panic', 'fear', 'scared', 'frightened'],
                'physical': ['heart racing', 'sweating', 'trembling', 'shaking', 'chest pain', 'dizzy'],
                'avoidance': ['avoiding', 'cant leave', 'stay home', 'hiding'],
                'restlessness': ['restless', 'on edge', 'tense', 'keyed up']
            },
            'severe_symptoms': {
                'suicidal': ['suicide', 'kill myself', 'end it', 'better off dead', 'want to die', 'self harm'],
                'psychotic': ['hearing voices', 'seeing things', 'paranoid', 'people watching'],
                'substance': ['drinking', 'alcohol', 'drugs', 'high', 'using']
            },
            'functional_impairment': {
                'work': ['cant work', 'lost job', 'fired', 'quit job', 'missing work'],
                'relationships': ['isolated', 'alone', 'no friends', 'family problems'],
                'daily_living': ['cant get up', 'stay in bed', 'dont shower', 'not taking care']
            },
            'stress': {
                'general': ['stressed', 'overwhelmed', 'pressure', 'burden', 'too much', 'cant cope'],
                'life_events': ['death', 'divorce', 'lost', 'trauma', 'abuse', 'violence']
            }
        }
    
    def calculate_risk_score(self, conversation_text):
        text_lower = conversation_text.lower()
        symptoms_found = defaultdict(list)
        severity_multiplier = 1.0
        
        # Check severe symptoms
        severe_found = False
        for category, keywords in self.mental_health_indicators['severe_symptoms'].items():
            found = [kw for kw in keywords if kw in text_lower]
            if found:
                symptoms_found[f'severe_{category}'].extend(found)
                severe_found = True
                severity_multiplier = 2.0
        
        # Depression, anxiety, impairment, stress scores
        depression_score = self._count_symptoms(text_lower, self.mental_health_indicators['depression'], symptoms_found, 'depression')
        anxiety_score = self._count_symptoms(text_lower, self.mental_health_indicators['anxiety'], symptoms_found, 'anxiety')
        impairment_score = self._count_symptoms(text_lower, self.mental_health_indicators['functional_impairment'], symptoms_found, 'impairment')
        stress_score = self._count_symptoms(text_lower, self.mental_health_indicators['stress'], symptoms_found, 'stress')
        
        base_score = (depression_score * 3 + anxiety_score * 2.5 + impairment_score * 2 + stress_score * 1.5) * severity_multiplier
        normalized_score = min(100, base_score * 2)
        
        return {
            'risk_score': round(normalized_score, 2),
            'risk_category': self._categorize_risk(normalized_score, severe_found),
            'symptoms_found': dict(symptoms_found),
            'depression_indicators': depression_score,
            'anxiety_indicators': anxiety_score,
            'impairment_indicators': impairment_score,
            'severe_symptoms': severe_found,
            'total_indicators': len(symptoms_found)
        }
    
    def _count_symptoms(self, text, indicators, symptoms_dict, prefix):
        score = 0
        for category, keywords in indicators.items():
            found = [kw for kw in keywords if kw in text]
            if found:
                symptoms_dict[f'{prefix}_{category}'].extend(found)
                score += len(found)
        return score
    
    def _categorize_risk(self, score, severe_symptoms):
        if severe_symptoms or score >= 70:
            return 'SEVERE'
        elif score >= 45:
            return 'MODERATE'
        elif score >= 20:
            return 'MILD'
        else:
            return 'MINIMAL'

print("✓ Mental Health Assessment algorithm loaded")

## Section 5: Treatment Recommendations (SA NDOH Guidelines)

### Key Contact Numbers:
- **National AIDS Helpline:** 0800 012 322
- **SADAG Mental Health Crisis Line:** 0800 567 567 / 011 262 6396
- **Suicide Crisis Line:** 0800 567 567

### Treatment Framework:
1. Risk stratification
2. Urgency determination
3. Testing/assessment protocols
4. Counseling requirements
5. Referral pathways
6. Treatment initiation
7. Integrated care for comorbidities

In [None]:
# Full TreatmentRecommender class implementation
# (Code continues from main script)
exec(open('/home/tendai/healthcare_risk_assessment.py').read().split('class TreatmentRecommender:')[1].split('# ============================================================================')[0])

print("✓ Treatment recommendation system loaded")

## Section 6: Analysis of All Conversations

In [None]:
# Initialize assessors
hiv_assessor = HIVRiskAssessor()
mh_assessor = MentalHealthAssessor()
recommender = TreatmentRecommender()

# Analyze all conversations
results = []
for conv in conversations:
    text = conv['full_text']
    hiv_assessment = hiv_assessor.calculate_risk_score(text)
    mh_assessment = mh_assessor.calculate_risk_score(text)
    recommendations = recommender.generate_recommendations(hiv_assessment, mh_assessment)
    
    results.append({
        'conversation_id': conv['conversation_id'],
        'hiv_risk_score': hiv_assessment['risk_score'],
        'hiv_risk_category': hiv_assessment['risk_category'],
        'mh_risk_score': mh_assessment['risk_score'],
        'mh_risk_category': mh_assessment['risk_category'],
        'hiv_assessment': hiv_assessment,
        'mh_assessment': mh_assessment,
        'recommendations': recommendations
    })

df_results = pd.DataFrame(results)
print(f"✓ Analyzed {len(df_results)} conversations")
print("\nFirst 5 results:")
df_results[['conversation_id', 'hiv_risk_score', 'hiv_risk_category', 'mh_risk_score', 'mh_risk_category']].head()

## Section 7: Summary Statistics

In [None]:
print("="*80)
print("COMPREHENSIVE RISK ASSESSMENT SUMMARY")
print("="*80)
print()

print("HIV RISK DISTRIBUTION:")
print("-"*40)
hiv_dist = df_results['hiv_risk_category'].value_counts()
for category, count in hiv_dist.items():
    print(f"  {category:10s}: {count:3d} ({(count/len(df_results)*100):5.1f}%)")

print("\nMENTAL HEALTH RISK DISTRIBUTION:")
print("-"*40)
mh_dist = df_results['mh_risk_category'].value_counts()
for category, count in mh_dist.items():
    print(f"  {category:10s}: {count:3d} ({(count/len(df_results)*100):5.1f}%)")

print("\nRISK SCORES:")
print("-"*40)
print(df_results[['hiv_risk_score', 'mh_risk_score']].describe())

## Section 8: Visualizations

In [None]:
# Create comprehensive visualization
fig = plt.figure(figsize=(16, 10))

# Risk distributions
ax1 = plt.subplot(2, 3, 1)
hiv_counts = df_results['hiv_risk_category'].value_counts()
colors = {'HIGH': '#d62728', 'MEDIUM': '#ff7f0e', 'LOW': '#2ca02c'}
ax1.bar(range(len(hiv_counts)), hiv_counts.values, color=[colors.get(x, '#1f77b4') for x in hiv_counts.index])
ax1.set_xticks(range(len(hiv_counts)))
ax1.set_xticklabels(hiv_counts.index)
ax1.set_title('HIV Risk Distribution', fontweight='bold')
ax1.set_ylabel('Count')
for i, v in enumerate(hiv_counts.values):
    ax1.text(i, v+1, str(v), ha='center', fontweight='bold')

ax2 = plt.subplot(2, 3, 2)
mh_counts = df_results['mh_risk_category'].value_counts()
mh_colors = {'SEVERE': '#d62728', 'MODERATE': '#ff7f0e', 'MILD': '#ffdd57', 'MINIMAL': '#2ca02c'}
ax2.bar(range(len(mh_counts)), mh_counts.values, color=[mh_colors.get(x, '#1f77b4') for x in mh_counts.index])
ax2.set_xticks(range(len(mh_counts)))
ax2.set_xticklabels(mh_counts.index, rotation=45)
ax2.set_title('Mental Health Risk Distribution', fontweight='bold')
ax2.set_ylabel('Count')
for i, v in enumerate(mh_counts.values):
    ax2.text(i, v+1, str(v), ha='center', fontweight='bold')

ax3 = plt.subplot(2, 3, 3)
ax3.scatter(df_results['hiv_risk_score'], df_results['mh_risk_score'], alpha=0.5)
ax3.set_xlabel('HIV Risk Score')
ax3.set_ylabel('Mental Health Risk Score')
ax3.set_title('HIV vs Mental Health Risk', fontweight='bold')
ax3.axhline(45, color='r', linestyle='--', alpha=0.3, label='MH Moderate')
ax3.axvline(30, color='r', linestyle='--', alpha=0.3, label='HIV Medium')
ax3.legend()

plt.tight_layout()
plt.savefig('/tendai/user-data/outputs/notebook_visualizations.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualizations created")

## Section 9: Demonstration Cases

To demonstrate the system's full capability, let's create example cases with varying risk levels:

In [None]:
# Create demonstration cases
demo_cases = [
    {
        'name': 'HIGH HIV Risk Case',
        'text': """I had unprotected sex with a new partner last week. Now I have discharge and 
                   burning urination. My partner told me they are HIV positive. I'm very worried 
                   and scared. Should I get tested? This is urgent."""
    },
    {
        'name': 'SEVERE Mental Health Case',
        'text': """I can't sleep, I'm sad all the time. I've lost interest in everything. I can't 
                   get out of bed. I'm feeling hopeless and worthless. Sometimes I think about ending 
                   it all. I can't concentrate at work and I'm isolated from friends and family. 
                   I'm drinking to cope."""
    },
    {
        'name': 'Integrated Care Case',
        'text': """I'm pregnant and worried. I had unprotected sex and need testing. I'm also feeling 
                   very anxious and depressed. I'm scared and overwhelmed. Can't sleep, crying all the time. 
                   Worried about my partner and the baby."""
    }
]

print("="*80)
print("DEMONSTRATION CASES - System Capability")
print("="*80)
print()

for demo in demo_cases:
    print("-"*80)
    print(f"CASE: {demo['name']}")
    print("-"*80)
    
    hiv_result = hiv_assessor.calculate_risk_score(demo['text'])
    mh_result = mh_assessor.calculate_risk_score(demo['text'])
    recs = recommender.generate_recommendations(hiv_result, mh_result)
    
    print(f"\nPatient Statement: {demo['text'][:150]}...")
    print(f"\nHIV Risk: {hiv_result['risk_category']} (Score: {hiv_result['risk_score']}/100)")
    print(f"Mental Health: {mh_result['risk_category']} (Score: {mh_result['risk_score']}/100)")
    print(f"\nUrgency - HIV: {recs['hiv_plan']['urgency']}")
    print(f"Urgency - MH: {recs['mental_health_plan']['urgency']}")
    
    if recs['integrated_considerations']:
        print("\n** INTEGRATED CARE REQUIRED **")
    print()

## Section 10: Detailed Treatment Plan Example

In [None]:
# Show detailed treatment plan for high-risk case
high_risk_text = demo_cases[0]['text']
hiv_assess = hiv_assessor.calculate_risk_score(high_risk_text)
mh_assess = mh_assessor.calculate_risk_score(high_risk_text)
treatment = recommender.generate_recommendations(hiv_assess, mh_assess)

print("="*80)
print("DETAILED TREATMENT PLAN - HIGH RISK CASE")
print("="*80)
print()
print("RISK SUMMARY:")
print(f"  HIV Risk: {treatment['summary']['hiv_risk']} ({treatment['summary']['hiv_score']}/100)")
print(f"  Mental Health: {treatment['summary']['mental_health_risk']} ({treatment['summary']['mental_health_score']}/100)")
print()

print("HIV MANAGEMENT PLAN:")
print("-"*80)
print(f"Urgency: {treatment['hiv_plan']['urgency']}")
print("\nTesting:")
for item in treatment['hiv_plan']['testing']:
    print(f"  • {item}")
print("\nCounseling:")
for item in treatment['hiv_plan']['counseling']:
    print(f"  • {item}")
print("\nReferral:")
for item in treatment['hiv_plan']['referral']:
    print(f"  • {item}")
print("\nPrevention:")
for item in treatment['hiv_plan']['prevention']:
    print(f"  • {item}")

print("\n" + "="*80)
print("Note: Treatment plans are tailored to SA NDOH Guidelines")
print("="*80)

## Section 11: Export Results

In [None]:
# Export results
export_df = df_results[['conversation_id', 'hiv_risk_score', 'hiv_risk_category',
                         'mh_risk_score', 'mh_risk_category']].copy()
export_df.to_csv('/mnt/user-data/outputs/assessment_results.csv', index=False)

print("✓ Results exported to: /mnt/user-data/outputs/assessment_results.csv")
print("\nExport preview:")
export_df.head(10)

## Section 12: Conclusions and Recommendations

### Key Findings:
1. **System Capability**: The risk assessment system successfully identifies and scores both HIV and mental health risks using evidence-based indicators from SA NDOH guidelines

2. **Dataset Characteristics**: The provided synthetic dataset contains generic conversations with minimal specific risk indicators, resulting in predominantly low-risk scores

3. **Treatment Integration**: The system implements integrated care recommendations, recognizing the strong bidirectional relationship between HIV and mental health (as per SA NDOH Mental Health Policy Framework)

### Clinical Value:
- **Early Detection**: Automated screening can identify high-risk individuals requiring urgent intervention
- **Resource Allocation**: Risk stratification enables appropriate triaging of limited healthcare resources
- **Guideline Adherence**: Ensures recommendations align with SA NDOH evidence-based protocols
- **Integrated Care**: Addresses comorbidity between HIV and mental health conditions

### System Strengths:
1. Evidence-based scoring algorithms
2. Multi-dimensional risk assessment (HIV + Mental Health)
3. Urgency classification for prioritization
4. Specific, actionable treatment recommendations
5. Integration with SA public health infrastructure

### Limitations and Future Enhancements:
1. **Keyword-based Approach**: Current system uses keyword matching; could be enhanced with:
   - Machine learning models (NLP transformers)
   - Contextual understanding (BERT, GPT)
   - Sentiment analysis

2. **Validation Needs**:
   - Clinical validation with real patient data
   - Sensitivity/specificity analysis
   - Comparison with clinician assessments

3. **Additional Features**:
   - Temporal tracking (symptom progression)
   - Risk factor interactions
   - Demographic considerations
   - Treatment adherence monitoring

### Deployment Considerations:
1. **Privacy & Security**: POPIA compliance, data encryption, secure storage
2. **Clinical Oversight**: System as decision support, not replacement for clinical judgment
3. **Quality Assurance**: Regular audits, feedback loops, continuous improvement
4. **Integration**: EMR integration, referral system connectivity

### References:
- SA NDOH HIV Testing Services Policy (2016)
- SA NDOH Mental Health Policy Framework and Strategic Plan (2023-2030)
- Mental Health Care Act 17 of 2002
- Primary Care 101 Guidelines
- SA National Strategic Plan for HIV, STIs and TB (2017-2022)

---

**For questions or feedback, please contact the development team.**