# Regulatory-Compliant Drug Causality Assessment

**Pharmacovigilance Analysis per FDA/EMA Guidelines**

This tool assists in:
- Adverse event detection
- Pharmacovigilance screening  
- Clinical report analysis
- Regulatory compliance (FDA/EMA)

## Features
- WHO-UMC Causality Assessment
- Naranjo ADR Probability Scale
- Section-wise analysis (Abstract, Methods, Results, Discussion, Conclusion)
- Drug-specific causality reports
- Comprehensive Word document generation

## Model Performance
- F1 Score: 0.9759
- Accuracy: 0.9759
- Sensitivity: 0.9868
- Specificity: 0.9650

## 1. Install Dependencies

In [1]:
# Install required packages
!pip install torch transformers pandas numpy scikit-learn nltk PyPDF2 safetensors ipywidgets python-docx -q
print('‚úì All packages installed!')

‚úì All packages installed!


## 2. Import Libraries

In [2]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd() / 'src'))

import os
import json
from datetime import datetime
from regulatory_causality_report import create_regulatory_report

print('‚úì Libraries imported successfully!')

‚úì punkt already available
‚úì punkt_tab already available
‚úì Libraries imported successfully!


## 3. Upload PDF and Generate Regulatory Report

Upload your PDF document to generate a comprehensive regulatory-compliant causality assessment report.

In [3]:
from ipywidgets import FileUpload, Button, VBox, HTML, Output
from IPython.display import display, clear_output

# Create upload widget
upload_widget = FileUpload(accept='.pdf', multiple=False)
analyze_button = Button(
    description='Generate Regulatory Report',
    button_style='success',
    disabled=True,
    tooltip='Upload a PDF first'
)
output_area = Output()

uploaded_file_path = None

def save_uploaded_file(change):
    global uploaded_file_path
    if upload_widget.value:
        uploaded_file = list(upload_widget.value.values())[0]
        filename = uploaded_file['metadata']['name']
        content = uploaded_file['content']
        
        # Save to data/raw directory
        os.makedirs('./data/raw', exist_ok=True)
        uploaded_file_path = f'./data/raw/{filename}'
        
        with open(uploaded_file_path, 'wb') as f:
            f.write(content)
        
        analyze_button.disabled = False
        
        with output_area:
            clear_output()
            print(f'‚úì File uploaded: {filename}')
            print(f'‚úì Saved to: {uploaded_file_path}')
            print('\nClick "Generate Regulatory Report" to process.')
            print('\nThis will create:')
            print('  - Comprehensive Word document with:')
            print('    ‚Ä¢ Drug-specific causality analysis')
            print('    ‚Ä¢ Section-wise breakdown (Abstract, Methods, Results, etc.)')
            print('    ‚Ä¢ WHO-UMC Causality Categories')
            print('    ‚Ä¢ Naranjo ADR Probability Scores')
            print('    ‚Ä¢ FDA/EMA regulatory context')
            print('    ‚Ä¢ Clinical significance assessments')
            print('  - JSON summary with statistics')

def analyze_pdf(b):
    global uploaded_file_path
    with output_area:
        if not uploaded_file_path or not os.path.exists(uploaded_file_path):
            print('‚ö† Please upload a PDF file first!')
            return
        
        clear_output()
        print(f'Generating regulatory report for: {uploaded_file_path}\n')
        print('=' * 80)
        print('This may take a few minutes...')
        print('=' * 80)
        
        try:
            # Generate regulatory report
            doc_path, json_path = create_regulatory_report(uploaded_file_path)
            
            # Display results
            print('\n' + '=' * 80)
            print('‚úì REGULATORY REPORT GENERATED SUCCESSFULLY')
            print('=' * 80)
            
            # Read and display summary
            with open(json_path, 'r') as f:
                summary = json.load(f)
            
            print(f"\nDocument: {summary['pdf_file']}")
            print(f"Analysis Date: {summary['analysis_date']}")
            print(f"\nStatistics:")
            print(f"  - Total Sentences: {summary['total_sentences']}")
            print(f"  - Unique Drugs: {summary['total_drugs']}")
            print(f"  - Unique Events: {summary['total_events']}")
            
            print(f"\nKey Drugs Identified:")
            for drug_stat in summary['drug_statistics'][:10]:
                print(f"  ‚Ä¢ {drug_stat['drug']}: {drug_stat['related_count']} related sentences ({drug_stat['max_confidence']*100:.2f}% confidence)")
            
            print(f"\nFiles Generated:")
            print(f"  üìÑ Word Report: {doc_path}")
            print(f"  üìä JSON Summary: {json_path}")
            
            print(f"\n{'=' * 80}")
            print('The Word document contains:')
            print('  ‚úì Executive Summary')
            print('  ‚úì Key Drugs Identified with confidence scores')
            print('  ‚úì Quality Metrics')
            print('  ‚úì Detailed Drug-Event Analysis (organized by drug)')
            print('  ‚úì Section-wise causality statements')
            print('  ‚úì WHO-UMC Causality Categories')
            print('  ‚úì Naranjo ADR Probability Scores')
            print('  ‚úì FDA/EMA Regulatory Context')
            print('  ‚úì Clinical Significance Assessments')
            print('=' * 80)
            
        except Exception as e:
            print(f'‚ùå Error: {str(e)}')
            import traceback
            traceback.print_exc()

upload_widget.observe(save_uploaded_file, names='value')
analyze_button.on_click(analyze_pdf)

display(VBox([
    HTML('<h3>üìÑ Upload PDF for Regulatory Causality Assessment</h3>'),
    HTML('<p><b>Generates comprehensive report with WHO-UMC and Naranjo assessments</b></p>'),
    upload_widget,
    analyze_button,
    output_area
]))

VBox(children=(HTML(value='<h3>üìÑ Upload PDF for Regulatory Causality Assessment</h3>'), HTML(value='<p><b>Gene‚Ä¶

## 4. View Generated Reports

List all generated regulatory reports.

In [4]:
import glob
from pathlib import Path

# Find all regulatory reports
word_reports = glob.glob('./results/*_regulatory_report_*.docx')
json_reports = glob.glob('./results/*_regulatory_summary_*.json')

print('=' * 80)
print('GENERATED REGULATORY REPORTS')
print('=' * 80)

if word_reports:
    print(f'\nWord Reports ({len(word_reports)}):') 
    for report in sorted(word_reports, reverse=True):
        size = Path(report).stat().st_size / 1024
        print(f'  üìÑ {Path(report).name} ({size:.1f} KB)')
else:
    print('\nNo Word reports found yet.')

if json_reports:
    print(f'\nJSON Summaries ({len(json_reports)}):')
    for report in sorted(json_reports, reverse=True):
        size = Path(report).stat().st_size / 1024
        print(f'  üìä {Path(report).name} ({size:.1f} KB)')
else:
    print('\nNo JSON summaries found yet.')

print('\n' + '=' * 80)

GENERATED REGULATORY REPORTS

Word Reports (2):
  üìÑ zh801708001593_regulatory_report_20251102_000636.docx (37.7 KB)
  üìÑ fphar-16-1498191_regulatory_report_20251102_000618.docx (47.7 KB)

JSON Summaries (2):
  üìä zh801708001593_regulatory_summary_20251102_000636.json (0.5 KB)
  üìä fphar-16-1498191_regulatory_summary_20251102_000618.json (3.1 KB)



## 5. Batch Process Multiple PDFs

Process all PDF files in the data/raw directory.

In [5]:
import glob

# Find all PDF files
pdf_files = glob.glob('./data/raw/*.pdf')

if pdf_files:
    print(f'Found {len(pdf_files)} PDF files\n')
    print('=' * 80)
    
    for i, pdf_path in enumerate(pdf_files, 1):
        print(f'\nProcessing {i}/{len(pdf_files)}: {Path(pdf_path).name}')
        print('-' * 80)
        
        try:
            doc_path, json_path = create_regulatory_report(pdf_path)
            print(f'‚úì Report generated: {Path(doc_path).name}')
        except Exception as e:
            print(f'‚úó Error: {e}')
    
    print('\n' + '=' * 80)
    print('‚úì BATCH PROCESSING COMPLETE')
    print('=' * 80)
else:
    print('No PDF files found in ./data/raw/ directory')

Found 1 PDF files


Processing 1/1: fphar-16-1498191.pdf
--------------------------------------------------------------------------------

GENERATING REGULATORY CAUSALITY REPORT
PDF: fphar-16-1498191.pdf

Loading BioBERT model...
‚úì Model loaded

Extracting text from PDF...
‚úì Extracted 47089 characters

Tokenizing sentences...
‚úì Found 415 sentences

Analyzing causality for each sentence...
  Processed 50/415 sentences...
  Processed 100/415 sentences...
  Processed 150/415 sentences...
  Processed 200/415 sentences...
  Processed 250/415 sentences...
  Processed 300/415 sentences...
  Processed 350/415 sentences...
  Processed 400/415 sentences...

‚úì Analysis complete
  - Identified 32 unique drugs
  - Identified 8 unique events

Generating Word report...

‚úì Report saved: results\fphar-16-1498191_regulatory_report_20251102_002149.docx
‚úì JSON summary saved: results\fphar-16-1498191_regulatory_summary_20251102_002149.json
‚úì Report generated: fphar-16-1498191_regulatory_repor

## 6. Understanding the Report

### Report Structure

The generated Word document contains:

#### 1. Executive Summary
- Total sentences analyzed
- Drug-event sentences identified
- Causality-related sentences
- Unique drugs and events

#### 2. Key Drugs Identified
- List of all drugs with causality signals
- Associated adverse events
- Confidence scores
- Number of related sentences

#### 3. Quality Metrics
- Model performance statistics
- Confidence thresholds
- Average confidence scores

#### 4. Detailed Drug Analysis
For each drug:
- **Section-wise breakdown** (Abstract, Methods, Results, Discussion, Conclusion)
- **Causality sentences** from each section
- **Classification** (Related/Not Related)
- **Confidence scores**
- **WHO-UMC Causality Category**
  - Certain/Definite
  - Probable/Likely
  - Possible
  - Unlikely
  - Conditional/Unclassified
  - Unassessable/Unclassifiable
- **Naranjo ADR Probability Score** (0-13 scale)
  - Definite (‚â•9)
  - Probable (5-8)
  - Possible (1-4)
  - Doubtful (‚â§0)

#### 5. Regulatory Assessment
For each drug:
- **FDA/EMA Guidelines** context
- **Clinical Significance** explanation
- **Recommended Actions**
  - Risk Management Plan (RMP)
  - Periodic Safety Update Report (PSUR)
  - Label updates
  - Post-marketing surveillance

### Causality Assessment Scales

#### WHO-UMC Causality Categories
1. **Certain**: Clear temporal relationship, no alternative explanation
2. **Probable/Likely**: Reasonable time relationship, unlikely other causes
3. **Possible**: Reasonable time relationship, other factors possible
4. **Unlikely**: Temporal relationship exists but other factors more likely
5. **Conditional/Unclassified**: More data needed
6. **Unassessable**: Cannot be judged

#### Naranjo ADR Probability Scale
- Score range: -4 to +13
- Based on 10 questions about the adverse drug reaction
- Categories:
  - **Definite**: ‚â•9 points
  - **Probable**: 5-8 points
  - **Possible**: 1-4 points
  - **Doubtful**: ‚â§0 points