# Hepatitis C Prediction - Making Predictions

This notebook demonstrates how to use the trained model to make predictions on new patient data.

## Features Required for Prediction

The model expects 12 numerical features:
1. **Age**: Patient age
2. **Sex**: Gender (0=Female, 1=Male) 
3. **ALB**: Albumin level
4. **ALP**: Alkaline phosphatase level
5. **ALT**: Alanine aminotransferase level
6. **AST**: Aspartate aminotransferase level
7. **BIL**: Bilirubin level
8. **CHE**: Cholinesterase level
9. **CHOL**: Cholesterol level
10. **CREA**: Creatinine level
11. **GGT**: Gamma-glutamyl transferase level
12. **PROT**: Protein level

## 1. Import Libraries and Load Model

In [3]:
import sys
import os
sys.path.append('../src')

import torch
import numpy as np
import pandas as pd
import pickle
from models import HepatitisNet, load_model

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_path = '../models/hepatitis_model.pth'
model, _ = load_model(model_path, HepatitisNet, input_size=12)
model = model.to(device)
print(f"Model loaded from: {model_path}")

preprocessing_path = '../data/processed/preprocessing_info.pkl'
with open(preprocessing_path, 'rb') as f:
    preprocessing_info = pickle.load(f)

scaler = preprocessing_info['scaler']
print(f"Scaler loaded from: {preprocessing_path}")

print(f"Using device: {device}")

Model loaded from: ../models/hepatitis_model.pth
Scaler loaded from: ../data/processed/preprocessing_info.pkl
Using device: cpu


## 2. Create Prediction Function

In [4]:
def predict_hepatitis(patient_data, model, scaler, device):

    if isinstance(patient_data, dict):
        features = list(patient_data.values())
    else:
        features = patient_data
    
    features = np.array(features).reshape(1, -1)
    
    features_scaled = scaler.transform(features)
    
    features_tensor = torch.FloatTensor(features_scaled).to(device)

    model.eval()
    with torch.no_grad():
        output = model(features_tensor)
        probabilities = torch.softmax(output, dim=1)
        prediction = output.argmax(dim=1).item()
        confidence = probabilities[0][prediction].item()
    
    return prediction, confidence

feature_names = ['Age', 'Sex', 'ALB', 'ALP', 'ALT', 'AST', 'BIL', 'CHE', 'CHOL', 'CREA', 'GGT', 'PROT']
print("Feature order:", feature_names)

Feature order: ['Age', 'Sex', 'ALB', 'ALP', 'ALT', 'AST', 'BIL', 'CHE', 'CHOL', 'CREA', 'GGT', 'PROT']


## 3. Example Predictions

In [5]:
patient_1 = {
    'Age': 45,
    'Sex': 1, 
    'ALB': 38.5,
    'ALP': 52.5,
    'ALT': 7.7,
    'AST': 22.1,
    'BIL': 7.5,
    'CHE': 6.93,
    'CHOL': 3.23,
    'CREA': 106.0,
    'GGT': 12.1,
    'PROT': 69.0
}

prediction_1, confidence_1 = predict_hepatitis(patient_1, model, scaler, device)
result_1 = "Hepatitis C" if prediction_1 == 1 else "No Hepatitis C"
print(f"Patient 1 Prediction: {result_1} (Confidence: {confidence_1:.3f})")

patient_2_data = [32, 0, 40.2, 74.0, 25.3, 31.2, 6.8, 7.2, 4.1, 85.0, 18.4, 72.5]
prediction_2, confidence_2 = predict_hepatitis(patient_2_data, model, scaler, device)
result_2 = "Hepatitis C" if prediction_2 == 1 else "No Hepatitis C"
print(f"Patient 2 Prediction: {result_2} (Confidence: {confidence_2:.3f})")

patient_3_data = [55, 1, 35.1, 95.2, 45.8, 62.3, 12.4, 5.8, 2.9, 125.0, 87.6, 68.2]
prediction_3, confidence_3 = predict_hepatitis(patient_3_data, model, scaler, device)
result_3 = "Hepatitis C" if prediction_3 == 1 else "No Hepatitis C"
print(f"Patient 3 Prediction: {result_3} (Confidence: {confidence_3:.3f})")

Patient 1 Prediction: No Hepatitis C (Confidence: 0.995)
Patient 2 Prediction: No Hepatitis C (Confidence: 0.989)
Patient 3 Prediction: No Hepatitis C (Confidence: 0.928)




## 4. Batch Predictions from CSV

In [6]:
def predict_batch_from_csv(csv_path, model, scaler, device, save_results=True):

    df = pd.read_csv(csv_path)
    

    feature_cols = ['Age', 'Sex', 'ALB', 'ALP', 'ALT', 'AST', 'BIL', 'CHE', 'CHOL', 'CREA', 'GGT', 'PROT']
    
    predictions = []
    confidences = []
    
    for idx, row in df.iterrows():
        patient_data = row[feature_cols].values
        pred, conf = predict_hepatitis(patient_data, model, scaler, device)
        predictions.append(pred)
        confidences.append(conf)

    df['Prediction'] = predictions
    df['Confidence'] = confidences
    df['Result'] = df['Prediction'].map({0: 'No Hepatitis C', 1: 'Hepatitis C'})
    
    if save_results:
        output_path = csv_path.replace('.csv', '_predictions.csv')
        df.to_csv(output_path, index=False)
        print(f"Results saved to: {output_path}")
    
    return df

test_data_path = '../data/processed/X_test.csv'
if os.path.exists(test_data_path):
    print("Making predictions on test data...")
    # Load test features
    X_test = pd.read_csv(test_data_path)
    
    predictions = []
    confidences = []
    
    for idx, row in X_test.iterrows():
        pred, conf = predict_hepatitis(row.values, model, scaler, device)
        predictions.append(pred)
        confidences.append(conf)
    
    results_df = pd.DataFrame({
        'Patient_ID': range(len(predictions)),
        'Prediction': predictions[:10],
        'Confidence': confidences[:10],
        'Result': [('Hepatitis C' if p == 1 else 'No Hepatitis C') for p in predictions[:10]]
    })
    
    print("\nFirst 10 predictions:")
    print(results_df)
else:
    print("Test data not found. You can create a CSV file with patient data to make batch predictions.")

Test data not found. You can create a CSV file with patient data to make batch predictions.


## 5. Interactive Prediction Interface

In [7]:
def interactive_prediction():

    print("=== Hepatitis C Prediction Interface ===")
    print("Please enter the following patient information:")
    print()
    
    try:
        age = float(input("Age: "))
        sex = int(input("Sex (0=Female, 1=Male): "))
        alb = float(input("ALB (Albumin): "))
        alp = float(input("ALP (Alkaline phosphatase): "))
        alt = float(input("ALT (Alanine aminotransferase): "))
        ast = float(input("AST (Aspartate aminotransferase): "))
        bil = float(input("BIL (Bilirubin): "))
        che = float(input("CHE (Cholinesterase): "))
        chol = float(input("CHOL (Cholesterol): "))
        crea = float(input("CREA (Creatinine): "))
        ggt = float(input("GGT (Gamma-glutamyl transferase): "))
        prot = float(input("PROT (Protein): "))
        
        patient_data = [age, sex, alb, alp, alt, ast, bil, che, chol, crea, ggt, prot]
        prediction, confidence = predict_hepatitis(patient_data, model, scaler, device)
        
        result = "Hepatitis C" if prediction == 1 else "No Hepatitis C"
        print(f"\n=== PREDICTION RESULT ===")
        print(f"Diagnosis: {result}")
        print(f"Confidence: {confidence:.3f} ({confidence*100:.1f}%)")
        
        if confidence < 0.7:
            print("Low confidence - consider additional testing")
        elif confidence > 0.9:
            print("High confidence prediction")
        else:
            print("Moderate confidence prediction")
            
    except ValueError:
        print("Invalid input. Please enter numeric values only.")
    except KeyboardInterrupt:
        print("\nPrediction cancelled.")

print("To use the interactive prediction interface, uncomment the last line and run this cell.")

To use the interactive prediction interface, uncomment the last line and run this cell.


## Summary

This notebook provides a complete interface for making predictions with the trained Hepatitis C model:

1. **Model Loading**: Load the trained model and preprocessing scaler
2. **Prediction Function**: Simple function to predict for individual patients
3. **Example Predictions**: Demonstrations with sample patient data
4. **Batch Processing**: Function to process multiple patients from CSV files
5. **Interactive Interface**: User-friendly input interface for real-time predictions

### Usage Notes:
- All 12 features must be provided in the correct order
- Features are automatically scaled using the same scaler from training
- Predictions include both the class (0/1) and confidence score
- High confidence (>90%) predictions are more reliable
- Low confidence (<70%) predictions may need additional medical evaluation

### Feature Reference:
- **Age**: Patient age in years
- **Sex**: 0=Female, 1=Male
- **ALB**: Albumin level (g/L)
- **ALP**: Alkaline phosphatase (U/L)
- **ALT**: Alanine aminotransferase (U/L)
- **AST**: Aspartate aminotransferase (U/L)
- **BIL**: Bilirubin (μmol/L)
- **CHE**: Cholinesterase (kU/L)
- **CHOL**: Cholesterol (mmol/L)
- **CREA**: Creatinine (μmol/L)
- **GGT**: Gamma-glutamyl transferase (U/L)
- **PROT**: Protein (g/L)