<a href="https://colab.research.google.com/github/Jules-gatete/Mission_Capstone/blob/main/UmutiSafe_ML_Model_Development_Notebook2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Complete structure for OCR + Rule-Based Disposal Classification System**

---



#UmutiSafe: Intelligent Medicine Disposal Guidance System


#Author: [Your Name]
#Institution: [Your Institution]
#Date: [Date]

#RESEARCH CONTRIBUTION:
This project demonstrates how pre-trained state-of-the-art models can be effectively
integrated with domain-specific knowledge for healthcare applications, achieving high
accuracy without requiring massive training datasets.

# **SECTION 1: PROJECT OVERVIEW & METHODOLOGY**


# UmutiSafe: Transfer Learning Approach to Medicine Disposal

## Problem Statement
90%+ of Rwandan households dispose of medicines improperly, causing:
- Environmental contamination
- Antimicrobial resistance
- Accidental poisoning
- Public health risks

## Research Approach: Why Transfer Learning?

### Traditional Approach (Not Feasible):
- Collect 10,000+ labeled medicine images
- Train CNN from scratch
- Requires: Months of data collection, GPU resources, expertise
- Result: Likely lower accuracy due to limited training data

### Our Approach (Transfer Learning + System Integration):
- Leverage pre-trained OCR models (trained on millions of images)
- Integrate with Rwanda FDA medicine database
- Combine with expert-designed disposal rules
- Result: Higher accuracy, faster deployment, explainable decisions

## System Components

### ML Component 1: Transfer Learning for Text Extraction
- **Base Model**: EasyOCR (CRAFT + CRNN)
- **Pre-trained on**: SynthText dataset (800K images), ICDAR datasets
- **Transfer Learning**: Use frozen weights, adapt to medicine packages
- **Our Contribution**: Preprocessing pipeline, confidence thresholding

### ML Component 2: Feature-Based Matching
- **Technique**: Fuzzy string matching with learned distance metrics
- **Algorithm**: Levenshtein distance + Token-based similarity
- **Our Contribution**: Search optimization, confidence calibration

### Component 3: Knowledge-Based Classification
- **Type**: Rule-based expert system (not neural)
- **Knowledge Source**: Rwanda FDA disposal guidelines
- **Our Contribution**: 11 disposal rules encoding domain expertise

## Success Metrics
- OCR accuracy: >85% (measured on Rwanda medicine photos)
- Matching accuracy: >90% (top-3 accuracy)
- Classification accuracy: >95% (rule-based, deterministic)
- System usability: >80% user satisfaction
"""

# GOOGLE COLAB SETUP



# Pharmaceutical Waste Disposal Classification Model
## Machine Learning Model for Automated Medicine Disposal Guidelines

This notebook creates a model that:
1. Classifies medicines into disposal categories
2. Assesses risk levels
3. Generates appropriate disposal guidelines
4. Can be deployed for real-time classification


#Cell 1: Install & Import Required Libraries


In [1]:
# Install required libraries
!pip install scikit-learn pandas numpy

print("✓ All libraries installed successfully!")

✓ All libraries installed successfully!


In [None]:
## Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import pickle
import json
import warnings
warnings.filterwarnings('ignore')

print("✓ All libraries imported successfully!")


✓ All libraries imported successfully!


# Load andprebare data


In [None]:
## Step 2: Load and Prepare Data
# Load the medicines data from CSV
# Replace 'medicines.csv' with your actual file path
df = pd.read_csv('/content/drive/MyDrive/UmutiSafe/data/raw/rwanda_fda_medicines_fixed.csv')

# Display first few rows
print("Dataset shape:", df.shape)
print("\nFirst 5 rows:")
df.head()


Dataset shape: (2322, 16)

First 5 rows:


Unnamed: 0,No,Registration_No,Brand_Name,Generic_Name,Dosage_Strength,Dosage_Form,Pack_Size,Packaging_Type,Shelf_Life,Manufacturer_Name,Manufacturer_Address,Manufacturer_Country,Marketing_Authorization_Holder,Local_Technical_Representative,Registration_Date,Expiry_Date
0,1,Rwanda FDA-HMP-MA-0033,ILET B2,"Glimepiride, Metformin HCl","2mg, 500mg",Tablets,"Box Of 10, Box Of 30",ALU-PVC/PVDC BLISTER PACK,24 Months,MSN LABORATORIES PRIVATE LIMITED (Formulations...,"Plot No. 42, Anrich Industrial Estate, Bollara...",INDIA,MSN LABORATORIES PRIVATE LIMITED,ABACUS PHARMA (A) LTD,2020-09-07,2025-09-07
1,2,Rwanda FDA-HMP-MA-0021,BI-PRETERAX,"Peridopril Arginine, Indapamide","5mg, 1.25mg",Tablets,30 Film Coated Tablets,POLYPROPYLENE CONTAINER,36 Months,LES LABORATOIRES SERVIER,"905 ROUTE DE SARAN, 45520 GIDY, FRANCE",FRANCE,LES LABORATOIRES SERVIER,KIPHARMA LTD,2020-09-15,2025-09-15
2,3,Rwanda FDA-HMP-MA-0022,DIAMICRON MR,Gliclazide,60mg,Tablets,30 Film Coated Tablets,ALU-ALU BLISTER PACK,24 Months,LES LABORATOIRES SERVIER,"905 ROUTE DE SARAN, 45520 GIDY, FRANCE",FRANCE,LES LABORATOIRES SERVIER,KIPHARMA LTD,2020-09-15,2025-09-15
3,4,Rwanda FDA-HMP-MA-0023,EYLEA,Aflibercept,40mg/ml,Solution For Injection,1 Vial*2ml,TYPE 1 GLASS VIAL,24 Months,"RAGENERON PHARMACEUTICALS, VETTER PHARMA-FERTI...","Inc.81 Columbia Turnpike, Rensselaer, New York...","USA, GERMANY",BAYER EAST AFRICA LIMITED,SURGIPHARM (RWANDA) LTD,2020-09-15,2025-09-15
4,5,Rwanda FDA-HMP-MA-0024,FLOXSAFE-400,Moxifloxacin,400mg,Tablets,3 Blisters Of 5 Tablets,PVC /PVDC BLISTER PACK,24 Months,MSN LABORATORIES PRIVATE LIMITED (Formulations...,"Plot No. 42, Anrich Industrial Estate, Bollara...",INDIA,MSN LABORATORIES PRIVATE LIMITED,ABACUS PHARMA (A) LTD,2020-09-15,2025-09-15


In [None]:
# Check column names and data types
print("Column names:")
print(df.columns.tolist())
print("\nData types:")
print(df.dtypes)
print("\nMissing values:")
print(df.isnull().sum())


Column names:
['No', 'Registration_No', 'Brand_Name', 'Generic_Name', 'Dosage_Strength', 'Dosage_Form', 'Pack_Size', 'Packaging_Type', 'Shelf_Life', 'Manufacturer_Name', 'Manufacturer_Address', 'Manufacturer_Country', 'Marketing_Authorization_Holder', 'Local_Technical_Representative', 'Registration_Date', 'Expiry_Date']

Data types:
No                                object
Registration_No                   object
Brand_Name                        object
Generic_Name                      object
Dosage_Strength                   object
Dosage_Form                       object
Pack_Size                         object
Packaging_Type                    object
Shelf_Life                        object
Manufacturer_Name                 object
Manufacturer_Address              object
Manufacturer_Country              object
Marketing_Authorization_Holder    object
Local_Technical_Representative    object
Registration_Date                 object
Expiry_Date                       object
dtype: ob

#Feature Engineering Functions

In [None]:
def classify_dosage_form(dosage_form):
    """
    Classify medicine into disposal categories based on dosage form
    Category 1: Solids (Tablets, Capsules, Powders)
    Category 2: Liquids (Solutions, Injections, Syrups)
    Category 3: Semisolids (Creams, Ointments, Gels)
    Category 4: Aerosols and Inhalers
    Category 5: Biological Waste
    """
    form = str(dosage_form).lower()

    if any(word in form for word in ['tablet', 'capsule', 'powder', 'granule']):
        return 1  # Solids
    elif any(word in form for word in ['solution', 'injection', 'liquid', 'syrup', 'suspension', 'infusion']):
        return 2  # Liquids
    elif any(word in form for word in ['cream', 'ointment', 'gel', 'paste', 'lotion']):
        return 3  # Semisolids
    elif any(word in form for word in ['aerosol', 'inhaler', 'spray']):
        return 4  # Aerosols
    elif any(word in form for word in ['vaccine', 'serum', 'blood']):
        return 5  # Biological
    else:
        return 1  # Default to solids

def assess_risk_level(ingredients):
    """
    Assess risk level based on active ingredients
    HIGH: Antineoplastic, Cytotoxic drugs
    MEDIUM: Antibiotics, Controlled substances
    LOW: General medicines
    """
    ingredients = str(ingredients).lower()

    # High-risk drugs
    high_risk = ['methotrexate', 'doxorubicin', 'cisplatin', 'fluorouracil',
                 'cyclophosphamide', 'vincristine', 'paclitaxel']

    # Medium-risk drugs
    medium_risk = ['moxifloxacin', 'ciprofloxacin', 'penicillin', 'cephalosporin',
                   'amoxicillin', 'morphine', 'fentanyl', 'oxycodone', 'diazepam', 'warfarin']

    if any(drug in ingredients for drug in high_risk):
        return 'HIGH'
    elif any(drug in ingredients for drug in medium_risk):
        return 'MEDIUM'
    else:
        return 'LOW'

def check_biodegradable(ingredients):
    """
    Check if liquid medicine is biodegradable
    """
    ingredients = str(ingredients).lower()
    biodegradable_substances = ['vitamin', 'glucose', 'saline', 'amino acid',
                                'sodium chloride', 'dextrose', 'cholecalciferol']

    return any(substance in ingredients for substance in biodegradable_substances)

print("✓ Feature engineering functions created!")

✓ Feature engineering functions created!


#Apply Feature Engineering

In [None]:
# Create features
df['disposal_category'] = df['Dosage_Form'].apply(classify_dosage_form)
df['risk_level'] = df['Generic_Name'].apply(assess_risk_level)
df['is_biodegradable'] = df['Generic_Name'].apply(check_biodegradable)

# Combine text features
df['combined_text'] = (df['Brand_Name'] + ' ' +
                       df['Generic_Name'] + ' ' +
                       df['Dosage_Form'] + ' ' +
                       df['Packaging_Type'])

print("Feature engineering complete!")
print("\nDisposal category distribution:")
print(df['disposal_category'].value_counts())
print("\nRisk level distribution:")
print(df['risk_level'].value_counts())
print("\nBiodegradable medicines:")
print(df['is_biodegradable'].value_counts())

Feature engineering complete!

Disposal category distribution:
disposal_category
1    1602
2     603
3     111
4       6
Name: count, dtype: int64

Risk level distribution:
risk_level
LOW       2250
MEDIUM      61
HIGH        11
Name: count, dtype: int64

Biodegradable medicines:
is_biodegradable
False    2246
True       76
Name: count, dtype: int64


 # Create TF-IDF Features

In [None]:
# Create TF-IDF features
tfidf = TfidfVectorizer(max_features=100, ngram_range=(1, 2))
X_text = tfidf.fit_transform(df['combined_text'])

print(f"✓ TF-IDF feature matrix shape: {X_text.shape}")
print(f"\nTop 20 features:")
print(tfidf.get_feature_names_out()[:20])

#Prepare Training Data

In [None]:
# Prepare features and target
X = X_text.toarray()
y_category = df['disposal_category']
y_risk = df['risk_level']

# Split data (using stratify to maintain class distribution)
X_train, X_test, y_cat_train, y_cat_test, y_risk_train, y_risk_test = train_test_split(
    X, y_category, y_risk, test_size=0.2, random_state=42, stratify=y_category
)

print(f"✓ Training set size: {X_train.shape[0]}")
print(f"✓ Testing set size: {X_test.shape[0]}")

# Train Disposal Category Model

In [None]:
# Model 1: Disposal Category Classifier
category_model = RandomForestClassifier(n_estimators=100, random_state=42)
category_model.fit(X_train, y_cat_train)

# Predictions
y_cat_pred = category_model.predict(X_test)

# Evaluation
print("=" * 60)
print("DISPOSAL CATEGORY CLASSIFICATION RESULTS")
print("=" * 60)
print(f"Accuracy: {accuracy_score(y_cat_test, y_cat_pred):.2%}")
print("\nClassification Report:")
print(classification_report(y_cat_test, y_cat_pred))
print("\nConfusion Matrix:")
print(confusion_matrix(y_cat_test, y_cat_pred))

#Train Risk Level Model

In [None]:
# Model 2: Risk Level Classifier
risk_model = RandomForestClassifier(n_estimators=100, random_state=42)
risk_model.fit(X_train, y_risk_train)

# Predictions
y_risk_pred = risk_model.predict(X_test)

# Evaluation
print("=" * 60)
print("RISK LEVEL CLASSIFICATION RESULTS")
print("=" * 60)
print(f"Accuracy: {accuracy_score(y_risk_test, y_risk_pred):.2%}")
print("\nClassification Report:")
print(classification_report(y_risk_test, y_risk_pred))
print("\nConfusion Matrix:")
print(confusion_matrix(y_risk_test, y_risk_pred))

#Feature Importance

In [None]:
# Get feature importance for category model
feature_importance = pd.DataFrame({
    'feature': tfidf.get_feature_names_out(),
    'importance': category_model.feature_importances_
}).sort_values('importance', ascending=False)

print("Top 15 Most Important Features for Disposal Category:")
print(feature_importance.head(15))

#Create Disposal Guidelines Database

In [None]:
# Complete disposal guidelines database
disposal_guidelines = {
    1: {  # Solids
        'category_name': 'Solids/Semisolids/Powders',
        'steps': [
            {
                'step': 1,
                'title': 'Preparation and Sorting',
                'actions': [
                    'Wear appropriate Personal Protective Equipment (PPE): gloves, mask, and protective gown',
                    'Sort expired or unused tablets separately from regular waste',
                    'Check expiry dates and verify medicine names',
                    'Document the quantity and type of medicine being disposed'
                ]
            },
            {
                'step': 2,
                'title': 'Packaging Removal',
                'actions': [
                    'Remove medicines from outer cardboard or paper packaging',
                    'Keep medicines in their original blister packs or inner packaging',
                    'Do NOT remove individual tablets from blister packs',
                    'Separate packaging materials for recycling if possible'
                ]
            },
            {
                'step': 3,
                'title': 'Container Preparation',
                'actions': [
                    'Use clean plastic drums or steel drums with secure lids',
                    'Label drums clearly: PHARMACEUTICAL WASTE - SOLIDS',
                    'Add date of collection on the label',
                    'If handling large quantities of one drug, mix with other medicines'
                ]
            },
            {
                'step': 4,
                'title': 'Disposal Method',
                'options': [
                    'Encapsulation (preferred for small-medium quantities)',
                    'Inertization (for medium quantities)',
                    'High-temperature incineration (for large quantities or high-risk drugs)',
                    'Landfill (only for properly encapsulated waste)'
                ]
            },
            {
                'step': 5,
                'title': 'Documentation',
                'actions': [
                    'Seal drums securely before transport',
                    'Complete waste transfer documentation',
                    'Transport to authorized disposal facility',
                    'Keep records for minimum 3 years'
                ]
            }
        ],
        'prohibitions': [
            'Do NOT flush tablets down toilet or sink unless it is on flush list',
            'Do NOT burn in open air',
            'Do NOT dispose in regular household waste without treatment',
            'Do NOT mix with infectious or sharp waste'
        ],
        'safety': [
            'Always wear protective equipment',
            'Work in well-ventilated areas',
            'Wash hands thoroughly after handling',
            'Keep medicines away from children during disposal'
        ]
    },
    2: {  # Liquids
        'category_name': 'Liquids',
        'steps': [
            {
                'step': 1,
                'title': 'Initial Assessment',
                'actions': [
                    'Identify if liquid is biodegradable or non-biodegradable',
                    'Check if sewage treatment plant is available',
                    'Wear appropriate PPE: gloves, goggles, mask',
                    'Document quantity in liters or milliliters'
                ]
            },
            {
                'step': 2,
                'title': 'Segregation',
                'actions': [
                    'Separate biodegradable liquids (vitamins, glucose, saline)',
                    'Keep antineoplastic drugs completely separate',
                    'Check container integrity for leaks'
                ]
            },
            {
                'step': 3,
                'title': 'Disposal Method Selection',
                'options': [
                    'Sewer disposal (ONLY for biodegradable liquids with dilution)',
                    'Pit disposal (if no sewer available)',
                    'High-temperature incineration (for non-biodegradable or high-risk)'
                ]
            },
            {
                'step': 4,
                'title': 'Execution',
                'actions': [
                    'For sewer: Dilute with at least 10 parts water',
                    'Pour slowly to avoid splashing',
                    'Flush with additional water',
                    'Rinse containers three times'
                ]
            },
            {
                'step': 5,
                'title': 'Documentation',
                'actions': [
                    'Record type and quantity disposed',
                    'Document disposal method used',
                    'Keep records for 3 years minimum'
                ]
            }
        ],
        'prohibitions': [
            'NEVER dispose antineoplastic drugs in sewer',
            'NEVER dispose antibiotics in sewer',
            'Do NOT pour concentrated medicines directly into sewers',
            'Do NOT reuse medicine containers'
        ],
        'safety': [
            'Prevent splashing during dilution',
            'Ensure adequate ventilation',
            'Have spill kit readily available',
            'Clean spills immediately'
        ]
    }
}

print("✓ Disposal guidelines database created!")

#Create Predictor Class

In [None]:
class MedicineDisposalPredictor:
    def __init__(self, category_model, risk_model, tfidf, guidelines):
        self.category_model = category_model
        self.risk_model = risk_model
        self.tfidf = tfidf
        self.guidelines = guidelines

    def predict(self, product_name, ingredients, dosage_form, packaging):
        """
        Predict disposal category and generate guidelines for a medicine
        """
        # Combine text features
        combined_text = f"{product_name} {ingredients} {dosage_form} {packaging}"

        # Transform text
        X_text = self.tfidf.transform([combined_text]).toarray()

        # Predict category and risk
        category = self.category_model.predict(X_text)[0]
        risk = self.risk_model.predict(X_text)[0]

        # Get guidelines
        guidelines = self.guidelines.get(category, {})

        # Check if biodegradable
        is_biodegradable = check_biodegradable(ingredients)

        result = {
            'medicine_info': {
                'product_name': product_name,
                'ingredients': ingredients,
                'dosage_form': dosage_form,
                'packaging': packaging
            },
            'classification': {
                'disposal_category': int(category),
                'category_name': guidelines.get('category_name', 'Unknown'),
                'risk_level': risk,
                'is_biodegradable': is_biodegradable
            },
            'guidelines': guidelines
        }

        return result

    def print_guidelines(self, result):
        """
        Print formatted disposal guidelines
        """
        print("\n" + "=" * 70)
        print("PHARMACEUTICAL WASTE DISPOSAL GUIDELINES")
        print("=" * 70)

        # Medicine Info
        info = result['medicine_info']
        print(f"\n📦 Medicine: {info['product_name']}")
        print(f"💊 Ingredients: {info['ingredients']}")
        print(f"📋 Dosage Form: {info['dosage_form']}")

        # Classification
        classification = result['classification']
        print(f"\n🏷️ Disposal Category: {classification['disposal_category']} - {classification['category_name']}")
        print(f"⚠️ Risk Level: {classification['risk_level']}")
        print(f"♻️ Biodegradable: {'Yes' if classification['is_biodegradable'] else 'No'}")

        # Guidelines
        guidelines = result['guidelines']

        print("\n" + "-" * 70)
        print("📝 STEP-BY-STEP DISPOSAL PROCEDURE")
        print("-" * 70)

        for step in guidelines.get('steps', []):
            print(f"\n▶️ STEP {step['step']}: {step['title']}")
            if 'actions' in step:
                for action in step['actions']:
                    print(f"  ✓ {action}")
            if 'options' in step:
                print("  Options:")
                for option in step['options']:
                    print(f"    • {option}")

        print("\n" + "-" * 70)
        print("🚫 PROHIBITIONS")
        print("-" * 70)
        for prohibition in guidelines.get('prohibitions', []):
            print(f"  ✗ {prohibition}")

        print("\n" + "-" * 70)
        print("🛡️ SAFETY PRECAUTIONS")
        print("-" * 70)
        for safety in guidelines.get('safety', []):
            print(f"  ⚠️ {safety}")

        print("\n" + "=" * 70)

# Create predictor instance
predictor = MedicineDisposalPredictor(category_model, risk_model, tfidf, disposal_guidelines)

print("✓ Predictor created successfully!")

#Test the Model

In [None]:
# Test with example medicines
test_medicines = [
    {
        'product_name': 'ILET B2',
        'ingredients': 'Glimepiride, Metformin HCl',
        'dosage_form': 'Tablets',
        'packaging': 'ALU-PVC/PVDC BLISTER PACK'
    },
    {
        'product_name': 'EYLEA',
        'ingredients': 'Aflibercept',
        'dosage_form': 'Solution For Injection',
        'packaging': 'TYPE 1 GLASS VIAL'
    },
    {
        'product_name': 'FLOXSAFE-400',
        'ingredients': 'Moxifloxacin',
        'dosage_form': 'Tablets',
        'packaging': 'PVC /PVDC BLISTER PACK'
    }
]

for med in test_medicines:
    result = predictor.predict(
        med['product_name'],
        med['ingredients'],
        med['dosage_form'],
        med['packaging']
    )
    predictor.print_guidelines(result)
    print("\n\n")

#Save Models

In [None]:
# Save all models and components
model_package = {
    'category_model': category_model,
    'risk_model': risk_model,
    'tfidf': tfidf,
    'guidelines': disposal_guidelines,
}

# Save to pickle file
with open('medicine_disposal_model.pkl', 'wb') as f:
    pickle.dump(model_package, f)

print("✓ Models saved successfully to 'medicine_disposal_model.pkl'")

# Save guidelines as JSON for easy access
with open('disposal_guidelines.json', 'w') as f:
    json.dump(disposal_guidelines, f, indent=2)

print("✓ Guidelines saved to 'disposal_guidelines.json'")

# Download files to your computer
from google.colab import files
files.download('medicine_disposal_model.pkl')
files.download('disposal_guidelines.json')

print("\n✅ Files ready for download!")

#Load Saved Model

In [None]:
# Function to load the saved model
def load_model(model_path='medicine_disposal_model.pkl'):
    """
    Load the saved model package
    """
    with open(model_path, 'rb') as f:
        model_package = pickle.load(f)

    predictor = MedicineDisposalPredictor(
        model_package['category_model'],
        model_package['risk_model'],
        model_package['tfidf'],
        model_package['guidelines']
    )

    return predictor

# Example of loading and using the model
loaded_predictor = load_model()
print("✓ Model loaded successfully!")

# Test the loaded model
result = loaded_predictor.predict(
    'PARACETAMOL',
    'Paracetamol',
    'Tablets',
    'BLISTER PACK'
)
loaded_predictor.print_guidelines(result)

#Batch Processing

In [None]:
def batch_process_medicines(dataframe):
    """
    Process multiple medicines from a dataframe and generate guidelines
    """
    results = []

    for idx, row in dataframe.iterrows():
        try:
            result = predictor.predict(
                row['product_name'],
                row['ingredients'],
                row['dosage_form'],
                row['packaging']
            )

            results.append({
                'product_name': row['product_name'],
                'ingredients': row['ingredients'],
                'dosage_form': row['dosage_form'],
                'disposal_category': result['classification']['disposal_category'],
                'category_name': result['classification']['category_name'],
                'risk_level': result['classification']['risk_level'],
                'is_biodegradable': result['classification']['is_biodegradable']
            })

        except Exception as e:
            print(f"❌ Error processing {row['product_name']}: {str(e)}")
            continue

    results_df = pd.DataFrame(results)

    print(f"\n✅ Batch processing complete!")
    print(f"📊 Processed {len(results)} medicines")

    return results_df

# Run batch processing
batch_results = batch_process_medicines(df)
print("\nBatch Results:")
print(batch_results)

# Save results
batch_results.to_csv('disposal_guidelines_batch.csv', index=False)
print("\n✓ Results saved to 'disposal_guidelines_batch.csv'")

# Download
from google.colab import files
files.download('disposal_guidelines_batch.csv')

#Create JSON API Output

In [None]:
def get_disposal_guidelines_json(product_name, ingredients, dosage_form, packaging):
    """
    Get disposal guidelines in JSON format (API-ready)
    """
    result = predictor.predict(product_name, ingredients, dosage_form, packaging)

    # Convert to JSON-friendly format
    json_result = {
        'medicine_info': result['medicine_info'],
        'classification': result['classification'],
        'guidelines': result['guidelines']
    }

    return json.dumps(json_result, indent=2)

# Test the JSON function
json_output = get_disposal_guidelines_json(
    'ILET B2',
    'Glimepiride, Metformin HCl',
    'Tablets',
    'ALU-PVC/PVDC BLISTER PACK'
)

print("API-Ready JSON Output:")
print(json_output[:1000] + "...")

#Model Evaluation Report

In [None]:
# Create comprehensive evaluation report
evaluation_report = f"""
{'=' * 70}
PHARMACEUTICAL DISPOSAL CLASSIFICATION MODEL - EVALUATION REPORT
{'=' * 70}

Dataset Information:
- Total medicines: {len(df)}
- Training set: {len(X_train)}
- Testing set: {len(X_test)}

Disposal Category Distribution:
{df['disposal_category'].value_counts().to_string()}

Risk Level Distribution:
{df['risk_level'].value_counts().to_string()}

Model Performance:
- Category Classification Accuracy: {accuracy_score(y_cat_test, y_cat_pred):.2%}
- Risk Level Classification Accuracy: {accuracy_score(y_risk_test, y_risk_pred):.2%}

Model Features:
✓ Automatic disposal category classification
✓ Risk level assessment (HIGH/MEDIUM/LOW)
✓ Biodegradability detection
✓ Step-by-step disposal guidelines generation
✓ Safety precautions and prohibitions
✓ Batch processing capability
✓ JSON API-ready output

{'=' * 70}
"""

print(evaluation_report)

# Save report
with open('model_evaluation_report.txt', 'w') as f:
    f.write(evaluation_report)

print("\n✓ Evaluation report saved!")

# Download
from google.colab import files
files.download('model_evaluation_report.txt')