### Healthcare – Patient Data Accuracy

**Task 1**: Patient Record Accuracy Assessment

**Objective**: Achieve high accuracy in patient records.

**Steps**:
1. Examine a sample patient dataset for common inaccuracies.
2. Identify at least three common issues, such as medication errors or misdiagnoses.
3. Propose validation measures to ensure data accuracy at the point of entry.

In [1]:
# Write your code from here
import pandas as pd

# Sample patient data
data = {
    'Patient_ID': ['P001', 'P002', 'P003', 'P004', 'P005', None],
    'Name': ['Alice', 'Bob', 'Charlie', 'Dana', 'Eve', 'Frank'],
    'DOB': ['1985-05-01', '1992-07-15', '2001-02-30', '1988-13-01', '1970-10-20', '1990-06-10'],
    'Diagnosis': ['Diabetes', 'Hypertension', 'Hypertension', None, 'Asthma', 'Diabetes'],
    'Medications': ['Metformin', '', 'Lisinopril', 'Lisinopril', 'Ventolin', 'Insulin'],
    'Gender': ['F', 'M', 'M', 'F', 'Unknown', 'M']
}

df = pd.DataFrame(data)
def find_patient_record_issues(df):
    issues = {
        'missing_patient_ids': df[df['Patient_ID'].isnull()],
        'invalid_dates': df[~pd.to_datetime(df['DOB'], errors='coerce').notnull()],
        'missing_diagnosis': df[df['Diagnosis'].isnull()],
        'empty_medications': df[df['Medications'].isnull() | (df['Medications'] == '')],
        'invalid_gender': df[~df['Gender'].isin(['M', 'F'])],
        'duplicate_records': df[df.duplicated(subset=['Name', 'DOB'], keep=False)]
    }
    return issues

issues = find_patient_record_issues(df)
for key, problem in issues.items():
    print(f"\n❌ {key.upper()}:")
    print(problem.to_string(index=False) if not problem.empty else "✅ No issues found.")



❌ MISSING_PATIENT_IDS:
Patient_ID  Name        DOB Diagnosis Medications Gender
      None Frank 1990-06-10  Diabetes     Insulin      M

❌ INVALID_DATES:
Patient_ID    Name        DOB    Diagnosis Medications Gender
      P003 Charlie 2001-02-30 Hypertension  Lisinopril      M
      P004    Dana 1988-13-01         None  Lisinopril      F

❌ MISSING_DIAGNOSIS:
Patient_ID Name        DOB Diagnosis Medications Gender
      P004 Dana 1988-13-01      None  Lisinopril      F

❌ EMPTY_MEDICATIONS:
Patient_ID Name        DOB    Diagnosis Medications Gender
      P002  Bob 1992-07-15 Hypertension                  M

❌ INVALID_GENDER:
Patient_ID Name        DOB Diagnosis Medications  Gender
      P005  Eve 1970-10-20    Asthma    Ventolin Unknown

❌ DUPLICATE_RECORDS:
✅ No issues found.


**Task 2**: Implement Healthcare Data Quality Checks

**Objective**: Maintain accurate health records within a healthcare system.

**Steps**:
1. Develop a validation workflow for patient data.
2. Use appropriate software to automate checks for common errors.

In [2]:
# Write your code from here
import pandas as pd
from datetime import datetime
from flask import Flask, request, jsonify
import threading

# ------------------------
# Validation Logic
# ------------------------

def validate_patient_records(df):
    required_fields = ['Patient_ID', 'Name', 'DOB', 'Diagnosis', 'Medications', 'Gender']
    today = datetime.today().date()

    # Try converting DOB to datetime
    df['DOB'] = pd.to_datetime(df['DOB'], errors='coerce')

    issues = {
        'missing_fields': df[df[required_fields].isnull().any(axis=1)],
        'invalid_dob': df[df['DOB'].isnull() | (df['DOB'].dt.date > today)],
        'invalid_gender': df[~df['Gender'].isin(['M', 'F'])],
        'missing_medications': df[df['Medications'].isnull() | (df['Medications'].str.strip() == '')],
        'diabetes_without_insulin': df[
            (df['Diagnosis'].str.contains('Diabetes', na=False)) &
            (~df['Medications'].str.lower().str.contains('insulin|metformin', na=False))
        ],
        'duplicate_patients': df[df.duplicated(subset=['Name', 'DOB'], keep=False)],
    }

    return issues

# ------------------------
# Sample Data
# ------------------------

def run_batch_check():
    sample = pd.DataFrame({
        'Patient_ID': ['P001', 'P002', 'P003', 'P003', None],
        'Name': ['Alice', 'Bob', 'Charlie', 'Charlie', 'Eve'],
        'DOB': ['1985-05-01', '1992-07-15', '2001-02-30', '2001-02-30', '2026-01-01'],
        'Diagnosis': ['Diabetes', 'Hypertension', 'Diabetes', 'Diabetes', 'Asthma'],
        'Medications': ['Metformin', '', 'None', 'Insulin', 'Ventolin'],
        'Gender': ['F', 'M', 'Unknown', 'M', 'F']
    })

    print("\n🔍 Running Batch Patient Data Quality Checks...\n")
    issues = validate_patient_records(sample)
    for check, df in issues.items():
        print(f"\n❌ {check.upper()}:")
        print(df.to_string(index=False) if not df.empty else "✅ Passed")

# ------------------------
# Real-time Validation API
# ------------------------

app = Flask(__name__)

@app.route('/validate_patient', methods=['POST'])
def validate_single_patient():
    record = request.json
    errors = []

    # Required fields
    required = ['Patient_ID', 'Name', 'DOB', 'Diagnosis', 'Medications', 'Gender']
    for field in required:
        if not record.get(field):
            errors.append(f"{field} is required.")

    # DOB check
    try:
        dob = datetime.strptime(record['DOB'], "%Y-%m-%d").date()
        if dob > datetime.today().date():
            errors.append("DOB cannot be in the future.")
    except Exception:
        errors.append("DOB must be in YYYY-MM-DD format.")

    # Gender check
    if record.get('Gender') not in ['M', 'F']:
        errors.append("Gender must be 'M' or 'F'.")

    # Diabetes medication rule
    if record.get('Diagnosis', '').lower() == 'diabetes':
        meds = record.get('Medications', '').lower()
        if 'insulin' not in meds and 'metformin' not in meds:
            errors.append("Diabetic patients must have insulin or metformin.")

    return jsonify({'valid': not errors, 'errors': errors})

# ------------------------
# Main Execution
# ------------------------

if __name__ == "__main__":
    # Run batch validation
    threading.Thread(target=run_batch_check).start()

    # Start real-time validation API
    print("\n🚑 Real-time Patient Validation API running at http://localhost:5000/validate_patient")
    app.run(debug=False)




🔍 Running Batch Patient Data Quality Checks...


🚑 Real-time Patient Validation API running at http://localhost:5000/validate_patient

❌ MISSING_FIELDS:
Patient_ID    Name        DOB Diagnosis Medications  Gender
      P003 Charlie        NaT  Diabetes        None Unknown
      P003 Charlie        NaT  Diabetes     Insulin       M
      None     Eve 2026-01-01    Asthma    Ventolin       F

❌ INVALID_DOB:
Patient_ID    Name        DOB Diagnosis Medications  Gender
      P003 Charlie        NaT  Diabetes        None Unknown
      P003 Charlie        NaT  Diabetes     Insulin       M
      None     Eve 2026-01-01    Asthma    Ventolin       F

❌ INVALID_GENDER:
Patient_ID    Name DOB Diagnosis Medications  Gender
      P003 Charlie NaT  Diabetes        None Unknown

❌ MISSING_MEDICATIONS:
Patient_ID Name        DOB    Diagnosis Medications Gender
      P002  Bob 1992-07-15 Hypertension                  M

❌ DIABETES_WITHOUT_INSULIN:
Patient_ID    Name DOB Diagnosis Medications  Gender
 

 * Running on http://127.0.0.1:5000
[33mPress CTRL+C to quit[0m
127.0.0.1 - - [17/May/2025 13:36:56] "[33mGET / HTTP/1.1[0m" 404 -
127.0.0.1 - - [17/May/2025 13:36:56] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
