### Healthcare – Patient Data Accuracy

**Task 1**: Patient Record Accuracy Assessment

**Objective**: Achieve high accuracy in patient records.

**Steps**:
1. Examine a sample patient dataset for common inaccuracies.
2. Identify at least three common issues, such as medication errors or misdiagnoses.
3. Propose validation measures to ensure data accuracy at the point of entry.

In [1]:
import pandas as pd

# Sample patient dataset simulation
data = {
    'patient_id': [101, 102, 103, 104, 105],
    'age': [25, 130, 40, -1, 60],  # 130 and -1 are invalid ages
    'diagnosis_code': ['D01', 'D99', 'X15', 'D03', None],  # 'X15' invalid, None missing
    'medication': ['MedA', 'MedB', 'MedC', None, 'MedE'],  # None = missing medication
    'lab_result': [4.5, 7.1, 8.2, 3.0, -1],  # -1 invalid lab result
}

df = pd.DataFrame(data)

# Valid diagnosis codes (simplified)
valid_diagnosis_codes = {'D01', 'D02', 'D03', 'D04', 'D99'}

# Validation checks
def check_age(age):
    return 0 <= age <= 120

def check_diagnosis_code(code):
    return code in valid_diagnosis_codes

def check_medication(med):
    return med is not None and med != ''

def check_lab_result(result):
    return 0 <= result <= 10  # assuming lab result range

# Identify inaccuracies
df['age_valid'] = df['age'].apply(check_age)
df['diagnosis_valid'] = df['diagnosis_code'].apply(lambda x: check_diagnosis_code(x) if pd.notna(x) else False)
df['medication_valid'] = df['medication'].apply(check_medication)
df['lab_result_valid'] = df['lab_result'].apply(check_lab_result)

# Summary of issues
print("Patient Record Accuracy Assessment:")
print(df[['patient_id', 'age_valid', 'diagnosis_valid', 'medication_valid', 'lab_result_valid']])

# Common issues found
print("\nCommon Issues:")
print(f"Invalid ages: {df[~df['age_valid']]['patient_id'].tolist()}")
print(f"Invalid diagnosis codes: {df[~df['diagnosis_valid']]['patient_id'].tolist()}")
print(f"Missing medications: {df[~df['medication_valid']]['patient_id'].tolist()}")
print(f"Invalid lab results: {df[~df['lab_result_valid']]['patient_id'].tolist()}")

# Validation measures (as comments):
'''
Validation measures to ensure accuracy at point of entry:
1. Age should be validated to be within realistic human range (0-120).
2. Diagnosis codes must be selected from a controlled vocabulary or lookup table.
3. Medication entries must not be empty; enforce mandatory fields.
4. Lab results should have plausible value ranges and be cross-checked.
5. Use dropdowns and controlled input in UI to reduce manual entry errors.
6. Real-time validation and alerts on invalid inputs.
'''

Patient Record Accuracy Assessment:
   patient_id  age_valid  diagnosis_valid  medication_valid  lab_result_valid
0         101       True             True              True              True
1         102      False             True              True              True
2         103       True            False              True              True
3         104      False             True             False              True
4         105       True            False              True             False

Common Issues:
Invalid ages: [102, 104]
Invalid diagnosis codes: [103, 105]
Missing medications: [104]
Invalid lab results: [105]


'\nValidation measures to ensure accuracy at point of entry:\n1. Age should be validated to be within realistic human range (0-120).\n2. Diagnosis codes must be selected from a controlled vocabulary or lookup table.\n3. Medication entries must not be empty; enforce mandatory fields.\n4. Lab results should have plausible value ranges and be cross-checked.\n5. Use dropdowns and controlled input in UI to reduce manual entry errors.\n6. Real-time validation and alerts on invalid inputs.\n'

**Task 2**: Implement Healthcare Data Quality Checks

**Objective**: Maintain accurate health records within a healthcare system.

**Steps**:
1. Develop a validation workflow for patient data.
2. Use appropriate software to automate checks for common errors.

In [2]:
import pandas as pd

# Sample patient dataset (with some errors)
data = {
    'patient_id': [101, 102, 103, 104, 105],
    'age': [25, 130, 40, -1, 60],  # 130 and -1 invalid
    'diagnosis_code': ['D01', 'D99', 'X15', 'D03', None],  # 'X15' invalid, None missing
    'medication': ['MedA', 'MedB', 'MedC', None, 'MedE'],  # None missing
    'lab_result': [4.5, 7.1, 8.2, 3.0, -1],  # -1 invalid
}

df = pd.DataFrame(data)

# Controlled vocabularies and valid ranges
valid_diagnosis_codes = {'D01', 'D02', 'D03', 'D04', 'D99'}
min_age, max_age = 0, 120
lab_result_min, lab_result_max = 0, 10

def validate_patient_record(row):
    errors = []
    
    if not (min_age <= row['age'] <= max_age):
        errors.append(f"Invalid age: {row['age']}")
    
    if pd.isna(row['diagnosis_code']) or row['diagnosis_code'] not in valid_diagnosis_codes:
        errors.append(f"Invalid or missing diagnosis code: {row['diagnosis_code']}")
    
    if pd.isna(row['medication']) or row['medication'] == '':
        errors.append("Missing medication information")
    
    if not (lab_result_min <= row['lab_result'] <= lab_result_max):
        errors.append(f"Invalid lab result: {row['lab_result']}")
    
    return errors

# Apply validation workflow
df['validation_errors'] = df.apply(validate_patient_record, axis=1)

# Extract records with errors
invalid_records = df[df['validation_errors'].map(len) > 0]

print("Healthcare Data Quality Validation Report:\n")
if invalid_records.empty:
    print("All patient records are valid.")
else:
    for idx, row in invalid_records.iterrows():
        print(f"Patient ID: {row['patient_id']}")
        for err in row['validation_errors']:
            print(f" - {err}")
        print()

# Summary of issues count
error_types = ['Invalid age', 'Invalid or missing diagnosis code', 'Missing medication', 'Invalid lab result']
error_counts = {key:0 for key in error_types}

for errors in df['validation_errors']:
    for e in errors:
        if "age" in e:
            error_counts['Invalid age'] += 1
        elif "diagnosis code" in e:
            error_counts['Invalid or missing diagnosis code'] += 1
        elif "medication" in e:
            error_counts['Missing medication'] += 1
        elif "lab result" in e:
            error_counts['Invalid lab result'] += 1

print("Summary of Issues Found:")
for issue, count in error_counts.items():
    print(f"{issue}: {count}")

Healthcare Data Quality Validation Report:

Patient ID: 102
 - Invalid age: 130

Patient ID: 103
 - Invalid or missing diagnosis code: X15

Patient ID: 104
 - Invalid age: -1
 - Missing medication information

Patient ID: 105
 - Invalid or missing diagnosis code: None
 - Invalid lab result: -1.0

Summary of Issues Found:
Invalid age: 2
Invalid or missing diagnosis code: 2
Missing medication: 1
Invalid lab result: 1
