### Healthcare – Patient Data Accuracy

**Task 1**: Patient Record Accuracy Assessment

**Objective**: Achieve high accuracy in patient records.

**Steps**:
1. Examine a sample patient dataset for common inaccuracies.
2. Identify at least three common issues, such as medication errors or misdiagnoses.
3. Propose validation measures to ensure data accuracy at the point of entry.

In [None]:
# Write your code from here

In [1]:
import pandas as pd

# Sample patient dataset
data = {
    'PatientID': [1, 2, 3, 4],
    'Name': ['John Doe', 'Jane Smith', 'Alice Brown', 'Bob White'],
    'Age': [29, 34, -1, 45],  # Negative age is an issue
    'Medication': ['DrugA', 'DrugB', 'DrugC', None],  # Missing medication is an issue
    'Diagnosis': ['Flu', 'Cold', 'Flu', '']  # Empty diagnosis is an issue
}

# Create a DataFrame
patient_df = pd.DataFrame(data)

# Display the dataset
print("Sample Patient Dataset:")
print(patient_df)

# Identify common issues
print("\nIdentifying Common Issues:")
if (patient_df['Age'] < 0).any():
    print("- Negative age values found.")
if patient_df['Medication'].isnull().any():
    print("- Missing medication entries found.")
if patient_df['Diagnosis'].str.strip().eq('').any():
    print("- Empty diagnosis entries found.")

Sample Patient Dataset:
   PatientID         Name  Age Medication Diagnosis
0          1     John Doe   29      DrugA       Flu
1          2   Jane Smith   34      DrugB      Cold
2          3  Alice Brown   -1      DrugC       Flu
3          4    Bob White   45       None          

Identifying Common Issues:
- Negative age values found.
- Missing medication entries found.
- Empty diagnosis entries found.


**Task 2**: Implement Healthcare Data Quality Checks

**Objective**: Maintain accurate health records within a healthcare system.

**Steps**:
1. Develop a validation workflow for patient data.
2. Use appropriate software to automate checks for common errors.

In [None]:
# Write your code from here


In [2]:
# Data Quality Checks and Fixes

# Fix negative age values by replacing them with NaN
patient_df['Age'] = patient_df['Age'].apply(lambda x: x if x >= 0 else None)

# Fill missing medication entries with a placeholder value
patient_df['Medication'] = patient_df['Medication'].fillna('Unknown')

# Replace empty diagnosis entries with a placeholder value
patient_df['Diagnosis'] = patient_df['Diagnosis'].replace('', 'Unknown')

# Display the cleaned dataset
print("Cleaned Patient Dataset:")
print(patient_df)

# Validate the dataset
print("\nValidation Checks:")
if (patient_df['Age'] < 0).any():
    print("- Negative age values still exist.")
else:
    print("- No negative age values found.")

if patient_df['Medication'].isnull().any():
    print("- Missing medication entries still exist.")
else:
    print("- No missing medication entries found.")

if patient_df['Diagnosis'].str.strip().eq('').any():
    print("- Empty diagnosis entries still exist.")
else:
    print("- No empty diagnosis entries found.")

Cleaned Patient Dataset:
   PatientID         Name   Age Medication Diagnosis
0          1     John Doe  29.0      DrugA       Flu
1          2   Jane Smith  34.0      DrugB      Cold
2          3  Alice Brown   NaN      DrugC       Flu
3          4    Bob White  45.0    Unknown   Unknown

Validation Checks:
- No negative age values found.
- No missing medication entries found.
- No empty diagnosis entries found.
