### Healthcare – Patient Data Accuracy

**Task 1**: Patient Record Accuracy Assessment

**Objective**: Achieve high accuracy in patient records.

**Steps**:
1. Examine a sample patient dataset for common inaccuracies.
2. Identify at least three common issues, such as medication errors or misdiagnoses.
3. Propose validation measures to ensure data accuracy at the point of entry.

In [1]:
import pandas as pd
from great_expectations.dataset import PandasDataset

class PatientDataset(PandasDataset):
    def validate_patient_data(self):
        self.expect_column_values_to_not_be_null("patient_id")
        self.expect_column_values_to_be_unique("patient_id")
        self.expect_column_values_to_not_be_null("date_of_birth")
        self.expect_column_values_to_match_regex("date_of_birth", r"\d{4}-\d{2}-\d{2}")
        self.expect_column_values_to_be_in_set("gender", ["Male", "Female", "Other"])
        self.expect_column_values_to_not_be_null("diagnosis_code")
        self.expect_column_values_to_be_in_set("diagnosis_code", ["A01", "B02", "C03", "D04"])  # Example ICD codes
        self.expect_column_values_to_not_be_null("medication")
        self.expect_column_values_to_be_in_set("medication", ["DrugA", "DrugB", "DrugC"])  # Example meds

# Sample patient data
data = {
    "patient_id": [1, 2, 3, 3],
    "date_of_birth": ["1980-05-01", "1990-07-15", None, "1975-02-28"],
    "gender": ["Male", "Female", "Other", "Unknown"],
    "diagnosis_code": ["A01", "B02", "X99", "C03"],
    "medication": ["DrugA", "DrugB", "DrugX", None]
}

df = PatientDataset(pd.DataFrame(data))

results = df.validate_patient_data()
results









**Task 2**: Implement Healthcare Data Quality Checks

**Objective**: Maintain accurate health records within a healthcare system.

**Steps**:
1. Develop a validation workflow for patient data.
2. Use appropriate software to automate checks for common errors.

In [2]:
import pandas as pd
import great_expectations as ge

def healthcare_data_quality_checks(df: pd.DataFrame):
    dataset = ge.from_pandas(df)
    dataset.expect_column_values_to_not_be_null("patient_id")
    dataset.expect_column_values_to_be_unique("patient_id")
    dataset.expect_column_values_to_not_be_null("date_of_birth")
    dataset.expect_column_values_to_match_regex("date_of_birth", r"^\d{4}-\d{2}-\d{2}$")
    dataset.expect_column_values_to_be_in_set("gender", ["Male", "Female", "Other"])
    dataset.expect_column_values_to_not_be_null("diagnosis_code")
    dataset.expect_column_values_to_be_in_set("diagnosis_code", ["A01", "B02", "C03", "D04"])  # example valid codes
    dataset.expect_column_values_to_not_be_null("medication")
    dataset.expect_column_values_to_be_in_set("medication", ["DrugA", "DrugB", "DrugC"])  # example meds
    return dataset.validate()

data = {
    "patient_id": [1, 2, 3, 4],
    "date_of_birth": ["1985-01-01", "1990-12-12", "1975-07-07", "invalid-date"],
    "gender": ["Male", "Female", "Other", "Unknown"],
    "diagnosis_code": ["A01", "B02", "C03", "X99"],
    "medication": ["DrugA", None, "DrugB", "DrugX"]
}

df = pd.DataFrame(data)
result = healthcare_data_quality_checks(df)
print(result)


{
  "success": false,
  "results": [
    {
      "success": true,
      "expectation_config": {
        "expectation_type": "expect_column_values_to_not_be_null",
        "kwargs": {
          "column": "patient_id",
          "result_format": "BASIC"
        },
        "meta": {}
      },
      "result": {
        "element_count": 4,
        "unexpected_count": 0,
        "unexpected_percent": 0.0,
        "unexpected_percent_total": 0.0,
        "partial_unexpected_list": []
      },
      "meta": {},
      "exception_info": {
        "raised_exception": false,
        "exception_message": null,
        "exception_traceback": null
      }
    },
    {
      "success": true,
      "expectation_config": {
        "expectation_type": "expect_column_values_to_be_unique",
        "kwargs": {
          "column": "patient_id",
          "result_format": "BASIC"
        },
        "meta": {}
      },
      "result": {
        "element_count": 4,
        "missing_count": 0,
        "missing_p