In [None]:
%load_ext autoreload
%autoreload 2

# HDRUK Acute Admissions Feature Engineering Part 1

This is an example using code available at https://github.com/lthtr-dst/hdruk_avoidable_admissions

Under development and there are bound to be breaking changes and bugs. 

Please see commit history, ensure you have the latest clone of the repo, have updated your conda environment using the environment.yaml file.

Please raise an issue if you find a bug or have a question.

The synthetic data was generated using the SDV module - Synthetic Data Vault (https://github.com/sdv-dev/SDV) by @vvcb

## Admitted Care

In [None]:
# Import Packages

import numpy as np
import pandas as pd

import avoidable_admissions as aa
from pandas_profiling import ProfileReport

In [None]:
# Import Data. Typically df is used as a moniker for a dataframe. The dytype argument can be droped if you want (it coerces all the data to strings)

df = pd.read_csv("synthetic_data/sdv_hdruk_admitted_care_synthetic_data.csv")

In [None]:
# Looking much better this time

df.columns

In [None]:
# Lets create a copy

dfa = df.copy()

## First Validation

In [None]:
# And run the first validation

good, bad = aa.data.validate.validate_admitted_care_data(dfa)

In [None]:
# Now we can print the results

print(f"""
Total number of rows in input data   : {dfa.shape[0]}
Number of rows that passed validation: {good.shape[0]}
Number of rows that failed validation: {bad.shape[0]}
""")

In [None]:
# Here are our failure cases

bad[["schema_context", "column", "check", "check_number", "failure_case", "index"]]

## Feature Engineering

In [None]:
# Now we can try to engineer the features

dfa_features = aa.features.build_features.build_admitted_care_features(good.copy())

## Second validation

In [None]:
# Now we are in a position to try and re-run the validation - second run

good, bad = aa.data.validate.validate_admitted_care_features(dfa_features)
print(f"""
Total number of rows in input data   : {dfa_features.shape[0]}
Number of rows that passed validation: {good.shape[0]}
Number of rows that failed validation: {bad.shape[0]}
""")

## Pandas Profiling

We can now generate a pandas profiling report which allows us to visually assess the quality of our dataset at the click of a button before we try to do any analytical transformation

In [None]:
# Now we have our good data we can build a profile report from it
profile = ProfileReport(
    good, title="Pandas Profiling Report"
)

In [None]:
# We can then display that report inline within the browser

profile

# Harmonization Phase 2

We are now in a good position to start work on the ED dataset