### updateECRDatastoreIncidentID
This is the 3rd and final step to update the ECR datastore after receiving new MPI data from LAC, after updating the `iris_id` in the `updateECRDatastoreIrisID` notebook.

This notebook syncs `incident_id`s between the Master Incident Index (MII) and the ECR datastore. As new MII data is made available through the `updateMII` Synapse job, the ECR datastore needs to be updated as well. This notebook updates the `incident_id` column in the ECR datastore if there is an entry in the MII with a corresponding `person_id` and the entry has a positive COVID test within 90 days of the ECR datastore's COVID specimen collection date.


Set up and prep data. Load ECR datastore (`ecr`) and MII delta tables (`mii`). Load the data necessary for identifying positive COVID tests (`covid_test_type_codes`, `covid_positive_results`)

In [None]:
from pyspark.sql import SparkSession
from delta.tables import *
from pyspark.sql.functions import *

account_name = "$STORAGE_ACCOUNT"
ECR_DELTA_TABLE_FILE_PATH = f"abfss://delta-tables@{account_name}.dfs.core.windows.net/ecr-datastore"
MII_DELTA_TABLE_FILE_PATH = f"abfss://patient_data@{account_name}.dfs.core.windows.net/MII.parquet"
COVID_IDENTIFICATION_CONFIG_FILE_PATH = f"abfss://delta-tables@{account_name}.dfs.core.windows.net/covid_identification_config.json"

spark = SparkSession.builder.getOrCreate()

# Read in data
ecr = spark.read.format("delta").load(ECR_DELTA_TABLE_FILE_PATH)
mii = spark.read.format("delta").load(MII_DELTA_TABLE_FILE_PATH).select("incident_id","person_id","specimen_collection_date").withColumnRenamed("incident_id","incident_id_mii").withColumnRenamed("person_id","person_id_mii").withColumnRenamed("specimen_collection_date","specimen_collection_date_mii")

# Covid identification data
df = spark.read.json(COVID_IDENTIFICATION_CONFIG_FILE_PATH, multiLine=True)
covid_test_type_codes = df.select('covid_test_type_codes').rdd.flatMap(lambda x: x).collect()[0]
covid_positive_results = df.select('covid_positive_results').rdd.flatMap(lambda x: x).collect()[0]

Create a `comparison_date` column in the ECR datastore. The ECR datastore contains 20 tests and associated specimen collection dates. When updating the `incident_id`, we are only concerned with positive, COVID tests and thus want to use the specimen collection date associated with positive COVID tests only. This block checks each of the tests to see if they are a COVID test (i.e., `test_type_code` is in the list of `covid_test_types_codes`) and whether the test is positive (i.e., the `test_result` is in the list of `covid_positive_results`).

In [None]:
# Add `comparison_date` column to ecr data ahead of join with mii to find positive covid tests
ecr = ecr.withColumn("comparison_date",
    when((lower(ecr.test_type_code_1).isin(covid_test_type_codes) & lower(ecr.test_result_1).isin(covid_positive_results)), ecr.specimen_collection_date_1)
    .when((lower(ecr.test_type_code_2).isin(covid_test_type_codes) & lower(ecr.test_result_2).isin(covid_positive_results)), ecr.specimen_collection_date_2)
    .when((lower(ecr.test_type_code_3).isin(covid_test_type_codes) & lower(ecr.test_result_3).isin(covid_positive_results)), ecr.specimen_collection_date_3)
    .when((lower(ecr.test_type_code_4).isin(covid_test_type_codes) & lower(ecr.test_result_4).isin(covid_positive_results)), ecr.specimen_collection_date_4)
    .when((lower(ecr.test_type_code_5).isin(covid_test_type_codes) & lower(ecr.test_result_5).isin(covid_positive_results)), ecr.specimen_collection_date_5)
    .when((lower(ecr.test_type_code_6).isin(covid_test_type_codes) & lower(ecr.test_result_6).isin(covid_positive_results)), ecr.specimen_collection_date_6)
    .when((lower(ecr.test_type_code_7).isin(covid_test_type_codes) & lower(ecr.test_result_7).isin(covid_positive_results)), ecr.specimen_collection_date_7)
    .when((lower(ecr.test_type_code_8).isin(covid_test_type_codes) & lower(ecr.test_result_8).isin(covid_positive_results)), ecr.specimen_collection_date_8)
    .when((lower(ecr.test_type_code_9).isin(covid_test_type_codes) & lower(ecr.test_result_9).isin(covid_positive_results)), ecr.specimen_collection_date_9)
    .when((lower(ecr.test_type_code_10).isin(covid_test_type_codes) & lower(ecr.test_result_10).isin(covid_positive_results)), ecr.specimen_collection_date_10)
    .when((lower(ecr.test_type_code_11).isin(covid_test_type_codes) & lower(ecr.test_result_11).isin(covid_positive_results)), ecr.specimen_collection_date_11)
    .when((lower(ecr.test_type_code_12).isin(covid_test_type_codes) & lower(ecr.test_result_12).isin(covid_positive_results)), ecr.specimen_collection_date_12)
    .when((lower(ecr.test_type_code_12).isin(covid_test_type_codes) & lower(ecr.test_result_13).isin(covid_positive_results)), ecr.specimen_collection_date_13)
    .when((lower(ecr.test_type_code_14).isin(covid_test_type_codes) & lower(ecr.test_result_14).isin(covid_positive_results)), ecr.specimen_collection_date_14)
    .when((lower(ecr.test_type_code_15).isin(covid_test_type_codes) & lower(ecr.test_result_15).isin(covid_positive_results)), ecr.specimen_collection_date_15)
    .when((lower(ecr.test_type_code_16).isin(covid_test_type_codes) & lower(ecr.test_result_16).isin(covid_positive_results)), ecr.specimen_collection_date_16)
    .when((lower(ecr.test_type_code_17).isin(covid_test_type_codes) & lower(ecr.test_result_17).isin(covid_positive_results)), ecr.specimen_collection_date_17)
    .when((lower(ecr.test_type_code_18).isin(covid_test_type_codes) & lower(ecr.test_result_18).isin(covid_positive_results)), ecr.specimen_collection_date_18)
    .when((lower(ecr.test_type_code_19).isin(covid_test_type_codes) & lower(ecr.test_result_19).isin(covid_positive_results)), ecr.specimen_collection_date_19)
    .when((lower(ecr.test_type_code_20).isin(covid_test_type_codes) & lower(ecr.test_result_20).isin(covid_positive_results)), ecr.specimen_collection_date_20)
    .otherwise(lit(None))
)

Join the MII and ECR Datastore where the IDs match and the MII speciment collection date is within 90 days of the ECR `comparison_date` selected in the previous cell to assemble the updates for the ECR datastore.

In [None]:
# Join MII and ECR to get ecr updates (positive covid tests)
ecr_updates = ecr.join(mii,((ecr.iris_id ==  mii.person_id_mii) & (datediff(ecr.comparison_date,mii.specimen_collection_date_mii) <= 90)),"inner").select("iris_id","incident_id_mii")
ecr_updates = ecr_updates.toDF("iris_id","incident_id_mii")


Load the ECR datastore (`ecr_main`) and merge in the updates (`ecr_updates`) such that when a match is found (e.g., a new positive COVID result within 90 days), the `incident_id` column in the ECR datastore is updated.

In [None]:
# Load ecr delta table
ecr_main = DeltaTable.forPath(spark,ECR_DELTA_TABLE_FILE_PATH)

# Merge in ecr updates such that the incident_id is updated
ecr_main.alias("ecr") \
  .merge(
    ecr_updates.alias("ecr_updates"),
    "ecr.person_id = ecr_updates.iris_id") \
  .whenMatchedUpdate(set = {"incident_id": "ecr_updates.incident_id_mii","incident_id_date_added": date_format(current_timestamp(), 'yyyy-MM-dd') }) \
  .execute()
