# Memberships silver

Checks to be performed:
- Primary key is unique
- Foreign key refers to existing team
- Time information can be correctly converted to Timestamp type

Enrichment: 
- Date information extracted from create_at value

Data is to be stored in SCD2 table; so that is possible to track coming and going from a team and identify if a subscription is the first or not. For this table the assumption is that, if a membership is not in the source file, it means that person has exited the team.

N.B. In UC primary and foreign keys can also be enforced directly on the tables, but then the write operation fails or succeed altogether, with this approach it would be possible to identify single rows that dont satisfy the conditions; the natural evolution of this could be a DLT implementation. As mentioned in the data quality module, the next improvement for this would be using DBX programmaticaly.

In [0]:

from modules.data_quality import (check_data_quality_id,
                                  check_data_quality_foreign_keys,
                                  check_data_quality_timestamps,
                                  check_data_quality_memberships_table)
from modules.enrichment import create_integer_datekeys
from modules.write import (add_scd2_columns,
                           identify_lines_SCD2,
                           deactivate_rows_SCD2,
                           append_df_to_table)


In [0]:
source_catalog = "hive_metastore"
source_schema = "default"
source_table_name = "memberships"

target_catalog = "hive_metastore"
target_schema = "default"
target_table_name = "memberships_silver"
cross_check_table_name = "teams_silver"

source_table_reference = source_catalog + "." + source_schema + "." +  source_table_name

cross_check_teams_reference = target_catalog + "." + target_schema + "." + cross_check_table_name
target_table_reference = target_catalog + "." + target_schema + "." + target_table_name

In [0]:
memberships_df = spark.table(source_table_reference)
teams_df = spark.table(cross_check_teams_reference)
target_table_df = spark.table(target_table_reference)

In [0]:
memberships_df, bad_formed_df = check_data_quality_id(memberships_df,"membership_id")

In [0]:
check_list = [{"foreign_key_column":"group_id",
              "cross_check_table": teams_df,
              "cross_check_primary_key_column": "team_id"}]
memberships_df, bad_formed_df = check_data_quality_foreign_keys(memberships_df, check_list)

In [0]:
memberships_df, bad_formed_df = check_data_quality_timestamps(memberships_df,["joined_at"],"membership_id")

In [0]:
memberships_df, bad_formed_df = check_data_quality_memberships_table(memberships_df)

In [0]:
memberships_df = create_integer_datekeys(memberships_df,["joined_at"])

In [0]:
memberships_df = add_scd2_columns(memberships_df, "joined_at")

In [0]:
new_and_updated_df, to_be_deactivated_df = identify_lines_SCD2(memberships_df,target_table_df,["membership_id"])


In [0]:
deactivate_rows_SCD2(spark,new_and_updated_df, to_be_deactivated_df, target_table_reference, ["membership_id"])

In [0]:
append_df_to_table(new_and_updated_df, target_table_reference)