#### Change Data Capture Template
- This folder (CDC_SQL_TEMPLATE) contains notebook and files that work together to load tables into the standardized zone.
- The process works by extracting data from source tables, running data quality checks and validations on this data, and, if the data is determined to be valid, loads this data into a new table in the standardized zone.
- The files included in the CDC_SQL_TEMPLATE folder are as follows:
  - **cdc_sql_template**: This is the main driver notebook for this process. This notebook uses the fields entered by the user to build the new table's schema, and a query provided by the user to source the data to be loaded into the table. This notebook also performs data quaity checks and validations to verify the data's integrity before loading. 
    - **NOTE: DO NOT MAKE ANY CHANGES TO THE cdc_sql_template NOTEBOOK**
  - **index.yaml**: This file is where the user will declare the new table's schema. That is, the schema where the table will reside, the table's name, comments, column names, primary keys and null constraints.
  - **extract_config.yaml**: This file is where the user will enter the query used to obtain the source data that will be loaded into the new table.
  - **great_expectations_config**: This notebook is used to add expectations to your expectation suite using great expectations. These expectations verify that the source data looks how you would expect. 

#### Instructions for use
(refer to [writing_messy_code](https://confluence.healthpartners.com/confluence/pages/viewpage.action?pageId=297468803) if you are unfamilliar with using config and yaml files in databricks)
1. Before you begin, verify you are in the dev environment (you should see dbw-cus-dev-dlh-hpintegrationworkspace...)
1. Create a new branch (branch off of main) in your own repo where you will create the new table.
1. Clone the **CDC_SQL_TEMPLATE** [folder](https://adb-1501466301957626.6.azuredatabricks.net/browse/folders/1188583638727940?o=1501466301957626#) into the folder of the schema where the table will land. (NOTE: While notebooks and folders are copied when you clone, files are not. Additional files will need to be exported and imported in later steps.)
2. Rename the cloned folder to name of your new table.
3. Switch back to the config [folder](https://adb-1501466301957626.6.azuredatabricks.net/browse/folders/2445972269864874?o=1501466301957626) and export both the index.yaml and extract_config.yaml.
4. Next, import both the **index.yaml** and **extract_config.yaml** file into the new config folder within the folder you cloned (the one that was cloned).
  - make sure the files are named **index.yaml** and **extract_config.yaml** or rename if necessary. (sometimes export will add trailing underscores etc.)
5. Switch to index.yaml and fill in your table attributes according to the instructions in the file. These attributes will be used to create your new target table and subsequent load/merge process.
6. Use SQL Editor to reate your SQL Select query that with the desired data. The results from this query will be used to merge into your new target table. Verify that this query executes without errors and returns the expected results.
  *NOTE: THIS QUERY MUST SOURCE DATA FROM THE CLEANSED CATALOG (PROD)*
7. Copy and Paste your query into config.yaml according to the instructions in the file.
8. Switch to the cdc_sql_template notebook and click the schedule button on the top right of the page and then click add a schedule to create a new workflow job.
  - Name the job the name of the table
  - You can select manual if you want to run the job manually or scheduled if you want it to run on its own periodically.
  - Select compute cluster 13.3LTS
  - You will need to add two parameters: 
    - one with key: source_environment, value: development (TODO: Switch to production)
    - one with key: target_environment and either value: dev (if your target table is in the development zone) OR value: prod (if your target table is to land in production)
12. Kick off the workflow job to create the new table. Upon success the new table will be loaded into the designated schema.
13. Push your changes to main on databricks and create a pull request on github to move your changes into main.
14. Once in main you can schedule your workflow job to run periodically if desired.
15. TODO: Add link to instructions on how to push to main