# Create Canvas Data Tables

*Authors: Departnemt of Education, Tasmania.*

Script to create canvas data tables (and optionally refresh metadata) in the specified database if they don't exist. Uses stage2np data.

Parameters: 
- **db_name**: Name of the database where tables are to be created. The DB will also be created if required.
- **table_prefix**: Prefix to add to each table name
- **base_data_path**: Base path where CanvasData stage2 files are located. Each subfolder will be considered a table.
- **refresh_tables**: Boolean to indicate whether to refresh metadata or not. Set to true if the schema has changed.
- **storage_account**: Name of your OEA storage account.
- **instrumentation_key**: AppInsights instrumentation key (for logging).

In [1]:
db_name = "CanvasData"
table_prefix = ""
base_data_path = "CanvasData"
refresh_tables = False

storage_account = ""
instrumentation_key = "375cef06-9584-49fe-a4a7-74e590a6c05f"

In [2]:
%run /oea_framework/OEA_py

In [3]:
oea = OEA(storage_account, instrumentation_key, "") # No salt required; we're not pseudonimising data here.

In [4]:
spark.sql(f"CREATE DATABASE IF NOT EXISTS {db_name}")

In [12]:
search_path = f"{oea.stage2np}/{base_data_path}"

for f in oea.get_folders(search_path):
    try:
        sql = f"CREATE TABLE IF NOT EXISTS {db_name}.{table_prefix + f} USING PARQUET LOCATION '{search_path}/{f}'"
        print(sql)
        spark.sql(sql)
        if refresh_tables:
            spark.sql(f"REFRESH TABLE {db_name}.{table_prefix + f}")
    except AnalysisException:
        print(f"Failed. Often caused by inability to infer schema. Does the folder contain any data?")