# Requirements
## Databricks
* A Databricks Workspace & Workspace Access Token
* At least one runnable cluster within the workspace
* Workspace attached to a metastore for Delta Lake
## Packages
`pandas` for data manipulation and `pydantic` for data modeling.

* `pandas < 2.0`
* `pydantic < 1.11`

## Delta table
The table who's column description and tags you want to write/update needs to already exist in your delta lake
## Infra
A cluster is required to be running on the Databricks workspace from where the Delta lake will be accessed. This cluster will behave as an intermediary to accept connections and data from outside Databricks and add the data into Delta lake.

In order to add data to Unity catalog, the cluster must be configured to access Unity Catalog.

## Import necessary libraries

In [None]:
from pydantic import BaseModel
import pandas as pd


## Initialize the catalog, schema and tables

In [None]:
dbutils.widgets.removeAll()

dbutils.widgets.text("catalog", "") 
catalog : str = getArgument("catalog")

dbutils.widgets.text("schema", "")
schema: str = getArgument("schema")

dbutils.widgets.text("table", "")
table: str = getArgument("table")





## Steps 📊

### 1. Input Pydantic Data Model 📝

Initialize your pydantic data model which inherits from pydantic `BaseModel` where you have declared all the column descriptions and tags.

### 2. Convert the Pydantic data model to a dataframe 🚀

Next we convert the data model into a dataframe containing the relevant fields, making it easier to retrieve the needed data.


### 3. Update Delta Lake Table 🔄

Once you are satisfied with the inferred metadata, apply the updates to your Delta Lake table, and it will be enriched with the new descriptions and tags.

## Create your pydantic data model class

In [None]:
#initialize your pydantic datamodel class here with the class name as pydantic_date_model which inherits from BaseModel
pydantic_data_model = None

## convert the data model to a data frame containing the needed info

In [None]:
def create_data_dictionary(model: type[BaseModel]) -> pd.DataFrame:
    """Describe the fields of a pydantic model as a pandas DataFrame.

    Args:
        model (Type[BaseModel]): A pydantic model.

    Returns:
        pd.DataFrame: A pandas DataFrame describing the model.
    """
    return pd.DataFrame(
        [
            {
                "field_name": field,
                "field_title": field_mf.field_info.title,
                "python_type": field_type
                if "<class '" not in (field_type := str(field_mf.type_))
                else field_type.split("'")[1],
                "nullable": field_mf.allow_none,
                "description": field_mf.field_info.description,
                "tags": field_mf.field_info.extra.get("tags"),
            }
            for field, field_mf in model.__fields__.items()
        ],
    )



df = create_data_dictionary(pydantic_data_model)

## Write the column description and tags into your table columns in delta lake

In [None]:

for column_name, column_title, column_description, tags in zip(
    df["field_name"].values,
    df["field_title"].values,
    df["description"].values,
    df["tags"].values,
):
    

    spark.sql(
        f"""Alter table {catalog}.{schema}.{table} alter column {column_name}  comment "{column_description}" """
    )
    if tags:
        tags_list = ",".join([f'"{tag}"' for tag in tags])
        spark.sql(
            f"Alter table {catalog}.{schema}.{table} alter column {column_name} set tags ({tags_list})"
        )
