# SST Register Model and Create Model Card: [SCHOOL]

Fifth step in the process of transforming raw (PDP) data into actionable, data-driven insights for advisors: finalize model with unity catalog model registration and generate model card for transparency and reporting

### References

- [Data science product components (Confluence doc)](https://datakind.atlassian.net/wiki/spaces/TT/pages/237862913/Data+science+product+components+the+modeling+process)
- [Databricks runtimes release notes](https://docs.databricks.com/en/release-notes/runtime/index.html)
- [SCHOOL WEBSITE](https://example.com)

In [0]:
%sh python --version

In [0]:
# WARNING: AutoML/mlflow expect particular packages within certain version constraints
# overriding existing installs can result in errors and inability to load trained models
# %pip install "student-success-tool==0.1.1" --no-deps
# %pip install "git+https://github.com/datakind/student-success-tool.git@develop" --no-deps

In [0]:
%restart_python

In [0]:
import os
import mlflow
import logging
from databricks.connect import DatabricksSession

from student_success_tool import dataio, configs, modeling
from student_success_tool.reporting.model_card.pdp import PDPModelCard

In [0]:
logging.basicConfig(level=logging.INFO, force=True)
logging.getLogger("py4j").setLevel(logging.WARNING)  # ignore databricks logger

try:
    spark = DatabricksSession.builder.getOrCreate()
except Exception:
    logging.warning("unable to create spark session; are you in a Databricks runtime?")
    pass

# HACK: hardcode uc base path and mflow client
catalog = "sst_dev"
client = mlflow.tracking.MlflowClient()

# HACK: We need to disable the mlflow widget template loading for MC output
# Retrieved from DB office hours, otherwise 10+ DB widgets try to load and
# fail when pulling from ML artifacts (it's annoying)
os.environ["MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR"] = "false"

## import school-specific code

In [0]:
# project configuration should be stored in a config file in TOML format
# it'll start out with just basic info: institution_id, institution_name
# but as each step of the pipeline gets built, more parameters will be moved
# from hard-coded notebook variables to shareable, persistent config fields
cfg = dataio.read_config("./config-TEMPLATE.toml", schema=configs.pdp.PDPProjectConfig)
cfg

# register model

In [0]:
model_name = modeling.registration.get_model_name(
    institution_id=cfg.institution_id,
    target=cfg.preprocessing.target.name,
    checkpoint=cfg.preprocessing.checkpoint.name,
)
model_name

In [0]:
mlflow.set_registry_uri("databricks-uc")
modeling.registration.register_mlflow_model(
    model_name,
    cfg.institution_id,
    run_id=cfg.model.run_id,
    catalog=catalog,
    registry_uri="databricks-uc",
    mlflow_client=client,
)

# generate model card

In [0]:
# Initialize card
card = PDPModelCard(config=cfg, catalog=catalog, model_name=model_name)

In [0]:
# Build context and download artifacts
card.build()