# Welcome to the Notebooks Container Runtime!

Make sure you've completed all of the setup instructions outlined in the [README]() file prior to running this Notebook.

- Have you uploaded the data?
- Have you configured the EAI?

If so, proceed!

In [None]:
import warnings
warnings.filterwarnings("ignore")

from snowflake.snowpark.context import get_active_session
session = get_active_session()
# Add a query tag to the session. This helps with troubleshooting and performance monitoring.
session.query_tag = {"origin":"sf_sit-is", 
                    "name":"aiml_notebooks_xgboost_on_gpu", 
                    "version":{"major":1, "minor":0},
                    "attributes":{"is_quickstart":1, "source":"notebook"}}

In [None]:
!pip freeze

Notebooks Container Runtime, along with External Access Integrations give us the flexibility to `pip install` packages from anywhere, including popular package repositories such as pypi. You can install whatever packages you need by running `!pip install <package_name>` directly in the Notebook.

In [None]:
!pip install seaborn

Just like Notebooks on the Warehouse Runtime, we can intermingle both SQL and Python cells:

In [None]:
show tables;

Let's visualize some of our data using the `seaborn` package that we installed above:

In [None]:
diamonds_df = session.table("DIAMONDS")
diamonds_df.show()

In [None]:
df = diamonds_df.to_pandas()

import seaborn as sns

# Create a visualization
sns.histplot(
    data=df,
    x="PRICE"
)

Now, let's train a basic `XGBRegressor` machine learning model. The ML Container Runtime for Snowflake Notebooks includes pre-installed common packages for doing machine learning tasks, including SnowparkML and other OSS packages.

In [None]:
import time
from snowflake.ml.modeling.xgboost import XGBRegressor

CATEGORICAL_COLUMNS = ["CUT", "COLOR", "CLARITY"]
NUMERICAL_COLUMNS = ["CARAT", "DEPTH", "X", "Y", "Z"]
LABEL_COLUMNS = ['PRICE']
diamonds_df = session.table("diamonds")

model = XGBRegressor(max_depth=400, input_cols=NUMERICAL_COLUMNS, label_cols=LABEL_COLUMNS)

t0 = time.time()
model.fit(diamonds_df)

t1 = time.time()

print(f"Fit in {t1-t0} seconds.")

SnowparkML on the container runtime automatically captures various logs and metrics associated with your training job. We can run some quick functions to fetch, print, or even visualize those metrics:

In [None]:
# utils
import requests

### Get logs depending on type
def fetch_log(log_type):
    file_path = f'/var/log/managedservices/{log_type}/mlrs/logs-mlrs.log'
    with open(file_path, 'r') as file:
        # Read the contents of the file
        file_contents = file.read()
        return file_contents

### Get response text
def fetch_metrics(port):
    metrics_url = f"http://localhost:{port}/metrics"
    response = requests.get(metrics_url)
    return response.text

def list_mlrs_metrics():
    txt = fetch_metrics(11501)
    metrics_name_and_value = {}
    for line in txt.split("\n")[:-1]:
        if not line.startswith("#"):
            tokens = line.split(" ")
            name, value = tokens[0], tokens[1]
            metrics_name_and_value[name] = value
        elif line.startswith("# HELP"):
            tokens = line.split(" ")
    return metrics_name_and_value

In [None]:
print("train attempt", list_mlrs_metrics()['train_attempts_total'])