<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Energy Consumption Forecasting using AzureML</b>
</header>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Introduction:</b></p>

<p style = 'font-size:16px;font-family:Arial'>In this business use case, we leverage the power of AzureML and Teradata Vantage to enhance our machine learning capabilities and enable scalable model scoring. Our goal is to efficiently utilize the strengths of both platforms to streamline our data analysis and decision-making processes.
<br>
<!-- <img src="images/microsoft-global-partnership-with-teradata.jpg" alt="Microsoft X Teradata"> -->
<br>
<strong>Azure Machine Learning (AzureML):</strong> AzureML is a cloud-based platform provided by Microsoft, designed to simplify and accelerate the end-to-end machine learning workflow. It enables data scientists and developers to collaborate on data preparation, model training, and model deployment with ease, utilizing various frameworks and libraries for building intelligent applications.</p>

<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Key Highlights of the Demo:</b></p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li><strong>Data Preparation and Exploration:</strong> We will explore the data in Teradata Vantage and get it ready for training our model.</li>
    <li><strong>Model Training and Evaluation:</strong> Using AzureML, we'll create a tailored machine learning model for our usecase.</li>
    <li><strong>Inference using Teradata Vantage:</strong> Finally, we'll show how Teradata Vantage can run the AzureML model we trained. This lets us make predictions quickly and efficiently, right from the Vantage platform using BYOM (Bring Your Own Model).</li>
</ol>
<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Dataset:</b></p>
<p style = 'font-size:16px;font-family:Arial'>The dataset used in this demo represents electricity consumption in Norway from the 1st of January 2016 to the 31st of August 2019. Each line in this dataset reflects consumption for one hour. Apart from electricity consumption, this datamart also reflects additional data: weather from multiple sources, daylight information and labour calendar. We collected all data from open data sources.</p>

<p style = 'font-size:16px;font-family:Arial'><b>But what if I don't have AzureML?</b> Don't worry, we will execute the steps before AzureML would be used, show you screen shots of what the AzureML user would be doing, and then we've included the completed model that you will import into Vantage for scoring.</p>

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>1. Initial setup</b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>1.1 Downloading and installing additional software needed</b>

In [None]:
%%capture
!pip install azureml-core azureml

<p style = 'font-size:16px;font-family:Arial'><i>The above statements may need to be uncommented if you run the notebooks on a platform other than ClearScape Analytics Experience that does not have the libraries installed. If you uncomment those installs, be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>

<hr>
<a id="anchor"></a>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>1.2 Setting up Azure credentials</b>

<p style="font-size: 16px; font-family: Arial;color:#E37C4D"><b>Required Azure Credentials:</b></p>
<ul style="font-size: 16px; font-family: Arial;">
    <li><strong>Tenant ID:</strong> This is a unique identifier for the Azure Active Directory (AAD) tenant associated with the Azure subscription.</li>
    <li><strong>Subscription ID:</strong> It is a unique identifier for the Azure subscription, which represents the purchased plan and services.</li>
    <li><strong>Resource Group</strong>: Azure organizes resources into resource groups, which help manage and monitor related resources as a single unit.</li>
    <li><strong>Workspace Name:</strong> This is the name of the Azure Machine Learning Workspace, which provides a centralized location to work with machine learning resources.</li>
    <li><strong>ClearScape Host:</strong> This is a host of a ClearScape machine.</li>
</ul>

<p style="font-size: 16px; font-family: Arial;color:#E37C4D"><b>How to Get These Inputs:</b></p>
<ol style="font-size: 16px; font-family: Arial;">
    <li><strong>Tenant ID, Subscription ID, and Resource Group:</strong> These credentials are related to your Azure account and subscription. If you already have an Azure account and an active subscription, you can find these credentials in the Azure portal. Here's how:
        <ul style="font-size: 16px; font-family: Arial;">
            <li><a href="https://docs.microsoft.com/azure/active-directory/fundamentals/active-directory-how-to-find-tenant">Find your tenant ID</a></li>
            <li><a href="https://learn.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id">Find your subscription ID</a></li>
            <li><a href="https://docs.microsoft.com/azure/azure-resource-manager/management/manage-resource-groups-portal">Create and manage Azure resource groups</a></li>
        </ul>
    </li>
    <li><strong>Workspace Name:</strong> If you have already set up an Azure Machine Learning Workspace, you can use the name of the workspace you created. If not, you can create one by following the steps in the Azure Machine Learning documentation:
        <ul style="font-size: 16px; font-family: Arial;">
            <li><a href="https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace#create-a-workspace">Create an Azure Machine Learning Workspace</a></li>
        </ul>
    </li>
    <li><strong>ClearScape Host:</strong> The ClearScape host is shown on the <a href = "https://clearscape.teradata.com/dashboard">Clearscape dashboard</a> for this machine.</li>
</ol>

<p style="font-size: 16px; font-family: Arial;color:#E37C4D"><b>No Azure Credentials:</b></p>
<p style="font-size: 16px; font-family: Arial;">
If you don't have the required Azure credentials, the <a href="./Getting Started with Azure.ipynb">Getting Started with Azure</a> will guide you through the process of setting up an Azure account and acquiring the necessary credentials.
</p>

In [None]:
from IPython.display import display, HTML

def get_yes_no_input(prompt):
    while True:
        user_input = input(prompt).strip().lower()
        if user_input == 'yes' or user_input == 'no':
            return user_input
        else:
            print("\033[1mInvalid input. Please enter 'yes' or 'no'.\033[0m")

user_choice = get_yes_no_input('''Do you have the following Azure credentials? (yes/no):

- Tenant ID
- Subscription ID
- Resource Group
- Workspace Name
- ClearScape Host

Enter 'yes' or 'no': ''')

if user_choice == 'yes':
    print("\033[1mPlease enter the credentials:\033[0m")
    tenant_id = input('Tenant ID:')
    subscription_id = input('Subscription ID:')
    resource_group = input('Resource Group:')
    workspace_name = input('Workspace Name:')
    host = input('ClearScape Host:')
elif user_choice == 'no':
    display(HTML(f'''<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><i><b>Note</b>: If you do not have an Azure subscription, please set-up an Azure account. Click on the link below to get started with Azure:</i></p>
    <a href = './Getting Started with Azure.ipynb'>Getting Started with Azure</a>
</div>'''))

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>1.3 Importing libraries</b>
<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import os
import getpass
import sys
import warnings

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd

from jdk4py import JAVA, JAVA_HOME, JAVA_VERSION

from teradataml.analytics.valib import *
from teradataml.analytics.Transformations import *
from teradataml.dataframe.copy_to import copy_to_sql
from teradataml.dataframe.dataframe import DataFrame, in_schema
from teradataml.context.context import create_context, remove_context
from teradataml import save_byom, delete_byom, list_byom, retrieve_byom, PMMLPredict, db_drop_table, db_drop_view, display

from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
import seaborn as sns

from azureml.core import Workspace, Experiment, ScriptRunConfig
from azureml.core.authentication import InteractiveLoginAuthentication

# Suppress warnings
warnings.filterwarnings('ignore')
display.max_rows = 5

# Modify the following to match the specific client environment settings
configure.val_install_location = 'val'
configure.byom_install_location = 'mldb'
os.environ['PATH'] = os.pathsep.join([os.environ['PATH'], str(JAVA_HOME), str(JAVA)[:-5]])

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>2. Initiate a connection to Vantage</b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.1 Let's start by connecting to the Teradata system </b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
eng.execute('''SET query_band='DEMO=Energy_Consumption_Forecasting_AzureML.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#E37C4D'><b>2.2 Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_Energy_cloud');"        # Takes 1 minute
%run -i ../run_procedure.py "call get_data('DEMO_Energy_local');"        # Takes 2 minutes

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>3. Data Exploration</b>

<table style = 'width:100%;table-layout:fixed;'>
<tr>
    <td style = 'vertical-align:middle' width = '50%'>
        <p style = 'font-size:16px;font-family:Arial'>Users can access large volumes of data by connecting remotely using the teradataml client connection library.  Python methods are translated to SQL and run remotely on the Vantage system.  Only the minimal amount of data required is copied to the client, allowing users to interact with data sets of any size and scale.
    </td>
    <td><img src = 'images/connect_and_discover.png' width = '400'></td>
</tr>
</table>

<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage</p>

In [None]:
df = DataFrame(in_schema("DEMO_Energy", "consumption"))
print(df.shape)

<p style = 'font-size:16px;font-family:Arial'>Let's investigate the data by looking at a sample.</p>

In [None]:
df

<p style = 'font-size:16px;font-family:Arial'>The dataset contains hourly energy consumption data along with various related features. These features include:</p>

<ul style = 'font-size:16px;font-family:Arial'>
    <li><b>TD_TIMECODE:</b> Date and time information in a specific format.</li>
    <li><b>consumption:</b> Hourly energy consumption values.</li>
    <li><b>y, m, d, h:</b> Year, month, day, and hour components of the timestamp.</li>
    <li><b>weekday:</b> Indicator for the day of the week (e.g., Monday, Tuesday).</li>
    <li><b>nasa_temp:</b> Temperature readings from NASA.</li>
    <li><b>cap_air_temperature:</b> Ambient air temperature measurements.</li>
    <li><b>cap_cloud_area_fraction:</b> Cloud cover percentage.</li>
    <li><b>cap_precipitation_amount:</b> Amount of precipitation.</li>
    <li><b>is_dark:</b> Flag indicating if it is dark.</li>
    <li><b>is_light:</b> Flag indicating if it is light.</li>
    <li><b>is_from_light_to_dark:</b> Flag indicating the transition from light to dark.</li>
    <li><b>is_from_dark_to_light:</b> Flag indicating the transition from dark to light.</li>
    <li><b>is_holiday:</b> Flag indicating if it is a holiday.</li>
    <li><b>is_pre_holiday:</b> Flag indicating if it is a day before a holiday.</li>
</ul>

<p style = 'font-size:16px;font-family:Arial'>
    These columns provide valuable insights into energy consumption patterns and the factors that might influence it, such as weather conditions, time of day, and holidays.
</p>

In [None]:
# Convert to pandas dataframe
pd_df = df.to_pandas(all_rows=True)

# Set the size of the plot
plt.figure(figsize=(12, 6))

# Create line plot using seaborn
sns.set_palette(['#add8e6', '#90ee90', '#00bfff'])
sns.lineplot(data=pd_df, x='TD_TIMECODE', y='consumption')

# Add x label
plt.xlabel('Date', fontsize=12)

# Add y label
plt.ylabel('Energy Units', fontsize=12)

# Add title
plt.title('Energy Demand', fontsize=16)

# Add legend
plt.legend(labels=['Energy'], fontsize=12)

# Add grid lines
plt.grid(axis='y', alpha=0.5)

# Remove spines
sns.despine()

# Show the plot
plt.show()

In [None]:
# Normalize the data
scaler = MinMaxScaler()
cols = ['cap_air_temperature', 'cap_cloud_area_fraction', 'cap_precipitation_amount']
pd_df[cols] = scaler.fit_transform(pd_df[cols])

# Create three subplots
fig, axs = plt.subplots(nrows=3, ncols=1, figsize=(12, 9))
sns.set_palette(['#add8e6', '#90ee90', '#00bfff'])
sns.lineplot(x='TD_TIMECODE', y='cap_air_temperature', data=pd_df, ax=axs[0])
sns.lineplot(x='TD_TIMECODE', y='cap_cloud_area_fraction', data=pd_df, ax=axs[1])
sns.lineplot(x='TD_TIMECODE', y='cap_precipitation_amount', data=pd_df, ax=axs[2])

# Set the labels, titles, and other properties for each subplot
cols = ['cap_air_temperature', 'cap_cloud_area_fraction', 'cap_precipitation_amount']
for i, ax in enumerate(axs):
    ax.set_ylabel('Normalized Values')
    ax.set_title(cols[i])
    ax.set_xlabel('Date')
    ax.grid()
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
    fig.autofmt_xdate(rotation=45)

plt.tight_layout()
plt.show()

<p style = 'font-size:16px;font-family:Arial'>The graph of cap_air_temperature shows an inverse relationship with energy consumption, meaning that in countries with colder climates like Norway, electricity usage tends to increase as the temperature drops, likely due to increased demand for heating. Conversely, electricity usage tends to decrease when the temperature rises, potentially due to reduced need for heating.</p> 

In [None]:
pd_df['quarter'] = pd_df['TD_TIMECODE'].dt.quarter
# create boxplots for selected columns for each quarter
sns.boxplot(x='quarter', y='consumption', data=pd_df, palette='pastel')

<p style = 'font-size:16px;font-family:Arial'>The above graph shows the distribution of energy consumption across quarters. It indicates that the 1st and 4th quarters across years witness high energy consumption due to cold weather, while the 3rd quarter witnesses the least energy consumption across years, indicating the summer season.</p> 

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>4. Data Preparation</b>

<table style = 'width:100%;table-layout:fixed;'>
<tr>
    <td style = 'vertical-align:top' width = '50%'>
        <p style = 'font-size:16px;font-family:Arial'>The Vantage Analytic Library is a suite of powerful functions that allows for whole-data-set descriptive analysis, data transformation, hypothesis testing, and algorithmic algorithms at an extreme scale.  As with all Vantage capabilities, these functions run in parallel at the source of the data</p>
        <ol style = 'font-size:16px;font-family:Arial'>
            <li>Create Feature Transformation objects</li>
            <br>
            <li>Define the columns to be retained in the analytic data set</li>
            <br>
            <li>Push the transformations to the data in Vantage</li>
            <br>
            <li>Inspect the results</li>
        </ol>
    </td>
    <td><img src = 'images/VAL_transformation.png' width = '400'></td>
</tr>
</table>

In [None]:
weekday_mapping = {1:'monday', 2:'tuesday', 3:'wednesday', 4:'thursday', 5:'friday', 6:'saturday', 7:'sunday'}
weekday_t = OneHotEncoder(values = weekday_mapping, columns = 'weekday')

hour_t = OneHotEncoder(values = [x for x in range(0,24)],  columns = 'h')

rs = MinMaxScalar(columns = ['nasa_temp','cap_air_temperature', 'cap_cloud_area_fraction', 'cap_precipitation_amount'])

rt = Retain(columns = ['consumption',
                       'is_dark', 'is_light', 'is_from_light_to_dark', 'is_from_dark_to_light', 
                       'is_holiday', 'is_pre_holiday'])

<p style = 'font-size:16px;font-family:Arial'>The transformation objects created in the previous step will be used to prepare the data for modeling. Specifically, weekday_t and hour_t will be used to convert weekday and hour columns from numeric to one-hot encoded columns. rs will be used to scale the nasa_temp using MinMaxScalar, and rt will be used to retain the specified columns. These transformations will enable the data to be used effectively in a machine learning model.</p>

In [None]:
t_output = valib.Transform(data = df,
                           one_hot_encode = [weekday_t, hour_t], 
                           rescale = [rs], 
                           index_columns = 'TD_TIMECODE',
                           retain = [rt])

copy_to_sql(t_output.result,
            table_name = 'output',
            if_exists = 'replace')

In [None]:
t_output.result

<p style = 'font-size:16px;font-family:Arial'>Please scroll to the right and observe that we now have columns named <b>monday-sunday</b> and <b>0_h - 23_h</b>. Also, nasa_temp has been scaled.
<br>
<br>
The following cell splits the data in to train and test data. The last week's data(24 * 7 = 168 hours) is kept for testing and remaining data is used for training.
</p>

In [None]:
eng.execute('''
    REPLACE VIEW test_df AS
    SELECT * FROM output
    QUALIFY row_number() OVER (order by TD_TIMECODE DESC) <= 168
''')

eng.execute('''
    REPLACE VIEW train_df AS
    SELECT * FROM output
    QUALIFY row_number() OVER (order by TD_TIMECODE DESC) > 168
''')

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>5. AzureML</b>
<!-- <ul style = 'font-size:16px;font-family:Arial'>
<li>
        <strong>Importing Required Libraries:</strong>
        <p>
            These typically include <code>azureml.core.authentication.InteractiveLoginAuthentication</code> and <code>azureml.core.Workspace</code>.
        </p>
    </li>
</ul> -->
<p style="font-size: 16px; font-family: Arial;color:#E37C4D"><b>Overview:</b></p>

<ol style="font-size: 16px; font-family: Arial;">
    <li>
        <b>Checking for Required Variables:</b>
        <p style="font-size: 16px; font-family: Arial;">
            This section checks if all the credentials are defined.
            If any required credentials are missing, it shows a message with the names of the missing variables and a link for more information.
        </p>
    </li>
    <li>
        <b>Azure Machine Learning Workspace Setup:</b>
        <p style="font-size: 16px; font-family: Arial;">
            The code sets up a workspace for Azure Machine Learning using specific credentials.
            This workspace allows running machine learning experiments.
        </p>
    </li>
    <li>
        <b>File Operation - Replacing a Line in a Python File:</b>
        <p style="font-size: 16px; font-family: Arial;">
            The code swaps default connection string with client connection string.
        </p>
    </li>
    <li>
        <b>Running a Python Script as an Experiment:</b>
        <p style="font-size: 16px; font-family: Arial;">
            The code creates an experiment for running a Python script.
            It submits the script to the Azure Machine Learning workspace for execution.
            The script's execution is monitored until completion.
        </p>
    </li>
</ol>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>5.1 Checking for Required Variables</b></p>
<p style="font-size: 16px; font-family: Arial;">The function <code>check_variables(variable_names)</code> checks if certain variables specified in the <code>required_variables</code> list are defined in the local environment.
<br>
If any of the required variables are missing, it prints the names of the missing variables and displays a message with a link to more information.</p>

In [None]:
def check_variables(variable_names):
    missing_variables = [var for var in variable_names if var not in locals()]
    
    if missing_variables:
        print("The following variables are missing:")
        for var in missing_variables:
            print(f" - {var}")
        display(HTML(f'''
            <p>Please ensure all the required credentials are defined.</p>
            <p>For more information, please go to <a href="#anchor">this section</a>.</p>
        '''))
    else:
        print("All required credentials are present.")

required_variables = ['tenant_id', 'subscription_id', 'resource_group', 'workspace_name', 'host']
check_variables(required_variables)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>5.2 Azure Machine Learning Workspace Setup</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
<li>
        <strong>Authentication:</strong>
        <p>
        The first line of code creates an instance of <code>InteractiveLoginAuthentication</code>. This class is used to authenticate and establish a connection to Azure services interactively. It allows you to log in to your Azure account using an interactive login prompt.
        </p>
        <ul>
            <li>
                <code>tenant_id:</code>This parameter is required and should be replaced with your Azure Active Directory (Azure AD) tenant ID. The tenant ID identifies the organization or tenant associated with your Azure subscription.
            </li>
        </ul>
    </li>
    <li>
        <strong>Creating the Azure Machine Learning Workspace:</strong>
        <p>
        The second part of the code creates an instance of the <code>Workspace</code> class, which represents the Azure Machine Learning Workspace. This is the primary entry point for interacting with Azure Machine Learning resources.
        </p>
        <ul>
            <li>
                <code>subscription_id:</code>The subscription ID identifies your Azure subscription, which is associated with the Azure Machine Learning resources.
            </li>
            <li>
                <code>resource_group:</code>Name of the resource group where your Azure Machine Learning Workspace is located. A resource group is a logical container for resources in Azure.
            </li>
            <li>
                <code>workspace_name:</code>Name of your Azure Machine Learning Workspace.
            </li>
            <li>
                <code>auth:</code> The <code>auth</code> parameter is set to the previously created <code>InteractiveLoginAuthentication</code> instance. This provides the authentication context for the workspace.
            </li>
        </ul>
    </li>
</ul>

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><i><b>Note</b>: If running the following cell for first time, you need to perform authentication. The output might look as follows:
    <img src = './images/authenticate.png'></i></p>
</div>

In [None]:
interactive_auth = InteractiveLoginAuthentication(tenant_id = tenant_id)
ws = Workspace(subscription_id = subscription_id,
               resource_group = resource_group,
               workspace_name = workspace_name,
               auth = interactive_auth)

In [None]:
ws

<ul style = 'font-size:16px;font-family:Arial'>
    <li>
        <strong>Creating an Experiment:</strong>
        <p>
        The code is creating an instance of the <code>Experiment</code> class, which represents an experiment in Azure Machine Learning. An experiment is a container that holds runs, and each run corresponds to a specific iteration or execution of a machine learning model or a data processing task.
        </p>
        <ul>
            <li>
                <code>workspace:</code> The <code>workspace</code> parameter is required and should be replaced with the actual Azure Machine Learning Workspace object (<code>ws</code>). This object represents the connection to your Azure Machine Learning workspace.
            </li>
            <li>
                <code>name:</code> The <code>name</code> parameter is used to specify the name of the experiment. In the provided code, the experiment name is set to 'python_snippet'. You can change this to a more descriptive name that represents the purpose of your experiment.
            </li>
        </ul>
    </li>
</ul>

In [None]:
# Create an experiment
experiment_name = 'python_snippet'
experiment = Experiment(workspace = ws, name = experiment_name)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>5.3 File Operation - Replacing connection string in Python File</b></p>
<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><i><b>Note</b>: The following cell updates the connection details in the <a href = './train.py'>train.py</a> file</i></p>
</div>

<ul style="font-size: 16px; font-family: Arial;">
            <li>
                The function <code>replace_line_in_py_file(file_path, old_line_content, new_line_content)</code> reads the content of a specified Python file (<code>train.py</code>) and finds the line containing the <code>old_line_content</code> i.e. the default connection string.
            </li>
            <li>
                If found, it replaces the line with the <code>new_line_content</code> i.e. new connection string.
            </li>
</ul>

In [None]:
def replace_line_in_py_file(file_path, old_line_content, new_line_content):
    # Read the content of the file
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Find the line containing the old content
    line_number = None
    for i, line in enumerate(lines):
        if old_line_content in line:
            line_number = i + 1  # Line numbers are 1-based in the file

    if line_number is not None:
        # Replace the line with the new content
        lines[line_number - 1] = new_line_content + '\n'  # Add '\n' to maintain line breaks

        # Write the changes back to the file
        with open(file_path, 'w') as file:
            file.writelines(lines)
        print(f"Line containing '{old_line_content}' in {file_path} has been replaced.")
    else:
        print(f"Line containing '{old_line_content}' not found in the file.")

# Example usage
file_path = 'train.py'  # Replace with the path of your .py file
old_line_content = "eng = create_context(host = 'xxx', username = 'demo_user', password = 'xxx')"
new_line_content = "eng = create_context(host='{}', username='demo_user', password='{}')".format(host, password)

replace_line_in_py_file(file_path, old_line_content, new_line_content)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>5.4 Running a Python Script as an Experiment</b></p>

<ul style = 'font-size:16px;font-family:Arial'>
    <li>
        <strong>Creating a ScriptRunConfig:</strong>
        <p>
        The code is creating an instance of the <code>ScriptRunConfig</code> class, which represents the configuration for running a script as an experiment in AzureML.
        </p>
        <ul>
            <li>
                <code>source_directory:</code> The <code>source_directory</code> parameter specifies the directory where the Python script <code>train.py</code> is located. The dot (<code>./</code>) represents the current working directory, assuming that the script is in the same directory as the Python script executing the code. Adjust this path if the <code>train.py</code> script is in a different directory.
            </li>
            <li>
                <code>script:</code> The <code>script</code> parameter should be replaced with the filename of your Python snippet file, in this case, <code>'train.py'</code>. This script contains the code that will be executed as part of the experiment. You should replace <code>'train.py'</code> with the actual name of your Python script.
            </li>
            <li>
                <code>compute_target:</code> The <code>compute_target</code> parameter specifies the name of the Azure Machine Learning compute target where the script will be executed. In this code, <code>'demo-compute'</code> is used as the compute target name. You should replace this with the actual name of your desired compute target.
            </li>
        </ul>
    </li>
</ul>

In [None]:
# Create a ScriptRunConfig with the Python snippet
script_run_config = ScriptRunConfig(source_directory='./',
                                   script='train.py',  # Replace with your Python snippet file name
                                   compute_target='demo-compute'  # Replace with your compute target name
                                   )

<ul style = 'font-size:16px;font-family:Arial'>
    <li>
        <strong>Submitting the Script Run:</strong>
        <p>
        The first line of code is submitting the script run as an experiment using the <code>experiment.submit()</code> method. The <code>run</code> variable holds the run object returned by the <code>submit()</code> method. This run object represents the execution of the script on the specified compute target.
        </p>
        <ul>
            <li>
                <code>experiment:</code> The <code>experiment</code> object is the instance of the <code>Experiment</code> class that you created earlier. It represents the Azure Machine Learning experiment where you want to log and track the run results.
            </li>
            <li>
                <code>config:</code> The <code>config</code> parameter is set to the previously created <code>script_run_config</code> object. This object contains the configuration for running the script, including the source code, dependencies, and execution settings.
            </li>
        </ul>
    </li>
    <li>
        <strong>Waiting for Completion:</strong>
        <p>
        The second line of code is waiting for the script run to complete using the <code>wait_for_completion()</code> method of the <code>run</code> object. The method call includes <code>show_output=True</code>, which means that the output of the run will be shown in the console while waiting for completion.
        </p>
        <p>
        The <code>wait_for_completion()</code> method allows the code to wait until the script execution finishes. This is useful for long-running scripts or experiments that require some time to complete. By setting <code>show_output=True</code>, you can view the progress and output of the script run in real-time.
        </p>
    </li>
</ul>

In [None]:
# Submit the script run
run = experiment.submit(config=script_run_config)

# Wait for the script run to complete
run.wait_for_completion()

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><i><b>Note</b>: If you do not have AzureML or did not perform the above steps, the following cell will do the required setup to run the remaining notebook.</i></p>
</div>

In [None]:
# Load the PMML file into Vantage
model_ids = ['lr']
model_files = ['energy_consumption_LR.pmml']
table_name = 'azureml_models'

if table_name not in eng.table_names(schema = 'demo_user'):
    for model_id, model_file in zip(model_ids, model_files):
        try:
            save_byom(model_id = model_id, model_file = model_file, table_name = table_name)
        except Exception as e:
            # if our model exists, delete and rewrite
            if str(e.args).find('TDML_2200') >= 1:
                delete_byom(model_id = model_id, table_name = table_name)
                save_byom(model_id = model_id, model_file = model_file, table_name = table_name)
            else:
                raise ValueError(f"Unable to save the model '{model_id}' in '{table_name}' due to the following error: {e}")

# Show the azureml_models table
list_byom(table_name)

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>6. Model Scoring and Evaluation</b>

<p style = 'font-size:16px;font-family:Arial'>The final step in this process is to test the trained model.  The PMMLPredict function will take the stored pipeline object (including any data preparation and mapping tasks) and execute it against the data on the Vantage Nodes.  Note that we can keep many models in the model table, with versioning, last scored timestamp, or any other management data to allow for the operational management of the process.</p>
        <ol style = 'font-size:16px;font-family:Arial'>
            <li>Create a pointer to the model in Vantage</li>
            <li>Execute the Scoring function using the model against the testing data</li>
            <li>Visualize the results</li>
        </ol>

In [None]:
# Obtain a pointer to the model
table_name = 'azureml_models'
model_id = 'lr'
model_lr = retrieve_byom(model_id, table_name=table_name, schema_name="demo_user")
df_test = DataFrame('test_df').sort('TD_TIMECODE')

result_lr = PMMLPredict(
            modeldata = model_lr,
            newdata = df_test,
            accumulate = ['TD_TIMECODE','consumption'],
            ).result.to_pandas(all_rows = True)

result_lr['prediction'] = pd.to_numeric(result_lr['prediction'], errors='coerce')

copy_to_sql(result_lr,
            table_name = 'result_lr',
            if_exists = 'replace')

In [None]:
result_lr

<p style = 'font-size:16px;font-family:Arial'>In the above step, we use the PMMLPredict method from teradataml library to score the model in the database. The PMMLPredict function in Teradata allows users to score the PMML model directly on the data in the Vantage system, without having to move the data or the model outside the system. This can help to improve the efficiency and security of the scoring process.</p>

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>7. Visualize the results</b>

In [None]:
res = pd.read_sql('''SELECT * FROM TD_RegressionEvaluator(
    ON demo_user.result_lr as InputTable
    USING
    ObservationColumn('consumption')
    PredictionColumn('prediction')
    Metrics('RMSE','R2','FSTAT')
    DegreesOfFreedom(5,48)
    ) as dt;
''', eng)
res

<p style="font-size: 16px; font-family: Arial;color:#E37C4D">
    <b>Explanation:</b>
</p>

<p style="font-size: 16px; font-family: Arial;">
    The table above provides evaluation metrics for a regression model:
</p>

<ol style="font-size: 16px; font-family: Arial;">
    <li>
        <b>RMSE (Root Mean Squared Error):</b>
        RMSE is a measure of the average prediction error of the model.
    </li>
    <li>
        <b>R2 (R-Squared):</b>
        R2 represents the proportion of the variance in the target variable that is predictable from the features.
    </li>
    <li>
        <b>F_SCORE (F-Statistic):</b>
        F-score measures the overall fit of the model. The value is high, suggesting a reasonably good fit.
    </li>
    <li>
        <b>F_CRITICALVALUE (Critical Value for F-Statistic):</b>
        The critical value for the F-statistic is used to determine statistical significance.
    </li>
    <li>
        <b>P_VALUE (P-Value):</b>
        The p-value is very small, indicating strong evidence against the null hypothesis.
    </li>
    <li>
        <b>F_CONCLUSION (Conclusion based on F-Test):</b>
        The F-test result leads to the conclusion of "Reject null hypothesis", suggesting the model's fit is statistically significant.
    </li>
</ol>

In [None]:
# Create the subplots
df_lr = result_lr

fig, ax = plt.subplots(figsize=(14, 6))

# Calculate RMS error for Linear Regression
rms_lr = res['RMSE'][0]

# Plot Linear Regression
ax.plot(df_lr.index, df_lr['consumption'], label=f'Actual Consumption', color='#1f77b4', linewidth=2)
ax.plot(df_lr.index, df_lr['prediction'].astype(float), label=f'Linear Regression (RMS={rms_lr:.2f})', color='#ff7f0e', linestyle='--')
ax.set_ylabel('Energy Consumption')
ax.set_title('Energy Consumption Prediction - Linear Regression')
ax.legend()
ax.grid(axis='y', linestyle='--')

# Add a background color
fig.patch.set_facecolor('#f2f2f2')

# Display the plot
plt.show()

<p style = 'font-size:16px;font-family:Arial'>The above graph displays the Actual and Predicted values and Root Mean Squared (RMS) error value for Linear Regression models. The lower the RMS error value, the better the model's performance. </p>

<p style = 'font-size:16px;font-family:Arial'>This demonstration has illustrated a simplified - but complete - overview of how a typical machine learning workflow can be improved using Vantage in conjunction with 3rd-party tools and techniques.  This combination allows users to leverage 3rd-party innovation with Vantage's operational scale, power, and stability.</p>

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>8. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>Cleanup work tables to prevent errors next time.</p>

In [None]:
db_drop_view(view_name='train_df', schema_name = 'demo_user')

In [None]:
db_drop_view(view_name='test_df', schema_name = 'demo_user')

In [None]:
db_drop_table(table_name='azureml_models', schema_name = 'demo_user')

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_Energy');"        # Takes 5 seconds

In [None]:
remove_context()

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">Copyright © Teradata Corporation - 2023. All Rights Reserved.</footer>