# Running Generative Models Inside Teradata DB

This demo demonstrates how to run generative models inside the Teradata database. To achieve this, models must first be converted to the ONNX format.

## Introduction to Generative Language Models

Generative language models are a type of artificial intelligence designed to generate new content based on the data they have been trained on. These models can produce human-like text, translate languages, answer questions, and even create summaries of long documents. They are trained on vast amounts of text data and learn to predict the next word in a sequence, which allows them to generate coherent and contextually relevant sentences.

### T5 Model for Summarization

In this demo, we will use the T5 (Text-to-Text Transfer Transformer) model for summarization. The T5 model, developed by Google, is a versatile generative language model that converts all NLP tasks into a text-to-text format. This means that both the input and output are always text strings, making the model highly flexible and capable of handling a wide range of tasks, from translation to summarization to question answering.

### Encoder-Decoder Models

The T5 model is an example of an encoder-decoder model. Encoder-decoder models consist of two main components:

1. **Encoder**: The encoder reads the input text and transforms it into a fixed-size context vector. This vector captures the essential information and context from the input text.
2. **Decoder**: The decoder takes the context vector produced by the encoder and generates the output text. It uses the information in the context vector to produce a coherent and contextually appropriate response.

This architecture allows encoder-decoder models to handle complex tasks that require understanding and generating text, such as summarization and translation. The T5 model's ability to treat every NLP task as a text-to-text problem simplifies the process of applying the model to different tasks, making it a powerful tool for various applications.

One advanced technique often used with encoder-decoder models is **Beam Search**. Beam Search is a decoding algorithm that improves the quality of generated sequences. Instead of greedily choosing the most probable token at each step, Beam Search keeps track of multiple candidate sequences (called beams) at each step. It expands these beams by considering the top tokens for each candidate sequence and retains only the most promising ones based on their cumulative probabilities. By exploring multiple potential sequences simultaneously, Beam Search increases the likelihood of finding a high-quality output.

In this demo, we will walk through the steps required to convert the T5 model to the ONNX format and run it within the Teradata database to perform text summarization.


## Demo Flow Overview

This demo showcases the process of running generative models inside the Teradata database. The entire workflow is divided into three main steps:

1. **Model Import and Conversion to ONNX**:
   The first step involves importing a pre-trained T5 model from the Hugging Face library and converting it to the ONNX (Open Neural Network Exchange) format. This step ensures that our model is ready for efficient deployment and execution within the Teradata environment.

2. **Deployment of the Model and Tokenizer to Teradata**:
   In the second step, we establish a connection to the Teradata database using the `teradataml` Python library. This library handles all aspects of connectivity and provides a user-friendly API similar to PySpark or Pandas DataFrame. We deploy two artifacts to the database: the ONNX model itself and the `tokenizer.json` file. These artifacts are deployed using the `save_byom` function, which abstracts the underlying complexity of the deployment process.

3. **In-Database Inference for Text Summarization**:
   The final step involves performing in-database inference to generate text summarizations. Using the deployed model and tokenizer, we process the input texts directly within the database. This approach leverages Teradata's massive parallel processing capabilities, providing performance and scalability advantages. Additionally, keeping the data within the database enhances security and reduces data transfer overhead.

By following these steps, we demonstrate how to effectively run generative models within the Teradata database, highlighting the benefits of keeping data and computation close together for enhanced performance and security.

![alt text](img/summarization_workflow.jpg "Teradata in-database LLMs")

## Step 1. Model Import and Conversion to ONNX


To run the T5 model within the Teradata database, we first need to convert it to the ONNX (Open Neural Network Exchange) format. ONNX models are typically more performant in inference compared to models executed natively, thanks to optimizations that streamline their execution across different platforms and hardware. ONNX is an open-source format for AI models, which allows models to be transferred between different frameworks and run on various hardware platforms.





   We start by importing the T5 model from the Hugging Face library using the utility built into the `onnxruntime` package. Hugging Face provides a wide range of pre-trained models, making it easy to access and use state-of-the-art NLP models. Using the this utility, we convert the T5 model from its native PyTorch format to the ONNX format. 


In [None]:
import onnxruntime as rt
import onnx

from onnxruntime.tools.onnx_model_utils import *

import transformers

import warnings

import teradataml as tdml
import getpass

In [None]:
# the execution may ends with error about IR version at the tests but this is OK. 
# This error cause by Linux specifics of this particular DemoVM
! python3 -m onnxruntime.transformers.convert_generation --total_runs 0 --disable_perf_test --disable_parity -m JulesBelveze/t5-small-headline-generator --model_type t5 --output t5-small-headline-generator/t5-small-headline-generator.onnx --no_repeat_ngram_size 2  --custom_attention_mask

   After conversion, we adjust the opset version in the ONNX file to match the compatibility requirements of the Teradata environment. Opset versions define the operations that are available for use in the model.

In [None]:
op = onnx.OperatorSetIdProto()
op.version = 12

model = onnx.load('./t5-small-headline-generator/t5-small-headline-generator.onnx')

model_ir8 = onnx.helper.make_model(model.graph, ir_version = 8, opset_imports = [op])

We need to fix the dynamic dimensions on the input and output to ensure the model operates correctly within the database. This involves setting specific dimensions that the model will use during inference.

In [None]:
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "sequence_length", 512)

rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "num_return_sequences", 1)
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "max_length", 100)


onnx.save(model_ir8, './t5-small-headline-generator/t5-small-headline-generator_fixed.onnx')

We test the ONNX model to ensure that it produces reasinable results

In [None]:
tokenizer = transformers.T5TokenizerFast.from_pretrained("JulesBelveze/t5-small-headline-generator")
predef_sess = rt.InferenceSession("./t5-small-headline-generator/t5-small-headline-generator_fixed.onnx")

In [None]:
enc = tokenizer("""
During my trip to Dubai on February 10th, I had a meeting with the CEO of Green Energy Solutions. Our collaboration involves a budget of $90,000 for renewable energy projects, targeting a 30% increase in clean energy adoption.
""", max_length = 512, padding='max_length')


encoder_result = predef_sess.run(
    None, 
    {"input_ids": [enc.input_ids], 
     "attention_mask": [enc.attention_mask],
     "max_length": [100], 
     "min_length": [10], 
     "repetition_penalty": [2],
     'num_beams' : [4], 
     'num_return_sequences': [1], 
     'length_penalty': [0]})


tokenizer.decode(encoder_result[0][0][0], skip_special_tokens = True)

And finally we save the tokenizer definition to a JSON file. This file will used during the in-database processing

In [None]:
tokenizer.save_pretrained("t5-small-headline-generator")

## Part 2. Model Deployment to Database to be Used with BYOM

In this section, we demonstrate how to deploy the model to the Teradata database using the BYOM (Bring Your Own Model) capability. We use the `teradataml` Python library to manage the connectivity and provide a convenient Python API that is similar to PySpark or pandas DataFrame.




### Opening Connection to Teradata

We start by setting up a connection to the Teradata database. The `teradataml` library handles all the intricacies of database connectivity, allowing us to interact with Teradata in a manner similar to working with data in pandas DataFrames.


In [None]:
tdml.create_context(host = '192.168.178.232', username='<YOUR DATABASE USERNAME>', password = '<YOUR DATABASE PASSWORD>')

### Deploying the Model and Tokenizer

After establishing the connection, we deploy two key artifacts to the database:
1. The model itself, converted to ONNX format.
2. The `tokenizer.json` file, which will be used for in-database tokenization.

Both artifacts are deployed using the `save_byom` function, which abstracts the underlying complexity and makes the deployment process straightforward. Internally, this function performs an insert operation into the database.

By using the `save_byom` function, we ensure that our model and tokenizer are readily available within the Teradata database for subsequent summarization operations. This integration minimizes data movement and optimizes performance by keeping all operations within the database environment.

In [None]:
#UNCOMMENT IF TABLE EXISTS
#tdml.db_drop_table('summarization_models')
#THIS OPERATION MAY TAKE A WHILE
tdml.save_byom('t5-small-headline-generator',
              './t5-small-headline-generator/t5-small-headline-generator_fixed.onnx',
              'summarization_models')

#UNCOMMENT IF TABLE EXISTS
#tdml.db_drop_table('summarization_tokenizers')
tdml.save_byom('t5-small-headline-generator',
              './t5-small-headline-generator/tokenizer.json',
              'summarization_tokenizers')

## Part 3. In-Database Inference for Text Summarization

Running a summarization model directly inside the Teradata database offers numerous benefits, including enhanced security, reduced data transfer overhead, and the utilization of Teradata's robust computational capabilities. This setup enables efficient, scalable text processing workflows, which is particularly advantageous for summarizing large volumes of text such as email communications.

In this demonstration, we'll use a small table of made-up emails as an example to illustrate how the summarization model works in-database.


In [None]:
tdml.DataFrame(tdml.in_schema('emails', 'emails')).head(3)

### Tokenization

The first sub-step involves creating a view with a tokenization function. This function converts the original text into binary vectors of tokens, which are the model-readable formats of the texts. Tokenization is a crucial process where text strings are broken down into smaller pieces or tokens. In Teradata, you can utilize any tokenizer from the Hugging Face models, which allows flexibility depending on the specific model or language nuances.

During tokenization, not only are token IDs generated, but an attention mask is also produced. The attention mask is an array of 1s and 0s indicating which tokens should be attended to, and which should be ignored by the model. This is essential for models to handle variable length inputs effectively and is particularly useful for padding sequences to a uniform length.


In [None]:
tdml.execute_sql("""

replace view v_emails_tokenized as (
    select
        id,
        txt,
        IDS as input_ids,
        attention_mask,
        cast(50 as BIGINT) max_length_0,
        cast(10 as BIGINT) min_length_0,
        cast(4 as BIGINT) num_beams_0,
        cast(1 as BIGINT) num_return_sequences_0,
        cast(1 as FLOAT) repetition_penalty_0,
        cast(2 as FLOAT) length_penalty_0
    from ivsm.tokenizer_encode(
        on emails.emails
        on (select model as tokenizer from summarization_tokenizers) DIMENSION
        USING
            ColumnsToPreserve('id', 'txt')
            OutputFields('IDS', 'ATTENTION_MASK')
            MaxLength(512)
            PadToMaxLength('True')
            TokenDataType('INT32')
            Debug('True')
    ) a
)
""")

### Model Application

The second sub-step is the actual application of the model in its ONNX format to the token vectors. When applying the model, parameters such as the number of beams for Beam Search, along with repetition and length penalties, are provided. These parameters are critical for controlling the generation process:
   - **Number of Beams**: This parameter for Beam Search controls how many different paths or 'beams' are considered during the decoding phase, allowing for more thorough exploration of possible translations or summaries
   - **Repetition Penalty**: This discourages the model from repeating the same line or phrase, enhancing the diversity and naturalness of the generated text
   - **Length Penalty**: This adjusts the model's preference for longer or shorter sentences, helping to ensure the output matches desired verbosity or succinctness

Running the model in-database capitalizes on Teradata's inbuilt efficiencies, ensuring that these computations are performed swiftly and at scale. The output from this step is again in the form of binary vectors representing the tokens of the summarized text

In [None]:
tdml.execute_sql("""
replace view v_emails_encoded
as (
    select 
            *
    from ivsm.IVSM_score(
            on v_emails_tokenized  -- table with data to be scored
            on summarization_models dimension
            using
                ColumnsToPreserve('id', 'txt') -- columns to be copied from input table
                ModelType('ONNX') -- model format
                BinaryInputFields('input_ids', 'attention_mask') -- enables binary input vectors
                BinaryOutputFields('sequences') -- define which output tensors to be outputed in binary format
                Caching('interquery') -- trun on model caching within the query
        ) a )
""")

### Detokenization

The final sub-step is detokenization, where the binary vector outputs from the model are converted back into human-readable text. This process is the inverse of tokenization and is essential for transforming the model's output into a format that is easily understandable by humans. Detokenization reaffirms the seamless integration of sophisticated NLP models within database operations, bridging the gap between advanced AI computations and practical business applications.

By executing these steps within the Teradata database, we harness the full power of in-database analytics to perform complex text summarization tasks directly where the data resides, reducing latency and enhancing overall data management efficiency.


In [None]:
#UNCOMMENT IF TABLE EXISTS
#tdml.db_drop_table("emails_processed")
tdml.execute_sql("""

create table emails_processed as 
(
    select
        *
    from ivsm.tokenizer_decode(
        on (select id, txt, sequences as vector from v_emails_encoded)
        on (select model as tokenizer from summarization_tokenizers) DIMENSION
        USING
            ColumnsToPreserve('id', 'txt')
            TokenDataType('INT32')
            SkipSpecialTokens('False')
    ) a
) with data

""")

%time

In [None]:
tdml.DataFrame("emails_processed")