# Arthur Sandbox Example: Headline Summarization
In this guide, we'll use the XSum headline summarization model and dataset from Huggingface to onboard a new streaming model to the Arthur platform. Then we will use Arthur to analyze our model.


# Outline

Read on for an overview of everything this notebook will cover. **[Click here to dive straight into the code.](#Imports)**

## Onboarding

Onboarding is the process of setting up your model to be monitored by Arthur. You specify the type of data your model ingests, send a reference dataset to provide a baseline of the distribution of your data, and you configure additional settings among the services Arthur offers.

**Arthur does not need your model object itself to monitor performance - only predictions are required**

All you need to monitor your model with Arthur is to upload the predictions your model makes: Arthur computes analytics about your model based on that prediction data. This data can be computed directly by your model in a script or notebook like this one to be uploaded to the platform, or can be fetched from an external database to be sent to Arthur.

### Getting Model Predictions
We'll prepare a sample from a headline summarization dataset and generate answers from a pretrained model from Huggingface.

### Registering Model with Arthur
We'll configure our model's attributes and save the model to the Arthur platform.

### Sending Inferences
We'll send model inferences (inputs and predictions) to the Arthur platform.


## Model Monitoring and Analysis

Once onboarding is complete and you have inferences uploaded to the platform, you can use Arthur to get model monitoring insights.

We will analyze the token-likelihood sequences produced by the model, measuring their length and their spikiness in changes in token likelihood from one token to the next.

---

# Setup & Imports

In [1]:
# ensure required packages are installed
#  don't worry, our requirements are flexible!

! pip install -r requirements.txt > /dev/null

In [2]:
from datetime import datetime, timedelta
from IPython.display import display, HTML
import numpy as np
import pandas as pd

---

# Onboarding

## Prepare data and predictions
We will load a sample of our training split fromt the XSum dataset. This dataset is used as a reference distribution defining the expected space of tasks for this generative text model. This way, Arthur can monitor for drift and stability of the texts inputted to your model.

In this example we will use the Pegasus encoder-decoder model finetuned on the XSum dataset to produce model predictions

In [3]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

xsum_dataset = load_dataset("xsum", split='test')

tokenizer = AutoTokenizer.from_pretrained("google/pegasus-xsum")

lm_model = AutoModelForSeq2SeqLM.from_pretrained("google/pegasus-xsum")

Found cached dataset xsum (/Users/maxcembalest/.cache/huggingface/datasets/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


The below generation function takes as input a string and returns a tuple containing the token ids of the generated sequence as well as the log probabilities as an array of shape (sequence length, vocab size)

In [4]:
from typing import Tuple, List
def generate_summary(article: str) -> Tuple[List[int], np.array]:
    tokens = tokenizer(article, max_length=512, return_tensors="pt")
    generation = lm_model.generate(**tokens, return_dict_in_generate=True, output_scores=True, num_beams=1, renormalize_logits=True)
    return generation.sequences, np.stack(generation.scores).squeeze()

Get a mapping from ids to tokens in the vocabulary

In [5]:
vocab = tokenizer.get_vocab()
id_to_vocab = {id_: token for token, id_ in vocab.items()}

Next, we use the Arthur SDK function `tensors_to_arthur_inference` to convert our model outputs to an Arthur-ready format. We can choose how many (up to 5) of the top probability tokens to log for each inference.

In [6]:
from arthurai.core.model_utils import tensors_to_arthur_inference

inferences = []
for i in range(5):
    # get model output
    generation = generate_summary(xsum_dataset[i]['document'])
    # decode tokens to final generated text
    seq = tokenizer.batch_decode(generation[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
    likelihoods = tensors_to_arthur_inference(generation[1], id_to_vocab, 3)
    inference = {
        'document': xsum_dataset[i]['document'],
        'generated_summary': seq[0],
        'summary_token_probs': likelihoods
    }
    inferences.append(inference)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [7]:
inferences

[{'document': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said th

## Registering Model With Arthur

### Setting Up Connection
Supply your login to authenticate with the platform.

In [8]:
from arthurai import ArthurAI
# connect to Arthur
# UNCOMMENT the two lines below and enter your details
arthur = ArthurAI(
    url="https://dev.arthur.ai",  # you can also pass this through the ARTHUR_ENDPOINT_URL environment variable
    login="admin",  # you can also pass this through the ARTHUR_LOGIN environment variable
)

Please enter password for admin: ········


### Registering Model Type

Next, we need to register the schema of the model with Arthur. Since this is a text to text model, we need to supply:

* a name for the model's input text
* a name for the model's output text
* optionally, additional column names for tokenized text and likelihoods
* optionally, ground truth or human generated texts

We'll instantiate an [`ArthurModel`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html) with the `ArthurAI.model()` method, which constructs a new local `ArthurModel` object. Later we'll use `ArthurModel.save()` to register this model with the Arthur platform.

We give the model a user-friendly `display_name` and allow the unique `partner_model_id` field to be automatically generated, but you can supply a unique identifier if it helps you map your models in Arthur to your other MLOps systems.

The `InputType` of a model specifies the general type of data your model ingests. The `OutputType` of a model specifies the modeling task at hand.

In [9]:
from datetime import datetime
from arthurai.common.constants import InputType, OutputType

# instantiate an ArthurModel object to be registered 
# with the Arthur platform with a name and a type
arthur_model = arthur.model(
    partner_model_id=f"NewsHeadlineSummarization_QS-{datetime.now().strftime('%Y%m%d%H%M%S')}",
    display_name="News Headline Generation",
    input_type=InputType.NLP,
    output_type=OutputType.TokenSequence
)

We have just registered a streaming model, which is the default for an ArthurModel. A streaming model receives instances of data as they come into the deployed model. A batch model, in contrast, receives data in groups, and is often preferred if your model runs as a job rather than operating in realtime or over a data stream. To indicated a batch model, simply add an `is_batch=True` parameter to the call above.

You can read more about [streaming and batch models in our documentation](https://docs.arthur.ai/user-guide/basic_concepts.html#basic-concepts-streaming-vs-batch).

### Building the model by specifying attributes

We use a helper function to register the model attributes for the input text (`input_column`) and output text (`output_text_column`) the model will process.

In addition, we register an attribute for the `output_likelihood_column`, which will be the attribute from our data that represents the likelihood of generated tokens, as well as an attribute called `ground_truth_text_column`, which will be the ground truth we evaluate our output text against.

In [10]:
arthur_model.build_token_sequence_model(input_column ='document',
                                 output_text_column='generated_summary',
                                 output_likelihood_column='summary_token_probs',
                                 ground_truth_text_column='summary')

Unnamed: 0,name,stage,value_type,categorical,is_unique,categories,bins,range,monitor_for_bias
0,document,PIPELINE_INPUT,UNSTRUCTURED_TEXT,True,True,[],,"[None, None]",False
1,summary_token_probs,PREDICTED_VALUE,TOKEN_LIKELIHOODS,False,False,[],,"[None, None]",False
2,generated_summary,PREDICTED_VALUE,UNSTRUCTURED_TEXT,True,False,[],,"[None, None]",False
3,summary,GROUND_TRUTH,UNSTRUCTURED_TEXT,True,True,[],,"[None, None]",False


### Saving the Model

Before saving, be sure to review your model to make sure everything is correct. We already saw the model schema returned by `ArthurModel.build()`, but we have since changed our attribute congiruations. Therefore we call `ArthurModel.review()` to see that our changed attributes look correct before saving to the platform. See the [onboarding walkthrough on the Arthur docs](https://docs.arthur.ai/user-guide/walkthroughs/model-onboarding/index.html#review-model) for tips on reviewing your model.

Note that while we capture the ranges of the attributes in this schema, they don’t need to be exact and won’t affect any performance calculations. They’re used as metadata to configure plots in the online Arthur dashboard, but never affect data drift or any other computations.

In [12]:
# review the model attribute properties in the model schema
arthur_model.review()

Unnamed: 0,name,stage,value_type,categorical,is_unique,categories,bins,range,monitor_for_bias
0,document,PIPELINE_INPUT,UNSTRUCTURED_TEXT,True,True,[],,"[None, None]",False
1,summary_token_probs,PREDICTED_VALUE,TOKEN_LIKELIHOODS,False,False,[],,"[None, None]",False
2,generated_summary,PREDICTED_VALUE,UNSTRUCTURED_TEXT,True,False,[],,"[None, None]",False
3,summary,GROUND_TRUTH,UNSTRUCTURED_TEXT,True,True,[],,"[None, None]",False


In [13]:
inferences

[{'document': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said th

Now, we save the model. 

Note that this will be the first call to send data to the Arthur platform so far in this example - no information has been sent yet to the platform.

The method `ArthurModel.save()` sends an API request to Arthur to validate your model - if there are any problems with your model schema, this method will result in an error informing you how to correct your model's configuration. If no errors are found, the model will be saved to the platform.

In [14]:
# validate the model and save it onto the Arthur platform
model_id = arthur_model.save()
with open("quickstart_model_id.txt", "w") as f:
    f.write(model_id)

16:30:19 - arthurai - We have registered the  model with Arthur and are getting it ready to accept inferences...
16:31:05 - arthurai - Model Creation Completed successfully, you can now send Data to Arthur.


### Set reference data

In [15]:
arthur_model.set_reference_data(directory_path='./')

16:31:05 - arthurai - Starting upload (0.149 MB in 1 files), depending on data size this may take a few minutes
16:31:05 - arthurai - Upload completed: reference.parquet


({'counts': {'success': 50, 'failure': 0, 'total': 50}, 'failures': [[]]},
 {'dataset_close_result': {'message': 'success'}})

### Send Inferences

One way to send inferences to Arthur is to use the `send_inferences` function, where we can pass in the predictions generated above

In [16]:
arthur_model.send_inferences(inferences)

16:31:05 - arthurai - 5 rows were missing inference_timestamp fields, so the current time was populated
16:31:05 - arthurai - 5 rows were missing partner_inference_id fields, so UUIDs were generated, see return values


{'counts': {'failure': 0, 'success': 5, 'total': 5},
 'results': [{'message': 'success',
   'status': 200,
   'partner_inference_id': 'TtQvNoxk6E4Fi8bzTu7tqJ'},
  {'message': 'success',
   'status': 200,
   'partner_inference_id': 'UXpYpRoiT5nfRe24rCRckj'},
  {'message': 'success',
   'status': 200,
   'partner_inference_id': 'hgsTdNLVxyKHCokFrRVbnj'},
  {'message': 'success',
   'status': 200,
   'partner_inference_id': '587bdV9jrGx9Vt7hDoPPzn'},
  {'message': 'success',
   'status': 200,
   'partner_inference_id': 'oQmtoESKKyJfy74icQGbQQ'}]}

We can also send a bulk json inference upload of inferences to the model by providing a path to the location of the inferences file

In [17]:
arthur_model.send_bulk_inferences(directory_path='./inferences/')

16:31:05 - arthurai - Starting upload (0.321 MB in 1 files), depending on data size this may take a few minutes




16:31:06 - arthurai - Upload completed: inferences/inferences.json


({'counts': {'success': 50, 'failure': 0, 'total': 50}, 'failures': [[]]},
 None)

When your ground truth values are available for your model's inferences, you can update each inference by Partner Inference ID with its corresponding label. You will need the partner_inference_ids for your data so that we can include them with our upload of ground truth values and match each prediction with its corresponding ground truth. Here, we will inspect one ground truth data point and then update in bulk.

In [18]:
import json
with open('ground_truth/gt.json', 'r') as f:
    gt = json.load(f)
gt[0]

{'ground_truth_timestamp': '2023-04-12T01:46:45.432216+00:00',
 'ground_truth_data': {'summary': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Steve

In [19]:
arthur_model.send_bulk_ground_truths(directory_path='./ground_truth/')

16:31:06 - arthurai - Starting upload (0.126 MB in 1 files), depending on data size this may take a few minutes




16:31:06 - arthurai - Upload completed: ground_truth/gt.json


{'counts': {'success': 50, 'failure': 0, 'total': 50}, 'failures': [[]]}

## See Model in Dashboard

In [20]:
# the code below will render a link for you to view your model in the Arthur Dashboard

def render_arthur_model_dashboard_link(arthur, arthur_model):
    url = 'https://' + ''.join(arthur.client.api_base_url.split('/')[1:-2])
    link_text = f"See your model ({arthur_model.display_name}) in the Arthur Dashboard"
    href_string = f"{url}/model/{arthur_model.id}/overview"
    html_string = f'<br> <a style="font-size:200%" href={href_string}>{link_text}</a> <br>'
    display(HTML(html_string))

render_arthur_model_dashboard_link(arthur, arthur_model) 

Once your inference data has been uploaded to the platform, you can see your model by following the above link to the model dashboard page to see an overview of the model and browse its inference data.

---