# Generate Personalized Sports Commentary Using Generative AI

Organizations in Sports industry are harnessing the power of AI to help bring better experience to fans who engage the sports throughout the events. With latest developments in technology, the expectations of sports fans are also evolving. Fans are more immersed in rich and personalized experiences by engaging with the games on-the-go with their smartphones. The need was felt for truly unique, live sports experiences that went beyond the usual routine of watching the games on live TV or streaming devices.

When it comes to engaging sports fans, Sportscasters provides analysis by condensing live information into interesting and relevant summaries. However, given the increasing amount of data collected and analyzed in the games, it becomes a challenging task to distill the information into summaries that are most relevant to the individuals. In sports reporting, the delivery of commentaries which turns data into engaging stories could drastically enhance the fan experiences. For instance, using a particular commentary style. or a particular language that the fans speak.

That’s where AI Natural Language Generation can be helpful in this area. With Foundation Models, which are trained on trillion corpus of words, we can now take the live data collected at the stadiums, and use the models to generate commentaries in variety of styles and languages to meet individual fan preferences.

To demonstrate the capability, we'll use a foundation model to generate sports commentaries. Additionally, we'll use a sample dataset to provide the model with contextual information. 

Following are the specific artifacts to be used throughout the notebook:

* Dataset: NFL dataset provided by [kaggle](https://www.kaggle.com/datasets/maxhorowitz/nflplaybyplay2009to2016)
* Foundation model: [A121 Jurassic-2 Jumbo Instruct](https://aws.amazon.com/marketplace/pp/prodview-f4y5ksmu5kccy)

The notebook is broken down into the following steps:
1. Create a new SageMaker endpoint using a J2 Instruct model 
2. Iterate each row in the CSV data; create a prompt based on the type of play
3. Create a prompt by augmenting specifial instructions, such as style, language and so forth. 
4. Delete the endpoint.

First, we need to install A121 python library that integrates with SageMaker

In [1]:
!pip install ai21[SM] sagemaker boto3 --quiet

Import required libraries

In [2]:
import csv 
import ai21
import json
import datetime
import math
import boto3
import json
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from sagemaker import ModelPackage

# Deploy AI21 Model to SageMaker
In order to deploy AI21 models, we need to reference the model artifacts via model package ARN. 
Specifically, the model package ARNs for J2-Jumbo instruct are defined as in the following dictionary.

## Prerequisites
This notebook has been tested in SageMaker Notebook instance and SageMaker Studio environments. 
You would need the required IAM permissions in order to access resources and to deploy models to SageMaker in this notebook. Follow this [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/security_iam_service-with-iam.html) for setting up appropriate IAM Role and Policies in SageMaker for more information. 

The following steps outlines the procss of deploying an AI21 model in SageMaker: 

1. Reference the model package ARN based on the region
2. Define the name of the model
3. Select an appropriate instance type to host the model: (recommended: ml.p4d.24xlarge")
4. Use SageMaker SDK to deploy the model

In [3]:
model_package_map = {
    "us-east-1": "arn:aws:sagemaker:us-east-1:865070037744:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "us-east-2": "arn:aws:sagemaker:us-east-2:057799348421:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "us-west-1": "arn:aws:sagemaker:us-west-1:382657785993:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "us-west-2": "arn:aws:sagemaker:us-west-2:594846645681:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "ca-central-1": "arn:aws:sagemaker:ca-central-1:470592106596:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "eu-central-1": "arn:aws:sagemaker:eu-central-1:446921602837:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "eu-west-1": "arn:aws:sagemaker:eu-west-1:985815980388:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "eu-west-2": "arn:aws:sagemaker:eu-west-2:856760150666:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "eu-west-3": "arn:aws:sagemaker:eu-west-3:843114510376:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "eu-north-1": "arn:aws:sagemaker:eu-north-1:136758871317:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "ap-southeast-1": "arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "ap-southeast-2": "arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "ap-northeast-2": "arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "ap-northeast-1": "arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "ap-south-1": "arn:aws:sagemaker:ap-south-1:077584701553:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f",
    "sa-east-1": "arn:aws:sagemaker:sa-east-1:270155090741:model-package/j2-jumbo-instruct-v1-1-033-87b797db88313edf9c3851adf6fc371f"
}

First, we need to find the right model package ARN based on the region we are using.

In [4]:
region = boto3.Session().region_name
if region not in model_package_map.keys():
    raise ("UNSUPPORTED REGION")

model_package_arn = model_package_map[region]

Create a sagemaker session object and a boto3 client

In [5]:
role = get_execution_role()
sagemaker_session = sage.Session()

runtime_sm_client = boto3.client("runtime.sagemaker")

Define the model name, instance type, and the content type to be used to send requests to the model endpoint

In [6]:
model_name = "j2-jumbo-instruct"

content_type = "application/json"

real_time_inference_instance_type = (
    "ml.p4d.24xlarge"
)

create a deployable model from the model package. The model object is used by SageMaker SDK to deploy the model into SageMaker Inference.

In [7]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

Deploy the model

In [None]:
# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name, 
                         model_data_download_timeout=3600,
                         container_startup_health_check_timeout=600,
                        )

# Working with the A121 Model
After the endpoint is deployed successfully, we can start using the model to generate text. In particular, we'll be using the model to generate a commentary based on a prompt. A prompt is essentailly a short description of the particular task for the model to carry out. You can think of the prompt as the input to the model. For instance, to ask the model to answer a specific question, the prompt would be the question. Follow this linkt to learn more about [prompt engineering](https://en.wikipedia.org/wiki/Prompt_engineering). 

Specific to our use case, we would provide a prompt that would generate a sport commentary to the model. A simple example would be:

```
"As a professional sportscaster, write a commentary in 2 sentences"
```
We'll evaluate the model capabilities of generating the commentaries by exploring different prompts. Let's go!

Define common variables to be referenced throughout the notebook

In [8]:
sample_input_csv = "data/reg_pbp_2009_500.csv"
down_dict = {1: "first down", 2: "second down", 3: "third down", 4: "forth down"}
quarter_dict = {1: "first quarter", 2: "second quarter", 3: "third quarter", 4: "forth quarter"}
quarter_end = True

Define a function that takes in a prompt and return a generated commentary

In [9]:
def generate_commentary(prompt):
    response = ai21.Completion.execute(sm_endpoint="j2-jumbo-instruct",
                                   prompt=prompt,
                                   maxTokens=50,
                                   temperature=0,
                                   numResults=1)

    return response['completions'][0]['data']['text'][1:]

As described earlier, we will be using a sample dataset that contains an NFL game play by play data. 
In the following function, we are taking the CSV row and transform into a prompt based on the particular play type. 
Based on the dataset, valid play types are defined as followed:

* kickoff
* pass
* run
* punt
* field_goal
* na -> no play (such as end of the quarter)

The goal here is to provide relevant context about a play, then construct a prompt to the model for commentary generation.
The following function takes in a single row in CSV, extracts the context from the given data point based on the particular play type. 
The returned prompt will then get fed into the model to generate commentary.

In [10]:
def generate_prompt(row, language, style):
    '''
    Prompt generator based on the given row.
    :param row: dictionary that describes the play.

    :return: curated prompt based on the given row.
    '''
    global quarter_end
    prompt = {}
    prompt['starting_yard_line'] = row['yrdln']
    prompt['quarter_time_remaining'] = row['time']
    prompt['home_team_score'] = row['total_home_score']
    prompt['away_team_score'] = row['total_away_score']
    play_type = row['play_type']
    offensive_team = row['posteam']
    defensive_team = row['defteam']

    if str(row['down']).isnumeric():
        down = down_dict[int(row['down'])]
    yards_gained = row['yards_gained']
    if quarter_end:
        prompt['new quarter'] = True
        quarter_end = False
    if row['touchdown'] == 1:
        prompt['touchdown'] = "touchdown"
        prompt['number of drive'] = row['drive']
    if row['sack'] == 1:
        prompt['sacked'] = 'sacked'

    if row['penalty'] == 1:
        prompt['is penalty'] = True
        prompt['penalty team'] = row['penalty_team']
        prompt['penalty player'] = row['penalty_player_name']
        prompt['penalty yards'] = row['penalty_yards']
        prompt['penalty type'] = row['penalty_type']
    if play_type == "kickoff":
        prompt['quarter'] = quarter_dict[int(row['qtr'])]
        prompt['play_type'] = play_type
        prompt['team'] = defensive_team
        prompt['receiving team'] = offensive_team
        prompt['distance'] = row['kick_distance']
        prompt['return yards'] = row['return_yards']
        prompt['player name'] = row['kicker_player_name']
        prompt['returner player name'] = row['kickoff_returner_player_name']

    elif play_type == "pass":
        prompt['play_type'] = play_type
        prompt['offensive_team'] = offensive_team
        prompt['defensive_team'] = defensive_team
        prompt['down'] = down
        prompt['pass_length'] = row['pass_length']
        if row['touchdown'] != 1:
            prompt[f"{down} with yards to go"] = row['ydstogo']
        prompt['passer_player_name'] = row['passer_player_name']
        prompt['receiver_player_name'] = row['receiver_player_name']
        prompt['passing_yards_gained'] = yards_gained
        if row['complete_pass'] == 1:
            prompt['completion'] = "complete pass"
        else:
            prompt['completion'] = "incomplete pass"
    elif play_type == "run":
        prompt['play type'] = play_type
        prompt['offensive_team'] = offensive_team
        prompt['defensive_team'] = defensive_team
        prompt['rusher_player_name'] = row['rusher_player_name']
        prompt['rushing_yards_gained'] = yards_gained
        prompt['tackle_1_player_name'] = row['solo_tackle_1_player_name']
        prompt['tackle_2_player_name'] = row['assist_tackle_1_player_name']
    elif play_type == "punt":
        prompt['play_type'] = play_type
        prompt[f"{down} with yards to go"] = row['ydstogo']
        prompt['offensive_team'] = offensive_team
        prompt['defensive_team'] = defensive_team
        prompt['punt_distance'] = row['kick_distance']
        prompt['punter_player_name'] = row['punter_player_name']
    elif play_type == "field_goal":
        prompt['play_type'] = play_type
        prompt['offensive_team'] = offensive_team
        prompt['defensive_team'] = defensive_team
        prompt[f"{down} with yards to go"] = row['ydstogo']
        prompt['field_goal_result'] = row['field_goal_result']
        prompt['kick_distance'] = row['kick_distance']
        prompt['kicker_player_name'] = row['kicker_player_name']
    elif play_type == "extra_point":
        prompt['offensive_team'] = offensive_team
        prompt['defensive_team'] = defensive_team
        prompt['extra_point_result'] = row['extra_point_result']
        prompt['kicker_player_name'] = row['kicker_player_name']

    elif play_type != play_type:  # nan
        prompt['offensive_team'] = offensive_team
        prompt['defensive_team'] = defensive_team
        prompt['end_of_the_quarter'] = True
        quarter_end = True
        
    prompt = f"{json.dumps(prompt)} \n As a professional sportscaster, write a {style} style commentary in {language} using 2 sentences"
    return prompt

Here we will look through the CSV file line by line and look at the output from the model. For simplicity, we'll only test the first few lines of the plays to give us some ideas about how the model performs based on the give prompt. 
In simplest term, we'll explore different styles and languages. J2 models support the following languages:

* Spanish
* French
* German
* Portuguese
* Italian
* Dutch

In [11]:
def test_model(language="english", style="nfl", max_lines=5):
    current_line = 0
    with open(sample_input_csv) as csv_file:
            csv_reader = csv.DictReader(csv_file)
            for row in csv_reader:
                if max_lines == current_line:
                    break
                prompt = generate_prompt(row, language, style)
                print("\033[94m \x1B[1m PROMPT")
                print(f'\033[94m {prompt}')
                print("\033[92m \x1B[1m COMMENTARY")
                generated_commentary = f"({row['time']}) {generate_commentary(prompt)}"
                print(f'{generated_commentary}')
                current_line += 1

In [12]:
language = "English" # setting language to be in English
style = "nfl" # standard style being the NFL style
test_model(language, style)

[94m [1m PROMPT
[94m {"starting_yard_line": "TEN 30", "quarter_time_remaining": "15:00", "home_team_score": "0", "away_team_score": "0", "new quarter": true, "quarter": "first quarter", "play_type": "kickoff", "team": "TEN", "receiving team": "PIT", "distance": "67", "return yards": "39", "player name": "R.Bironas", "returner player name": "S.Logan"} 
 As a professional sportscaster, write a nfl style commentary in English using 2 sentences
[92m [1m COMMENTARY
(15:00) The Tennessee Titans kick off to the Steelers to start the first quarter. The returner, S. Logan, brings the ball back 39 yards.
[94m [1m PROMPT
[94m {"starting_yard_line": "PIT 42", "quarter_time_remaining": "14:53", "home_team_score": "0", "away_team_score": "0", "play_type": "pass", "offensive_team": "PIT", "defensive_team": "TEN", "down": "first down", "pass_length": "short", "first down with yards to go": "10", "passer_player_name": "B.Roethlisberger", "receiver_player_name": "H.Ward", "passing_yards_gained":

Let's try a different style

In [13]:
language = "english" # setting language to be in English
style = "twitter" # twitter style
test_model(language, style)

[94m [1m PROMPT
[94m {"starting_yard_line": "TEN 30", "quarter_time_remaining": "15:00", "home_team_score": "0", "away_team_score": "0", "quarter": "first quarter", "play_type": "kickoff", "team": "TEN", "receiving team": "PIT", "distance": "67", "return yards": "39", "player name": "R.Bironas", "returner player name": "S.Logan"} 
 As a professional sportscaster, write a twitter style commentary in english using 2 sentences
[92m [1m COMMENTARY
(15:00) "R.Bironas kicks off to S.Logan, who returns it 39 yards to the TEN 30. 15:00 remaining in the first quarter. #TENvsPIT"
[94m [1m PROMPT
[94m {"starting_yard_line": "PIT 42", "quarter_time_remaining": "14:53", "home_team_score": "0", "away_team_score": "0", "play_type": "pass", "offensive_team": "PIT", "defensive_team": "TEN", "down": "first down", "pass_length": "short", "first down with yards to go": "10", "passer_player_name": "B.Roethlisberger", "receiver_player_name": "H.Ward", "passing_yards_gained": "5", "completion": "inco

In [14]:
language = "english" # setting language to be in English
style = "meme" # meme style
test_model(language, style)

[94m [1m PROMPT
[94m {"starting_yard_line": "TEN 30", "quarter_time_remaining": "15:00", "home_team_score": "0", "away_team_score": "0", "quarter": "first quarter", "play_type": "kickoff", "team": "TEN", "receiving team": "PIT", "distance": "67", "return yards": "39", "player name": "R.Bironas", "returner player name": "S.Logan"} 
 As a professional sportscaster, write a meme style commentary in english using 2 sentences
[92m [1m COMMENTARY
(15:00) "And there it is! R. Bironas kicks off to S. Logan, who returns it 39 yards. Looks like it's going to be an exciting game, folks!"
[94m [1m PROMPT
[94m {"starting_yard_line": "PIT 42", "quarter_time_remaining": "14:53", "home_team_score": "0", "away_team_score": "0", "play_type": "pass", "offensive_team": "PIT", "defensive_team": "TEN", "down": "first down", "pass_length": "short", "first down with yards to go": "10", "passer_player_name": "B.Roethlisberger", "receiver_player_name": "H.Ward", "passing_yards_gained": "5", "completion"

In [15]:
language = "english" # setting language to be in English
style = "poetry" # poetry style
test_model(language, style)

[94m [1m PROMPT
[94m {"starting_yard_line": "TEN 30", "quarter_time_remaining": "15:00", "home_team_score": "0", "away_team_score": "0", "quarter": "first quarter", "play_type": "kickoff", "team": "TEN", "receiving team": "PIT", "distance": "67", "return yards": "39", "player name": "R.Bironas", "returner player name": "S.Logan"} 
 As a professional sportscaster, write a poetry style commentary in english using 2 sentences
[92m [1m COMMENTARY
(15:00) The Titans kick off to start the game,
The ball is caught by the Steelers' S. Logan,
He returns it 39 yards,
And the Steelers start their first drive.
[94m [1m PROMPT
[94m {"starting_yard_line": "PIT 42", "quarter_time_remaining": "14:53", "home_team_score": "0", "away_team_score": "0", "play_type": "pass", "offensive_team": "PIT", "defensive_team": "TEN", "down": "first down", "pass_length": "short", "first down with yards to go": "10", "passer_player_name": "B.Roethlisberger", "receiver_player_name": "H.Ward", "passing_yards_gain

In [16]:
language = "english" # setting language to be Spanish
style = "superhero" # standard style being the NFL style
test_model(language, style)

[94m [1m PROMPT
[94m {"starting_yard_line": "TEN 30", "quarter_time_remaining": "15:00", "home_team_score": "0", "away_team_score": "0", "quarter": "first quarter", "play_type": "kickoff", "team": "TEN", "receiving team": "PIT", "distance": "67", "return yards": "39", "player name": "R.Bironas", "returner player name": "S.Logan"} 
 As a professional sportscaster, write a superhero style commentary in english using 2 sentences
[92m [1m COMMENTARY
(15:00) "And now, it's time for kickoff! R. Bironas takes the field and lines up for the kick. He sends the ball soaring through the air, and it's caught by S.Logan for a 39-yard return! The first quarter is underway, and both teams are scoreless so far. It's going to be an exciting
[94m [1m PROMPT
[94m {"starting_yard_line": "PIT 42", "quarter_time_remaining": "14:53", "home_team_score": "0", "away_team_score": "0", "play_type": "pass", "offensive_team": "PIT", "defensive_team": "TEN", "down": "first down", "pass_length": "short", "fir

In [17]:
language = "spanish" # setting language to be Spanish
style = "nfl" # standard style being the NFL style
test_model(language, style)

[94m [1m PROMPT
[94m {"starting_yard_line": "TEN 30", "quarter_time_remaining": "15:00", "home_team_score": "0", "away_team_score": "0", "quarter": "first quarter", "play_type": "kickoff", "team": "TEN", "receiving team": "PIT", "distance": "67", "return yards": "39", "player name": "R.Bironas", "returner player name": "S.Logan"} 
 As a professional sportscaster, write a nfl style commentary in spanish using 2 sentences
[92m [1m COMMENTARY
(15:00) El jugador R. Bironas lanza un kickoff de 67 yardas, y es recuperado por el jugador S. Logan con 39 yardas de retorno.
[94m [1m PROMPT
[94m {"starting_yard_line": "PIT 42", "quarter_time_remaining": "14:53", "home_team_score": "0", "away_team_score": "0", "play_type": "pass", "offensive_team": "PIT", "defensive_team": "TEN", "down": "first down", "pass_length": "short", "first down with yards to go": "10", "passer_player_name": "B.Roethlisberger", "receiver_player_name": "H.Ward", "passing_yards_gained": "5", "completion": "incomplete

# Clean up
Delete the endpoint

In [None]:
pedictor.delete_endpoint()