# Radio Message Understanding: Dual Model Approach

**Advanced Text Understanding for F1 Strategy**

For our F1 strategic decision system, we need to extract structured information from radio messages to feed our logic agent. Building on our successful sentiment analysis model, we'll implement a comprehensive approach to understand both the intent and specific entities in team radio communications.

---

## 1. Intent Classification with RoBERTa

We'll fine-tune a RoBERTa model (similar to our sentiment analysis model) to classify radio messages into specific intent categories:

- **Order:** Direct instructions to the driver ("Box this lap", "Push now")
- **Information:** Factual updates about race conditions ("Hamilton is 2 seconds behind")
- **Question:** Queries requiring driver input ("How are the tyres feeling?")
- **Warning:** Alerts about potential issues ("Watch your fuel consumption")
- **Strategy:** Long-term planning elements ("We're looking at Plan B")
- **Problem**: messages that ensures actual problems ("My left wing is broken")

This classification will help our logic agent understand the purpose of each communication and respond appropriately.

---

## 2. Custom NER with SpaCy for F1-Specific Entities

We'll train a specialized SpaCy model to identify key racing entities in the text:

- **DRIVER:** References to specific drivers
- **TEAM:** Team names and references
- **TYRES:** Tyre compounds and conditions (soft, medium, hard, intermediate, wet)
- **LAPNUM:** References to specific laps
- **TIME_GAP:** Time differences mentioned in seconds
- **STRATEGY:** Strategy terms (undercut, overcut, Plan A/B)
- **TRACK_STATUS:** Track conditions (DRS, safety car, VSC)

---

## Complete Radio Understanding Pipeline

By combining these new models with our existing sentiment analysis:

$Radio Message → [Sentiment Analysis] → [Intent Classification] → [Entity Extraction] → Structured Data$


The final output should be comprehensive structured data like:

```json
{
  "message": "Box this lap for softs, Hamilton is catching up",
  "analysis": {
    "sentiment": "neutral",
    "intent": "order",
    "entities": {
      "action": "box",
      "lap": "current",
      "tyres": "soft",
      "driver_ref": "Hamilton",
      "situation": "catching up"
    }
  }
}
````
This rich, structured information will enable my logic agent to make sophisticated race strategy decisions based on radio communications.

---

# 1. But first, I need to relabel the data.

My data is not labeled for making intention recognition. Therefore, the first thing I need to do is label again the data in a different csv for the intention categories.

Therefore, my first approach will be the following:

## Step 1: Define Intent Categories
First, we need to establish clear definitions for each intent category:

1. **ORDER**: Direct instructions requiring action from the driver

    Examples: "Box this lap", "Push now", "Stay out"


2. **INFORMATION**: Factual updates about race conditions

    Examples: "Hamilton is 2 seconds behind", "Lap time 1:34.5", "You're P4"


3. **QUESTION**: Queries requiring driver input

    Examples: "How are the tyres feeling?", "Do you want to pit this lap?", "Are you happy with the balance?"


4. **WARNING**: Alerts about potential issues or cautions

    Examples: "Watch your fuel consumption", "Yellow flag in sector 2", "VSC deployed"


5. **STRATEGY**: Long-term planning elements or discussions

    Examples: "We're looking at Plan B", "Target plus 5 on tyre management", "Consider an undercut"

6. **PROBLEM**: driver-reported issues:

    Examples: "Losing grip on the rear", "My tires are dead".

---

## Step 1: Import Necessary Libraries


In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from transformers import RobertaTokenizer, RobertaForSequenceClassification, AdamW, get_linear_schedule_with_warmup
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import time
import datetime

---

## Step 2: Manual Labeling Interface

I will make with Jupyet widgets a simple interface that helps me label the data. For this task, I´ll use `radio_filtered.csv`

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
from ipywidgets import widgets, Layout
import ipywidgets as widgets

# Load the full radio messages dataset
df = pd.read_csv('../../outputs/week4/radio_clean/radio_filtered.csv')

# Check the columns to see which one contains the radio messages
print("Columns in the dataset:", df.columns.tolist())

# Assuming there's a column containing the radio messages, we'll use that
message_column = 'radio_message'  # Adjust this based on the actual column name

# Use the entire dataset instead of a sample
intent_df = pd.DataFrame({
    'message': df[message_column].values,
    'intent': [""] * len(df)
})

# Reset the index to make sure we have 0-based sequential indices
intent_df = intent_df.reset_index(drop=True)

print(f"Total messages to label: {len(intent_df)}")

# Define the intent categories including the PROBLEM category
intent_categories = ["ORDER", "INFORMATION", "QUESTION", "WARNING", "STRATEGY", "PROBLEM"]

# Create a stateful counter for tracking which message we're on
current_index = widgets.IntText(value=0, description='Current:', layout=Layout(display='none'))

# Output widget for displaying the labeling interface
output = widgets.Output()

################################ WARNING: ONLY UNCOMMENT FOR LABELING THE DATA ######################################
# # Function to save the dataframe
# def save_dataframe():
#     # Only save rows that have been labeled
#     labeled_df = intent_df[intent_df['intent'] != ""]
#     labeled_df.to_csv('../../outputs/week4/radio_clean/intent_labeled_data.csv', index=False)
#     with output:
#         print(f"Dataset saved! {len(labeled_df)} labeled messages.")
        
#         if len(labeled_df) > 0:
#             # Show distribution of intents
#             plt.figure(figsize=(10, 6))
#             sns.countplot(y='intent', data=labeled_df)
#             plt.title('Distribution of Intent Categories')
#             plt.tight_layout()
#             plt.show()
######################################################################################################################
# Function to display the current message
def display_current_message():
    with output:
        output.clear_output()
        
        if current_index.value >= len(intent_df):
            current_index.value = len(intent_df) - 1
            
        if current_index.value < 0:
            current_index.value = 0
            
        idx = current_index.value
        print(f"Message {idx+1}/{len(intent_df)}:")
        print(f"\"{intent_df.iloc[idx]['message']}\"")
        
        # Show current label if any
        current_intent = intent_df.iloc[idx]['intent']
        if current_intent:
            print(f"\nCurrent label: {current_intent}")
        
        # Display intent category descriptions for reference
        print("\nIntent Categories:")
        print("ORDER: Direct instructions requiring action (Box this lap, Push now)")
        print("INFORMATION: Factual updates (Hamilton is 2 seconds behind)")
        print("QUESTION: Queries requiring driver input (How are the tyres feeling?)")
        print("WARNING: Alerts about external issues (Yellow flag in sector 2)")
        print("STRATEGY: Long-term planning elements (We're looking at Plan B)")
        print("PROBLEM: Driver-reported issues (Losing grip on the rear)")
        
        # Count labeled messages
        labeled_count = (intent_df['intent'] != "").sum()
        print(f"\nProgress: {labeled_count}/{len(intent_df)} messages labeled ({labeled_count/len(intent_df)*100:.1f}%)")

# Function to handle intent button clicks
def on_intent_button_clicked(b, intent=None):
    idx = current_index.value
    intent_df.at[idx, 'intent'] = intent
    # Automatically move to next message after labeling
    current_index.value += 1
    display_current_message()

# Navigation button handlers
def on_prev_clicked(b):
    current_index.value -= 1
    display_current_message()
    
def on_next_clicked(b):
    current_index.value += 1
    display_current_message()

# Create buttons for each intent category
intent_buttons = []
for intent in intent_categories:
    button = widgets.Button(
        description=intent,
        button_style='', 
        layout=Layout(width='150px', height='40px')
    )
    
    button.on_click(lambda b, intent=intent: on_intent_button_clicked(b, intent))
    intent_buttons.append(button)

# Create navigation buttons
prev_button = widgets.Button(
    description='« Previous',
    button_style='info',
    layout=Layout(width='120px', height='40px')
)
prev_button.on_click(on_prev_clicked)

next_button = widgets.Button(
    description='Next »',
    button_style='info',
    layout=Layout(width='120px', height='40px')
)
next_button.on_click(on_next_clicked)

# Create save button
save_button = widgets.Button(
    description='💾 Save Progress',
    button_style='success',
    layout=Layout(width='150px', height='40px')
)

######################### WARNING: ONLY UNCOMMENT FOR SAVING NEW LABELED DATA ###############################
# save_button.on_click(lambda b: save_dataframe())
#############################################################################################################



# Create button rows
intent_row1 = widgets.HBox(intent_buttons[:3], layout=Layout(justify_content='center'))
intent_row2 = widgets.HBox(intent_buttons[3:], layout=Layout(justify_content='center'))
nav_row = widgets.HBox([prev_button, save_button, next_button], layout=Layout(justify_content='center'))

# Assemble the UI
vbox = widgets.VBox([
    current_index,
    output,
    intent_row1,
    intent_row2,
    nav_row
])

# Initialize the display
display(vbox)
display_current_message()

Columns in the dataset: ['driver', 'radio_message', 'sentiment']
Total messages to label: 529


VBox(children=(IntText(value=0, description='Current:', layout=Layout(display='none')), Output(), HBox(childre…

---

## Step 3: Training an intent classifier

Next steps are:

1. *Splitting `intent_labeled_data.csv` into train/validation/test sets*.

2. *Tokenize the dataset, adjusting the maximum tokens*

3. *Download a pre-trained RoBERTa model and apply fine-tuning*.

4. Try some runs, see how the performance improves and save the best model.

The workflow here is quite similar to the one of `N03_bert_sentiment.ipynb`, but now predicting 6 classes instead of only 3.


##### Loading the dataset and applying mapping

In [28]:
df = pd.read_csv('../../outputs/week4/radio_clean/intent_labeled_data.csv')

# Display basic information about the dataset
print(f"Dataset shape: {df.shape}")
print("\nFirst few rows:")


print("\Intent distribution:")
print(df['intent'].value_counts())

Dataset shape: (529, 2)

First few rows:
\Intent distribution:
intent
INFORMATION    211
PROBLEM        109
ORDER          107
STRATEGY        35
QUESTION        33
Name: count, dtype: int64


In [29]:
print(df.head())

                                             message       intent
0  So don't forget Max, use your head please. Are...        ORDER
1  Okay Max, we're expecting rain in about 9 or 1...     QUESTION
2  You might find this lap that you meet a little...  INFORMATION
3  Just another two or three minutes to get throu...  INFORMATION
4   So settle into standard race management now Max.        ORDER


In [None]:
# # Create numeric labels based on sentiment values
# sentiment_mapping = {
#     'positive': 0, 
#     'neutral': 1, 
#     'negative': 2
# }


# df['label'] = df['sentiment'].map(sentiment_mapping)

# # Check if we need to handle any missing mappings
# if df['label'].isna().sum() > 0:
#     print(f"\nWarning: {df['label'].isna().sum()} rows couldn't be mapped. Unique values in 'sentiment':")
#     print(df['sentiment'].unique())


### 3.1 Splitting the data