# Radio Message Understanding: Dual Model Approach

**Advanced Text Understanding for F1 Strategy**

For our F1 strategic decision system, we need to extract structured information from radio messages to feed our logic agent. Building on our successful sentiment analysis model, we'll implement a comprehensive approach to understand both the intent and specific entities in team radio communications.

---

## 1. Intent Classification with RoBERTa

We'll fine-tune a RoBERTa model (similar to our sentiment analysis model) to classify radio messages into specific intent categories:

- **Order:** Direct instructions to the driver ("Box this lap", "Push now")
- **Information:** Factual updates about race conditions ("Hamilton is 2 seconds behind")
- **Question:** Queries requiring driver input ("How are the tyres feeling?")
- **Warning:** Alerts about potential issues ("Watch your fuel consumption")
- **Strategy:** Long-term planning elements ("We're looking at Plan B")

This classification will help our logic agent understand the purpose of each communication and respond appropriately.

---

## 2. Custom NER with SpaCy for F1-Specific Entities

We'll train a specialized SpaCy model to identify key racing entities in the text:

- **DRIVER:** References to specific drivers
- **TEAM:** Team names and references
- **TYRES:** Tyre compounds and conditions (soft, medium, hard, intermediate, wet)
- **LAPNUM:** References to specific laps
- **TIME_GAP:** Time differences mentioned in seconds
- **STRATEGY:** Strategy terms (undercut, overcut, Plan A/B)
- **TRACK_STATUS:** Track conditions (DRS, safety car, VSC)

---

## Complete Radio Understanding Pipeline

By combining these new models with our existing sentiment analysis:

$Radio Message → [Sentiment Analysis] → [Intent Classification] → [Entity Extraction] → Structured Data$


The final output should be comprehensive structured data like:

```json
{
  "message": "Box this lap for softs, Hamilton is catching up",
  "analysis": {
    "sentiment": "neutral",
    "intent": "order",
    "entities": {
      "action": "box",
      "lap": "current",
      "tyres": "soft",
      "driver_ref": "Hamilton",
      "situation": "catching up"
    }
  }
}
````
This rich, structured information will enable my logic agent to make sophisticated race strategy decisions based on radio communications.

---

# 1. But first, I need to relabel the data.

My data is not labeled for making intention recognition. Therefore, the first thing I need to do is label again the data in a different csv for the intention categories.

Therefore, my first approach will be the following:

## Step 1: Define Intent Categories
First, we need to establish clear definitions for each intent category:

1. **ORDER**: Direct instructions requiring action from the driver

    Examples: "Box this lap", "Push now", "Stay out"


2. **INFORMATION**: Factual updates about race conditions

    Examples: "Hamilton is 2 seconds behind", "Lap time 1:34.5", "You're P4"


3. **QUESTION**: Queries requiring driver input

    Examples: "How are the tyres feeling?", "Do you want to pit this lap?", "Are you happy with the balance?"


4. **WARNING**: Alerts about potential issues or cautions

    Examples: "Watch your fuel consumption", "Yellow flag in sector 2", "VSC deployed"


5. **STRATEGY**: Long-term planning elements or discussions

    Examples: "We're looking at Plan B", "Target plus 5 on tyre management", "Consider an undercut"

6. **PROBLEM**: driver-reported issues:
    Examples: "Losing grip on the rear", "My tires are dead".

---

## Step 1: Create Seed Dataset for Intent Classification

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

# Load the unlabeled radio messages dataset
df = pd.read_csv('../../outputs/week4/radio_clean/radio_filtered.csv')

# Check the columns to see which one contains the radio messages
print("Columns in the dataset:", df.columns.tolist())

# Assuming there's a column containing the radio messages, we'll use that
message_column = 'radio_message'  # Adjust this based on the actual column name

# Select a random sample of 50 messages for intent labeling
np.random.seed(42)  # For reproducibility
seed_sample = df.sample(n=50)

# Create a new DataFrame with just the messages and a new intent column
intent_seed_df = pd.DataFrame({
    'message': seed_sample[message_column],
    'intent': ""
})

# Display the sample for manual labeling
display(intent_seed_df)

Columns in the dataset: ['driver', 'radio_message', 'sentiment']


Unnamed: 0,message,intent
140,I think Carlos is clogged up. Copy understood.,
397,"I have a slow dark shift, but the left panel i...",
6,This time I had reasonable deg in the first st...,
334,"Yeah, but I'm just so sh**ing low in the middl...",
322,Up three.,
82,So stay with Max.,
225,"Alex, we're in a really good place, so we're s...",
495,"Okay, so a bit of split, soft and medium on th...",
522,"There's another shower approaching, expected a...",
101,So in about five laps we'll start nibbling at ...,


---

## Step 2: Manual Labeling Interface


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
from ipywidgets import widgets, Layout
import ipywidgets as widgets

# Load the unlabeled radio messages dataset
df = pd.read_csv('../../outputs/week4/radio_clean/radio_filtered.csv')

# Check the columns to see which one contains the radio messages
print("Columns in the dataset:", df.columns.tolist())

# Assuming there's a column containing the radio messages, we'll use that
message_column = 'radio_message'  # Adjust this based on the actual column name

# Select a random sample of 50 messages for intent labeling
np.random.seed(42)  # For reproducibility
seed_sample = df.sample(n=50)

# Create a new DataFrame with just the messages and a new intent column
intent_seed_df = pd.DataFrame({
    'message': seed_sample[message_column].values,
    'intent': [""] * len(seed_sample)
})

# IMPORTANT: Reset the index to make sure we have 0-based sequential indices
intent_seed_df = intent_seed_df.reset_index(drop=True)

# Define the intent categories including the new PROBLEM category
intent_categories = ["ORDER", "INFORMATION", "QUESTION", "WARNING", "STRATEGY", "PROBLEM"]

# Create a stateful counter for tracking which message we're on
current_index = widgets.IntText(value=0, description='Current:', layout=Layout(display='none'))

# Output widget for displaying the labeling interface
output = widgets.Output()

# Function to save the dataframe
def save_dataframe():
    intent_seed_df.to_csv('../../outputs/week4/radio_clean/intent_seed_dataset.csv', index=False)
    with output:
        print("Dataset saved!")
        # Show distribution of intents
        plt.figure(figsize=(10, 6))
        sns.countplot(y='intent', data=intent_seed_df)
        plt.title('Distribution of Intent Categories')
        plt.tight_layout()
        plt.show()

# Function to display the next message
def display_next_message():
    with output:
        output.clear_output()
        
        if current_index.value >= len(intent_seed_df):
            print("All messages have been labeled!")
            save_dataframe()
            return
            
        idx = current_index.value
        print(f"Message {idx+1}/{len(intent_seed_df)}:")
        print(f"\"{intent_seed_df.iloc[idx]['message']}\"")  # Use iloc instead of loc
        
        # Display intent category descriptions for reference
        print("\nIntent Categories:")
        print("ORDER: Direct instructions requiring action (Box this lap, Push now)")
        print("INFORMATION: Factual updates (Hamilton is 2 seconds behind)")
        print("QUESTION: Queries requiring driver input (How are the tyres feeling?)")
        print("WARNING: Alerts about external issues (Yellow flag in sector 2)")
        print("STRATEGY: Long-term planning elements (We're looking at Plan B)")
        print("PROBLEM: Driver-reported issues (Losing grip on the rear)")

# Create buttons for each intent category
intent_buttons = []
for intent in intent_categories:
    button = widgets.Button(
        description=intent,
        button_style='', 
        layout=Layout(width='150px')
    )
    
    def on_button_clicked(b, intent=intent):
        idx = current_index.value
        intent_seed_df.at[idx, 'intent'] = intent  # Use at instead of loc
        current_index.value += 1
        display_next_message()
    
    button.on_click(lambda b, intent=intent: on_button_clicked(b, intent))
    intent_buttons.append(button)

# Create buttons row
buttons_box = widgets.HBox(intent_buttons)

# Create save button
save_button = widgets.Button(
    description='Save Progress',
    button_style='success',
    layout=Layout(width='150px')
)
save_button.on_click(lambda b: save_dataframe())

# Assemble the UI
vbox = widgets.VBox([current_index, output, buttons_box, save_button])

# Initialize the display
display(vbox)
display_next_message()

Columns in the dataset: ['driver', 'radio_message', 'sentiment']


VBox(children=(IntText(value=0, description='Current:', layout=Layout(display='none')), Output(), HBox(childre…