# Multi-Modal Data Analysis Workflow
**ASIST Study 3 Dataset**  


## Objective
Analyze team performance data across four modalities:
1. JSON behavior logs
2. Video recordings
3. Chat transcripts

Identify correlations between AI interventions and team outcomes.

## Note

All datasets were taken from official CHART ASIST Study 3 Dataset available at ASU official repository.

#### Subset used : 

| Team ID   | ASI ID            | trial | intervention_recipent |
|--------|----------------|-------|---------------|
| 000315 | ASI-CMURI-TA1         | T000829 | E001211, E001215, E001155 |

### Target Dataset

**AI Agent Action signals** -  
1. RemindTransporterBeep  
2. InformAboutTriagedVictim 
3. RemindMedicToInformAboutTriagedVicti     
4. TriageCriticalVictim                     
5. EvacuateCriticalVictim                   
6. EncouragePlayerProximityToMedicIHMCDyad  
7. RemindChangeMarke                        
8. RemindRubblePerturbatio                  
9. EvacuationZoneDistanc                    
10. TeamSawVictimMarke                       
11. TimeElapse                               
12. StartEvacuatio

**Participants Action Signals**- 


| Time Stamp | AI Message | AI Action Class                                                                                                                                                                                                 | Transporter Message | Engineer Message | Medic Message | Transporter Status | Engineer Status | Medic Status | Nearest Victim Distance | Team Score | AI Advice Score |
|------------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|------------------|---------------|--------------------|-----------------|--------------|-------------------------|------------|-----------------|
| 11:23:01   | N/A        | RemindTransporterBeep,<br>InformAboutTriagedVictim,<br>RemindMedicToInformAboutTriagedVictim,<br>TriageCriticalVictim,<br>EvacuateCriticalVictim,<br>EncouragePlayerProximityToMedicIHMCDyad,<br>RemindChangeMarker,<br>RemindRubblePerturbation,<br>EvacuationZoneDistance,<br>TeamSawVictimMarker,<br>TimeElapsed,<br>StartEvacuation | N/A                 | N/A              | N/A           | N/A                | N/A             | N/A          | N/A                     | N/A        | N/A             |


## Installing dependencies

In [1]:
# %pip install opencv-python pandas scikit-learn matplotlib torch

In [2]:
import cv2
from pathlib import Path
import json
import pandas as pd
# import openai
import matplotlib.pyplot as plt
import os
import warnings
# import torch
warnings.filterwarnings("ignore")
import json

# Preparing Dataset

### 1. JSON Logs Processing
#### Objective
Extract structured data from nested JSON logs containing:
- Team actions
- AI intervention timestamps
- Mission outcomes


In [3]:


# def parse_json_logs(input_path: Path, output_path: Path) -> pd.DataFrame:
#     """Flatten nested JSON logs into structured format"""
#     with open(input_path, 'r') as f:
#         data = [json.loads(line) for line in f]
    
#     df = pd.json_normalize(data, sep='_')
#     df.to_csv(output_path, index=False)
#     return df

# # Process all trial messages
# input_files = [
#     Path("data/json_logs/HSRData_TrialMessages_Trial-T000603_..."),
#     Path("data/json_logs/HSRData_TrialMessages_Trial-T000639_..."),
#     Path("data/json_logs/HSRData_TrialMessages_Trial-T000671_...")
# ]

# output_dir = Path("data/processed/json_parsed/")
# output_dir.mkdir(parents=True, exist_ok=True)

# for file in input_files:
#     output_file = output_dir / f"{file.stem}_parsed.csv"
#     df = parse_json_logs(file, output_file)
#     print(f"Processed {len(df)} records from {file.name}")



### 2. Video Analysis
#### Objective
Extract key frames every 10 seconds for:
- Activity pattern analysis
- Non-verbal communication study


In [4]:
# import cv2
# import os
# from pathlib import Path

# def extract_frames(video_path: Path, output_dir: Path, interval: int = 10):
#     """Extract frames at fixed intervals (default: 10 seconds)"""
#     vidcap = cv2.VideoCapture(str(video_path))
#     if not vidcap.isOpened():
#         print(f"Error opening video: {video_path}")
#         return
    
#     fps = vidcap.get(cv2.CAP_PROP_FPS)
#     frame_interval = int(fps * interval)
    
#     count = 0
#     while vidcap.isOpened():
#         success, frame = vidcap.read()
#         if not success: 
#             break
#         if count % frame_interval == 0:
#             cv2.imwrite(str(output_dir / f"frame_{count:04d}.jpg"), frame)
#         count += 1
#     vidcap.release()

# # Configuration
# video_dir = Path("../data/videos/")
# output_base_dir = Path("../processed_data/")
# output_base_dir.mkdir(parents=True, exist_ok=True)

# # Process all MP4 files
# for idx, video_file in enumerate(video_dir.glob("*.mp4"), start=1):
#     folder_path = output_base_dir / str(idx)
#     folder_path.mkdir(parents=True, exist_ok=True)
    
#     print(f"Processing {video_file.name} -> Folder {idx}")
#     extract_frames(video_file, folder_path)


### 3. Game Metrics Implementation

#### Objective

Extract game metrics to the dataframe along with data given to AI

In [5]:
## TODO ADD GAME METRICS AND AI TRAIN DATA TO DATAFRAME

# Data Analysis


In [6]:
df = pd.read_csv("/mnt/c/Users/Som/Desktop/CHART ASIST/Study3_Analysis/data/transcripts/transcript.csv")
df = df[1:]

In [7]:
df.head(5)

Unnamed: 0,trial,team,scenario,date,timestamp,asi,intervention_message,intervention_recipent,speech_message,medic,transporter,engineer,explanation
1,T000829,TM000315,Saturn_C,7/8/2022,22:56:10,ASI-CMURI-TA1,{},{},okay this is engineer room tool with known dam...,0.0,0.0,1.0,{}
2,T000829,TM000315,Saturn_C,7/8/2022,22:56:16,ASI-CMURI-TA1,{},{},so most likely to be critical victims in those...,0.0,0.0,1.0,{}
3,T000829,TM000315,Saturn_C,7/8/2022,22:56:18,ASI-CMURI-TA1,{},{},can you repeat that again engineer,1.0,0.0,0.0,{}
4,T000829,TM000315,Saturn_C,7/8/2022,22:56:29,ASI-CMURI-TA1,{},{},okay I3 A2 E2 and A2 have rooms with known dam...,0.0,0.0,1.0,{}
5,T000829,TM000315,Saturn_C,7/8/2022,22:56:31,ASI-CMURI-TA1,{},{},okay thank you this is medic,1.0,0.0,0.0,{}


#### Cleaning Data

Originally it had many columns and we reduced it down to only the ones which had data 



In [8]:
df["asi_message"] = df["intervention_message"].replace("{}", "")
df["team_message"] = df["speech_message"].replace("{}", "")

Adding a intervention_class column from explanation string

In [9]:
# Extract intervention_class from explanation strings
df['intervention_class'] = df['explanation'].str.extract(
    r"'intervention_class'\s*:\s*'([^']*)'"
)

# Create binary columns for each unique intervention class
intervention_classes = df['intervention_class'].dropna().unique()
for cls in intervention_classes:
    df[cls] = df['intervention_class'].eq(cls).astype(int)

# Cleanup intermediate column
df = df.drop(columns=['intervention_class'])

# Clean column names by removing 'Intervention' suffix
df = df.rename(columns=lambda col: col[:-13] if col.endswith('Intervention') else col)



We don't need unique team ids, asi ids, date, trial id, intervention_recipent id or explanation since we have extracted the unique class

In [10]:

df = df.drop(columns=["team","asi","date","timestamp","explanation","intervention_recipent","intervention_message","speech_message","trial"], axis=1)



In [11]:
df.sample(15)

Unnamed: 0,scenario,medic,transporter,engineer,asi_message,team_message,RemindTransporterBeep,InformAboutTriagedVictim,RemindMedicToInformAboutTriagedVicti,TriageCriticalVictim,EvacuateCriticalVictim,EncouragePlayerProximityToMedicIHMCDyad,RemindChangeMarke,RemindRubblePerturbatio,EvacuationZoneDistanc,TeamSawVictimMarke,TimeElapse,StartEvacuatio
96,Saturn_C,1.0,0.0,0.0,,medic to transporter this imn B1 there's a cri...,0,0,0,0,0,0,0,0,0,0,0,0
255,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0
7,Saturn_C,0.0,1.0,0.0,,that tells me that that's near the center you ...,0,0,0,0,0,0,0,0,0,0,0,0
75,Saturn_C,1.0,0.0,0.0,,yes that's correct thank you and,0,0,0,0,0,0,0,0,0,0,0,0
284,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0
253,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0
17,Saturn_C,0.0,1.0,0.0,,oh this is transporter again there is Rubble i...,0,0,0,0,0,0,0,0,0,0,0,0
2,Saturn_C,0.0,0.0,1.0,,so most likely to be critical victims in those...,0,0,0,0,0,0,0,0,0,0,0,0
63,Saturn_C,1.0,1.0,1.0,"Team, you seem to be neglecting high-value cri...",,0,0,0,1,0,0,0,0,0,0,0,0
139,Saturn_C,0.0,1.0,0.0,,all right,0,0,0,0,0,0,0,0,0,0,0,0


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 310 entries, 1 to 310
Data columns (total 18 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   scenario                                 147 non-null    object 
 1   medic                                    147 non-null    float64
 2   transporter                              147 non-null    float64
 3   engineer                                 147 non-null    float64
 4   asi_message                              147 non-null    object 
 5   team_message                             147 non-null    object 
 6   RemindTransporterBeep                    310 non-null    int64  
 7   InformAboutTriagedVictim                 310 non-null    int64  
 8   RemindMedicToInformAboutTriagedVicti     310 non-null    int64  
 9   TriageCriticalVictim                     310 non-null    int64  
 10  EvacuateCriticalVictim                   310 non-n

In [13]:
df.isnull().sum()

scenario                                   163
medic                                      163
transporter                                163
engineer                                   163
asi_message                                163
team_message                               163
RemindTransporterBeep                        0
InformAboutTriagedVictim                     0
RemindMedicToInformAboutTriagedVicti         0
TriageCriticalVictim                         0
EvacuateCriticalVictim                       0
EncouragePlayerProximityToMedicIHMCDyad      0
RemindChangeMarke                            0
RemindRubblePerturbatio                      0
EvacuationZoneDistanc                        0
TeamSawVictimMarke                           0
TimeElapse                                   0
StartEvacuatio                               0
dtype: int64

In [14]:
df.dropna(inplace=True)

 Drop duplicated values

In [15]:
# df = df.drop_duplicates()

In [16]:
df.duplicated().sum()

np.int64(26)

In [17]:
df.isnull().sum()

scenario                                   0
medic                                      0
transporter                                0
engineer                                   0
asi_message                                0
team_message                               0
RemindTransporterBeep                      0
InformAboutTriagedVictim                   0
RemindMedicToInformAboutTriagedVicti       0
TriageCriticalVictim                       0
EvacuateCriticalVictim                     0
EncouragePlayerProximityToMedicIHMCDyad    0
RemindChangeMarke                          0
RemindRubblePerturbatio                    0
EvacuationZoneDistanc                      0
TeamSawVictimMarke                         0
TimeElapse                                 0
StartEvacuatio                             0
dtype: int64

In [18]:
df

Unnamed: 0,scenario,medic,transporter,engineer,asi_message,team_message,RemindTransporterBeep,InformAboutTriagedVictim,RemindMedicToInformAboutTriagedVicti,TriageCriticalVictim,EvacuateCriticalVictim,EncouragePlayerProximityToMedicIHMCDyad,RemindChangeMarke,RemindRubblePerturbatio,EvacuationZoneDistanc,TeamSawVictimMarke,TimeElapse,StartEvacuatio
1,Saturn_C,0.0,0.0,1.0,,okay this is engineer room tool with known dam...,0,0,0,0,0,0,0,0,0,0,0,0
2,Saturn_C,0.0,0.0,1.0,,so most likely to be critical victims in those...,0,0,0,0,0,0,0,0,0,0,0,0
3,Saturn_C,1.0,0.0,0.0,,can you repeat that again engineer,0,0,0,0,0,0,0,0,0,0,0,0
4,Saturn_C,0.0,0.0,1.0,,okay I3 A2 E2 and A2 have rooms with known dam...,0,0,0,0,0,0,0,0,0,0,0,0
5,Saturn_C,1.0,0.0,0.0,,okay thank you this is medic,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,Saturn_C,1.0,0.0,0.0,,K2 transporter if you can come to these victim...,0,0,0,0,0,0,0,0,0,0,0,0
144,Saturn_C,1.0,0.0,0.0,,this is a critical victim transport so it bein...,0,0,0,0,0,0,0,0,0,0,0,0
145,Saturn_C,0.0,1.0,0.0,Transporters focusing on marking rooms and eva...,,0,0,0,0,0,1,0,0,0,0,0,0
146,Saturn_C,0.0,1.0,0.0,,yeah,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
df.team_message.isnull().sum()

np.int64(0)

In [20]:
for i, col in enumerate(df["team_message"]):
    col_str = col
    if "yeah" in col_str or "yes" in col_str or "ok" in col_str or "okay" in col_str or "alright" in col_str or "alrighty" in col_str or "sure" in col_str or "correct" in col_str or "right" in col_str or "yep" in col_str or "thanks" in col_str or "all right" in col_str or "sounds good" in col_str or "good" in col_str or "great" in col_str or "perfect" in col_str or "awesome" in col_str or "nice" in col_str  or "cool" in col_str or "thankyou" in col_str or "thank you" in col_str:
        df.loc[i, "team_message"] = 1
    else:
        df.loc[i, "team_message"] = 0


In [21]:
df[df["team_message"]==0]

Unnamed: 0,scenario,medic,transporter,engineer,asi_message,team_message,RemindTransporterBeep,InformAboutTriagedVictim,RemindMedicToInformAboutTriagedVicti,TriageCriticalVictim,EvacuateCriticalVictim,EncouragePlayerProximityToMedicIHMCDyad,RemindChangeMarke,RemindRubblePerturbatio,EvacuationZoneDistanc,TeamSawVictimMarke,TimeElapse,StartEvacuatio
1,Saturn_C,0.0,0.0,1.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Saturn_C,0.0,0.0,1.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Saturn_C,0.0,1.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Saturn_C,1.0,0.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Saturn_C,1.0,0.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
141,Saturn_C,0.0,0.0,1.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
142,Saturn_C,1.0,0.0,0.0,"Medic, if your team was informed that a victim...",0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
143,Saturn_C,1.0,0.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
144,Saturn_C,1.0,0.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
df.rename(columns={'team_message': 'team_sentiment'}, inplace=True)

In [23]:
df.dropna(inplace=True)

In [24]:
for i in df.columns:
    if not isinstance(df[i], object):
        df[i] = df[i].astype(int)


In [25]:
df

Unnamed: 0,scenario,medic,transporter,engineer,asi_message,team_sentiment,RemindTransporterBeep,InformAboutTriagedVictim,RemindMedicToInformAboutTriagedVicti,TriageCriticalVictim,EvacuateCriticalVictim,EncouragePlayerProximityToMedicIHMCDyad,RemindChangeMarke,RemindRubblePerturbatio,EvacuationZoneDistanc,TeamSawVictimMarke,TimeElapse,StartEvacuatio
1,Saturn_C,0.0,0.0,1.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Saturn_C,0.0,0.0,1.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Saturn_C,1.0,0.0,0.0,,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Saturn_C,0.0,0.0,1.0,,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Saturn_C,1.0,0.0,0.0,,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,Saturn_C,1.0,0.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
144,Saturn_C,1.0,0.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
145,Saturn_C,0.0,1.0,0.0,Transporters focusing on marking rooms and eva...,1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
146,Saturn_C,0.0,1.0,0.0,,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Label Communications with AI

Use AI to label human communication (team_message) and asi_message significance while considering the intervention classes 

In [None]:
df

#### Exploratory Data Analysis

In [None]:
df.describe()

In [None]:
df.describe(include="all")

In [None]:
s = df.select_dtypes(include="number").corr()
s

In [None]:
plt.imshow(s)
plt.xticks(range(len(s.columns)), s.columns, rotation=90)
plt.yticks(range(len(s.index)), s.index)
plt.colorbar()
plt.show()
