# Multi-Modal Data Analysis Workflow
**ASIST Study 3 Dataset**  


## Objective
Analyze team performance data across four modalities:
1. JSON behavior logs
2. SPSS survey responses
3. Video recordings
4. Chat transcripts

Identify correlations between AI interventions and team outcomes.

## Note

All datasets were taken from official CHART ASIST Study 3 Dataset available at ASU official repository.

#### Subset used : 

| Team ID   | ASI ID            | trial | intervention_recipent |
|--------|----------------|-------|---------------|
| 000315 | ASI-CMURI-TA1         | T000829 | E001211, E001215, E001155 |


## Installing dependencies

In [None]:
# %pip install opencv-python pandas scikit-learn matplotlib torch

In [None]:
import cv2
from pathlib import Path
import json
import pandas as pd
# import openai
import matplotlib.pyplot as plt
import os
# import torch
import json

# Preparing Dataset

### 1. JSON Logs Processing
#### Objective
Extract structured data from nested JSON logs containing:
- Team actions
- AI intervention timestamps
- Mission outcomes


In [None]:


# def parse_json_logs(input_path: Path, output_path: Path) -> pd.DataFrame:
#     """Flatten nested JSON logs into structured format"""
#     with open(input_path, 'r') as f:
#         data = [json.loads(line) for line in f]
    
#     df = pd.json_normalize(data, sep='_')
#     df.to_csv(output_path, index=False)
#     return df

# # Process all trial messages
# input_files = [
#     Path("data/json_logs/HSRData_TrialMessages_Trial-T000603_..."),
#     Path("data/json_logs/HSRData_TrialMessages_Trial-T000639_..."),
#     Path("data/json_logs/HSRData_TrialMessages_Trial-T000671_...")
# ]

# output_dir = Path("data/processed/json_parsed/")
# output_dir.mkdir(parents=True, exist_ok=True)

# for file in input_files:
#     output_file = output_dir / f"{file.stem}_parsed.csv"
#     df = parse_json_logs(file, output_file)
#     print(f"Processed {len(df)} records from {file.name}")



### 2. Video Analysis
#### Objective
Extract key frames every 10 seconds for:
- Activity pattern analysis
- Non-verbal communication study


In [None]:
# import cv2
# import os
# from pathlib import Path

# def extract_frames(video_path: Path, output_dir: Path, interval: int = 10):
#     """Extract frames at fixed intervals (default: 10 seconds)"""
#     vidcap = cv2.VideoCapture(str(video_path))
#     if not vidcap.isOpened():
#         print(f"Error opening video: {video_path}")
#         return
    
#     fps = vidcap.get(cv2.CAP_PROP_FPS)
#     frame_interval = int(fps * interval)
    
#     count = 0
#     while vidcap.isOpened():
#         success, frame = vidcap.read()
#         if not success: 
#             break
#         if count % frame_interval == 0:
#             cv2.imwrite(str(output_dir / f"frame_{count:04d}.jpg"), frame)
#         count += 1
#     vidcap.release()

# # Configuration
# video_dir = Path("../data/videos/")
# output_base_dir = Path("../processed_data/")
# output_base_dir.mkdir(parents=True, exist_ok=True)

# # Process all MP4 files
# for idx, video_file in enumerate(video_dir.glob("*.mp4"), start=1):
#     folder_path = output_base_dir / str(idx)
#     folder_path.mkdir(parents=True, exist_ok=True)
    
#     print(f"Processing {video_file.name} -> Folder {idx}")
#     extract_frames(video_file, folder_path)


### 3. Game Metrics Implementation

#### Objective

Extract game metrics to the dataframe along with data given to AI

In [None]:
## TODO ADD GAME METRICS AND AI TRAIN DATA TO DATAFRAME

# Data Analysis


In [476]:
df = pd.read_csv("/mnt/c/Users/Som/Desktop/CHART ASIST/Study3_Analysis/data/transcripts/transcript.csv")

In [477]:
df.head(5)

Unnamed: 0,trial,team,scenario,date,timestamp,asi,intervention_message,intervention_recipent,speech_message,medic,transporter,engineer,explanation
0,T000829,TM000315,Saturn_C,7/8/2022,22:55:55,ASI-CMURI-TA1,"Hello, I am ATLAS, and will be providing advic...","['E001211', 'E001215', 'E001155']",{},1,1,1,"{'info': {'default_message': 'Hello, I am ATLA..."
1,T000829,TM000315,Saturn_C,7/8/2022,22:56:10,ASI-CMURI-TA1,{},{},okay this is engineer room tool with known dam...,0,0,1,{}
2,T000829,TM000315,Saturn_C,7/8/2022,22:56:16,ASI-CMURI-TA1,{},{},so most likely to be critical victims in those...,0,0,1,{}
3,T000829,TM000315,Saturn_C,7/8/2022,22:56:18,ASI-CMURI-TA1,{},{},can you repeat that again engineer,1,0,0,{}
4,T000829,TM000315,Saturn_C,7/8/2022,22:56:29,ASI-CMURI-TA1,{},{},okay I3 A2 E2 and A2 have rooms with known dam...,0,0,1,{}


In [478]:
df = df[1:]

#### Removing useless columns

Originally it had many columns and we reduced it down to only the ones which had data 



In [479]:
df["asi_message"] = df["intervention_message"].replace("{}", "")
df["team_message"] = df["speech_message"].replace("{}", "")

Adding a intervention_class column from explanation string

In [480]:
# Extract intervention_class from explanation strings
df['intervention_class'] = df['explanation'].str.extract(
    r"'intervention_class'\s*:\s*'([^']*)'"
)

# Create binary columns for each unique intervention class
intervention_classes = df['intervention_class'].dropna().unique()
for cls in intervention_classes:
    df[cls] = df['intervention_class'].eq(cls).astype(int)

# Cleanup intermediate column
df = df.drop(columns=['intervention_class'])

# Clean column names by removing 'Intervention' suffix
df = df.rename(columns=lambda col: col[:-13] if col.endswith('Intervention') else col)



We don't need unique team ids, asi ids, date, trial id, intervention_recipent id or explanation since we have extracted the unique class

In [481]:

df = df.drop(columns=["team","asi","date","timestamp","explanation","intervention_recipent","intervention_message","speech_message","trial"], axis=1)



In [482]:
df.sample(15)

Unnamed: 0,scenario,medic,transporter,engineer,asi_message,team_message,RemindTransporterBeep,InformAboutTriagedVictim,RemindMedicToInformAboutTriagedVicti,TriageCriticalVictim,...,EncouragePlayerProximityToMedicIHMCDyad,RemindChangeMarke,RemindRubblePerturbatio,EvacuationZoneDistanc,TeamSawVictimMarke,TimeElapse,StartEvacuatio,TeamWelcomeMessag,TransporterEarlyStrateg,TriagedVictimMarke
279,Saturn_D,1,1,1,"Team, please start evacuating all the triaged ...",,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
284,Saturn_D,0,1,0,"Green, remember to place a marker block after ...",,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
155,Saturn_C,1,0,0,,can we beat ourselves and this time our now,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
82,Saturn_C,0,1,0,,transporter I am blocked in by Rubble at South...,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34,Saturn_C,0,0,1,,I'm trying to find D4 D2 keep going pass D4,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
180,Saturn_D,1,0,0,,unfortunately we have to leave,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
214,Saturn_D,1,0,0,,where we are located,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
35,Saturn_C,0,0,1,,am I in the wrong hallway,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Saturn_C,0,1,0,,all right this is transporter it says that the...,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
272,Saturn_D,1,0,0,,medic to transporter there are none,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [483]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 310 entries, 1 to 310
Data columns (total 21 columns):
 #   Column                                   Non-Null Count  Dtype 
---  ------                                   --------------  ----- 
 0   scenario                                 310 non-null    object
 1   medic                                    310 non-null    int64 
 2   transporter                              310 non-null    int64 
 3   engineer                                 310 non-null    int64 
 4   asi_message                              310 non-null    object
 5   team_message                             310 non-null    object
 6   RemindTransporterBeep                    310 non-null    int64 
 7   InformAboutTriagedVictim                 310 non-null    int64 
 8   RemindMedicToInformAboutTriagedVicti     310 non-null    int64 
 9   TriageCriticalVictim                     310 non-null    int64 
 10  EvacuateCriticalVictim                   310 non-null    int64