Clayton Cohn<br>
4 Nov 2023<br>
OELE Lab<br>
Vanderbilt University

#<center> S18 Consensus and Cohen's *k*

## Introduction and Attribution

This notebook was create by Clayton Cohn for the purpose of creating the MMLTE survey's final consensus document.

In this notebook, we will:
*   Map all extracted features to acronyms
*   Export IRR spreadsheet for Cohen's *k*

In the next notebook, we will:
*   Calculate Cohen's k for the additional extracted features
*   Export final consensus and spreadsheet

The MMLTE survey project is a collaborative effor between Dr. Gautam Biswas, Clayton Cohn, Eduardo Davalos, Joyce Fonteles, Dr. Meiyi Ma, Caleb Vatral, and Hanchen (David) Wang.

## Data Import

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
S18_PATH = "drive/My Drive/Clayton/20230420_MMLTE/S18_Consensus.csv"

In [3]:
import pandas as pd

# Display max rows, columns, column length
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

df = pd.read_csv(S18_PATH,header=0)

for col in df:
  if col not in {"Analysis Results (w/ multimodal advantages)","Reviewer Notes"}:
    assert not df[col].isnull().values.any(), print(col)

df.head()

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Sub,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,physical,STEM,"individual, multi-person",instructional,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,physical,STEM,"individual, multi-person",instructional,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,physical,STEM,"individual, multi-person",instructional,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,physical,STEM,multi-person,instructional,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,physical,STEM,multi-person,instructional,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


## Map to Acronyms

### Environment Setting

In [4]:
env_setting_vals = set()
env_setting_list = [l.split(", ") for l in list(df["Environment Setting"])]
for item in env_setting_list:
  for val in item:
    env_setting_vals.add(val.lower())

env_setting_vals

{'blended', 'physical', 'unspecified', 'virtual', 'virutal'}

In [5]:
env_setting_map = {
    "blended": "BLND",
    "physical": "PHYS",
    "virtual": "VIRT",
    "virutal": "VIRT",
    "unspecified": "UNSP"
}

In [6]:
new_env_setting_list = []

for item in env_setting_list:
  new_item = set()
  for val in item:
    new_item.add(env_setting_map[val.lower()])
  new_env_setting_list.append(list(new_item))

new_env_setting_list[:5]

[['PHYS'], ['PHYS'], ['PHYS'], ['PHYS'], ['PHYS']]

In [7]:
new_env_setting_list_combined = []

for item in new_env_setting_list:
  new_item = ", ".join(item)
  new_env_setting_list_combined.append(new_item)

new_env_setting_list_combined[:5]

['PHYS', 'PHYS', 'PHYS', 'PHYS', 'PHYS']

In [8]:
df.insert(13, "Environment Setting (mapped)", new_env_setting_list_combined)
df.head(10)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Setting (mapped),Environment Sub,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,physical,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,physical,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,physical,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,physical,PHYS,STEM,multi-person,instructional,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,physical,PHYS,STEM,multi-person,instructional,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,
5,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,physical,PHYS,STEM,multi-person,instructional,K-12,model-based,,Joyce/Eduardo,1&2,
6,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,virtual,VIRT,STEM,individual,informal,undergraduate,model-based,"Results indicate that when predicting student posttest performance and interest, models utilizing multimodal data either perform equally well or outperform models utilizing unimodal data. The findings suggest that MMLA can accurately predict students’ posttest performance and interest during game-based learning and hold significant potential for guiding real-time adaptive scaffolding",Joyce,1,
7,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,virtual,VIRT,STEM,individual,informal,undergraduate,model-based,"Common case of multimodal outperform unimodal models, through the addition of gaze to classify student's posttest performance and interest.",Eduardo,2,
8,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,virtual,VIRT,STEM,individual,informal,undergraduate,model-based,,Joyce/Eduardo,1&2,
9,1877483551,motion-based educational games: using multi-modal data to predict player’s performance,Serena Lee-Cultura,2020,Learning,"VIDEO,EYE,SENSOR","PULSE,TEMP,EDA,GAZE,POSE",CLS,MID,COG,IEEE Conference on Games,6,blended,BLND,STEM,individual,instructional,K-12,model-based,Authors conclude that the feature combination of gaze and physiological MMD provide the most accurate predictions of correct answers. They also show the feasibility of early prediction of children's performance by using half (as oppose to full) data lengths to extract features and predict correctness.,Joyce,1,


In [9]:
df.drop(columns=["Environment Setting"], inplace=True)
df.rename(columns={'Environment Setting (mapped)': 'Environment Setting'},inplace=True)
df.head(5)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Sub,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,multi-person,instructional,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,multi-person,instructional,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


### Environment Subject

In [10]:
env_subject_vals = set()
env_subject_list = [l.split(", ") for l in list(df["Environment Sub"])]
for item in env_subject_list:
  for val in item:
    env_subject_vals.add(val.lower())

env_subject_vals

{'humanities',
 'language arts',
 'other',
 'psychomotor',
 'psychomotor skills',
 'stem',
 'unspecified'}

In [11]:
env_subject_map = {
    'humanities':"HUM",
    'language arts':"HUM",
    'other':"OTH",
    'psychomotor':"PSY",
    'psychomotor skills':"PSY",
    'stem':"STEM",
    'unspecified':"UNSP"
}

In [12]:
new_env_subject_list = []

for item in env_subject_list:
  new_item = set()
  for val in item:
    new_item.add(env_subject_map[val.lower()])
  new_env_subject_list.append(list(new_item))

new_env_subject_list[:5]

[['STEM'], ['STEM'], ['STEM'], ['STEM'], ['STEM']]

In [13]:
new_env_subject_list_combined = []

for item in new_env_subject_list:
  new_item = ", ".join(item)
  new_env_subject_list_combined.append(new_item)

new_env_subject_list_combined[:5]

['STEM', 'STEM', 'STEM', 'STEM', 'STEM']

In [14]:
df.insert(14, "Environment Subject (mapped)", new_env_subject_list_combined)
df[15:25]

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Sub,Environment Subject (mapped),Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
15,2070224207,detecting medical simulation errors with machine learning and multimodal data,Daniele Di Mitri,2019,Training,"VIDEO,MOTION,LOGS","POSE,LOGS",CLS,MID,CAIM,Conference on Artificial Intelligence in Medicine,11,BLND,psychomotor skills,PSY,individual,training,undergraduate,model-based,"Used each Chest Compression as training sample by masking/windowing of the original time series, then trained an LSTM network with all these samples and were able to classify accurately the target classes, however discarding the rest of the time-series they were not able to detect if a CC happened. Author asks Doctorial Consortium how, given the available data, could they train a classifier able to detect whether a CC happened or not.",Joyce,1,
16,2070224207,detecting medical simulation errors with machine learning and multimodal data,Daniele Di Mitri,2019,Training,"VIDEO,MOTION,LOGS","POSE,LOGS",CLS,MID,CAIM,Conference on Artificial Intelligence in Medicine,11,BLND,psychomotor skills,PSY,individual,training,undergraduate,model-based,"Trained an LSTM to predict ['too slow', 'on-point', 'too fast'] for Chest compression training. Achieved 70-75% accuracy.",Eduardo,2,
17,2070224207,detecting medical simulation errors with machine learning and multimodal data,Daniele Di Mitri,2019,Training,"VIDEO,MOTION,LOGS","POSE,LOGS",CLS,MID,CAIM,Conference on Artificial Intelligence in Medicine,11,BLND,psychomotor skills,PSY,individual,training,undergraduate,model-based,,Joyce/Eduardo,1&2,
18,2634033325,controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting,Xavier Ochoa,2020,Training,"VIDEO,AUDIO,PPA","POSE,PROS,PPA",STATS,OTH,BJET,British Journal of Educational Technology,12,BLND,humanities,HUM,individual,informal,unspecified,model-free,"Evidence found in this paper suggests that automated feedback has a positive effect on oral presentation quality, but that the strength of this effect is small. Furthermore, different oral presentation dimensions are affected differently by the use of the system (i.e., there are large gains in looking at the audience during the presentation, while there is a negligible improvement in the avoidance of filled pauses)",Joyce,1,
19,2634033325,controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting,Xavier Ochoa,2020,Training,"VIDEO,AUDIO,PPA","POSE,PROS,PPA",STATS,OTH,BJET,British Journal of Educational Technology,12,BLND,humanities,HUM,individual,training,unspecified,model-free,Authors showcase that the training tool improved manually defined scores between an initial and second use of the tool.,Eduardo,2,
20,2634033325,controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting,Xavier Ochoa,2020,Training,"VIDEO,AUDIO,PPA","POSE,PROS,PPA",STATS,OTH,BJET,British Journal of Educational Technology,12,BLND,humanities,HUM,individual,training,unspecified,model-free,,Joyce/Eduardo,1&2,
21,3051560548,temporal analysis of multimodal data to predict collaborative learning outcomes,Jennifer K. Olsen,2020,Learning,"LOGS,AUDIO,EYE","GAZE,LOGS,PROS,TRANS,QUAL",REG,MID,BJET,British Journal of Educational Technology,13,VIRT,STEM,STEM,multi-person,instructional,K-12,model-based,"Assessing the relation of dual gaze, tutor log, audio and dialog data to students’ learning gains, we find that a combination of modalities, especially those at a smaller time scale, such as gaze and audio, provides a more accurate prediction of learning gains than models with a single modality.",Joyce,1,
22,3051560548,temporal analysis of multimodal data to predict collaborative learning outcomes,Jennifer K. Olsen,2020,Learning,"LOGS,AUDIO,EYE","GAZE,LOGS,PROS,TRANS,QUAL",REG,MID,BJET,British Journal of Educational Technology,13,VIRT,STEM,STEM,multi-person,instructional,K-12,model-based,Evaluating how multimodal features contribute to a model's performance to predict learning gains. Audio features introduce noise that negatively impacted the error of the model.,Eduardo,2,
23,3051560548,temporal analysis of multimodal data to predict collaborative learning outcomes,Jennifer K. Olsen,2020,Learning,"LOGS,AUDIO,EYE","GAZE,LOGS,PROS,TRANS,QUAL",REG,MID,BJET,British Journal of Educational Technology,13,VIRT,STEM,STEM,multi-person,instructional,K-12,model-based,,Joyce/Eduardo,1&2,
24,3339002981,estimation of success in collaborative learning based on multimodal learning analytics features,Daniel Spikol,2017,Learning,"EYE,LOGS,VIDEO,AUDIO","GAZE,LOGS,PROS,POSE",CLS,MID,ICALT,International Conference on Advanced Learning Technologies,14,VIRT,STEM,STEM,multi-person,instructional,undergraduate,model-based,"Assessing the relation of dual gaze, tutor log, audio and dialog data to students’ learning gains, authors found that a combination of modalities, especially those at a smaller time scale, such as gaze and audio, provides a more accurate prediction of learning gains than models with a single modality.",Joyce,1,


In [15]:
df.drop(columns=["Environment Sub"], inplace=True)
df.rename(columns={'Environment Subject (mapped)': 'Environment Subject'},inplace=True)
df.head(5)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person",instructional,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,multi-person,instructional,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,multi-person,instructional,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


### Participant Structure

In [16]:
participant_vals = set()
participant_list = [l.split(", ") for l in list(df["Participant Structure"])]
for item in participant_list:
  for val in item:
    participant_vals.add(val.lower())

participant_vals

{'individual', 'multi-person', 'multi-student', 'mutli-person'}

In [17]:
participant_map = {
    'individual':"IND",
    'multi-person':"MULTI",
    'multi-student':"MULTI",
    'mutli-person':"MULTI"
}

In [18]:
new_participant_list = []

for item in participant_list:
  new_item = set()
  for val in item:
    new_item.add(participant_map[val.lower()])
  new_participant_list.append(list(new_item))

new_participant_list[:5]

[['IND', 'MULTI'], ['IND', 'MULTI'], ['IND', 'MULTI'], ['MULTI'], ['MULTI']]

In [19]:
new_participant_list_combined = []

for item in new_participant_list:
  new_item = ", ".join(item)
  new_participant_list_combined.append(new_item)

new_participant_list_combined[:5]

['IND, MULTI', 'IND, MULTI', 'IND, MULTI', 'MULTI', 'MULTI']

In [20]:
df.insert(15, "Participant Structure (mapped)", new_participant_list_combined)
df.head(10)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Participant Structure (mapped),Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person","IND, MULTI",instructional,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person","IND, MULTI",instructional,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"individual, multi-person","IND, MULTI",instructional,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,multi-person,MULTI,instructional,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,multi-person,MULTI,instructional,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,
5,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,multi-person,MULTI,instructional,K-12,model-based,,Joyce/Eduardo,1&2,
6,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,individual,IND,informal,undergraduate,model-based,"Results indicate that when predicting student posttest performance and interest, models utilizing multimodal data either perform equally well or outperform models utilizing unimodal data. The findings suggest that MMLA can accurately predict students’ posttest performance and interest during game-based learning and hold significant potential for guiding real-time adaptive scaffolding",Joyce,1,
7,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,individual,IND,informal,undergraduate,model-based,"Common case of multimodal outperform unimodal models, through the addition of gaze to classify student's posttest performance and interest.",Eduardo,2,
8,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,individual,IND,informal,undergraduate,model-based,,Joyce/Eduardo,1&2,
9,1877483551,motion-based educational games: using multi-modal data to predict player’s performance,Serena Lee-Cultura,2020,Learning,"VIDEO,EYE,SENSOR","PULSE,TEMP,EDA,GAZE,POSE",CLS,MID,COG,IEEE Conference on Games,6,BLND,STEM,individual,IND,instructional,K-12,model-based,Authors conclude that the feature combination of gaze and physiological MMD provide the most accurate predictions of correct answers. They also show the feasibility of early prediction of children's performance by using half (as oppose to full) data lengths to extract features and predict correctness.,Joyce,1,


In [21]:
df.drop(columns=["Participant Structure"], inplace=True)
df.rename(columns={'Participant Structure (mapped)': 'Participant Structure'},inplace=True)
df.head(5)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",instructional,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",instructional,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",instructional,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,instructional,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,instructional,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


### Didactic Nature

In [22]:
didactic_nature_vals = set()
didactic_nature_list = [l.split(", ") for l in list(df["Didactic Nature"])]
for item in didactic_nature_list:
  for val in item:
    didactic_nature_vals.add(val.lower())

didactic_nature_vals

{'informal',
 'instructional',
 'instrutional',
 'insturctional',
 'training',
 'unspecified'}

In [23]:
didactic_nature_map = {
    'informal':"INF",
    'instructional':"INSTR",
    'instrutional':"INSTR",
    'insturctional':"INSTR",
    'training':"TRAIN",
    'unspecified':"UNSP"
}

In [24]:
new_didactic_nature_list = []

for item in didactic_nature_list:
  new_item = set()
  for val in item:
    new_item.add(didactic_nature_map[val.lower()])
  new_didactic_nature_list.append(list(new_item))

new_didactic_nature_list[:5]

[['INSTR'], ['INSTR'], ['INSTR'], ['INSTR'], ['INSTR']]

In [25]:
new_didactic_nature_list_combined = []

for item in new_didactic_nature_list:
  new_item = ", ".join(item)
  new_didactic_nature_list_combined.append(new_item)

new_didactic_nature_list_combined[:5]

['INSTR', 'INSTR', 'INSTR', 'INSTR', 'INSTR']

In [26]:
df.insert(16, "Didactic Nature (mapped)", new_didactic_nature_list_combined)
df.head(10)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Didactic Nature (mapped),Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",instructional,INSTR,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",instructional,INSTR,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",instructional,INSTR,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,instructional,INSTR,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,instructional,INSTR,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,
5,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,instructional,INSTR,K-12,model-based,,Joyce/Eduardo,1&2,
6,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,IND,informal,INF,undergraduate,model-based,"Results indicate that when predicting student posttest performance and interest, models utilizing multimodal data either perform equally well or outperform models utilizing unimodal data. The findings suggest that MMLA can accurately predict students’ posttest performance and interest during game-based learning and hold significant potential for guiding real-time adaptive scaffolding",Joyce,1,
7,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,IND,informal,INF,undergraduate,model-based,"Common case of multimodal outperform unimodal models, through the addition of gaze to classify student's posttest performance and interest.",Eduardo,2,
8,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,IND,informal,INF,undergraduate,model-based,,Joyce/Eduardo,1&2,
9,1877483551,motion-based educational games: using multi-modal data to predict player’s performance,Serena Lee-Cultura,2020,Learning,"VIDEO,EYE,SENSOR","PULSE,TEMP,EDA,GAZE,POSE",CLS,MID,COG,IEEE Conference on Games,6,BLND,STEM,IND,instructional,INSTR,K-12,model-based,Authors conclude that the feature combination of gaze and physiological MMD provide the most accurate predictions of correct answers. They also show the feasibility of early prediction of children's performance by using half (as oppose to full) data lengths to extract features and predict correctness.,Joyce,1,


In [27]:
df.drop(columns=["Didactic Nature"], inplace=True)
df.rename(columns={'Didactic Nature (mapped)': 'Didactic Nature'},inplace=True)
df.head(5)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,unspecified,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,unspecified,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,unspecified,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K-12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K-12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


### Level of Instruction or Training

In [28]:
level_vals = set()
level_list = [l.split(", ") for l in list(df["Level of Instruction or Training"])]
for item in level_list:
  for val in item:
    level_vals.add(val.lower())

level_vals

{'graduate',
 'k-12',
 'k12',
 'professional',
 'professional development',
 'undergraduate',
 'undisclosed',
 'university',
 'unspecified'}

In [29]:
level_map = {
    'graduate':"UNI",
    'k-12':"K12",
    'k12':"K12",
    'professional':"PROF",
    'professional development':"PROF",
    'undergraduate':"UNI",
    'undisclosed':"UNSP",
    'university':"UNI",
    'unspecified':"UNSP"
}

In [30]:
new_level_list = []

for item in level_list:
  new_item = set()
  for val in item:
    new_item.add(level_map[val.lower()])
  new_level_list.append(list(new_item))

new_level_list[:5]

[['UNSP'], ['UNSP'], ['UNSP'], ['K12'], ['K12']]

In [31]:
new_level_list_combined = []

for item in new_level_list:
  new_item = ", ".join(item)
  new_level_list_combined.append(new_item)

new_level_list_combined[:5]

['UNSP', 'UNSP', 'UNSP', 'K12', 'K12']

In [32]:
df.insert(17, "Level of Instruction or Training (mapped)", new_level_list_combined)
df.head(10)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Level of Instruction or Training (mapped),Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,unspecified,UNSP,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,unspecified,UNSP,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,unspecified,UNSP,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K-12,K12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K-12,K12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,
5,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K-12,K12,model-based,,Joyce/Eduardo,1&2,
6,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,IND,INF,undergraduate,UNI,model-based,"Results indicate that when predicting student posttest performance and interest, models utilizing multimodal data either perform equally well or outperform models utilizing unimodal data. The findings suggest that MMLA can accurately predict students’ posttest performance and interest during game-based learning and hold significant potential for guiding real-time adaptive scaffolding",Joyce,1,
7,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,IND,INF,undergraduate,UNI,model-based,"Common case of multimodal outperform unimodal models, through the addition of gaze to classify student's posttest performance and interest.",Eduardo,2,
8,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,IND,INF,undergraduate,UNI,model-based,,Joyce/Eduardo,1&2,
9,1877483551,motion-based educational games: using multi-modal data to predict player’s performance,Serena Lee-Cultura,2020,Learning,"VIDEO,EYE,SENSOR","PULSE,TEMP,EDA,GAZE,POSE",CLS,MID,COG,IEEE Conference on Games,6,BLND,STEM,IND,INSTR,K-12,K12,model-based,Authors conclude that the feature combination of gaze and physiological MMD provide the most accurate predictions of correct answers. They also show the feasibility of early prediction of children's performance by using half (as oppose to full) data lengths to extract features and predict correctness.,Joyce,1,


In [33]:
df.drop(columns=["Level of Instruction or Training"], inplace=True)
df.rename(columns={'Level of Instruction or Training (mapped)': 'Level of Instruction or Training'},inplace=True)
df.head(5)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,model-based,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,model-based,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,model-based,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K12,model-based,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K12,model-based,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


### Analysis Approach

In [34]:
analysis_vals = set()
analysis_list = [l.split(", ") for l in list(df["Analysis Approach"])]
for item in analysis_list:
  for val in item:
    analysis_vals.add(val.lower())

analysis_vals

{'mixed',
 'model based',
 'model free',
 'model-based',
 'model-free',
 'uses 3rd party model but for augmenting qualitative'}

In [35]:
analysis_map = {
    'mixed':"MB, MF",
    'model based':"MB",
    'model free':"MF",
    'model-based':"MB",
    'model-free':"MF",
    'uses 3rd party model but for augmenting qualitative':"MB"
}

In [36]:
new_analysis_list = []

for item in analysis_list:
  new_item = set()
  for val in item:
    new_item.add(analysis_map[val.lower()])
  new_analysis_list.append(list(new_item))

new_analysis_list[:5]

[['MB'], ['MB'], ['MB'], ['MB'], ['MB']]

In [37]:
new_analysis_list_combined = []

for item in new_analysis_list:
  new_item = ", ".join(item)
  new_analysis_list_combined.append(new_item)

new_analysis_list_combined[:5]

['MB', 'MB', 'MB', 'MB', 'MB']

In [38]:
df.insert(18, "Analysis Approach (mapped)", new_analysis_list_combined)
df[15:25]

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Approach (mapped),Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
15,2070224207,detecting medical simulation errors with machine learning and multimodal data,Daniele Di Mitri,2019,Training,"VIDEO,MOTION,LOGS","POSE,LOGS",CLS,MID,CAIM,Conference on Artificial Intelligence in Medicine,11,BLND,PSY,IND,TRAIN,UNI,model-based,MB,"Used each Chest Compression as training sample by masking/windowing of the original time series, then trained an LSTM network with all these samples and were able to classify accurately the target classes, however discarding the rest of the time-series they were not able to detect if a CC happened. Author asks Doctorial Consortium how, given the available data, could they train a classifier able to detect whether a CC happened or not.",Joyce,1,
16,2070224207,detecting medical simulation errors with machine learning and multimodal data,Daniele Di Mitri,2019,Training,"VIDEO,MOTION,LOGS","POSE,LOGS",CLS,MID,CAIM,Conference on Artificial Intelligence in Medicine,11,BLND,PSY,IND,TRAIN,UNI,model-based,MB,"Trained an LSTM to predict ['too slow', 'on-point', 'too fast'] for Chest compression training. Achieved 70-75% accuracy.",Eduardo,2,
17,2070224207,detecting medical simulation errors with machine learning and multimodal data,Daniele Di Mitri,2019,Training,"VIDEO,MOTION,LOGS","POSE,LOGS",CLS,MID,CAIM,Conference on Artificial Intelligence in Medicine,11,BLND,PSY,IND,TRAIN,UNI,model-based,MB,,Joyce/Eduardo,1&2,
18,2634033325,controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting,Xavier Ochoa,2020,Training,"VIDEO,AUDIO,PPA","POSE,PROS,PPA",STATS,OTH,BJET,British Journal of Educational Technology,12,BLND,HUM,IND,INF,UNSP,model-free,MF,"Evidence found in this paper suggests that automated feedback has a positive effect on oral presentation quality, but that the strength of this effect is small. Furthermore, different oral presentation dimensions are affected differently by the use of the system (i.e., there are large gains in looking at the audience during the presentation, while there is a negligible improvement in the avoidance of filled pauses)",Joyce,1,
19,2634033325,controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting,Xavier Ochoa,2020,Training,"VIDEO,AUDIO,PPA","POSE,PROS,PPA",STATS,OTH,BJET,British Journal of Educational Technology,12,BLND,HUM,IND,TRAIN,UNSP,model-free,MF,Authors showcase that the training tool improved manually defined scores between an initial and second use of the tool.,Eduardo,2,
20,2634033325,controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting,Xavier Ochoa,2020,Training,"VIDEO,AUDIO,PPA","POSE,PROS,PPA",STATS,OTH,BJET,British Journal of Educational Technology,12,BLND,HUM,IND,TRAIN,UNSP,model-free,MF,,Joyce/Eduardo,1&2,
21,3051560548,temporal analysis of multimodal data to predict collaborative learning outcomes,Jennifer K. Olsen,2020,Learning,"LOGS,AUDIO,EYE","GAZE,LOGS,PROS,TRANS,QUAL",REG,MID,BJET,British Journal of Educational Technology,13,VIRT,STEM,MULTI,INSTR,K12,model-based,MB,"Assessing the relation of dual gaze, tutor log, audio and dialog data to students’ learning gains, we find that a combination of modalities, especially those at a smaller time scale, such as gaze and audio, provides a more accurate prediction of learning gains than models with a single modality.",Joyce,1,
22,3051560548,temporal analysis of multimodal data to predict collaborative learning outcomes,Jennifer K. Olsen,2020,Learning,"LOGS,AUDIO,EYE","GAZE,LOGS,PROS,TRANS,QUAL",REG,MID,BJET,British Journal of Educational Technology,13,VIRT,STEM,MULTI,INSTR,K12,model-based,MB,Evaluating how multimodal features contribute to a model's performance to predict learning gains. Audio features introduce noise that negatively impacted the error of the model.,Eduardo,2,
23,3051560548,temporal analysis of multimodal data to predict collaborative learning outcomes,Jennifer K. Olsen,2020,Learning,"LOGS,AUDIO,EYE","GAZE,LOGS,PROS,TRANS,QUAL",REG,MID,BJET,British Journal of Educational Technology,13,VIRT,STEM,MULTI,INSTR,K12,model-based,MB,,Joyce/Eduardo,1&2,
24,3339002981,estimation of success in collaborative learning based on multimodal learning analytics features,Daniel Spikol,2017,Learning,"EYE,LOGS,VIDEO,AUDIO","GAZE,LOGS,PROS,POSE",CLS,MID,ICALT,International Conference on Advanced Learning Technologies,14,VIRT,STEM,MULTI,INSTR,UNI,model-based,MB,"Assessing the relation of dual gaze, tutor log, audio and dialog data to students’ learning gains, authors found that a combination of modalities, especially those at a smaller time scale, such as gaze and audio, provides a more accurate prediction of learning gains than models with a single modality.",Joyce,1,


In [39]:
df.drop(columns=["Analysis Approach"], inplace=True)
df.rename(columns={'Analysis Approach (mapped)': 'Analysis Approach'},inplace=True)
df.head(5)

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,MB,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,MB,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,MB,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K12,MB,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K12,MB,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


## Verify New Columns

In [40]:
print("Environment Setting:", set(df["Environment Setting"]))
print("Environment Subject", set(df["Environment Subject"]))
print("Participant Structure", set(df["Participant Structure"]))
print("Didactic Nature", set(df["Didactic Nature"]))
print("Level of Instruction or Training", set(df["Level of Instruction or Training"]))
print("Analysis Approach", set(df["Analysis Approach"]))

Environment Setting: {'VIRT', 'BLND', 'VIRT, PHYS', 'UNSP', 'PHYS'}
Environment Subject {'HUM, OTH, STEM', 'HUM, STEM', 'STEM', 'OTH', 'HUM', 'UNSP', 'PSY'}
Participant Structure {'IND', 'IND, MULTI', 'MULTI'}
Didactic Nature {'TRAIN, INSTR', 'INSTR', 'TRAIN', 'UNSP', 'INF'}
Level of Instruction or Training {'UNI', 'UNI, K12', 'PROF', 'UNI, UNSP', 'K12', 'UNI, PROF', 'UNSP'}
Analysis Approach {'MB, MF', 'MF, MB', 'MF', 'MB'}


## Save IRR/Consensus File