Clayton Cohn<br>
6 Nov 2023<br>
OELE Lab<br>
Vanderbilt University

#<center> S18 Cohen's *k* and Spreadsheet Creation

## Introduction and Attribution

This notebook was create by Clayton Cohn for the purpose of calculating Cohen's *k* and creating the MMLTE survey's final consensus document.

In this notebook, we will:
*   Calculate Cohen's k for the additional extracted features
*   Export final consensus and spreadsheet

The MMLTE survey project is a collaborative effor between Dr. Gautam Biswas, Clayton Cohn, Eduardo Davalos, Joyce Fonteles, Dr. Meiyi Ma, Caleb Vatral, and Hanchen (David) Wang.

## Data Import

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
S18_PATH = "drive/My Drive/Clayton/20230420_MMLTE/S18_IRR_and_Consensus.csv"

In [3]:
import pandas as pd

# Display max rows, columns, column length
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

df = pd.read_csv(S18_PATH,header=0)

for col in df:
  if col not in {"Analysis Results (w/ multimodal advantages)","Reviewer Notes"}:
    assert not df[col].isnull().values.any(), print(col)

df.head()

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,MB,The MMLA techniques applied were able to successfully extract relevant features to quantify and visualize teacher and student behaviors and activities related to student engagement based on the classroom video and audio recordings,Joyce,1,
1,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,MB,"Within heavily rigged smart classroom, visual (gaze, posture) and auditory (speech levels) were extracted via AI and were then used (rule's based approach -> model) to compute student's engagement in individual, pair, and group structures.",Eduardo,2,
2,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,MB,,Joyce/Eduardo,1&2,
3,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K12,MB,"Results indicate that physiological arousal could indicate engagement in metacognitive interactions, provided evidence for physiological activities as triggers for adaptive loops of learning regulation in collaborative learning. They utilised the CNN model to classify the sequences of regulatory activities and Shared Physiological Arousal Events moments and results have provided a proof of concept and indicated the potential of using ML for predicting collaborative learning success.",Joyce,1,
4,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K12,MB,"The paper explores the correlation of physiological signals to metacognition interactions in a collaborative setting. Two researchers video coded sessions into sequences of interactions. With these sequences, Markov chains were made and a 1D CNN (input physiological signal, output interaction label) was trained, with preliminary results that suggest a possible future direction.",Eduardo,2,


In [4]:
# Verify S17 UUIDs match
S17_PATH = "drive/My Drive/Clayton/20230420_MMLTE/S17.csv"
df_17 = pd.read_csv(S17_PATH,header=0)
assert set(df_17.UUID) == set(df.UUID)
assert len(set(df_17.UUID)) == len(set(df.UUID)) == 73

## Partition DataFrame

Create individual DataFrames for Clayton, Caleb, Eduardo, and Joyce.

In [5]:
df_clayton = df.loc[df['Full-Read 3 by Researcher'] == "Clayton"]
df_clayton.reset_index(drop=True,inplace=True)

df_caleb = df.loc[df['Full-Read 3 by Researcher'] == "Caleb"]
df_caleb.reset_index(drop=True,inplace=True)

df_eduardo = df.loc[df['Full-Read 3 by Researcher'] == "Eduardo"]
df_eduardo.reset_index(drop=True,inplace=True)

df_joyce = df.loc[df['Full-Read 3 by Researcher'] == "Joyce"]
df_joyce.reset_index(drop=True,inplace=True)

assert len(df_clayton) + len(df_caleb) + len(df_eduardo) + len(df_joyce) == len(df) - 73
assert len(df_clayton) == len(df_caleb) and len(df_eduardo) == len(df_joyce)
assert set(df.UUID) ==  set(df_clayton.UUID).union(set(df_caleb.UUID)).union(set(df_eduardo.UUID).union(set(df_joyce.UUID)))
assert len(set(df.UUID)) == len(set(df_clayton.UUID).union(set(df_caleb.UUID))) + \
      len(set(df_eduardo.UUID).union(set(df_joyce.UUID)))

Confirm orderings are correct for Cohen's *k*.

In [6]:
for i,row in df_clayton.iterrows():
  assert row['UUID'] == df_caleb.iloc[i]['UUID']

for i,row in df_eduardo.iterrows():
  assert row['UUID'] == df_joyce.iloc[i]['UUID']

## Cohen's *k*

In [7]:
all_categories_cohens_k = {}

### Environment Setting

In [8]:
env_setting_features = ["BLND","PHYS","VIRT","UNSP"]

In [9]:
import math
from sklearn.metrics import cohen_kappa_score as cks

# Clayton/Caleb score
# Joyce/Eduardo score
# Average between both pairs
# Average of all features per category
# Average of all categories

env_setting_dict = {}

for f in env_setting_features:

  clayton_f_scores = []
  for i,row in df_clayton.iterrows():
    vals = set(row["Environment Setting"].split(", "))
    clayton_f_scores.append(1 if f in vals else 0)

  caleb_f_scores = []
  for i,row in df_caleb.iterrows():
    vals = set(row["Environment Setting"].split(", "))
    caleb_f_scores.append(1 if f in vals else 0)

  eduardo_f_scores = []
  for i,row in df_eduardo.iterrows():
    vals = set(row["Environment Setting"].split(", "))
    eduardo_f_scores.append(1 if f in vals else 0)

  joyce_f_scores = []
  for i,row in df_joyce.iterrows():
    vals = set(row["Environment Setting"].split(", "))
    joyce_f_scores.append(1 if f in vals else 0)

  clayton_caleb = cks(clayton_f_scores,caleb_f_scores)
  eduardo_joyce = cks(joyce_f_scores,eduardo_f_scores)

  avg = 0
  if math.isnan(clayton_caleb):
    avg = eduardo_joyce
  elif math.isnan(eduardo_joyce):
    avg = clayton_caleb
  else:
    avg = (clayton_caleb + eduardo_joyce) / 2

  env_setting_dict[f] = avg

env_setting_dict["TOTAL"] = sum(env_setting_dict.values()) / len(env_setting_dict)
env_setting_dict

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


{'BLND': 0.8643774703557312,
 'PHYS': 0.8263377062289246,
 'VIRT': 0.9433673321968274,
 'UNSP': 0.6538461538461539,
 'TOTAL': 0.8219821656569092}

In [10]:
all_categories_cohens_k["Environment Setting"] = env_setting_dict["TOTAL"]
all_categories_cohens_k

{'Environment Setting': 0.8219821656569092}

### Environment Subject

In [11]:
env_subject_features = ["HUM","OTH","PSY","STEM","UNSP"]

In [12]:
env_subject_dict = {}

for f in env_subject_features:

  clayton_f_scores = []
  for i,row in df_clayton.iterrows():
    vals = set(row["Environment Subject"].split(", "))
    clayton_f_scores.append(1 if f in vals else 0)

  caleb_f_scores = []
  for i,row in df_caleb.iterrows():
    vals = set(row["Environment Subject"].split(", "))
    caleb_f_scores.append(1 if f in vals else 0)

  eduardo_f_scores = []
  for i,row in df_eduardo.iterrows():
    vals = set(row["Environment Subject"].split(", "))
    eduardo_f_scores.append(1 if f in vals else 0)

  joyce_f_scores = []
  for i,row in df_joyce.iterrows():
    vals = set(row["Environment Subject"].split(", "))
    joyce_f_scores.append(1 if f in vals else 0)

  clayton_caleb = cks(clayton_f_scores,caleb_f_scores)
  eduardo_joyce = cks(joyce_f_scores,eduardo_f_scores)

  avg = 0
  if math.isnan(clayton_caleb):
    avg = eduardo_joyce
  elif math.isnan(eduardo_joyce):
    avg = clayton_caleb
  else:
    avg = (clayton_caleb + eduardo_joyce) / 2

  env_subject_dict[f] = avg

env_subject_dict["TOTAL"] = sum(env_subject_dict.values()) / len(env_subject_dict)
env_subject_dict

{'HUM': 0.8833055972723951,
 'OTH': 0.17215496368038735,
 'PSY': 0.9368600682593857,
 'STEM': 0.8300486223662884,
 'UNSP': 0.8928571428571428,
 'TOTAL': 0.7430452788871199}

In [13]:
all_categories_cohens_k["Environment Subject"] = env_subject_dict["TOTAL"]
all_categories_cohens_k

{'Environment Setting': 0.8219821656569092,
 'Environment Subject': 0.7430452788871199}

### Participant Structure

In [14]:
env_participant_features = ["IND","MULTI"]

In [15]:
env_participant_dict = {}

for f in env_participant_features:

  clayton_f_scores = []
  for i,row in df_clayton.iterrows():
    vals = set(row["Participant Structure"].split(", "))
    clayton_f_scores.append(1 if f in vals else 0)

  caleb_f_scores = []
  for i,row in df_caleb.iterrows():
    vals = set(row["Participant Structure"].split(", "))
    caleb_f_scores.append(1 if f in vals else 0)

  eduardo_f_scores = []
  for i,row in df_eduardo.iterrows():
    vals = set(row["Participant Structure"].split(", "))
    eduardo_f_scores.append(1 if f in vals else 0)

  joyce_f_scores = []
  for i,row in df_joyce.iterrows():
    vals = set(row["Participant Structure"].split(", "))
    joyce_f_scores.append(1 if f in vals else 0)

  clayton_caleb = cks(clayton_f_scores,caleb_f_scores)
  eduardo_joyce = cks(joyce_f_scores,eduardo_f_scores)

  avg = 0
  if math.isnan(clayton_caleb):
    avg = eduardo_joyce
  elif math.isnan(eduardo_joyce):
    avg = clayton_caleb
  else:
    avg = (clayton_caleb + eduardo_joyce) / 2

  env_participant_dict[f] = avg

env_participant_dict["TOTAL"] = sum(env_participant_dict.values()) / len(env_participant_dict)
env_participant_dict

{'IND': 0.9187408491947291,
 'MULTI': 0.9152941878505789,
 'TOTAL': 0.917017518522654}

In [16]:
all_categories_cohens_k["Participant Structure"] = env_participant_dict["TOTAL"]
all_categories_cohens_k

{'Environment Setting': 0.8219821656569092,
 'Environment Subject': 0.7430452788871199,
 'Participant Structure': 0.917017518522654}

### Didactic Nature

In [17]:
env_didactic_features = ["INF","INSTR","TRAIN","UNSP"]

In [18]:
env_didactic_dict = {}

for f in env_didactic_features:

  clayton_f_scores = []
  for i,row in df_clayton.iterrows():
    vals = set(row["Didactic Nature"].split(", "))
    clayton_f_scores.append(1 if f in vals else 0)

  caleb_f_scores = []
  for i,row in df_caleb.iterrows():
    vals = set(row["Didactic Nature"].split(", "))
    caleb_f_scores.append(1 if f in vals else 0)

  eduardo_f_scores = []
  for i,row in df_eduardo.iterrows():
    vals = set(row["Didactic Nature"].split(", "))
    eduardo_f_scores.append(1 if f in vals else 0)

  joyce_f_scores = []
  for i,row in df_joyce.iterrows():
    vals = set(row["Didactic Nature"].split(", "))
    joyce_f_scores.append(1 if f in vals else 0)

  clayton_caleb = cks(clayton_f_scores,caleb_f_scores)
  eduardo_joyce = cks(joyce_f_scores,eduardo_f_scores)

  avg = 0
  if math.isnan(clayton_caleb):
    avg = eduardo_joyce
  elif math.isnan(eduardo_joyce):
    avg = clayton_caleb
  else:
    avg = (clayton_caleb + eduardo_joyce) / 2

  env_didactic_dict[f] = avg

env_didactic_dict["TOTAL"] = sum(env_didactic_dict.values()) / len(env_didactic_dict)
env_didactic_dict

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


{'INF': 0.4398744113029827,
 'INSTR': 0.6278823778294902,
 'TRAIN': 0.7304600132986763,
 'UNSP': 0.4782608695652174,
 'TOTAL': 0.5691194179990916}

In [19]:
all_categories_cohens_k["Didactic Nature"] = env_didactic_dict["TOTAL"]
all_categories_cohens_k

{'Environment Setting': 0.8219821656569092,
 'Environment Subject': 0.7430452788871199,
 'Participant Structure': 0.917017518522654,
 'Didactic Nature': 0.5691194179990916}

### Level of Instruction or Training

In [20]:
env_level_features = ["UNI","K12","PROF","UNSP"]

In [21]:
env_level_dict = {}

for f in env_level_features:

  clayton_f_scores = []
  for i,row in df_clayton.iterrows():
    vals = set(row["Level of Instruction or Training"].split(", "))
    clayton_f_scores.append(1 if f in vals else 0)

  caleb_f_scores = []
  for i,row in df_caleb.iterrows():
    vals = set(row["Level of Instruction or Training"].split(", "))
    caleb_f_scores.append(1 if f in vals else 0)

  eduardo_f_scores = []
  for i,row in df_eduardo.iterrows():
    vals = set(row["Level of Instruction or Training"].split(", "))
    eduardo_f_scores.append(1 if f in vals else 0)

  joyce_f_scores = []
  for i,row in df_joyce.iterrows():
    vals = set(row["Level of Instruction or Training"].split(", "))
    joyce_f_scores.append(1 if f in vals else 0)

  clayton_caleb = cks(clayton_f_scores,caleb_f_scores)
  eduardo_joyce = cks(joyce_f_scores,eduardo_f_scores)

  avg = 0
  if math.isnan(clayton_caleb):
    avg = eduardo_joyce
  elif math.isnan(eduardo_joyce):
    avg = clayton_caleb
  else:
    avg = (clayton_caleb + eduardo_joyce) / 2

  env_level_dict[f] = avg

env_level_dict["TOTAL"] = sum(env_level_dict.values()) / len(env_level_dict)
env_level_dict

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


{'UNI': 0.8300442081539484,
 'K12': 0.972913616398243,
 'PROF': 0.6372549019607843,
 'UNSP': 0.6372549019607843,
 'TOTAL': 0.76936690711844}

In [22]:
all_categories_cohens_k["Level of Instruction or Training"] = env_level_dict["TOTAL"]
all_categories_cohens_k

{'Environment Setting': 0.8219821656569092,
 'Environment Subject': 0.7430452788871199,
 'Participant Structure': 0.917017518522654,
 'Didactic Nature': 0.5691194179990916,
 'Level of Instruction or Training': 0.76936690711844}

### Analysis Approach

In [23]:
env_analysis_features = ["MB","MF"]

In [24]:
env_analysis_dict = {}

for f in env_analysis_features:

  clayton_f_scores = []
  for i,row in df_clayton.iterrows():
    vals = set(row["Analysis Approach"].split(", "))
    clayton_f_scores.append(1 if f in vals else 0)

  caleb_f_scores = []
  for i,row in df_caleb.iterrows():
    vals = set(row["Analysis Approach"].split(", "))
    caleb_f_scores.append(1 if f in vals else 0)

  eduardo_f_scores = []
  for i,row in df_eduardo.iterrows():
    vals = set(row["Analysis Approach"].split(", "))
    eduardo_f_scores.append(1 if f in vals else 0)

  joyce_f_scores = []
  for i,row in df_joyce.iterrows():
    vals = set(row["Analysis Approach"].split(", "))
    joyce_f_scores.append(1 if f in vals else 0)

  clayton_caleb = cks(clayton_f_scores,caleb_f_scores)
  eduardo_joyce = cks(joyce_f_scores,eduardo_f_scores)

  avg = 0
  if math.isnan(clayton_caleb):
    avg = eduardo_joyce
  elif math.isnan(eduardo_joyce):
    avg = clayton_caleb
  else:
    avg = (clayton_caleb + eduardo_joyce) / 2

  env_analysis_dict[f] = avg

env_analysis_dict["TOTAL"] = sum(env_analysis_dict.values()) / len(env_analysis_dict)
env_analysis_dict

{'MB': 0.4303054879697215,
 'MF': 0.45092296769613066,
 'TOTAL': 0.4406142278329261}

In [25]:
all_categories_cohens_k["Analysis Approach"] = env_analysis_dict["TOTAL"]
all_categories_cohens_k

{'Environment Setting': 0.8219821656569092,
 'Environment Subject': 0.7430452788871199,
 'Participant Structure': 0.917017518522654,
 'Didactic Nature': 0.5691194179990916,
 'Level of Instruction or Training': 0.76936690711844,
 'Analysis Approach': 0.4406142278329261}

### Aggregated

In [26]:
k = sum(all_categories_cohens_k.values()) / len(all_categories_cohens_k)
k

0.7101909193361902

## Final S18 Spreadsheet

In [27]:
# Create consensus sheet

df_consensus = df[df['Reviewer'] == "1&2"]
df_consensus.reset_index(inplace=True,drop=True)
df_consensus.head()

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Sort Number,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach,Analysis Results (w/ multimodal advantages),Full-Read 3 by Researcher,Reviewer,Reviewer Notes
0,1326191931,multimodal learning analytics in a laboratory classroom,Man Ching Esther Chan,2019,Learning,"VIDEO,AUDIO","POSE,GAZE,PROS","CLS,CLUST",LATE,MLPALA,Machine Learning Paradigms: Advances in Learning Analytics,3,PHYS,STEM,"IND, MULTI",INSTR,UNSP,MB,,Joyce/Eduardo,1&2,
1,1469065963,examining socially shared regulation and shared physiological arousal events with multimodal learning analytics,Andy Nguyen,2022,Learning,"VIDEO,AUDIO,SENSOR","QUAL,EDA","PATT,CLS,CLUST",HYBRID,BJET,British Journal of Educational Technology,4,PHYS,STEM,MULTI,INSTR,K12,MB,,Joyce/Eduardo,1&2,
2,1598166515,multimodal learning analytics for game-based learning,Andrew Emerson,2020,Learning,"VIDEO,LOGS,EYE","AFFECT,GAZE,LOGS,PPA","CLS,STATS",MID,BJET,British Journal of Educational Technology,5,VIRT,STEM,IND,INF,UNI,MB,,Joyce/Eduardo,1&2,
3,1877483551,motion-based educational games: using multi-modal data to predict player’s performance,Serena Lee-Cultura,2020,Learning,"VIDEO,EYE,SENSOR","PULSE,TEMP,EDA,GAZE,POSE",CLS,MID,COG,IEEE Conference on Games,6,BLND,STEM,IND,INSTR,K12,MB,,Joyce/Eduardo,1&2,
4,2000036002,predicting learners’ effortful behaviour in adaptive assessment using multimodal data,Kshitij Sharma,2020,Learning,"VIDEO,EYE,SENSOR","EDA,TEMP,PULSE,EEG,GAZE,AFFECT","CLUST,CLS,PATT",MID,LAK,International Conference on Learning Analytics & Knowledge,7,VIRT,STEM,IND,INSTR,UNI,MB,,Joyce/Eduardo,1&2,


In [28]:
df_consensus.sort_values(["Year","Mapped First Author", "UUID"], inplace=True, ascending=True)
df_consensus.drop(columns=["Mapped Full Publication", \
                           "Sort Number", \
                           "Analysis Results (w/ multimodal advantages)", \
                           "Full-Read 3 by Researcher", \
                           "Reviewer", \
                           "Reviewer Notes"],
                  inplace=True)
df_consensus.reset_index(inplace=True,drop=True)
df_consensus.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_consensus.sort_values(["Year","Mapped First Author", "UUID"], inplace=True, ascending=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_consensus.drop(columns=["Mapped Full Publication", \


Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach
0,818492192,understanding student learning trajectories using multimodal learning analytics within an embodied-interaction learning environment,Alejandro Andrade,2017,Learning,"VIDEO,LOGS,INTER,PPA","GAZE,LOGS,INTER,PPA,GEST","CLUST,QUAL",HYBRID,LAK,BLND,STEM,IND,INSTR,K12,MB
1,3408664396,multimodal student engagement recognition in prosocial games,Athanasios Psaltis,2017,Learning,"VIDEO,LOGS","POSE,AFFECT,LOGS",CLS,LATE,T-CIAIG,BLND,HUM,IND,INF,K12,MB
2,1118315889,using multimodal learning analytics to identify aspects of collaboration in project-based learning,Daniel Spikol,2017,Learning,"VIDEO,AUDIO,LOGS","POSE,PROS",REG,MID,CSCL,PHYS,STEM,MULTI,INSTR,UNI,MB
3,3339002981,estimation of success in collaborative learning based on multimodal learning analytics features,Daniel Spikol,2017,Learning,"EYE,LOGS,VIDEO,AUDIO","GAZE,LOGS,PROS,POSE",CLS,MID,ICALT,VIRT,STEM,MULTI,INSTR,UNI,MB
4,1609706685,learning pulse: a machine learning approach for predicting performance in self-regulated learning using multimodal data,Daniele Di Mitri,2017,Training,"SENSOR,LOGS,MOTION,PPA","PULSE,ACT,AFFECT",REG,MID,LAK,BLND,UNSP,IND,UNSP,UNI,MB


In [29]:
# Map columns
df_consensus.rename(columns={
    'Mapped First Author': 'First Author',
    'Environment Type (learning or training)': 'Environment Type', \
    'Mapped Data Collection Mediums':'Data Collection Mediums', \
    'Mapped Modalities' : 'Modalities', \
    'Mapped Analysis Methods' : 'Analysis Methods', \
    'Mapped Fusion Types' : 'Fusion Types', \
    'Mapped Publication Acronym' : 'Publication'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_consensus.rename(columns={


In [30]:
assert not df_consensus.isnull().values.any()
assert len(df_consensus) == 73
assert len(set(df_consensus.UUID)) == 73
assert set(df_consensus.UUID) == set(df.UUID)

Compare to S17 to verify.

In [31]:
df_17.sort_values(["Year","Mapped First Author", "UUID"], inplace=True, ascending=True)
df_17.reset_index(inplace=True,drop=True)
df_17.head()

Unnamed: 0,UUID,Title,Mapped First Author,Year,Environment Type (learning or training),Mapped Data Collection Mediums,Mapped Modalities,Mapped Analysis Methods,Mapped Fusion Types,Mapped Publication Acronym,Mapped Full Publication,Full-Read 2 by Researcher,Reviewer Number,Sort Number
0,818492192,understanding student learning trajectories using multimodal learning analytics within an embodied-interaction learning environment,Alejandro Andrade,2017,Learning,"VIDEO,LOGS,INTER,PPA","GAZE,LOGS,INTER,PPA,GEST","CLUST,QUAL",HYBRID,LAK,International Conference on Learning Analytics & Knowledge,Caleb/Clayton,3,23
1,3408664396,multimodal student engagement recognition in prosocial games,Athanasios Psaltis,2017,Learning,"VIDEO,LOGS","POSE,AFFECT,LOGS",CLS,LATE,T-CIAIG,Transactions on Computational Intelligence and AI in Games,Joyce/Eduardo,3,15
2,1118315889,using multimodal learning analytics to identify aspects of collaboration in project-based learning,Daniel Spikol,2017,Learning,"VIDEO,AUDIO,LOGS","POSE,PROS",REG,MID,CSCL,Conference on Computer Supported Collaborative Learning,Caleb/Clayton,3,26
3,3339002981,estimation of success in collaborative learning based on multimodal learning analytics features,Daniel Spikol,2017,Learning,"EYE,LOGS,VIDEO,AUDIO","GAZE,LOGS,PROS,POSE",CLS,MID,ICALT,International Conference on Advanced Learning Technologies,Eduardo/Joyce,3,14
4,1609706685,learning pulse: a machine learning approach for predicting performance in self-regulated learning using multimodal data,Daniele Di Mitri,2017,Training,"SENSOR,LOGS,MOTION,PPA","PULSE,ACT,AFFECT",REG,MID,LAK,International Conference on Learning Analytics & Knowledge,Eduardo/Joyce,3,69


In [32]:
assert df_17.UUID.equals(df_consensus.UUID)

In [33]:
# Save consensus spreadsheet
S18_PATH = "drive/My Drive/Clayton/20230420_MMLTE/S18.csv"
df_consensus.to_csv(S18_PATH,index=False)

In [34]:
# Verify saved consensus
df_import = pd.read_csv(S18_PATH)
df_import.compare(df_consensus)
assert df_import.equals(df_consensus)

In [35]:
df_consensus

Unnamed: 0,UUID,Title,First Author,Year,Environment Type,Data Collection Mediums,Modalities,Analysis Methods,Fusion Types,Publication,Environment Setting,Environment Subject,Participant Structure,Didactic Nature,Level of Instruction or Training,Analysis Approach
0,818492192,understanding student learning trajectories using multimodal learning analytics within an embodied-interaction learning environment,Alejandro Andrade,2017,Learning,"VIDEO,LOGS,INTER,PPA","GAZE,LOGS,INTER,PPA,GEST","CLUST,QUAL",HYBRID,LAK,BLND,STEM,IND,INSTR,K12,MB
1,3408664396,multimodal student engagement recognition in prosocial games,Athanasios Psaltis,2017,Learning,"VIDEO,LOGS","POSE,AFFECT,LOGS",CLS,LATE,T-CIAIG,BLND,HUM,IND,INF,K12,MB
2,1118315889,using multimodal learning analytics to identify aspects of collaboration in project-based learning,Daniel Spikol,2017,Learning,"VIDEO,AUDIO,LOGS","POSE,PROS",REG,MID,CSCL,PHYS,STEM,MULTI,INSTR,UNI,MB
3,3339002981,estimation of success in collaborative learning based on multimodal learning analytics features,Daniel Spikol,2017,Learning,"EYE,LOGS,VIDEO,AUDIO","GAZE,LOGS,PROS,POSE",CLS,MID,ICALT,VIRT,STEM,MULTI,INSTR,UNI,MB
4,1609706685,learning pulse: a machine learning approach for predicting performance in self-regulated learning using multimodal data,Daniele Di Mitri,2017,Training,"SENSOR,LOGS,MOTION,PPA","PULSE,ACT,AFFECT",REG,MID,LAK,BLND,UNSP,IND,UNSP,UNI,MB
5,3093310941,embodied conversational agents for multimodal automated social skills training in people with autism spectrum disorders,Hiroki Tanaka,2017,Training,"AUDIO,VIDEO,PPA","POSE,PROS,AFFECT","REG,STATS",MID,PLOS,VIRT,HUM,IND,TRAIN,"K12, UNI",MF
6,3095923626,a multimodal analysis of making,Marcelo Worsley,2017,Learning,"VIDEO,AUDIO,SENSOR,PPA,INTER","GEST,PPA,EDA,ACT,PROS,QUAL,INTER","STATS,CLUST,QUAL,PATT",EARLY,IJAIED,PHYS,STEM,MULTI,INF,"K12, UNI",MB
7,2456887548,an unobtrusive and multimodal approach for behavioral engagement detection of students,Nese Alyuz,2017,Learning,"LOGS,VIDEO,SCREEN,PPA","AFFECT,POSE,LOGS,PPA",CLS,HYBRID,MIE,VIRT,STEM,IND,INSTR,K12,MB
8,1374035721,attentivelearner2: a multimodal approach for improving mooc learning on mobile devices,Phuong Pham,2017,Learning,"VIDEO,SURVEY","PULSE,AFFECT,SURVEY",CLS,MID,AIED,VIRT,STEM,IND,INSTR,UNSP,MB
9,85990093,multimodal markers of persuasive speech : designing a virtual debate coach,Volha Petukhova,2017,Training,"VIDEO,AUDIO","PROS,GEST","CLS,QUAL,STATS",MID,INTERSPEECH,PHYS,HUM,MULTI,TRAIN,K12,MB
