Second pass on human error analysis (aka "adversarial trial" analysis) #35

judithfan · 2021-07-15T20:51:52Z

Naming consistency: Only 80-90 matching trials between human and modeling experiments? Need to debug the stimulus matching issue, there are known discrepancies between naming conventions for stimuli for model & human evaluations
Replicate RMSE analysis for all trials (not just adversarial subset) to verify that we can recover the same pattern in model-human consistency across models (with particle models doing the best, convnets worse, etc.)
Visualize raw correlation between human & model predictions on adversarial trials, contextualized among all trials.
Computing noise ceiling on the adversarial trials only (same as in paper). Then normalizing these metrics by these noise ceiling estimates.

felixbinder · 2021-07-15T21:23:45Z

Re the naming consistency: as I understand it, the 80-90 matching trials is fine. The dataframes contain information for all 150 (x8) stem IDs (for the models already properly renamed under Canon Stimulus Name). The 80-90 refers to the number of human observation per stimulus ID. We started out with about 100 participants per scenario and excluded some for various reasons (see OSF file), so those numbers make sense.

judithfan assigned felixbinder and DylanTao Jul 15, 2021

DylanTao closed this as completed Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second pass on human error analysis (aka "adversarial trial" analysis) #35

Second pass on human error analysis (aka "adversarial trial" analysis) #35

judithfan commented Jul 15, 2021 •

edited by DylanTao

felixbinder commented Jul 15, 2021

Second pass on human error analysis (aka "adversarial trial" analysis) #35

Second pass on human error analysis (aka "adversarial trial" analysis) #35

Comments

judithfan commented Jul 15, 2021 • edited by DylanTao

felixbinder commented Jul 15, 2021

judithfan commented Jul 15, 2021 •

edited by DylanTao