Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second pass on human error analysis (aka "adversarial trial" analysis) #35

Closed
4 tasks done
judithfan opened this issue Jul 15, 2021 · 1 comment
Closed
4 tasks done
Assignees

Comments

@judithfan
Copy link
Member

judithfan commented Jul 15, 2021

  • Naming consistency: Only 80-90 matching trials between human and modeling experiments? Need to debug the stimulus matching issue, there are known discrepancies between naming conventions for stimuli for model & human evaluations
  • Replicate RMSE analysis for all trials (not just adversarial subset) to verify that we can recover the same pattern in model-human consistency across models (with particle models doing the best, convnets worse, etc.)
  • Visualize raw correlation between human & model predictions on adversarial trials, contextualized among all trials.
  • Computing noise ceiling on the adversarial trials only (same as in paper). Then normalizing these metrics by these noise ceiling estimates.
@felixbinder
Copy link
Collaborator

Re the naming consistency: as I understand it, the 80-90 matching trials is fine. The dataframes contain information for all 150 (x8) stem IDs (for the models already properly renamed under Canon Stimulus Name). The 80-90 refers to the number of human observation per stimulus ID. We started out with about 100 participants per scenario and excluded some for various reasons (see OSF file), so those numbers make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants