Experiment: Inference k-prompts #8

derpyplops · 2023-04-14T10:12:26Z

See https://www.lesswrong.com/posts/bFwigCDMC5ishLz7X/rfc-possible-ways-to-expand-on-discovering-latent-knowledge#Additional_ideas_that_came_up_while_writing_this_post_

I think you don't really need to understand the details VINC for this, just that it's a process which provides a model that maps activations on inputs -> credence scores.

Kaarel

In this experiment, we want to average the CCS outputs on the various ways to prompt the same data point when doing inference, and also do the same for VINC outputs.

In other words, I think you can essentially treat the VINC training process as a black box here. The VINC training process outputs a probe that maps activations to credence scores (just like the CCS probe does), and you'd only be using this trained probe.

note: Imported from old project

derpyplops self-assigned this Apr 14, 2023

lauritowal closed this as completed Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Inference k-prompts #8

Experiment: Inference k-prompts #8

derpyplops commented Apr 14, 2023

Experiment: Inference k-prompts #8

Experiment: Inference k-prompts #8

Comments

derpyplops commented Apr 14, 2023