You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think you don't really need to understand the details VINC for this, just that it's a process which provides a model that maps activations on inputs -> credence scores.
Kaarel
In this experiment, we want to average the CCS outputs on the various ways to prompt the same data point when doing inference, and also do the same for VINC outputs.
In other words, I think you can essentially treat the VINC training process as a black box here. The VINC training process outputs a probe that maps activations to credence scores (just like the CCS probe does), and you'd only be using this trained probe.
note: Imported from old project
The text was updated successfully, but these errors were encountered:
See https://www.lesswrong.com/posts/bFwigCDMC5ishLz7X/rfc-possible-ways-to-expand-on-discovering-latent-knowledge#Additional_ideas_that_came_up_while_writing_this_post_
I think you don't really need to understand the details VINC for this, just that it's a process which provides a model that maps activations on inputs -> credence scores.
Kaarel
In this experiment, we want to average the CCS outputs on the various ways to prompt the same data point when doing inference, and also do the same for VINC outputs.
In other words, I think you can essentially treat the VINC training process as a black box here. The VINC training process outputs a probe that maps activations to credence scores (just like the CCS probe does), and you'd only be using this trained probe.
note: Imported from old project
The text was updated successfully, but these errors were encountered: