Skip to content

Releases: AlignmentResearch/tuned-lens

v0.2.0

18 Jul 17:03
Compare
Choose a tag to compare

Breaking changes

  • The from from_model_and_pretrained interface has been updated to take remove the slice option this has been moved to its own method slice_sequence method.

New features

  • Integration with transformer lens #103

    • This is probably the biggest new feature. We now support directly producing a PredictionTrajectory from a lens and an ActivationCache.
    • This means that you can visualize the effects of interventions made using the fantastic TransformerLens library using the full set of tools that come with the tuned-lens project.
    • There is a tutorial discussing this integration here
  • Rank visualization #105

    • Like in the original logit lens blog post we now support easily visualizing the rank of the target token in the prediction distribution. See

Full Changelog: v0.1.1...v0.2.0

v0.1.1

13 Jun 16:09
Compare
Choose a tag to compare

Most of the changes in this release focused on improving the training and evaluation code. If you are mainly using pretrained lenses, this should not affect you too much.

Changes

  • The evaluation sub-command now produces json files, evaluating for a certain number of tokens rather than steps, and the command line interface has been improved. (#92)
  • Training now supports check pointing to allow for saving lenses during training and resuming training if it is interrupted (#95).
  • Training can now be done in 8 bits though this does not currently combine with fspd (#88, #94)

Bug Fixes

v0.1.0

02 May 01:28
Compare
Choose a tag to compare

This release primarily focused on removing technical debt, refactoring the repository, and raising the engineering standards in the codebase. While there are some new features, particularly in the plotting code, most of the work focused on making the codebase maintainable and easy to continue building on.

Changes

  • A large amount of code was removed in this update #80. Some of this code is relevant to replicating a few of the experiments in the archived version of the arXiv paper. For those planning to replicate the prompt injection experiments, the abnormality detection code can still be found in version 0.0.5 of the codebase.
  • The Tuned Lens class itself has also been substantially simplified by extracting the unembed operation into its own class, namely the Unembed class #55.
    • The largest breaking change for downstream users is the new interface for loading pretrained lenses. See the documentation here
  • The plotting code was completely refactored to make it more versatile and easier to build on #63. There is a tutorial for these new features in the docs.
  • The training code was completely rewritten to make it modular, making use of shared ingredients and the downstream loop was removed. For reference on how to use the new training interface, see the tutorial here
  • The model_surgery module no longer uses heuristics to locate where certain model components are #69
  • The data processing code was also streamlined #78
  • In addition, the Decoder has been simplified renamed to the Unembed class #71 #81, #55.

Contributors

While the majority of this update was written by @levmckinney, a huge thank you to @norabelrose and @alexmlong for their contributions and @AdamGleave, @rhaps0dy, @taufeeque9 for providing code reviews.

Full Changelog: v0.0.5...v0.1.0

v0.0.5

19 Apr 16:09
Compare
Choose a tag to compare

This release will likely be the final release before 0.1.x. There are some major refactors that are about to be merged. This release mostly consists of removing a lot of dead code and allowing you to specify a revision for tokenizers in the training scripts and lenses in TunedLens.load.

v0.0.3

15 Mar 00:55
Compare
Choose a tag to compare

First release!