Releases · AlignmentResearch/tuned-lens

18 Jul 17:03

github-actions

v0.2.0

9bf1f35

v0.2.0 Latest

Latest

Breaking changes

The from from_model_and_pretrained interface has been updated to take remove the slice option this has been moved to its own method slice_sequence method.

New features

Integration with transformer lens #103
- This is probably the biggest new feature. We now support directly producing a PredictionTrajectory from a lens and an ActivationCache.
- This means that you can visualize the effects of interventions made using the fantastic TransformerLens library using the full set of tools that come with the tuned-lens project.
- There is a tutorial discussing this integration here
Rank visualization #105
- Like in the original logit lens blog post we now support easily visualizing the rank of the target token in the prediction distribution. See

Full Changelog: v0.1.1...v0.2.0

Assets 2

13 Jun 16:09

github-actions

v0.1.1

1159886

v0.1.1

Most of the changes in this release focused on improving the training and evaluation code. If you are mainly using pretrained lenses, this should not affect you too much.

Changes

The evaluation sub-command now produces json files, evaluating for a certain number of tokens rather than steps, and the command line interface has been improved. (#92)
Training now supports check pointing to allow for saving lenses during training and resuming training if it is interrupted (#95).
Training can now be done in 8 bits though this does not currently combine with fspd (#88, #94)

Bug Fixes

Slow tokenizers can now work correctly when installing with the [slow_tokenizers] optional dependency (#91).
Lens hashes that were broken by a previous change have been removed and should no longer produce warnings (#99, https://huggingface.co/spaces/AlignmentResearch/tuned-lens/discussions/39)

Assets 2

02 May 01:28

github-actions

v0.1.0

076be8f

v0.1.0

This release primarily focused on removing technical debt, refactoring the repository, and raising the engineering standards in the codebase. While there are some new features, particularly in the plotting code, most of the work focused on making the codebase maintainable and easy to continue building on.

Changes

A large amount of code was removed in this update #80. Some of this code is relevant to replicating a few of the experiments in the archived version of the arXiv paper. For those planning to replicate the prompt injection experiments, the abnormality detection code can still be found in version 0.0.5 of the codebase.
The Tuned Lens class itself has also been substantially simplified by extracting the unembed operation into its own class, namely the Unembed class #55.
- The largest breaking change for downstream users is the new interface for loading pretrained lenses. See the documentation here
The plotting code was completely refactored to make it more versatile and easier to build on #63. There is a tutorial for these new features in the docs.
The training code was completely rewritten to make it modular, making use of shared ingredients and the downstream loop was removed. For reference on how to use the new training interface, see the tutorial here
The model_surgery module no longer uses heuristics to locate where certain model components are #69
The data processing code was also streamlined #78
In addition, the Decoder has been simplified renamed to the Unembed class #71 #81, #55.

Contributors

While the majority of this update was written by @levmckinney, a huge thank you to @norabelrose and @alexmlong for their contributions and @AdamGleave, @rhaps0dy, @taufeeque9 for providing code reviews.

Full Changelog: v0.0.5...v0.1.0

Contributors

alexmlong, AdamGleave, and 4 other contributors

Assets 2

19 Apr 16:09

github-actions

v0.0.5

0c5db92

v0.0.5

This release will likely be the final release before 0.1.x. There are some major refactors that are about to be merged. This release mostly consists of removing a lot of dead code and allowing you to specify a revision for tokenizers in the training scripts and lenses in TunedLens.load.

Assets 2

15 Mar 00:55

github-actions

v0.0.3

0f61262

v0.0.3

First release!

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking changes

New features

Changes

Bug Fixes

Changes

Contributors

Contributors

Releases: AlignmentResearch/tuned-lens

v0.2.0

Breaking changes

New features

v0.1.1

Changes

Bug Fixes

v0.1.0

Changes

Contributors

Contributors

v0.0.5

v0.0.3