Experiment on Iterative Summarization, potentially linked to Natural Abstraction. Made for the Alignment jam #4, see the write-up for more details.
-
This notebook humbly tries to explore the notion of abstraction through iterative summarization. This Interpretability work could be a start on grasping the surface of Natural Abstraction experiments. It can be related to a series of post by johnswentworth (notably "Project Intro", "The Telephone Theorem", "Project Update" and "Abstractions as Redundant Information").
-
The code is inspired by Neel Nanda's demo notebook, his library TransformerLens and is supported by the Amazon (small) dataset of movie and TV reviews available here. The TL;DR dataset of Reddit was formerly used but reviews were often not pleasant.
The repository is accompanied by notebooks without the cell's outputs, see below.
- Training notebook (resulting weights publicly available, see the notebook)
- Interpretability notebook
With more time and resources fine-tuning could be done on a bigger model with a bigger dataset. The idea could be to emphasize the experiment on abstraction and iterative models, see the write-up for references.