Iterative summarization interpretability

Experiment on Iterative Summarization, potentially linked to Natural Abstraction. Made for the Alignment jam #4, see the write-up for more details.

Related work and sources

This notebook humbly tries to explore the notion of abstraction through iterative summarization. This Interpretability work could be a start on grasping the surface of Natural Abstraction experiments. It can be related to a series of post by johnswentworth (notably "Project Intro", "The Telephone Theorem", "Project Update" and "Abstractions as Redundant Information").
The code is inspired by Neel Nanda's demo notebook, his library TransformerLens and is supported by the Amazon (small) dataset of movie and TV reviews available here. The TL;DR dataset of Reddit was formerly used but reviews were often not pleasant.

Demo notebooks

The repository is accompanied by notebooks without the cell's outputs, see below.

Training notebook (resulting weights publicly available, see the notebook)

Interpretability notebook

Future work

With more time and resources fine-tuning could be done on a bigger model with a bigger dataset. The idea could be to emphasize the experiment on abstraction and iterative models, see the write-up for references.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
interpretability_notebook.ipynb		interpretability_notebook.ipynb
jam4_write-up.pdf		jam4_write-up.pdf
training_notebook.ipynb		training_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

interpretability_notebook.ipynb

interpretability_notebook.ipynb

jam4_write-up.pdf

jam4_write-up.pdf

training_notebook.ipynb

training_notebook.ipynb

Repository files navigation

Iterative summarization interpretability

Related work and sources

Demo notebooks

Future work

About

Releases

Packages

Languages

License

Xmaster6y/Iterative_summarization

Folders and files

Latest commit

History

Repository files navigation

Iterative summarization interpretability

Related work and sources

Demo notebooks

Future work

About

Resources

License

Stars

Watchers

Forks

Languages