# SAEfarer - AG News

This notebook provides a demonstration of SAEfarer with a [model](https://huggingface.co/Kyle1668/ag-news-19200-bert-base-uncased) trained to categorize articles from the [AG News](http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html) dataset as either World, Sports, Business, or Sci/Tech.


First, we will install SAEfarer.


In [None]:
!pip install saefarer

I have already trained a sparse autoencoder on this model and calculated the analysis data needed by the widget. Here, we download the SAE and analysis database.


In [None]:
!curl -o sae.pt https://pub-d9a3d46ad9e747de82d25cd1f4610ee9.r2.dev/ag_news/x8-k8/sae.pt

In [None]:
!curl -o analysis.db https://pub-d9a3d46ad9e747de82d25cd1f4610ee9.r2.dev/ag_news/x8-k8/analysis.db

Next, we can import SAEfarer and the necessary code from [transformers](https://huggingface.co/docs/transformers/en/index) to load the model and tokenizer.


In [None]:
from saefarer.sae import SAE
from saefarer.adapters.tokenizers import HuggingFaceBertTokenizerAdapter
from saefarer.utils import get_default_device
from saefarer.widget import Widget, WidgetConfig
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    BertTokenizerFast,
)

Next, we can download the model and tokenizer from Hugging Face and load the SAE.


In [None]:
model_name = "Kyle1668/ag-news-19200-bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [None]:
tokenizer: BertTokenizerFast = AutoTokenizer.from_pretrained(model_name, use_fast=True)  # type: ignore

sf_tokenizer = HuggingFaceBertTokenizerAdapter(tokenizer=tokenizer)

In [None]:
device = get_default_device()
sae = SAE.load("sae.pt", device=device)

Now we are ready to configure and run the widget.


In [None]:
cfg = WidgetConfig(
    height=755,
    base_font_size=16,
    n_table_rows=10,
    device=device,
)

In [None]:
w = Widget(path="analysis.db", cfg=cfg, model=model, tokenizer=sf_tokenizer, sae=sae)

w