# News Classifier Quickstart

The code should run out of the box if the following dependencies are installed:
```
# dependencies:
# pytorch
# transformers
# safetensors
```
The models are packaged in the `news_clf` subdirectory. The mode classes handle tokenization to the training length (1500 tokens) and calculation of class scores (ordinal model only) and class probabilities from the base model outputs.

The base model is `microsoft/deberta-v3-xsmall`. The "`xsmall`" model specification is actually the same size as `deberta-v3-small`, but trades breadth for depth (smaller embeddings, deeper network), which should work better given the abstract nature of the target categories. More depth allows the article information to be integrated across a larger range.

In [1]:
from news_clf import ( PretrainedModelForOrdinalSequenceClassification, 
                       PretrainedModelForUnorderedSequenceClassification )

device_map = 'cpu' # set to 'auto' to use gpu if available

2025-06-12 09:10:08.684209: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-12 09:10:08.804503: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1749733808.858336  154515 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749733808.871898  154515 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1749733808.967588  154515 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

### Load in the two model versions

The two models perform comparably but the ordinal model edges out the 3-discrete class model by about .1% accuracy. Both achieve about 89% accuracy and can be used for probabilities, but only the ordinal model returns a one-dimensional score for the articles.

In [2]:
checkpoint_3class = '3class_model_best_checkpoint.safetensors' # accuracy for the 3class model is .8957
clf_3class = PretrainedModelForUnorderedSequenceClassification(device_map=device_map, checkpoint=checkpoint_3class)

Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-xsmall and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [3]:
checkpoint_ordinal = 'ordinal_model_best_checkpoint.safetensors' # accuracy for the ordinal model is .8973
clf_ordinal = PretrainedModelForOrdinalSequenceClassification(device_map=device_map, checkpoint=checkpoint_ordinal)



### Processing an article

Use the `classify_article` method with a title and body (the model requires both, if there is no title input the empty string `''`). Articles are truncated to a maximum of 1,500 tokens.

In [4]:
import pandas as pd

# load in the training data
df_framing_annotations = pd.read_csv('Dataset-framing_annotations-Llama-3.3-70B-Instruct-Turbo.csv')

#### A neutral article:

In [5]:
# select an article to run
from news_clf.text_utils import shorten_to_n_words
import numpy as np

target = 'NEUTRAL'
i = np.random.choice(np.where(df_framing_annotations.FRAMING_CLASS == target)[0])
example = df_framing_annotations.iloc[i]

print(example.source)
print(example.title)
print(shorten_to_n_words(example.body, 250))

print('\nAnnotation:', example.FRAMING_CLASS)
print('\n3-class softmax model:')
print(clf_3class.classify_article(example.title, example.body))

print('Ordinal model:')
print(clf_ordinal.classify_article(example.title, example.body))

{'uri': 'economictimes.indiatimes.com', 'dataType': 'news', 'title': 'Economic Times'}
Brigitte Macron net worth: From Chocolate Heiress to France's First Lady
Brigitte Macron, France's First Lady, has captured public attention due to a viral video and her unique relationship with President Emmanuel Macron. Hailing from the Trogneux family, renowned for their chocolate business, she accumulated wealth through inheritance and real estate, including the valuable Villa Monéjan.A viral video of France's First Lady, Brigitte Macron, appearing to gently shove President Emmanuel Macron before stepping off a presidential plane has rekindled public fascination with their relationship. While Macron dismissed the moment as playful -- "I was bickering, or rather joking, with my wife" -- it reignited curiosity about the couple's dynamic, their 25-year age gap, and Brigitte herself. So, who is the woman beside France's president -- and how much is she worth?

Born Brigitte Marie-Claude Trogneux on A

#### A loaded article:

In [6]:
target = 'LOADED'
i = np.random.choice(np.where(df_framing_annotations.FRAMING_CLASS == target)[0])
example = df_framing_annotations.iloc[i]

print(example.source)
print(example.title)
print(shorten_to_n_words(example.body, 250))

print('\nAnnotation:', example.FRAMING_CLASS)
print('\n3-class softmax model:')
print(clf_3class.classify_article(example.title, example.body))

print('Ordinal model:')
print(clf_ordinal.classify_article(example.title, example.body))

{'uri': 'gadgets360.com', 'dataType': 'news', 'title': 'NDTV Gadgets 360'}
AI Researchers Secretly Used Reddit to Test Chatbot Persuasion
Subreddit moderators denounce study as unethical, deceptive, and unauthor

In a covert experiment now sparking legal threats, researchers from the University of Zurich deployed artificial intelligence (AI) bots to test how effectively they could sway opinions on Reddit -- all without user consent. The bots infiltrated the subreddit r/ChangeMyView, which has nearly four million members and exists to facilitate civil debates on controversial topics. Over time, these AI agents posted more than 1,700 comments while posing as real users, ranging from a male rape survivor minimising trauma to a black man criticising Black Lives Matter. None of the subreddit users were told the posts were created by artificial intelligence.

As per a 404 Media report, the study's findings were not announced until after the experiment had concluded; researchers and moderator

#### An alarmist article:

In [7]:
target = 'ALARMIST'
i = np.random.choice(np.where(df_framing_annotations.FRAMING_CLASS == target)[0])
example = df_framing_annotations.iloc[i]

print(example.source)
print(example.title)
print(shorten_to_n_words(example.body, 250))

print('\nAnnotation:', example.FRAMING_CLASS)
print('\n3-class softmax model:')
print(clf_3class.classify_article(example.title, example.body))

print('Ordinal model:')
print(clf_ordinal.classify_article(example.title, example.body))

{'uri': 'dnyuz.com', 'dataType': 'news', 'title': 'DNyuz'}
Fight Like Our Democracy Depends on It
The first 100 days of President Trump's second term have done more damage to American democracy than anything else since the demise of Reconstruction. Mr. Trump is attempting to create a presidency unconstrained by Congress or the courts, in which he and his appointees can override written law when they want to. It is precisely the autocratic approach that this nation's founders sought to prevent when writing the Constitution.

Mr. Trump has the potential to do far more harm in the remainder of his term. If he continues down this path and Congress and the courts fail to stop him, it could fundamentally alter the character of American government. Future presidents, seeking to either continue or undo his policies, will be tempted to pursue a similarly unbound approach, in which they use the powers of the federal government to silence critics and reward allies.

It pains us to write these wor

In [12]:
tariffs_articles = df_framing_annotations[df_framing_annotations.concept == 'tariffs']
tariffs_articles.head()

Unnamed: 0,concept,source,dateTimePub,title,body,sentiment,url,FRAMING_CLASS,ANSWER_REFUSAL,PROMPT_NUM_TOKENS
102956,tariffs,"{'uri': 'economictimes.indiatimes.com', 'dataT...",2025-04-30T16:42:17Z,Trump's tariffs have launched global trade war...,Donald Trump's sweeping new tariffs have reign...,-0.262745,https://economictimes.indiatimes.com/tech/tech...,NEUTRAL,,10580
102957,tariffs,"{'uri': 'idahostatejournal.com', 'dataType': '...",2025-04-30T13:48:01Z,Trump's tariffs have launched global trade war...,NEW YORK (AP) --\n\nLong-threatened tariffs fr...,-0.223529,https://www.idahostatejournal.com/news/nationa...,LOADED,,10614
102958,tariffs,"{'uri': 'wral.com', 'dataType': 'news', 'title...",2025-04-30T13:48:01Z,Trump's tariffs have launched global trade war...,Long-threatened tariffs from U.S. President Do...,-0.223529,https://www.wral.com/story/trumps-tariffs-have...,LOADED,,10581
102959,tariffs,"{'uri': 'cnet.com', 'dataType': 'news', 'title...",2025-05-20T19:17:02Z,Tariffs Explained: I Have Everything You Need ...,Thomas is a native of upstate New York and a g...,-0.003922,https://www.cnet.com/personal-finance/tariffs-...,LOADED,,10439
102960,tariffs,"{'uri': 'cnet.com', 'dataType': 'news', 'title...",2025-05-20T10:00:00Z,Tariffs Explained: Price Hikes Loom as Trump C...,Thomas is a native of upstate New York and a g...,-0.05098,https://www.cnet.com/personal-finance/tariffs-...,LOADED,,10439


In [14]:
i = np.random.choice(np.where(df_framing_annotations.concept == 'tariffs')[0])
example = df_framing_annotations.iloc[i]

print(example.source)
print(example.title)
print(shorten_to_n_words(example.body, 250))

print('\nAnnotation:', example.FRAMING_CLASS)
print('\n3-class softmax model:')
print(clf_3class.classify_article(example.title, example.body))

print('Ordinal model:')
print(clf_ordinal.classify_article(example.title, example.body))

{'uri': 'edition.cnn.com', 'dataType': 'news', 'title': 'CNN International'}
Here's what will get more expensive at Walmart because of tariffs
Time is running out for Walmart shoppers to avoid higher prices.

The retail giant on Thursday warned that its products will become more expensive due to President Donald Trump's tariffs being "too high."

"We will do our best to keep our prices as low as possible. But given the magnitude of the tariffs, even at the reduced levels announced this week, we aren't able to absorb all the pressure given the reality of narrow retail margins," Walmart CEO Douglas McMillon said in an earnings call.

The changes will likely take effect by the end of May, and prices will increase "much more" in June, Walmart's finance chief, John David Rainey, told CNBC.

Walmart, which has over 4,600 stores in the United States, gets merchandise from Canada, China, India, Mexico and Vietnam, among other nations. Those countries face at least 10% in tariffs, and imports o

In [None]:
i = np.random.choice(np.where(df_framing_annotations.concept == 'tariffs')[0])
example = df_framing_annotations.iloc[i]

print(example.source)
print(example.title)
print(shorten_to_n_words(example.body, 250))

print('\nAnnotation:', example.FRAMING_CLASS)
print('\n3-class softmax model:')
print(clf_3class.classify_article(example.title, example.body))

print('Ordinal model:')
print(clf_ordinal.classify_article(example.title, example.body))