<a href="https://colab.research.google.com/github/ExCaLBBR/ExCaLBBR_Projects/blob/main/SocioenvironmentalGeometry/articlePoliticalClassifier/Measuring_Political_Bias.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
# @title Install Dependancies
!pip install beautifulsoup4 --quiet
!pip install requests --quiet
!pip install transformers --quiet
!pip install --upgrade transformers --quiet
from bs4 import BeautifulSoup
import requests
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [7]:
# @title Define Utility functions

# Scrape text from an article
def webscrapingText(url):
  #request to get url
  response = requests.get(url)
  soup = BeautifulSoup(response.content, "html.parser")

  #get text content of article
  #empty string
  article_text = ""

  for paragraph in soup.find_all("p"):
    article_text += paragraph.get_text()

  return article_text


# Clean data into smaller percentages
def roundPercentage(percentages):
  for i in range(len(percentages)):
    percentages[i] = round(percentages[i], 3)
  return percentages


#Bert model from https://huggingface.co/bucketresearch/politicalBiasBERT
def bertModel(article_text):
  text = article_text
  tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
  model = AutoModelForSequenceClassification.from_pretrained("bucketresearch/politicalBiasBERT")

  # the max length that the model takes in is 512 characters, so it is only analyzing the first 512
  #characters of the article
  inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
  labels = torch.tensor([0])
  outputs = model(**inputs, labels=labels)
  loss, logits = outputs[:2]

  # [0] -> left
  # [1] -> center
  # [2] -> right

  percentages = logits.softmax(dim=-1)[0].tolist()

  return percentages

#Output Classification
def printResults(percentages):
  left = percentages[0]
  center = percentages[1]
  right = percentages[2]

  greatest = max(percentages)

  if greatest == left:
    bias = "Left"
  elif greatest == center:
    bias = "Center"
  else:
    bias = "Right"

  results = f"Left = {left}, Center = {center}, Right = {right}. "

  results += f"This text is {bias}-leaning!"

  return results

In [8]:
#Print Article
url = "https://www.cnn.com/2023/09/27/politics/trump-skipping-debate-republicans-2024/index.html"
print(webscrapingText(url))


Donald Trump will skip another Republican presidential debate on Wednesday night because no one will punish him for not being there. 
  
      No other Republican front-runner could so contemptuously snub his party’s second on-stage forum and do his own thing – in this case, a speech about the autoworkers dispute in Detroit as he cranks up a general election campaign months before the first primary votes are cast. 
  
      While getting away with it is the ex-president’s quintessential political skill, his talent for evading consequences is facing a grave challenge in another sphere – the courts. A New York judge on Tuesday underscored the growing threat to Trump from his mountain of legal challenges, ruling in a civil case that the ex-president and his adult sons were liable for fraud. The judgment, which poses a severe threat to the future of the Trump Organization, comes ahead of the ex-president’s four criminal trials in other matters. 
  
      Trump cannot control his legal fat

In [9]:

#Run Classification
print(bertModel(webscrapingText(url)))

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/909 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

[0.9910492897033691, 0.005068385042250156, 0.0038823040667921305]


In [10]:
#Run Classification + round probability
print(roundPercentage(bertModel(webscrapingText(url))))

[0.991, 0.005, 0.004]


In [None]:
#Multi test case
diffUrl = "https://www.cnn.com/2023/10/24/politics/house-republicans-speaker-nominee/index.html"
foxnewsUrl = "https://www.foxnews.com/politics/biden-busts-century-old-tradition-wont-place-name-new-hampshires-presidential-primary-ballot"
print(printResults(roundPercentage(bertModel(webscrapingText(url)))))
print(printResults(roundPercentage(bertModel(webscrapingText(diffUrl)))))
print(printResults(roundPercentage(bertModel(webscrapingText(foxnewsUrl)))))

Left = 0.991, Center = 0.005, Right = 0.004. This text is Left-leaning!
Left = 0.988, Center = 0.006, Right = 0.006. This text is Left-leaning!
Left = 0.421, Center = 0.052, Right = 0.527. This text is Right-leaning!
