In [None]:
!pip install transformers
!pip install accelerate
!pip install --upgrade gdown

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.me

In [None]:
import gdown

In [None]:
#News Dataset
!gdown https://drive.google.com/uc?id=111ttZguhf8KptNLylJfuy0pya_SEl9iU

Downloading...
From: https://drive.google.com/uc?id=111ttZguhf8KptNLylJfuy0pya_SEl9iU
To: /content/combined_300_set.csv
  0% 0.00/5.08M [00:00<?, ?B/s]100% 5.08M/5.08M [00:00<00:00, 74.6MB/s]


In [None]:
#Reddit Dataset
!gdown https://drive.google.com/uc?id=1qj0Mfjwdzjj3JgOCg3b1LEWz8vLSfxAa

Downloading...
From: https://drive.google.com/uc?id=1qj0Mfjwdzjj3JgOCg3b1LEWz8vLSfxAa
To: /content/reddit_scrape_Jul22.csv
  0% 0.00/16.1M [00:00<?, ?B/s] 52% 8.39M/16.1M [00:00<00:00, 70.4MB/s]100% 16.1M/16.1M [00:00<00:00, 80.1MB/s]


In [None]:
#Reddit comments Gen Dataset
!gdown https://drive.google.com/uc?id=1fNTB8OaElewz_Gejc5bChbC-SwQX1aCZ

Downloading...
From: https://drive.google.com/uc?id=1fNTB8OaElewz_Gejc5bChbC-SwQX1aCZ
To: /content/reddit_gen_comments.csv
  0% 0.00/123k [00:00<?, ?B/s]100% 123k/123k [00:00<00:00, 10.5MB/s]


In [None]:
import pandas as pd
import numpy as np
import os
import re
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from transformers import TFDistilBertForSequenceClassification, DistilBertTokenizer
import tensorflow as tf

## Cleaning Function

In [None]:
from gensim.utils import tokenize
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))

def clean_text(text):
    lemmatizer = WordNetLemmatizer()
    stemmer = PorterStemmer()
    tokens = list(tokenize(text))
    #res = ' '.join([stemmer.stem(t.lower()) for t in tokens if t.lower() not in stop_words])
    res = ' '.join([lemmatizer.lemmatize(t.lower()) for t in tokens if t.lower() not in stop_words])
    if len(res) == 0:
        return ' '
    else:
        return res

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Stopwords line sets a list of common English stopwords, which are words like “the,” “is,” and “in” that don’t add significant meaning to the text for analysis.The clean_text function processes the text:

1. Tokenization: Splits the text into words.
2. Lemmatization: Converts words to their base form (e.g., "running" becomes "run").
3. Stopwords Removal: Excludes words from the stop words list.
Lowercasing: Converts all words to lowercase to ensure uniformity.


#I. Introduction

## 1.1 Project Goals



1.   **Dataset Creation and Comparison:** Develop two datasets to examine differences in political bias between published news articles and political message board posts.
2.   **Topic Modeling:** Use topic modeling techniques to analyze and compare the major topics discussed by news publishers and message board users.
3.   **Political Bias Classification:** Attempt to classify political bias (left, center, right) for published news articles.
4.   **Fine-tuning GPT Models:** Fine-tune GPT models to generate responses to prompts based on the training data from both published news articles and political message board posts.





## 1.2 Dataset



1.   **News Articles Dataset**: This dataset comprises articles from a range of established publishers, including 'The Gateway Pundit,' 'The Washington Free Beacon,' 'CNBC,' 'Reuters,' 'Wired,' 'The Intercept,' and 'The New Yorker.' Each article was assigned a bias classification (left, center, right) based on ratings from the AllSides platform, which evaluates and rates media bias to provide balanced perspectives.

* Title: The headline or title of the news article.
* Plaintext: The main content of the article, extracted in plain text format.
* Publishing Date: The date when the article was published, providing a temporal context to the content.
* Source: The URL from which the article was scraped, indicating the direct link to the original content.
* Publisher: The name of the news organization or publisher of the article (e.g., 'The Gateway Pundit', 'Reuters').
* Bias: A classification of the article's bias, labeled as 'Right,' 'Center,' or 'Left' based on the publisher's known political leaning.



2.   **Political Message Boards Dataset**: This dataset includes posts and comments from various political subreddits, such as 'socialism,' 'democrats,' 'DemocraticSocialism,' 'SocialDemocracy,' 'progressive,' 'alltheleft,' 'Liberal,' 'feminisms,' 'Communist,' 'RadicalFeminism,' 'Libertarian,' 'conservatives,' 'Capitalism,' 'republicans,' and 'anarchocapitalism.' Due to the complexity of efficiently labeling these subreddit comments for political bias, manual labeling was required, which proved to be labor-intensive. Despite significant efforts, labeling the comments did not yield meaningful classification performance. Consequently, the classification focus was shifted to the published news articles.

* Title: The title of the post.
* Score: The Reddit score (upvotes minus downvotes) of the post.
* Id: The unique identifier of the post.
* Subreddit: The name of the subreddit where the post was made (e.g., 'politics', 'Libertarian', 'progressive').
* URL: The URL linking directly to the post on Reddit.
* Num of Comments: The number of comments associated with the post.
* Text: The body text of the post (if available).
* Date Created: The UTC timestamp of when the post was created, converted into a datetime format.
* Comment_Text: The text of comments associated with each post.







## 1.3 Scrapers

###Published News Article Crawler
The Published News web crawler is designed to collect and classify news articles from various publishers with different political biases. It uses the fundus library to scrape articles based on pre-defined sources and filters. The key components and functionalities of the crawler are as follows:

1. Publisher Filtering: The script begins by defining lists of publishers categorized into political biases: right, center, and left. It filters these publishers using the filter_publishers function, which matches publisher names against predefined lists (e.g., right-leaning publishers like 'The Gateway Pundit' and 'The Washington Free Beacon').

2. Crawler Initialization: Three separate crawlers are initialized, each focusing on a different bias category (right, center, left). The crawlers use source types such as RSSFeed, NewsMap, and Sitemap to discover and retrieve articles from the specified publishers.

3. Article Collection: The crawler iterates over each bias category, scraping a maximum of 100 articles per category. Articles are filtered using the inverse(regex_filter("politic")) function to exclude non-political content. The script extracts the article's title, plaintext content, publishing date, source URL, and publisher information.

4. Error Handling: The crawler includes error handling to manage potential issues such as missing keys or unexpected errors during article processing. These errors are logged but do not halt the execution of the script.

5. Data Enrichment and Classification: The crawler enriches each article's data with additional metadata, including the source URL, publisher name, and the classified political bias based on the crawler (right, center, left).

6. Output: The collected articles are stored in a pandas DataFrame, which is then saved to a CSV file. This dataset serves as a structured collection of news articles categorized by political bias, ready for further analysis.

###Reddit Scraper
Reddit Scraper
The  Reddit scraper is designed to extract posts and comments from a variety of political subreddits. It utilizes the Reddit API to authenticate and fetch data, focusing on posts and their associated comments. The key components of the scraper are as follows:

1. Authentication: The script uses the Reddit API credentials (client ID, client secret, user agent, username, and password) to obtain an access token. This token is required for making authenticated requests to the Reddit API.

2. Data Collection: The scraper fetches data from specified subreddits by making HTTP GET requests to Reddit's API endpoints. It collects details like post titles, scores, subreddit names, URLs, the number of comments, post text, creation dates, and associated comments.

3. Error Handling and Rate Limiting: The scraper includes basic error handling to manage request failures. It also introduces a delay (time.sleep(0.6)) between requests to avoid hitting Reddit's rate limits.

4. Data Extraction and Storage: The data is extracted into a structured format, with each post and its comments represented as rows in a pandas DataFrame. Columns include post title, score, ID, subreddit, URL, number of comments, post text, creation date, and comment text.

5. Output: The final dataset is saved as a CSV file, containing information on posts and comments from the subreddits specified.

# II. Topic Extraction

###Article/Subreddit Topic Extraction
The goal of this process is to discover common themes or topics within the articles and ensure that our model performs fairly across different groups, like political biases, subreddits or sources.

1. Data Cleaning: We clean the article text by removing unnecessary words and converting words to their basic form (like changing "running" to "run"). This helps the model focus on the important parts of the text.

2. Text Transformation: We convert the cleaned text into a structured format (a matrix of word counts) so the machine learning model can analyze it. This transformation also captures patterns in word usage across different articles.

3. Topic Extraction: We use a technique called LDA (Latent Dirichlet Allocation) to find common themes or topics within the articles based on the words they contain.

4. Bias Prediction: We train a logistic regression model to predict the political bias of articles based on their text. After training, the model predicts the bias for new articles and we test its accuracy.

5. Fairness Evaluation: Finally, we assess how well the model performs for different groups (like left-leaning or right-leaning articles). This ensures the model is fair and performs equally well across different categories.

## 2.1 Publications

In [None]:
# Load the data
df_article_topic_extract = pd.read_csv('combined_300_set.csv')

# Display data
df_article_topic_extract

Unnamed: 0.1,Unnamed: 0,title,plaintext,publishing_date,source,publisher,bias
0,0,Working Families Party Nominates Kamala Harris...,The nomination gives the presumptive Democrati...,2024-07-26 18:15:50+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
1,1,Kamala Harris Is Ready for This Fight,"In a matter of days, Vice President Kamala Har...",2024-07-26 14:29:46+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
2,2,J.D. Vance’s Hatred of Cat Ladies Is Weirder a...,"Patriarchy, plutocracy, and ethnonationalism f...",2024-07-26 14:13:48+00:00,https://www.thenation.com/article/politics/jd-...,thenation,Left
3,3,What I Learned Covering Attorney General Kamal...,"Since her time as California attorney general,...",2024-07-26 09:00:00+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
4,4,The “Strange Charisma” of Kamala Harris,How the Vice-President quickly consolidated su...,2024-07-25 06:00:00-04:00,https://www.newyorker.com/podcast/political-sc...,newyorker,Left
...,...,...,...,...,...,...,...
895,895,Maryland Gov. Wes Moore raised nearly $4.6M fo...,Maryland Gov. Wes Moore raised nearly $4.6 mil...,2023-03-10 16:29:47-05:00,https://www.foxnews.com/politics/maryland-gov-...,foxnews,Right
896,896,West Virginia lawmakers approve hospital expan...,West Virginia hospitals seeking to improve or ...,2023-03-10 16:25:37-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right
897,897,America's Political Realignment Is Real,Column: The education divide could restore Tru...,2024-03-15 09:00:27+00:00,https://freebeacon.com/columns/americas-politi...,Unknown,Right
898,898,West Virginia senator who interrupted session ...,The West Virginia Senate on Friday removed a l...,2023-03-10 16:24:19-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right


### Create Count Vectorizer

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_article_topic_extract.plaintext, df_article_topic_extract.bias, random_state = 0, test_size = 0.3)

In [None]:
X_train.shape

(630,)

In [None]:
X_test.shape

(270,)

In [None]:
y_train.shape

(630,)

In [None]:
y_test.shape

(270,)

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

countVect = CountVectorizer(preprocessor=clean_text, ngram_range=(1,2))

In [None]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [None]:
countVect.fit(X_train)

X_train_mat = countVect.transform(X_train)
X_test_mat = countVect.transform(X_test)

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation as LDA
from pprint import pprint

### Extract topics per class, LDA MODEL



In [None]:
df_article_topic_extract['cleaned_text'] = df_article_topic_extract['plaintext'].apply(clean_text)

count_vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')

def extract_topics(df, class_column, n_topics=5):
    classes = df[class_column].unique()
    lda_model = LDA(n_components=n_topics, random_state=42)
    results = {}

    for class_name in classes:
        subset = df[df[class_column] == class_name]
        count_matrix = count_vectorizer.fit_transform(subset['cleaned_text'])

        lda_model.fit(count_matrix)

        topics = []
        for topic_idx, topic in enumerate(lda_model.components_):
            top_words = [count_vectorizer.get_feature_names_out()[i] for i in topic.argsort()[-10:]]
            topics.append((topic_idx, top_words))

        results[class_name] = topics

    return results


This function extracts topics for each class in a given column (class_column, like bias). It does the following:

1. LDA Model: Initializes an LDA model to identify n_topics topics from the text.
2. Class-based Subsetting: Iterates over each class (like different bias labels), subsets the data for that class, and fits the count vectorizer.
3. Topic Extraction: After fitting the LDA model, it extracts the top words for each topic based on the word importance.


### Extract and print topics


In [None]:
topics_per_class = extract_topics(df_article_topic_extract, 'bias')

for class_name, topics in topics_per_class.items():
    print(f"\n==== {class_name} Topics ====")
    for topic_idx, top_words in topics:
        print(f"Topic {topic_idx + 1}: {', '.join(top_words)}")


==== Left Topics ====
Topic 1: education, right, year, case, people, trump, school, student, said, vance
Topic 2: republican, harris, voter, democrat, president, democratic, state, party, trump, biden
Topic 3: campaign, rural, republican, party, kennedy, new, political, biden, president, trump
Topic 4: political, woman, campaign, american, percent, said, president, white, people, trump
Topic 5: policy, war, new, vote, gaza, election, people, state, biden, israel

==== Center Topics ====
Topic 1: public, government, state, country, political, year, like, say, people, said
Topic 2: rate, country, market, global, gold, russia, ukraine, year, said, china
Topic 3: party, state, democrat, house, senate, president, biden, said, republican, trump
Topic 4: year, country, right, government, woman, people, election, said, political, party
Topic 5: trump, million, campaign, said, company, political, kaplan, say, election, facebook

==== Right Topics ====
Topic 1: republican, house, senate, presid

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation as LDA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.metrics import classification_report
from pprint import pprint

#Prepare data for logistic regression
X = count_vectorizer.fit_transform(df_article_topic_extract['cleaned_text'])
y = df_article_topic_extract['bias']  # Change 'bias' to the relevant target column

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

This section applies logistic regression to predict the bias of articles:

1. Vectorization: The cleaned text is transformed into a vector format.
2. Splitting: The data is split into training and test sets.
3. Modeling: A logistic regression model is trained (fit) and used to predict (predict) the test data.

### Evaluate fairness


In [None]:
# Evaluate fairness
def fairness_study(actual_y, pred_y, group_data, group_name):
    unique_groups = group_data.unique()

    for group in unique_groups:
        print(f"\n==== {group_name}: {group} ====")

        # Subset data for this group
        mask = (group_data == group)
        actual_y_sub = actual_y[mask]
        pred_y_sub = pred_y[mask]

        # Confusion matrix
        conf_matrix = confusion_matrix(actual_y_sub, pred_y_sub)
        print(f"Confusion Matrix:\n{conf_matrix}")

        # Accuracy
        accuracy = accuracy_score(actual_y_sub, pred_y_sub)
        print(f"Accuracy: {accuracy:.2f}")

        # Classification report
        report = classification_report(actual_y_sub, pred_y_sub)
        print(f"Classification Report:\n{report}")


This function evaluates how well the model performs for different groups (e.g., by bias or by source). It calculates:

1. Confusion Matrix: Summarizes prediction results.
2. Accuracy: Measures overall prediction accuracy.
3. Classification Report: Provides precision, recall, and F1-score.

### Fairness Study

In [None]:
test_indices = df_article_topic_extract.index[df_article_topic_extract.index.isin(y_test.index)]

test_df = df_article_topic_extract.loc[test_indices]
# Example fairness study by bias
fairness_study(y_test, y_pred, test_df['bias'], 'bias')



==== bias: Left ====
Confusion Matrix:
[[ 0  0  0]
 [25 31 27]
 [ 0  0  0]]
Accuracy: 0.37
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00         0
        Left       1.00      0.37      0.54        83
       Right       0.00      0.00      0.00         0

    accuracy                           0.37        83
   macro avg       0.33      0.12      0.18        83
weighted avg       1.00      0.37      0.54        83


==== bias: Center ====
Confusion Matrix:
[[38 38 35]
 [ 0  0  0]
 [ 0  0  0]]
Accuracy: 0.34
Classification Report:
              precision    recall  f1-score   support

      Center       1.00      0.34      0.51       111
        Left       0.00      0.00      0.00         0
       Right       0.00      0.00      0.00         0

    accuracy                           0.34       111
   macro avg       0.33      0.11      0.17       111
weighted avg       1.00      0.34      0.51       111


==== 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
# Example fairness study by source
fairness_study(y_test, y_pred, test_df['source'], 'source')


==== source: https://www.thenation.com/article/politics/jd-vance-cat-lady/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

        Left       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.thenation.com/article/politics/joe-biden-speech-farewell/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/supreme-court-pros/ ====
Confusion Matri

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Classification Report:
              precision    recall  f1-score   support

        Left       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.thenation.com/article/politics/trump-acceptance-speech-2024-analysis/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

        Left       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.thenation.com/article/politics/bernie-sanderss-interview-life-lessons/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

        Left       1.00      1.00      1.00      

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/starmer-uk-election-us-left-envy/ ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/biden-nato-press-conference/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
   

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/republicans-for-biden/ ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.newyorker.com/podcast/political-scene/the-great-democratic-party-freakout-of-2024 ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

        Left       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.thenation.com/article/politics/trump-biden-debate-disaster/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/jamaal-bowman-defeat-lessons/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.newyorker.com/news/daily-comment/the-politics-that-derailed-congestion-pricing-in-new-york ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

        Left       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.thenation.com/article/politics/this-machine-fights-fascism/ ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    reca

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/trump-rally-bronx/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/nikki-haley-voting-for-trump/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              pre

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/kristi-noem-killing-puppy-cruelty-gop/ ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/trump-biden-threats/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
      

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/taylor-swift-gop-attacks-biden/ ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/sleepy-trump-trial-drugs/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
        

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

        Left       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.thenation.com/article/politics/nj-ballot-line/ ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.newyorker.com/podcast/political-scene/randall-kennedy-on-harvard-protests-antisemitism-and-the-meaning-of-free-speech ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification R

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thenation.com/article/politics/nevada-democrats-2024/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.newyorker.com/podcast/political-scene/trumps-bonkers-immunity-claim-with-neal-katyal ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Cla

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
        Left       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/maduro-falsely-labels-political-opposition-in-venezuela-as-fascist-threat-/7711522.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.wired.com/story/fortnite-has-a-political-violence-problem/ ====
Confusion Matrix:
[[0 1]
 [0 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.wired.com/story/why-and-how-wired-is-covering-politics/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/un-libya-remains-mired-in-crisis-as-political-leaders-violate-human-rights-to-cling-to-power-/7692436.html ===

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.cnbc.com/2024/06/19/golden-goose-postpones-milan-ipo-citing-political-turmoil-in-europe.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.wired.com/story/signal-politics-software-criticism/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/police-and-military-seen-gaining-power-amid-vietnamese-political-upheaval/7654082.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.wired.com/story/twitter-virality-politics-change/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accura

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.wired.com/story/the-nightmare-politics-and-sticky-science-of-hacking-the-climate/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.wired.com/story/fandom-internet-culture-one-direction-politics-kaitlyn-tiffany/ ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Clas

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/turkish-court-hands-pro-kurdish-politicians-lengthy-sentences-over-deadly-protests/7616935.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.wired.com/story/dune-geopolitics-cybersecurity/ ====
Confusion Matrix:
[[1]]
Accuracy

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.cnbc.com/2024/03/15/3-tips-for-navigating-political-conversations-at-work-and-more.html ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.wired.com/story/british-flag-politics/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.cnbc.com/2024/02/21/hunter-biden-asks-judge-to-dismiss-tax-charges-arguing-that-prosecutors-bowed-to-political-pressure.html ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.wired.com/story/watch-dogs-legion-dystopia-politics-ubisoft/ ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   supp

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.voanews.com/a/historic-win-shatters-stereotypes-empowers-women-in-pakistani-politics/7539884.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/political-spat-brews-over-south-african-opposition-s-appeal-to-us-/7527521.html ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classificatio

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.cnbc.com/2023/11/13/former-uk-prime-minister-david-cameron-made-foreign-minister-in-surprise-political-comeback.html ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.cnbc.com/2023/11/12/growing-geopolitical-conflicts-have-some-investors-feeling-guilty-about-buying-defense-stocks.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/usa_us-politics_lawmakers-confirm-former-ambassador-us-spy-chief/6203481.html ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.voanews.com/a/usa_us-politics_us-senate-confirms-becerra-top-federal-health-official/6203480.html ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
       Right       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/usa_us-politics_biden-signs-coronavirus-relief-package/6203192.html ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      Center       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.voanews.com/a/africa_womens-participation-politics-growing-slowly-worldwide/6203140.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
           

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/usa_us-politics_trump-allies-show-fealty-former-president-golden-statue/6202617.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/usa_us-politics_fbi-monitoring-domestic-extremists-who-might-threaten-bidens-speec

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/usa_us-politics_democratic-senators-opposition-imperils-confirmation-biden-budget-pick/6202307.html ====
Confusion Matrix:
[[0 1]
 [0 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       1.0
        Left       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.voanews.com/a/usa_us-politics_bidens-immigration-reform-proposal-explained/620225

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

       Right       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.foxnews.com/politics/lindsey-graham-demands-fbis-christopher-wray-recant-testimony-says-its-clear-trump-hit-bullet ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

       Right       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.foxnews.com/politics/horrific-murder-american-child-ignites-travel-ban-effort-south-american-country ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precis

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/republicans-dominate-airwaves-harris-seeks-comeback-polls-dems-arent-worried ====
Confusion Matrix:
[[1]]
Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

       Right       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://www.foxnews.com/us/fbi-trump-questionnaire-exposes-divisive-partisan-politics-bureau-former-agent-says ====
Confusio

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/media/detroit-pastor-slams-identity-politics-kamala-harris-becomes-presumptive-democratic-nominee ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/biden-likely-keep-same-routine-accomplish-nothing-waning-months-

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

        Left       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/desantis-releases-graphic-video-showing-trans-surgeries-biden-calls-governors-policies-cruel ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/ted-cruz-asks-stanford-punish-students-who-heckled-trump-jud

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/twitter-explodes-after-former-biden-spox-praises-president-working-9-am ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/gop-controlled-wisconsin-assembly-vote-bill-prevent-ban-conversion-therapy ====
C

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

       Right       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1


==== source: https://freebeacon.com/author/stiles/politics/former-dem-spox-with-ties-to-clinton-crime-family-will-moderate-first-presidential-debate/ ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thegatewaypundit.com/2024/07/blame-game-politicos-point-charged-rhetoric-behind-trump/ ====
Confusion Matrix:
[[0 0]
 [1 0]]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/world/former-haiti-mayor-accused-torturing-killing-political-opponents-heads-court-boston ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/hunter-biden-subpoenaed-bank-america-records-opened-new-avenues-investig

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/us/georgia-police-arrest-brothers-allegedly-stabbed-man-arguing-mexican-politics-religion ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/biden-indefinitely-blocks-millions-acres-land-water-future-oil-drilling 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.foxnews.com/politics/pence-warns-renewed-iran-deal-would-pave-path-nuclear-weapon-gold-regime ====
Confusion Matrix:
[[0 0]
 [1 0]]
Accuracy: 0.00
Classification Report:
              precision    recall  f1-score   support

      Center       0.00      0.00      0.00       0.0
       Right       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0


==== source: https://www.thegatewaypundit.com/2024/07/political-persecution-still-very-popular-brazils-bolsonaro-indicted/ ===

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

This conducts a fairness study for bias and source:

1. Fairness by Bias: Evaluates performance for different political biases.
2. Fairness by Source: Evaluates performance for different article sources.

## 2.2 Subreddits

This process involves analyzing Reddit posts, cleaning the text, and using machine learning to predict which subreddit a post belongs to. The steps include topic extraction using Latent Dirichlet Allocation (LDA) and training a logistic regression model to classify posts by subreddit. Finally, a fairness study is conducted to evaluate the model's performance across different subreddits.

The key difference from the previous project is that this one focuses on subreddit classification rather than political bias in news articles, but both processes share similar steps in text cleaning, model training, and fairness evaluation.

In [None]:
# Load the data
df_subreddit_topic_extract = pd.read_csv('reddit_scrape_Jul22.csv')

# Display data
df_subreddit_topic_extract= df_subreddit_topic_extract.dropna()

### Create Count Vectorizer

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

countVect = CountVectorizer(preprocessor=clean_text, ngram_range=(1,2))

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation as LDA
from pprint import pprint

### Extract topics per class, LDA MODEL



In [None]:
df_subreddit_topic_extract['cleaned_text'] = df_subreddit_topic_extract['Comment_Text'].apply(clean_text)


In [None]:
def extract_topics_by_subreddit(df, class_column, n_topics=5):
    subreddits = df[class_column].unique()
    count_vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
    lda_model = LDA(n_components=n_topics, random_state=42)
    results = {}

    for subreddit in subreddits:
        subset = df[df[class_column] == subreddit]
        count_matrix = count_vectorizer.fit_transform(subset['cleaned_text'])

        lda_model.fit(count_matrix)

        topics = []
        for topic_idx, topic in enumerate(lda_model.components_):
            top_words = [count_vectorizer.get_feature_names_out()[i] for i in topic.argsort()[-10:]]
            topics.append((topic_idx, top_words))

        results[subreddit] = topics

    return results

### Extract and print topics


In [None]:
topics_per_subreddit = extract_topics_by_subreddit(df_subreddit_topic_extract, 'Subreddit')

# Display the topics
for subreddit, topics in topics_per_subreddit.items():
    print(f"\n==== {subreddit} Topics ====")
    for topic_idx, top_words in topics:
        print(f"Topic {topic_idx + 1}: {', '.join(top_words)}")


==== politics Topics ====
Topic 1: comment, question, grothman, like, biden, attempt, assassination, election, day, dropping
Topic 2: click, politics, thread, reddit, megathread_president_biden_announces_that_he_will, www, http, com, sort, comment
Topic 3: attempt, assassination, day, question, election, like, comment, grothman, hosing, roof
Topic 4: http, www, rule, edit, like, ago, drop, biden, election, question
Topic 5: biden, ago, attempt, assassination, day, comment, question, election, grothman, like

==== democrats Topics ====
Topic 1: democrat, right, time, republican, like, people, president, vote, biden, trump
Topic 2: thing, say, year, like, kamala, medium, believe, trump, people, biden
Topic 3: contact, automatically, performed, concern, question, action, rule, message, bot, democrat
Topic 4: law, medium, like, abortion, white, amp, know, gun, ban, trump
Topic 5: need, harris, fact, let, joe, want, gay, fuck, trump, know

==== socialism Topics ====
Topic 1: communist, let

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation as LDA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.metrics import classification_report
from pprint import pprint

# Prepare data for logistic regression
X = countVect.fit_transform(df_subreddit_topic_extract['cleaned_text'])
y = df_subreddit_topic_extract['Subreddit']  # Using 'Subreddit' as the target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

### Evaluate fairness


In [None]:
def fairness_study(actual_y, pred_y, group_data, group_name):
    unique_groups = group_data.unique()

    for group in unique_groups:
        print(f"\n==== {group_name}: {group} ====")

        # Subset data for this group
        mask = (group_data == group)
        actual_y_sub = actual_y[mask]
        pred_y_sub = pred_y[mask]

        # Confusion matrix
        conf_matrix = confusion_matrix(actual_y_sub, pred_y_sub)
        print(f"Confusion Matrix:\n{conf_matrix}")

        # Accuracy
        accuracy = accuracy_score(actual_y_sub, pred_y_sub)
        print(f"Accuracy: {accuracy:.2f}")

        # Classification report
        report = classification_report(actual_y_sub, pred_y_sub)
        print(f"Classification Report:\n{report}")


### Fairness Study

In [None]:
# Perform fairness analysis on the subreddit data
test_indices = df_subreddit_topic_extract.index[df_subreddit_topic_extract.index.isin(y_test.index)]
test_df = df_subreddit_topic_extract.loc[test_indices]

# Fairness study by subreddit
fairness_study(y_test, y_pred, test_df['Subreddit'], 'Subreddit')



==== Subreddit: politics ====
Confusion Matrix:
[[0 0 0 0]
 [0 0 0 0]
 [1 3 0 1]
 [0 0 0 0]]
Accuracy: 0.00
Classification Report:
                     precision    recall  f1-score   support

PoliticalDiscussion       0.00      0.00      0.00       0.0
       changemyview       0.00      0.00      0.00       0.0
           politics       0.00      0.00      0.00       5.0
          socialism       0.00      0.00      0.00       0.0

           accuracy                           0.00       5.0
          macro avg       0.00      0.00      0.00       5.0
       weighted avg       0.00      0.00      0.00       5.0


==== Subreddit: democrats ====
Confusion Matrix:
[[ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 1  8  1  8  8 16 32  2  5]
 [ 0  0  0  0  0  0  0  0  0]]
Accuracy: 0.02
Classification Report:
                  

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

# III. Classification Models

For this section, we aimed to create classification models to classify news articles according to political bias (Center, Left, and Right). In these attempts we created three types of models to capture a diverse approach to classification:


1.   Linear Model: Assume linear relationship betwee the dependent and independent variables
2.   Ensemble: Combine multiple classifiers to try and improve accuracy incrementally; assign higher weight to instances where there is misclassification
3. BERT: Deep learning model to capture the context of words in the articles, providing a well-adapted classification model

For the linear and ensemble models, we created instances for each that used both Count Vectorizer and TFIDF Vectorizer. For example, for well-polished articles where the inherent biases may be unclear, being able to emphasize unique words and phrases through TFIDF vectorizer may allow the model to better capture the differences in how news publications that lean towards a political bias word their reporting on hot-button issues.



## Linear: Support Vector Machines

We chose support vector machines as our supervised learning algorithm over other models such as logistic regression as we wanted to reduce the risk of both overfitting to and error on the data through the boundary of seperation. As SVM also requires a small number of samples in the dataset to create the support vectors, we thought it appropriate to use for our classification problem as we only scraped 900 news articles in which to build our models. We also saw the benefit of using grid search to find the best parameters of Gamma and, again, to avoide overfitting.

###  Import Data

In [None]:
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from gensim.utils import tokenize

# Load the data
df_svm = pd.read_csv('combined_300_set.csv')

# Display data
df_svm

Unnamed: 0.1,Unnamed: 0,title,plaintext,publishing_date,source,publisher,bias
0,0,Working Families Party Nominates Kamala Harris...,The nomination gives the presumptive Democrati...,2024-07-26 18:15:50+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
1,1,Kamala Harris Is Ready for This Fight,"In a matter of days, Vice President Kamala Har...",2024-07-26 14:29:46+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
2,2,J.D. Vance’s Hatred of Cat Ladies Is Weirder a...,"Patriarchy, plutocracy, and ethnonationalism f...",2024-07-26 14:13:48+00:00,https://www.thenation.com/article/politics/jd-...,thenation,Left
3,3,What I Learned Covering Attorney General Kamal...,"Since her time as California attorney general,...",2024-07-26 09:00:00+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
4,4,The “Strange Charisma” of Kamala Harris,How the Vice-President quickly consolidated su...,2024-07-25 06:00:00-04:00,https://www.newyorker.com/podcast/political-sc...,newyorker,Left
...,...,...,...,...,...,...,...
895,895,Maryland Gov. Wes Moore raised nearly $4.6M fo...,Maryland Gov. Wes Moore raised nearly $4.6 mil...,2023-03-10 16:29:47-05:00,https://www.foxnews.com/politics/maryland-gov-...,foxnews,Right
896,896,West Virginia lawmakers approve hospital expan...,West Virginia hospitals seeking to improve or ...,2023-03-10 16:25:37-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right
897,897,America's Political Realignment Is Real,Column: The education divide could restore Tru...,2024-03-15 09:00:27+00:00,https://freebeacon.com/columns/americas-politi...,Unknown,Right
898,898,West Virginia senator who interrupted session ...,The West Virginia Senate on Friday removed a l...,2023-03-10 16:24:19-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right


### Train Test Split

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_svm.plaintext, df_svm.bias, random_state = 42, test_size = 0.3, stratify=df_svm.bias)

In [None]:
X_train.shape

(630,)

In [None]:
X_test.shape

(270,)

In [None]:
y_train.shape

(630,)

In [None]:
y_test.shape

(270,)

In [None]:
y_train.value_counts()

Unnamed: 0_level_0,count
bias,Unnamed: 1_level_1
Right,210
Center,210
Left,210


In [None]:
y_test.value_counts()

Unnamed: 0_level_0,count
bias,Unnamed: 1_level_1
Center,90
Left,90
Right,90


### Create Tfidf Vectorizer

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

#Create tfidf vectorizer, take unigrams and bigrams
tfidf = TfidfVectorizer(preprocessor=clean_text, ngram_range=(1,3))

In [None]:
#Fit tfidf Vectorizer on the training set
tfidf.fit(X_train)

#Transform the training and test documents
X_train_mat_tfidf = tfidf.transform(X_train)
X_test_mat_tfidf = tfidf.transform(X_test)

In [None]:
X_train_mat_tfidf.shape

(630, 549049)

###  Create Count Vectorizer

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

#Create count vectorizer, take unigrams and bigrams
countVect = CountVectorizer(preprocessor=clean_text, ngram_range=(1,2))

In [None]:
#Fit Count Vectorizer on the training set
countVect.fit(X_train)

#Transform the training and test documents
X_train_mat_count_vect = countVect.transform(X_train)
X_test_mat_count_vect = countVect.transform(X_test)

In [None]:
X_train_mat_count_vect.shape

(630, 257600)

###  Create SVM Classifier

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

## Check to see what is happening with labels

svc = SVC(kernel='linear')

#Create parameters
param = {
    'C': [0.1, 1, 10, 100, 1000],
    'gamma': [0.1, 1, 10, 100]
}



#Perform Grid Search
#Tfidf
grid_search_tfidf = GridSearchCV(estimator=svc, param_grid=param, scoring='accuracy')

#Count Vectorizer
grid_search_count_vect = GridSearchCV(estimator=svc, param_grid=param, scoring='accuracy')


###  Fit to both sets of vectorized training data

In [None]:
# TFIDF
grid_search_tfidf = grid_search_tfidf.fit(X_train_mat_tfidf, y_train)

In [None]:
# Count Vectorizer
grid_search_count_vect = grid_search_count_vect.fit(X_train_mat_count_vect, y_train)

#### Print Best Parameters

Interestingly, we note that whilst the best gamma for both classifiers was the same at 0.1, the best C parameter for TFIDF was higher than that of the count vectorizer classifier, at 10 and 0.1 respectively. As such, the TFIDF SVM model is fitting to the training data more closely, easing up on regularization restraints. We can assume that the features are more distinct. With the smaller C and therefore higher regularization for Count Vect SVM model, we may expect that the classifcation scores will yield more missclassifications.

In [None]:
#Print best params TFIDF
print(grid_search_tfidf.best_params_)
print(grid_search_tfidf.best_score_)

{'C': 10, 'gamma': 0.1}
0.8746031746031745


In [None]:
#Print best params Count Vectorizer
print(grid_search_count_vect.best_params_)
print(grid_search_count_vect.best_score_)

{'C': 0.1, 'gamma': 0.1}
0.7968253968253968


#### Make Class Predictions

In [None]:
#Make class prediction TFIDF
y_pred_tfidf = grid_search_tfidf.predict(X_test_mat_tfidf)

In [None]:
#Make class prediction Count Vectorizer
y_pred_count_vect = grid_search_count_vect.predict(X_test_mat_count_vect)

####  Print Scores

We noticed that the SVM classifier that used TFIDF Vectorizer had both better f1-scores and accuracy scores than the classiffier that utulized count vectorizer. We believe that this is due to TFIDF vectorizer reducing the weight of common/more frequent words in the corpus. Given that our news articles are very polished and readers therefore have to look a bit closer to see which political bias the publication leads, focusing on the more discriminative words is important to the classification problem. The score for the SVM classifier that used count vectorizer was still respectable however. The larger value C for the model using count vectorizer seems to support the higher number of misclassifications when compared to the TFIDF SVM model.

In [None]:
#Get scores TFIDF
from sklearn.metrics import classification_report
print('TFIDF')
print(classification_report(y_test, y_pred_tfidf))

TFIDF
              precision    recall  f1-score   support

      Center       0.85      0.87      0.86        90
        Left       0.81      0.91      0.86        90
       Right       0.87      0.74      0.80        90

    accuracy                           0.84       270
   macro avg       0.84      0.84      0.84       270
weighted avg       0.84      0.84      0.84       270



In [None]:
#Get scores Count Vect
from sklearn.metrics import classification_report
print('Count Vectorizer')
print(classification_report(y_test, y_pred_count_vect))

Count Vectorizer
              precision    recall  f1-score   support

      Center       0.88      0.83      0.86        90
        Left       0.75      0.77      0.76        90
       Right       0.77      0.80      0.79        90

    accuracy                           0.80       270
   macro avg       0.80      0.80      0.80       270
weighted avg       0.80      0.80      0.80       270



## Esemble: ADA Boost

In [None]:
# Load the data
df_ada = pd.read_csv('combined_300_set.csv')

# Display data
df_ada

Unnamed: 0.1,Unnamed: 0,title,plaintext,publishing_date,source,publisher,bias
0,0,Working Families Party Nominates Kamala Harris...,The nomination gives the presumptive Democrati...,2024-07-26 18:15:50+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
1,1,Kamala Harris Is Ready for This Fight,"In a matter of days, Vice President Kamala Har...",2024-07-26 14:29:46+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
2,2,J.D. Vance’s Hatred of Cat Ladies Is Weirder a...,"Patriarchy, plutocracy, and ethnonationalism f...",2024-07-26 14:13:48+00:00,https://www.thenation.com/article/politics/jd-...,thenation,Left
3,3,What I Learned Covering Attorney General Kamal...,"Since her time as California attorney general,...",2024-07-26 09:00:00+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
4,4,The “Strange Charisma” of Kamala Harris,How the Vice-President quickly consolidated su...,2024-07-25 06:00:00-04:00,https://www.newyorker.com/podcast/political-sc...,newyorker,Left
...,...,...,...,...,...,...,...
895,895,Maryland Gov. Wes Moore raised nearly $4.6M fo...,Maryland Gov. Wes Moore raised nearly $4.6 mil...,2023-03-10 16:29:47-05:00,https://www.foxnews.com/politics/maryland-gov-...,foxnews,Right
896,896,West Virginia lawmakers approve hospital expan...,West Virginia hospitals seeking to improve or ...,2023-03-10 16:25:37-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right
897,897,America's Political Realignment Is Real,Column: The education divide could restore Tru...,2024-03-15 09:00:27+00:00,https://freebeacon.com/columns/americas-politi...,Unknown,Right
898,898,West Virginia senator who interrupted session ...,The West Virginia Senate on Friday removed a l...,2023-03-10 16:24:19-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right


### Create Count Vectorizer

In [None]:
#Convert labels to numerical values for the XGBoost
bias_mapping = {'Center': 0, 'Left': 1, 'Right': 2}
y= df_ada['bias']
y = y.map(bias_mapping)

In [None]:
y

Unnamed: 0,bias
0,1
1,1
2,1
3,1
4,1
...,...
895,2
896,2
897,2
898,2


In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_ada.plaintext, y, random_state = 0, test_size = 0.3)

In [None]:
X_train.shape

(630,)

In [None]:
X_test.shape

(270,)

In [None]:
y_train.shape

(630,)

In [None]:
y_test.shape

(270,)

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

#Create count vectorizer, take unigrams and bigrams
countVect = CountVectorizer(preprocessor=clean_text, ngram_range=(1,2))

tfidf_ada = TfidfVectorizer(preprocessor=clean_text, ngram_range=(1,2))

In [None]:
#Fit Count Vectorizer and TFIDF Vectorizer on the training set
countVect.fit(X_train)
tfidf_ada.fit(X_train)


#Transform the training and test documents
X_train_mat_cv = countVect.transform(X_train)
X_test_mat_cv = countVect.transform(X_test)

X_train_mat_tfidf = tfidf_ada.transform(X_train)
X_test_mat_tfidf = tfidf_ada.transform(X_test)

###  Create ADA Boost Classifier

In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report


In [None]:
# Create an instance of the AdaBoostClassifier
ada = AdaBoostClassifier(random_state=42)

# Create parameters
param = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [1.0, 0.1, 0.01]
}

# Grid Search
grid_search_ada = GridSearchCV(estimator=ada, param_grid=param, scoring='accuracy', cv=5, verbose=1, n_jobs=-1)

# Fit to training data
grid_search_ada_cv = grid_search_ada.fit(X_train_mat_cv, y_train)
grid_search_ada_tfidf = grid_search_ada.fit(X_train_mat_tfidf, y_train)


Fitting 5 folds for each of 9 candidates, totalling 45 fits


  pid = os.fork()
  pid = os.fork()


Fitting 5 folds for each of 9 candidates, totalling 45 fits


In [None]:
#Print best params
print('Count Vectorizer')
print(grid_search_ada_cv.best_params_)
print(grid_search_ada_cv.best_score_)

print('TFIDF Vectorizer')
print(grid_search_ada_tfidf.best_params_)
print(grid_search_ada_tfidf.best_score_)

Count Vectorizer
{'learning_rate': 0.1, 'n_estimators': 200}
0.7365079365079366
TFIDF Vectorizer
{'learning_rate': 0.1, 'n_estimators': 200}
0.7365079365079366


In [None]:
#Make class prediction
y_pred_cv = grid_search_ada_cv.predict(X_test_mat_cv)
y_pred_tfidf = grid_search_ada_tfidf.predict(X_test_mat_tfidf)


In [None]:
#Get scores
print('Scores Count Vect')
print(classification_report(y_test, y_pred_cv))

Scores Count Vect
              precision    recall  f1-score   support

           0       0.66      0.88      0.75        90
           1       0.87      0.71      0.78        86
           2       0.84      0.71      0.77        94

    accuracy                           0.77       270
   macro avg       0.79      0.77      0.77       270
weighted avg       0.79      0.77      0.77       270



In [None]:
print('Scores TFIDF')
print(classification_report(y_test, y_pred_tfidf))

Scores TFIDF
              precision    recall  f1-score   support

           0       0.69      0.87      0.77        90
           1       0.87      0.78      0.82        86
           2       0.89      0.76      0.82        94

    accuracy                           0.80       270
   macro avg       0.82      0.80      0.80       270
weighted avg       0.82      0.80      0.80       270



## 3.3 BERT

**Model Description:** The BERT-based model utilizes DistilBERT to classify news articles into left, center, and right political biases. Minimal text cleaning is applied to preserve semantic integrity, ensuring optimal model performance by only removing URLs and non-alphanumeric characters. The dataset is split into training, validation, and test sets with stratified sampling to maintain balanced class distributions, enhancing model robustness against class imbalance. The architecture includes an additional dropout layer to prevent overfitting, followed by a softmax layer for classification. The training process uses an adaptive learning rate schedule, a batch size of 5, and spans 6 epochs, ensuring efficient convergence. This approach leverages DistilBERT's ability to capture nuanced text semantics for effective political bias classification.

### Load Dataset, Clean, & Stratify Test/Train

In [None]:
df_bert = pd.read_csv('combined_300_set.csv',index_col=0)

def clean_text_bert(text):
    text = re.sub(r"http\S+", "", text)  # Remove URLs starting with http/https
    text = re.sub(r"www\S+", "", text)   # Remove URLs starting with www
    text = re.sub(r"\S+\.com\S*", "", text)  # Remove strings containing ".com"
    text = re.sub(r"\S+\.org\S*", "", text)  # Remove strings containing ".org"
    text = re.sub(r"[^a-zA-Z\s]", "", text)  # Remove non-alphanumeric characters except spaces
    text = re.sub(r'\n', " ", text)
    text = ''.join([(t.lower()) for t in text])
    if len(text) == 0:
        return ' '
    else:
        return text

def print_class_distribution(labels, dataset_name):
    class_distribution = np.sum(labels == np.arange(len(label_encoder.classes_)).reshape(-1, 1), axis=1)
    class_names = label_encoder.classes_
    print(f"\nClass distribution in {dataset_name}:")
    for class_name, count in zip(class_names, class_distribution):
        print(f"{class_name}: {count}")


Chose a less-intensive cleaning approach than "clean_text" to take advantage of bert's semantic/context sensitivity.

In [None]:
corpus = [clean_text_bert(doc) for doc in df_bert['plaintext'].values]

label_encoder = LabelEncoder()
labels = label_encoder.fit_transform(df_bert['bias'].values)

train_corpus, temp_corpus, train_labels, temp_labels = train_test_split(
    corpus, labels, test_size=0.4, stratify=labels
)

val_corpus, test_corpus, val_labels, test_labels = train_test_split(
    temp_corpus, temp_labels, test_size=0.5, stratify=temp_labels
)

# Verify distribution
print_class_distribution(train_labels, "Training set")
print_class_distribution(val_labels, "Validation set")
print_class_distribution(test_labels, "Test set")


Class distribution in Training set:
Center: 180
Left: 180
Right: 180

Class distribution in Validation set:
Center: 60
Left: 60
Right: 60

Class distribution in Test set:
Center: 60
Left: 60
Right: 60


### Build Bert Model w/ Dropout

In [None]:
from transformers import TFDistilBertForSequenceClassification, DistilBertTokenizer
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

class CustomDistilBertForSequenceClassification(tf.keras.Model):
    def __init__(self, num_labels):
        super(CustomDistilBertForSequenceClassification, self).__init__()
        self.distilbert = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=num_labels)
        self.dropout = tf.keras.layers.Dropout(0.25)
        self.classifier = tf.keras.layers.Dense(num_labels, activation='softmax')

    def call(self, inputs, training=False):
        # Get the output from DistilBERT
        distilbert_output = self.distilbert(inputs, training=training)
        hidden_state = distilbert_output[0]

        # Apply the additional dropout layer
        dropout_output = self.dropout(hidden_state, training=training)

        # Final classification layer
        logits = self.classifier(dropout_output)

        return logits


num_labels = 3
model = CustomDistilBertForSequenceClassification(num_labels=num_labels)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFDistilBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
You should 

In [None]:
# training data
train_inputs = tokenizer(
    train_corpus,
    max_length=128,
    padding=True,
    truncation=True,
    return_tensors="tf"
)
train_input_ids = tf.convert_to_tensor(train_inputs['input_ids'])
train_attention_mask = tf.convert_to_tensor(train_inputs['attention_mask'])

# validation data
val_inputs = tokenizer(
    val_corpus,
    max_length=128,
    padding=True,
    truncation=True,
    return_tensors="tf"
)
val_input_ids = tf.convert_to_tensor(val_inputs['input_ids'])
val_attention_mask = tf.convert_to_tensor(val_inputs['attention_mask'])

# test data
test_inputs = tokenizer(
    test_corpus,
    max_length=128,
    padding=True,
    truncation=True,
    return_tensors="tf"
)
test_input_ids = tf.convert_to_tensor(test_inputs['input_ids'])
test_attention_mask = tf.convert_to_tensor(test_inputs['attention_mask'])

### Instantiate Model & Train

Used learning rate decay to reduce catestrophic forgetting. When testing the model we found signs of overfitting as indicated by superior training scores vs validation & oscillating training loss. We also saw large variance in validation loss without a learning rate scheduler, which also indicates model instability as seen with lackluster validation accuracy. Implementing these features helped to improve validation accuracy significantly.

In [None]:
def combined_lr_schedule(epoch, lr):
    warmup_epochs = 2  # Number of warmup epochs
    total_epochs = 8  # Total number of training epochs
    initial_lr = 2e-5  # Target learning rate after warmup
    decay_rate = 0.5  # Decay rate after warmup
    decay_epochs = 3  # When to start decay

    if epoch < warmup_epochs:
        # Warmup phase: increase learning rate linearly
        return lr + (initial_lr - lr) / warmup_epochs
    elif epoch < decay_epochs:
        # Maintain learning rate after warmup until decay starts
        return initial_lr
    else:
        # Apply decay after decay_epochs
        decay_factor = (epoch - decay_epochs + 1)
        return initial_lr * (decay_rate ** decay_factor)

# Create the LearningRateScheduler callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(combined_lr_schedule, verbose=1)

optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5)
model.compile(optimizer=optimizer, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

In [None]:
# Train the model
history = model.fit(
    x={'input_ids': train_input_ids, 'attention_mask': train_attention_mask},
    y=train_labels,
    validation_data=(
        {'input_ids': val_input_ids, 'attention_mask': val_attention_mask},
        val_labels
    ),
    epochs=6,
    batch_size=5,
    callbacks=[lr_scheduler]
)


Epoch 1: LearningRateScheduler setting learning rate to 1.999999974737875e-05.
Epoch 1/6


  output, from_logits = _get_logits(



Epoch 2: LearningRateScheduler setting learning rate to 1.999999974737875e-05.
Epoch 2/6

Epoch 3: LearningRateScheduler setting learning rate to 2e-05.
Epoch 3/6

Epoch 4: LearningRateScheduler setting learning rate to 1e-05.
Epoch 4/6

Epoch 5: LearningRateScheduler setting learning rate to 5e-06.
Epoch 5/6

Epoch 6: LearningRateScheduler setting learning rate to 2.5e-06.
Epoch 6/6


In [None]:
# Evaluate the model on the test set
results = model.evaluate(
    x={'input_ids': test_input_ids, 'attention_mask': test_attention_mask},
    y=test_labels
)
print(f"Test Loss: {results[0]}, Test Accuracy: {results[1]}")

Test Loss: 0.43393996357917786, Test Accuracy: 0.8500000238418579


Conclusions: BERT performed as well

# IV. GPT Text Generation

## 4.1 Articles

### Make dataframes

In order to build the three GPT text generators, we first had to create three subdataframes from the dataset on the column 'bias,' for each of the three poltiical biases. We appended the article text body to the title as the title of articles can oftentimes be phrased in a way to bring out a strong emotional reaction.

In [None]:
df_article_gen= pd.read_csv('combined_300_set.csv',index_col=0)
df_article_gen.head()

Unnamed: 0,title,plaintext,publishing_date,source,publisher,bias
0,Working Families Party Nominates Kamala Harris...,The nomination gives the presumptive Democrati...,2024-07-26 18:15:50+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
1,Kamala Harris Is Ready for This Fight,"In a matter of days, Vice President Kamala Har...",2024-07-26 14:29:46+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
2,J.D. Vance’s Hatred of Cat Ladies Is Weirder a...,"Patriarchy, plutocracy, and ethnonationalism f...",2024-07-26 14:13:48+00:00,https://www.thenation.com/article/politics/jd-...,thenation,Left
3,What I Learned Covering Attorney General Kamal...,"Since her time as California attorney general,...",2024-07-26 09:00:00+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
4,The “Strange Charisma” of Kamala Harris,How the Vice-President quickly consolidated su...,2024-07-25 06:00:00-04:00,https://www.newyorker.com/podcast/political-sc...,newyorker,Left


#### Center Dataframe

In [None]:
df_center = df_article_gen[df_article_gen['bias']=='Center']
df_center

Unnamed: 0,title,plaintext,publishing_date,source,publisher,bias
300,Maduro falsely labels political opposition in ...,Venezuelans are set to vote for their next pre...,2024-07-24 20:15:02+00:00,https://www.voanews.com/a/maduro-falsely-label...,voanews,Center
301,US links Pakistan's economic growth to politic...,The United States urged Pakistan Wednesday to ...,2024-07-24 15:33:21+00:00,https://www.voanews.com/a/us-links-pakistan-s-...,voanews,Center
302,Fortnite Has a Political Violence Problem,"In a report shared exclusively with WIRED, the...",2024-07-18 09:00:00-04:00,https://www.wired.com/story/fortnite-has-a-pol...,wired,Center
303,"In South Asia, Trump shooting is used to push ...","South Asia, long a breeding ground for conspir...",2024-07-20 15:52:58+00:00,https://www.voanews.com/a/in-south-asia-trump-...,voanews,Center
304,CNBC Daily Open: Wall Street looks past politi...,What you need to know today\n\nTech reboundThe...,2024-07-23 01:07:01+00:00,https://www.cnbc.com/2024/07/23/cnbc-daily-ope...,cnbc,Center
...,...,...,...,...,...,...
595,Biden Goes to US Heartland for Support on Mass...,President Joe Biden is sidestepping a divided ...,2021-02-16 22:52:26+00:00,https://www.voanews.com/a/usa_us-politics_bide...,voanews,Center
596,Republican Groups Censure Party Lawmakers Who ...,State and local Republican groups in the Unite...,2021-02-16 18:57:20+00:00,https://www.voanews.com/a/usa_us-politics_repu...,voanews,Center
597,Independent Commission to Examine Capitol Riot...,House Speaker Nancy Pelosi said Monday that Co...,2021-02-15 23:06:48+00:00,https://www.voanews.com/a/usa_us-politics_inde...,voanews,Center
598,China says foreign trade faces 'extremely seve...,BEIJING — China's Commerce Ministry on Wednesd...,2023-07-19 10:37:08+00:00,https://www.cnbc.com/2023/07/19/china-says-tra...,cnbc,Center


In [None]:
#Append bias to article plaintext

df_center['combined'] = df_center['title'] + ' ' + df_center['plaintext'] + ' <|endoftext|>'


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_center['combined'] = df_center['title'] + ' ' + df_center['plaintext'] + ' <|endoftext|>'


In [None]:
df_center['combined']

Unnamed: 0,combined
300,Maduro falsely labels political opposition in ...
301,US links Pakistan's economic growth to politic...
302,Fortnite Has a Political Violence Problem In a...
303,"In South Asia, Trump shooting is used to push ..."
304,CNBC Daily Open: Wall Street looks past politi...
...,...
595,Biden Goes to US Heartland for Support on Mass...
596,Republican Groups Censure Party Lawmakers Who ...
597,Independent Commission to Examine Capitol Riot...
598,China says foreign trade faces 'extremely seve...


#### Left Dataframe

In [None]:
df_left = df_article_gen[df_article_gen['bias']=='Left']
df_left

Unnamed: 0,title,plaintext,publishing_date,source,publisher,bias
0,Working Families Party Nominates Kamala Harris...,The nomination gives the presumptive Democrati...,2024-07-26 18:15:50+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
1,Kamala Harris Is Ready for This Fight,"In a matter of days, Vice President Kamala Har...",2024-07-26 14:29:46+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
2,J.D. Vance’s Hatred of Cat Ladies Is Weirder a...,"Patriarchy, plutocracy, and ethnonationalism f...",2024-07-26 14:13:48+00:00,https://www.thenation.com/article/politics/jd-...,thenation,Left
3,What I Learned Covering Attorney General Kamal...,"Since her time as California attorney general,...",2024-07-26 09:00:00+00:00,https://www.thenation.com/article/politics/kam...,thenation,Left
4,The “Strange Charisma” of Kamala Harris,How the Vice-President quickly consolidated su...,2024-07-25 06:00:00-04:00,https://www.newyorker.com/podcast/political-sc...,newyorker,Left
...,...,...,...,...,...,...
295,Jonathan Haidt on “The Anxious Generation”,"The evidence implicating social-media apps, th...",2024-04-22 06:00:00-04:00,https://www.newyorker.com/podcast/political-sc...,newyorker,Left
296,The Morality Play Inside Trump’s Courtroom,“This idea of the old ‘Teflon Don’ is just fin...,2024-04-20 06:00:00-04:00,https://www.newyorker.com/podcast/political-sc...,newyorker,Left
297,"Six Months After Payments Resumed, Student Loa...",A new report from the Student Debt Crisis Cent...,2024-03-05 10:00:00+00:00,https://www.thenation.com/article/politics/stu...,thenation,Left
298,Meet the YouTube Bros Who Might Help Trump Win...,The NELK boys are worshipped by millions of yo...,2023-03-03 10:00:09+00:00,https://www.thenation.com/article/politics/nel...,thenation,Left


In [None]:
#Append bias to article plaintext

df_left['combined'] = df_left['title'] + ' ' + df_left['plaintext'] + ' <|endoftext|>'


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_left['combined'] = df_left['title'] + ' ' + df_left['plaintext'] + ' <|endoftext|>'


In [None]:
df_left['combined']

Unnamed: 0,combined
0,Working Families Party Nominates Kamala Harris...
1,Kamala Harris Is Ready for This Fight In a mat...
2,J.D. Vance’s Hatred of Cat Ladies Is Weirder a...
3,What I Learned Covering Attorney General Kamal...
4,The “Strange Charisma” of Kamala Harris How th...
...,...
295,Jonathan Haidt on “The Anxious Generation” The...
296,The Morality Play Inside Trump’s Courtroom “Th...
297,"Six Months After Payments Resumed, Student Loa..."
298,Meet the YouTube Bros Who Might Help Trump Win...


#### Right Dataframe

In [None]:
df_right = df_article_gen[df_article_gen['bias']=='Right']
df_right

Unnamed: 0,title,plaintext,publishing_date,source,publisher,bias
600,Top Democratic super PAC launches massive $50M...,A top Democratic super PAC has launched a mass...,2024-07-26 18:54:16-04:00,https://www.foxnews.com/politics/top-democrati...,foxnews,Right
601,Park Police union says officers ‘did everythin...,Following the protests at Union Station by ant...,2024-07-26 18:27:51-04:00,https://www.foxnews.com/politics/park-police-u...,foxnews,Right
602,Ramaswamy warns GOP on several 'hard realities...,Former presidential candidate Vivek Ramaswamy ...,2024-07-26 18:21:41-04:00,https://www.foxnews.com/politics/ramaswamy-war...,foxnews,Right
603,"Trump's former doctor gives health update, cal...",A former White House doctor released a letter ...,2024-07-26 12:48:25-04:00,https://www.foxnews.com/politics/trump-rapidly...,foxnews,Right
604,Who Engineered the Political Coup Against Biden?,This story originally was published by Real Cl...,2024-07-26 15:30:44+00:00,https://www.thegatewaypundit.com/2024/07/who-e...,thegatewaypundit,Right
...,...,...,...,...,...,...
895,Maryland Gov. Wes Moore raised nearly $4.6M fo...,Maryland Gov. Wes Moore raised nearly $4.6 mil...,2023-03-10 16:29:47-05:00,https://www.foxnews.com/politics/maryland-gov-...,foxnews,Right
896,West Virginia lawmakers approve hospital expan...,West Virginia hospitals seeking to improve or ...,2023-03-10 16:25:37-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right
897,America's Political Realignment Is Real,Column: The education divide could restore Tru...,2024-03-15 09:00:27+00:00,https://freebeacon.com/columns/americas-politi...,Unknown,Right
898,West Virginia senator who interrupted session ...,The West Virginia Senate on Friday removed a l...,2023-03-10 16:24:19-05:00,https://www.foxnews.com/politics/west-virginia...,foxnews,Right


In [None]:
#Append bias to article plaintext

df_right['combined'] = df_right['title'] + ' ' + df_right['plaintext'] + ' <|endoftext|>'


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_right['combined'] = df_right['title'] + ' ' + df_right['plaintext'] + ' <|endoftext|>'


In [None]:
df_right['combined']

Unnamed: 0,combined
600,Top Democratic super PAC launches massive $50M...
601,Park Police union says officers ‘did everythin...
602,Ramaswamy warns GOP on several 'hard realities...
603,"Trump's former doctor gives health update, cal..."
604,Who Engineered the Political Coup Against Bide...
...,...
895,Maryland Gov. Wes Moore raised nearly $4.6M fo...
896,West Virginia lawmakers approve hospital expan...
897,America's Political Realignment Is Real Column...
898,West Virginia senator who interrupted session ...


###Zero-Shot Learner

We created a zero-shot learner text generator using distilgpt2 as a baseline, to eventually compare to our tuned models.

In [None]:
#Done once
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('distilgpt2')
model = AutoModelForCausalLM.from_pretrained('distilgpt2')


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

To reduce redundant code, we created one article_generator function which takes three paramenets:


*   Prompt: The input text for the generator, declared once
*   Model: The fine-tuned model for each of the three biases
*   Tokenizer: Create an instance of AutoTokenizer from the fine-tuned model








In [None]:
# Create article generator method, used for Center, Left, and Right models

def article_generator(prompt, model, tokenizer):
  input_text = prompt
  inputs = tokenizer.encode(input_text, return_tensors="pt")
  output_sequences = model.generate(
    input_ids = inputs,
    max_length= 500,  # the length of the final sentence
    temperature = 0.9, # the closer to one, the less deterministic. The closer to zero, the more deterministic
    top_k = 20, # how many next words to consider when doing a tree-like structure
    top_p = 0.9,
    repetition_penalty = 1, # penalty for repeating a word in the input (min 1)
    do_sample = True, # True -> probabilistic model (output varies)
    num_return_sequences = 5
)

  for i in range(len(output_sequences)):
    print(f'{i}: {tokenizer.decode(output_sequences[i])}\n')

The results were mixed, with some samples being actually somewhat cohesive.

In [None]:
prompt = "Donald Trump has been treated"
article_generator(prompt, model, tokenizer)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


0: Donald Trump has been treated as a political opportunist.










































































































































































































































































































































































































































































































1: Donald Trump has been treated as an anti-Trump opponent and a potential opponent.














































































































































































































































































































































































### Center Bias Article Generator




#### Make train and test set

For each of the three models, we created a train and test set for fine-tuning.

In [None]:
df_center_train = df_center.combined.values[:200]
df_center_test = df_center.combined.values[200:]

In [None]:
len(df_center_train)

200

In [None]:
len(df_center_test)

100

In [None]:
#Write to text fille
with open('df_center_train.txt','w') as f:
  f.write('\n'.join(df_center_train))
with open('df_center_test','w') as f:
  f.write('\n'.join(df_center_test))

####Fine Tune Model


In [None]:
# Done once
!curl https://raw.githubusercontent.com/huggingface/transformers/27c1b656cca75efa0cc414d3bf4e6aacf24829de/examples/run_lm_finetuning.py > run_lm_finetuning.py


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31078  100 31078    0     0   109k      0 --:--:-- --:--:-- --:--:--  109k


In [None]:
!mkdir center_bias_experiments
epochs= 3
file_with_center_training_set = 'df_center_train.txt'

text = f"for epoch in {epochs} \n"+\
"do \n"+\
"python run_lm_finetuning.py "+\
f"--output_dir=center_bias_experiments/epoch_{epochs} "+\
"--model_type=gpt2 "+\
"--model_name_or_path=distilgpt2 "+\
f"--train_data_file={file_with_center_training_set} "+\
"--do_train "+\
"--overwrite_output_dir "+\
"--save_steps=500 " +\
f"--num_train_epochs={epochs} \n" +\
"done"


In [None]:
f_center = open('run_experiments.sh',mode='w')
f_center.write(text)
f_center.close()

In [None]:
!bash run_experiments.sh

2024-08-25 04:46:52.089850: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-25 04:46:52.109634: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-25 04:46:52.115717: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
08/25/2024 04:46:55 - INFO - __main__ -   Training/evaluation parameters Namespace(train_data_file='df_center_train.txt', output_dir='center_bias_experiments/epoch_3', eval_data_file=None, model_type='gpt2', model_name_or_path='distilgpt2', mlm=False, mlm_probability=0.15, config_name='', tokenizer_name='', cache_dir='', block_size=1024, do_train=True, do_eval=Fa

#### Save Model

In [None]:
center_bias_tokenizer = AutoTokenizer.from_pretrained('center_bias_experiments/epoch_3')
center_bias_model = AutoModelForCausalLM.from_pretrained('center_bias_experiments/epoch_3')

center_bias_model_path = "center_bias_generation_model"
center_bias_model.save_pretrained(center_bias_model_path)
center_bias_tokenizer.save_pretrained(center_bias_model_path)
# mount it
from google.colab import drive
drive.mount('/content/drive',force_remount=True)
# copy it as a new directory in the root of your google drive
import shutil
shutil.copytree(center_bias_model_path,'/content/drive/MyDrive/'+ center_bias_model_path)


Mounted at /content/drive


'/content/drive/MyDrive/center_bias_generation_model'

### Left Bias Article Generator

#### Make train and test set

In [None]:
df_left_train = df_left.combined.values[:200]
df_left_test = df_left.combined.values[200:]

In [None]:
len(df_left_train)

200

In [None]:
len(df_left_test)

100

In [None]:
#Write to text fille
with open('df_left_train.txt','w') as f:
  f.write('\n'.join(df_left_train))
with open('df_left_test','w') as f:
  f.write('\n'.join(df_left_test))

#### Fine Tune Model


In [None]:
!mkdir left_bias_experiments
epochs= 3
file_with_left_training_set = 'df_left_train.txt'

text = f"for epoch in {epochs} \n"+\
"do \n"+\
"python run_lm_finetuning.py "+\
f"--output_dir=left_bias_experiments/epoch_{epochs} "+\
"--model_type=gpt2 "+\
"--model_name_or_path=distilgpt2 "+\
f"--train_data_file={file_with_left_training_set} "+\
"--do_train "+\
"--overwrite_output_dir "+\
"--save_steps=500 " +\
f"--num_train_epochs={epochs} \n" +\
"done"


In [None]:
f_left = open('run_experiments.sh',mode='w')
f_left.write(text)
f_left.close()

In [None]:
!bash run_experiments.sh

2024-08-25 04:49:50.725453: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-25 04:49:50.771757: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-25 04:49:50.785774: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
08/25/2024 04:49:55 - INFO - __main__ -   Training/evaluation parameters Namespace(train_data_file='df_left_train.txt', output_dir='left_bias_experiments/epoch_3', eval_data_file=None, model_type='gpt2', model_name_or_path='distilgpt2', mlm=False, mlm_probability=0.15, config_name='', tokenizer_name='', cache_dir='', block_size=1024, do_train=True, do_eval=False,

#### Save Model

In [None]:
left_bias_tokenizer = AutoTokenizer.from_pretrained('left_bias_experiments/epoch_3')
left_bias_model = AutoModelForCausalLM.from_pretrained('left_bias_experiments/epoch_3')

left_bias_model_path = "left_bias_generation_model"
left_bias_model.save_pretrained(left_bias_model_path)
left_bias_tokenizer.save_pretrained(left_bias_model_path)
# mount it
from google.colab import drive
drive.mount('/content/drive',force_remount=True)
# copy it as a new directory in the root of your google drive
import shutil
shutil.copytree(left_bias_model_path,'/content/drive/MyDrive/'+ left_bias_model_path)


Mounted at /content/drive


'/content/drive/MyDrive/left_bias_generation_model'

### Right Bias Article Generator

#### Make train and test set

In [None]:
df_right_train = df_right.combined.values[:200]
df_right_test = df_right.combined.values[200:]

In [None]:
len(df_right_train)

200

In [None]:
len(df_right_test)

100

In [None]:
#Write to text fille
with open('df_right_train.txt','w') as f:
  f.write('\n'.join(df_right_train))
with open('df_right_test','w') as f:
  f.write('\n'.join(df_right_test))

#### Fine Tune Model


In [None]:
!mkdir right_bias_experiments
epochs= 3
file_with_right_training_set = 'df_right_train.txt'

text = f"for epoch in {epochs} \n"+\
"do \n"+\
"python run_lm_finetuning.py "+\
f"--output_dir=right_bias_experiments/epoch_{epochs} "+\
"--model_type=gpt2 "+\
"--model_name_or_path=distilgpt2 "+\
f"--train_data_file={file_with_right_training_set} "+\
"--do_train "+\
"--overwrite_output_dir "+\
"--save_steps=500 " +\
f"--num_train_epochs={epochs} \n" +\
"done"


In [None]:
f_right = open('run_experiments.sh',mode='w')
f_right.write(text)
f_right.close()

In [None]:
!bash run_experiments.sh

2024-08-25 04:53:02.310098: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-25 04:53:02.346925: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-25 04:53:02.359586: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
08/25/2024 04:53:08 - INFO - __main__ -   Training/evaluation parameters Namespace(train_data_file='df_right_train.txt', output_dir='right_bias_experiments/epoch_3', eval_data_file=None, model_type='gpt2', model_name_or_path='distilgpt2', mlm=False, mlm_probability=0.15, config_name='', tokenizer_name='', cache_dir='', block_size=1024, do_train=True, do_eval=Fals

#### Save Model

In [None]:
right_bias_tokenizer = AutoTokenizer.from_pretrained('right_bias_experiments/epoch_3')
right_bias_model = AutoModelForCausalLM.from_pretrained('right_bias_experiments/epoch_3')

right_bias_model_path = "right_bias_generation_model"
right_bias_model.save_pretrained(right_bias_model_path)
right_bias_tokenizer.save_pretrained(right_bias_model_path)
# mount it
from google.colab import drive
drive.mount('/content/drive',force_remount=True)
# copy it as a new directory in the root of your google drive
import shutil
shutil.copytree(right_bias_model_path,'/content/drive/MyDrive/'+ right_bias_model_path)


###Test Models

Something we notice about these articles is that the biases are implicit, so you have to read closely to see which way the text seems to lean. We attribute this to publications, even with their biases, being very polished and edited to look professional. Even though the language may not be as intense as what can be found in online disucssion forums, which are defintely more emotionally-driven,  you can still see that publications take positions on certain hot-button topics.

In [None]:
prompt = "The US and Mexico border has been"

In [None]:
#Center
article_generator(prompt, center_bias_model, center_bias_tokenizer)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0: The US and Mexico border has been plagued by violent street protests and a growing number of drug traffickers.

A US State Department spokesman said on Monday that the U.S. was “deeply concerned about the escalating violence at the border."

“The U.S. has long supported the efforts of Mexico and the U.S. to secure a safe border through which all lawful immigrants and refugees can be granted legal status," he said.

“The U.S. government is concerned that the border crossing will not be safe for all citizens, and that Mexico will not allow entry into the United States without a visa,” he added.

“The U.S. is working closely with Mexico to secure a safe border through which all lawful immigrants and refugees can be granted lawful status,” he added.

The United States has been accused of being complicit in the mass murder of two American citizens and three Mexican nationals.

The Mexican-US border was closed in June after the September 11 terrorist attacks and ongoing clashes between th

In [None]:
#Left bias
article_generator(prompt, left_bias_model, left_bias_tokenizer)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0: The US and Mexico border has been blocked by the United Nations, as well as by the US and Mexico’s governments. The US, which has been seeking to prevent the flow of migrants to Europe, has not responded.

On Monday, President Donald Trump announced he would cancel the travel ban on individuals from seven Muslim countries.

On Thursday, Trump’s press secretary, Sarah Huckabee Sanders, said that the US was “deeply concerned” about a series of high-profile US policy moves by the US.

“We have had the greatest possible opportunity to ensure that we don’t have to rely on a list of countries that are complicit in human trafficking and are trying to build a wall across the border.”

The US has been blocking access to the US-Mexico border since its decision to block the flow of migrants into the US. The move was also blocked by the US State Department, which is also blocking the flow of asylum seekers in the US and has blocked the flow of people fleeing conflict in the region.

This is why

In [None]:
#Right bias
article_generator(prompt, right_bias_model, right_bias_tokenizer)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0: The US and Mexico border has been locked in a standoff for more than a week as border security is in crisis as a result of the ongoing standoff, according to the U.S. Department of Homeland Security.

"As a result of the ongoing standoff, border security is locked in a situation that is likely to escalate and escalate further, including the illegal immigration process in the United States," the DHS Department said in a statement.

A joint statement from the Department of Homeland Security and Customs Enforcement, Border Patrol, Homeland Security, Border Protection and Border Protection said the border security situation in Central America is "a critical security priority for the federal government, with our resources, our resources, and our resources to protect American citizens from illegal immigration."

"Border security is critical for all American citizens, including children, seniors, and those who are fleeing violence," the statement said.

The White House did not immediately 

## 4.2 Subreddit Comments

In [None]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from torch.utils.data import Dataset, DataLoader
import torch

In [None]:
df_reddit_gen = pd.read_csv('reddit_gen_comments.csv')

df_reddit_gen_right = df_reddit_gen[df_reddit_gen['bias'] == 'right']

df_reddit_gen_left = df_reddit_gen[df_reddit_gen['bias'] == 'left']

df_reddit_gen_center = df_reddit_gen[df_reddit_gen['bias'] == 'center']

In [None]:

model_name = 'distilgpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

In [None]:
# Add padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Combined individual comments per bias classification to increase the size of text. We thought this might help the GPT moddel to understand a variety of topics at once as it relates to bias. We saw significantly improved propogation of biased prompt responses when increasing the size of training blocks through this process

In [None]:
def combine_comments(df, max_tokens_per_block=100):
    combined_comments = []
    current_block = []

    for comment in df['comment']:
        tokenized_comment = tokenizer.encode(comment)
        if len(current_block) + len(tokenized_comment) > max_tokens_per_block:
            combined_comments.append(current_block)
            current_block = tokenized_comment
        else:
            current_block.extend(tokenized_comment)

    if current_block:
        combined_comments.append(current_block)


    combined_comments_text = [tokenizer.decode(block, skip_special_tokens=True) for block in combined_comments]

    return combined_comments_text

In [None]:

combined_comments_text_right = combine_comments(df_reddit_gen_right, max_tokens_per_block=100)
combined_comments_text_left = combine_comments(df_reddit_gen_left, max_tokens_per_block=100)
combined_comments_text_center = combine_comments(df_reddit_gen_center, max_tokens_per_block=100)

Create a custom Dataset class for fine-tuning

In [None]:

class RedditDataset(Dataset):
    def __init__(self, texts, tokenizer, max_length=100):
        self.texts = texts
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        encoding = self.tokenizer(
            text,
            return_tensors='pt',
            truncation=True,
            max_length=self.max_length,
            padding='max_length'
        )
        input_ids = encoding['input_ids'].flatten()
        attention_mask = encoding['attention_mask'].flatten()
        return {
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'labels': input_ids
        }

Prepare the dataset

In [None]:
right_dataset = RedditDataset(combined_comments_text_right, tokenizer)
left_dataset = RedditDataset(combined_comments_text_left, tokenizer)
center_dataset = RedditDataset(combined_comments_text_center, tokenizer)

Create separate models for each bias fine-tuning dataset

In [None]:
# Training arguments for right-biased comments
training_args_right = TrainingArguments(
    output_dir='./right_bias_experiments',
    num_train_epochs=5,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-5,
    save_steps=10,
    save_total_limit=2,
    logging_dir='./logs_right',
    logging_steps=5,
    overwrite_output_dir=True,
    warmup_steps=10,
    weight_decay=0.01
)
model_right = AutoModelForCausalLM.from_pretrained(model_name)
# Trainer for right-biased comments
trainer_right = Trainer(
    model=model_right,
    args=training_args_right,
    train_dataset=right_dataset
)

# Training arguments for left-biased comments
training_args_left = TrainingArguments(
    output_dir='./left_bias_experiments',
    num_train_epochs=5,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-5,
    save_steps=10,
    save_total_limit=2,
    logging_dir='./logs_left',
    logging_steps=5,
    overwrite_output_dir=True,
    warmup_steps=10,
    weight_decay=0.01
)
model_left = AutoModelForCausalLM.from_pretrained(model_name)
# Trainer for left-biased comments
trainer_left = Trainer(
    model=model,
    args=training_args_left,
    train_dataset=left_dataset
)

# Training arguments for center-biased comments
training_args_center = TrainingArguments(
    output_dir='./center_bias_experiments',
    num_train_epochs=5,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-5,
    save_steps=10,
    save_total_limit=2,
    logging_dir='./logs_center',
    logging_steps=5,
    overwrite_output_dir=True,
    warmup_steps=10,
    weight_decay=0.01
)
model_center = AutoModelForCausalLM.from_pretrained(model_name)
# Trainer for center-biased comments
trainer_center = Trainer(
    model=model_center,
    args=training_args_center,
    train_dataset=center_dataset
)


In [None]:
trainer_right.train()
trainer_left.train()
trainer_center.train()

Step,Training Loss
5,4.0395
10,3.7655
15,3.3699
20,3.2502
25,3.135


Step,Training Loss
5,3.78
10,3.4714
15,3.0171


Step,Training Loss
5,3.8602
10,3.5416
15,3.0725


TrainOutput(global_step=15, training_loss=3.491432253519694, metrics={'train_runtime': 44.5894, 'train_samples_per_second': 6.392, 'train_steps_per_second': 0.336, 'total_flos': 6022073548800.0, 'train_loss': 3.491432253519694, 'epoch': 4.137931034482759})

In [None]:
model_right.save_pretrained('./reddit_comments_right_bias_model')
tokenizer.save_pretrained('./reddit_comments_right_bias_model')

('./reddit_comments_right_bias_model/tokenizer_config.json',
 './reddit_comments_right_bias_model/special_tokens_map.json',
 './reddit_comments_right_bias_model/vocab.json',
 './reddit_comments_right_bias_model/merges.txt',
 './reddit_comments_right_bias_model/added_tokens.json',
 './reddit_comments_right_bias_model/tokenizer.json')

In [None]:
model_left.save_pretrained('./reddit_comments_left_bias_model')
tokenizer.save_pretrained('./reddit_comments_left_bias_model')

('./reddit_comments_left_bias_model/tokenizer_config.json',
 './reddit_comments_left_bias_model/special_tokens_map.json',
 './reddit_comments_left_bias_model/vocab.json',
 './reddit_comments_left_bias_model/merges.txt',
 './reddit_comments_left_bias_model/added_tokens.json',
 './reddit_comments_left_bias_model/tokenizer.json')

In [None]:
model_center.save_pretrained('./reddit_comments_center_bias_model')
tokenizer.save_pretrained('./reddit_comments_center_bias_model')

('./reddit_comments_center_bias_model/tokenizer_config.json',
 './reddit_comments_center_bias_model/special_tokens_map.json',
 './reddit_comments_center_bias_model/vocab.json',
 './reddit_comments_center_bias_model/merges.txt',
 './reddit_comments_center_bias_model/added_tokens.json',
 './reddit_comments_center_bias_model/tokenizer.json')

The fine-tuned GPT models demonstrate an ability to capture political biases effectively, with right-leaning, left-leaning, and center-leaning models generating responses that generally align with expected viewpoints on key issues like immigration, guns, abortion, and taxes. However, some responses, particularly from the right-leaning model on immigration, show less differentiation, suggesting that the nuances of bias may not always be fully captured. Short prompts may limit the model's ability to express detailed biases, leading to more generalized or ambiguous responses. This highlights the need for more specific and context-rich prompts to elicit clear biases.

##### Right-Leaning Fine-Tuned Responses

In [None]:
right_bias_tokenizer = AutoTokenizer.from_pretrained('./reddit_comments_right_bias_model')
right_bias_model = AutoModelForCausalLM.from_pretrained('./reddit_comments_right_bias_model')

prompts = [
    "Immigration is",
    "Guns are",
    "Abortion is",
    "Taxes are",
    "The current political situation is"
]


tokenizer.pad_token = tokenizer.eos_token


for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    input_ids = right_bias_tokenizer.encode(prompt, return_tensors='pt')
    attention_mask = input_ids.ne(tokenizer.pad_token_id).long()


    outputs = right_bias_model.generate(
        input_ids,
        attention_mask=attention_mask,
        max_length=50,
        num_return_sequences=5,
        temperature=0.9,
        top_k=50,
        top_p=0.95,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

    # Decode and print each generated text
    for i, generated_sequence in enumerate(outputs):
        generated_text = right_bias_tokenizer.decode(generated_sequence, skip_special_tokens=True)
        # Remove excessive newlines
        cleaned_text = generated_text.replace("\n", " ").strip()
        print(f"Generated Text {i + 1}: {cleaned_text}")


Prompt: Immigration is
Generated Text 1: Immigration is so high that we cannot be able to provide for this community," said Hickenlooper.
Generated Text 2: Immigration is an issue of choice. Immigration must be reformed.
Generated Text 3: Immigration is one of the key factors that drive immigration. It helps increase the value of American jobs, and it helps increase the safety net.
Generated Text 4: Immigration is not a matter of nationality.”
Generated Text 5: Immigration is not an option, it is a threat.

Prompt: Guns are
Generated Text 1: Guns are made up of a variety of different types of birds.    The first generation of birds to be raised in a field is the Mennonite. Most of these birds are raised by a family of chickens and are raised
Generated Text 2: Guns are going to be on the right side."
Generated Text 3: Guns are very common in the United States, with about 200 Americans. However, most are a minority.
Generated Text 4: Guns are a great way to protect us.”
Generated Text 5

##### Left-Leaning Fine-Tuned Responses

In [None]:
left_bias_tokenizer = AutoTokenizer.from_pretrained('./reddit_comments_left_bias_model')
left_bias_model = AutoModelForCausalLM.from_pretrained('./reddit_comments_left_bias_model')


prompts = [
    "Immigration is",
    "Guns are",
    "Abortion is",
    "Taxes are",
    "The current political situation is"
]


left_bias_tokenizer.pad_token = left_bias_tokenizer.eos_token


for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    input_ids = left_bias_tokenizer.encode(prompt, return_tensors='pt')
    attention_mask = input_ids.ne(left_bias_tokenizer.pad_token_id).long()


    outputs = left_bias_model.generate(
        input_ids,
        attention_mask=attention_mask,
        max_length=50,
        num_return_sequences=5,
        temperature=0.9,
        top_k=50,
        top_p=0.95,
        do_sample=True,
        pad_token_id=left_bias_tokenizer.eos_token_id
    )


    for i, generated_sequence in enumerate(outputs):
        generated_text = left_bias_tokenizer.decode(generated_sequence, skip_special_tokens=True)

        cleaned_text = generated_text.replace("\n", " ").strip()
        print(f"Generated Text {i + 1}: {cleaned_text}")



Prompt: Immigration is
Generated Text 1: Immigration is a disease that is endemic in Central Africa. There has been a resurgence in the number of cases of diseases. The current epidemic of disease is due to the rapid emergence of new diseases. This is the first outbreak of disease in Central African
Generated Text 2: Immigration is a global phenomenon that has helped stem the global refugee crisis,‣‣‣‣‣‣‣‣‣‣‣‣‣‣‣‣‣�
Generated Text 3: Immigration is an illegal immigration policy that has been passed without due process and should not be tolerated and enforced," said Sen. Barbara Boxer (D-Calif.) when pressed about the potential repercussions of this policy. "I'd also like to clarify
Generated Text 4: Immigration is the primary reason why the world should not have to wait until the end of the last century for the death of the greatest human race on earth!   The great human tragedy of the Great War, which was about to take place
Generated Text 5: Immigration is the cause of many of the g

##### Center-Leaning Fine-Tuned Responses

In [None]:
center_bias_tokenizer = AutoTokenizer.from_pretrained('./reddit_comments_center_bias_model')
center_bias_model = AutoModelForCausalLM.from_pretrained('./reddit_comments_center_bias_model')


prompts = [
    "Immigration is",
    "Guns are",
    "Abortion is",
    "Taxes are",
    "The current political situation is"
]


center_bias_tokenizer.pad_token = center_bias_tokenizer.eos_token


for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    input_ids = center_bias_tokenizer.encode(prompt, return_tensors='pt')
    attention_mask = input_ids.ne(center_bias_tokenizer.pad_token_id).long()


    outputs = center_bias_model.generate(
        input_ids,
        attention_mask=attention_mask,
        max_length=50,
        num_return_sequences=5,
        temperature=0.9,
        top_k=50,
        top_p=0.95,
        do_sample=True,
        pad_token_id=center_bias_tokenizer.eos_token_id
    )


    for i, generated_sequence in enumerate(outputs):
        generated_text = center_bias_tokenizer.decode(generated_sequence, skip_special_tokens=True)

        cleaned_text = generated_text.replace("\n", " ").strip()
        print(f"Generated Text {i + 1}: {cleaned_text}")



Prompt: Immigration is
Generated Text 1: Immigration is not one of them,” he added.
Generated Text 2: Immigration is the biggest obstacle to human rights.‹
Generated Text 3: Immigration is not going to solve this issue for the country," he said.
Generated Text 4: Immigration is an important issue, and the United States should consider immigration policies that protect the civil liberties of the many citizens. We need to support immigration reform.
Generated Text 5: Immigration is a natural process, and the ability of the community to secure it is vital that our economy, not the government, is strong.

Prompt: Guns are
Generated Text 1: Guns are rare and rare, but we hope to find out soon who can buy those and who can trade.
Generated Text 2: Guns are still on the way to the finals, but the game is very close.
Generated Text 3: Guns are now banned at all on the basis of their appearance.
Generated Text 4: Guns are dangerous, but they are a danger.
Generated Text 5: Guns are generally m

# V. Conclusion

## 5.1 Findings

This project aims to explore differences in political bias between published news articles and political message board posts through dataset creation, topic modeling, and political bias classification. Two primary datasets have been developed: one comprising news articles from various established publishers, classified for bias using AllSides ratings, and another consisting of posts and comments from political subreddits.

**Topic Extraction:** The topics extracted from politically biased news articles and subreddit comments reveal distinct ways bias is presented across these platforms. News articles tend to exhibit bias more implicitly, focusing on broad themes like policy, governance, and international issues that align with their respective political leanings—left-leaning articles emphasize social justice and reform, while right-leaning articles focus on national security and economic conservatism. In contrast, subreddit topics show explicit bias, driven by user-generated content that often reflects community-specific language and concerns, such as direct references to political figures, ideological terms, and specific social issues. This difference highlights how structured media channels may subtly influence through topic selection and framing, while online communities express bias more directly, reflecting the immediate and passionate nature of user-driven discussions.

**Classification:** While classifying news articles into political bias categories (left, center, right) has shown promise, classifying the political bias of individual Reddit comments has proven challenging. The complexity arises from the contextual nature of subreddit discussions, where biases are often embedded within the flow of conversation, making it difficult to label comments without losing nuance. Initial attempts to train models on these hand-labeled comments, with only 250 examples per bias class, yielded poor performance, suggesting that individual comment-level classification may not be the best approach.

**Text-Generation**: Fine-tuned GPT models trained on published news articles versus subreddit comments show clear differences in how political bias is expressed, largely due to the nature and length of the training data. Models trained on lengthy, structured news articles produce responses that are detailed, nuanced, and context-rich, often reflecting the formal language and comprehensive analysis typical of journalistic writing. These models incorporate broader geopolitical insights and policy implications, offering a more layered understanding of issues like the US-Mexico border. In contrast, models trained on shorter, informal subreddit comments generate more direct, succinct, and emotionally charged responses, echoing the immediate opinions and sentiments commonly found in online discussions. This difference indicates that article-trained models are better at providing in-depth, contextually informed outputs, while subreddit-trained models tend to be more reactive, capturing the tone of quick, opinionated exchanges.

## 5.2 Future Considerations/Opportunities

Future efforts could focus on calculating the overall bias of entire discussions for each Reddit post rather than individual comments. This could potentially capture the broader context and sentiment, providing a more accurate representation of political leanings. Additionally, improving dataset size and labeling consistency might enhance model training and classification accuracy. Fine-tuning GPT models using these enriched datasets could also provide more nuanced insights and responses that reflect the distinct characteristics of both news articles and political message board discussions.
