### Disclaimer: Non-Political Affiliation
The content and context of this project do **not reflect any political affiliations** or ideologies of the creator. Any interpretations or assumptions linking the project's material to specific political views are unfounded and not endorsed.

### Content Sensitivity Disclaimer
Please be advised that this project may contain material that could be considered offensive to some individuals. This includes content derived from user contributions on IMDb. Viewer discretion is advised. The content is presented for educational, informational, or research purposes only and does not intend to harm, offend, or disparage **any group** or **individual**

# Political Bias Model

Joy Albertini

python -m spacy download en_core_web_lg 
https://huggingface.co/datasets/JyotiNayak/political_ideologies
pip install spacy-lookups-data

In [222]:
MODEL_DIR = './NLP/spacy_model'
MODEL_BEST = './NLP/spacy_model/model-best/'
TEST = './NLP/political_test.spacy'
TRAIN = './NLP/political_train.spacy'
VALIDATION = './NLP/political_validation.spacy'
CONFIG = './NLP/config.cfg'

### Imports 

In [223]:
import pandas as pd
from spacy.cli.train import train
import spacy
from spacy.tokens import DocBin
from sklearn.metrics import classification_report

Spacy version 

In [224]:
print("spaCy version:", spacy.__version__)

spaCy version: 3.7.4


### Train / Test / Validation 

In [225]:
def replace_label(df):
    label_map = {1: 'left', 0: 'right'}
    df['label'] = df['label'].map(label_map)
    
def clean(df):
    label_map = {"Left": 'left', "Right": 'right', "Neutral": 'neutral'}
    df.drop(['ID'], axis=1, inplace=True)
    df['label'] = df['label'].map(label_map)
    
def shuffle_rows(df): 
     return df.sample(frac=1).reset_index(drop=True)

### Train 1

In [226]:
df_train = pd.read_parquet("Train_dataset/political_bias_train.parquet")
replace_label(df_train)
df_train.head()

Unnamed: 0,statement,label,issue_type,__index_level_0__
0,"Climate change, and the escalating environment...",left,1,465
1,I believe in the foundational importance of th...,right,2,1191
2,I firmly believe that the principle of separat...,left,6,2440
3,I firmly believe in the separation of church a...,left,6,2406
4,I firmly believe in the power of free markets ...,right,0,1903


In [227]:
df_train_2 = pd.read_csv("Train_dataset/film_reviews_train.csv")
df_train_2_neutral = pd.read_csv("Train_dataset/film_neutral.csv")
clean(df_train_2)
clean(df_train_2_neutral)
df_train_2 = pd.concat([df_train_2_neutral, df_train_2], ignore_index=True)
shuffle_rows(df_train)
df_train_2.head()

Unnamed: 0,statement,label
0,Provides an insightful look into the character...,neutral
1,The editing is sharp and adds to the narrative...,neutral
2,"The film's pacing is inconsistent, with some p...",neutral
3,Features ambitious themes that aren't fully re...,neutral
4,Features ambitious themes that aren't fully re...,neutral


In [228]:
df_test = pd.read_parquet("Train_dataset/political_bias_test.parquet")
replace_label(df_test)
df_test.head()

Unnamed: 0,statement,label,issue_type,__index_level_0__
0,While respecting individual rights is paramoun...,right,7,1777
1,The continuous economic dependence on China ha...,right,3,1342
2,I firmly believe in the sanctity and tradition...,right,2,2700
3,While I recognize and empathize with the chall...,right,5,3100
4,I firmly believe in preserving the integrity o...,right,6,984


In [229]:
df_test_2 = pd.read_csv("Train_dataset/film_reviews_test.csv")
clean(df_test_2)
df_train_2.head()

Unnamed: 0,statement,label
0,Provides an insightful look into the character...,neutral
1,The editing is sharp and adds to the narrative...,neutral
2,"The film's pacing is inconsistent, with some p...",neutral
3,Features ambitious themes that aren't fully re...,neutral
4,Features ambitious themes that aren't fully re...,neutral


In [230]:
df_validation = pd.read_parquet("Train_dataset/political_bias_validation.parquet")
replace_label(df_validation)
df_validation.head()

Unnamed: 0,statement,label,issue_type,__index_level_0__
0,"I firmly believe that all individuals, regardl...",right,5,1563
1,I believe that we should work towards more dip...,left,3,1286
2,I firmly believe in the importance of acknowle...,left,5,1425
3,"The traditional family structure, with a mothe...",right,2,1108
4,I believe in the importance of providing equal...,left,2,1031


In [231]:
df_validation_2 = pd.read_csv("Train_dataset/film_reviews_validation.csv")
clean(df_validation_2)
df_validation_2.head()

Unnamed: 0,statement,label
0,Stresses the importance of military strength i...,right
1,Highlights the role of renewable energy in mit...,left
2,Commemorates the achievements of industrial pi...,right
3,Delivers an impartial analysis of various film...,neutral
4,Demands global action against the wealth inequ...,left


### Convert train and test to docbin 

In [232]:
df_train = pd.concat([df_train, df_train_2], ignore_index=True)
df_test = pd.concat([df_test, df_test_2], ignore_index=True)
df_validation = pd.concat([df_validation, df_validation_2], ignore_index=True)

In [233]:
#df_train = df_train_2
#df_test = df_test_2
#df_validation = df_validation_2

In [234]:
df_train = shuffle_rows(df_train)
df_test = shuffle_rows(df_test)
df_validation = shuffle_rows(df_validation)

nlp = spacy.load(MODEL_BEST)
def df2docbin(df, nlp):
    docbin = DocBin()
    for _, row in df.iterrows():
        doc = nlp.make_doc(row['statement'])
        cats = {
            'right': 1 if row['label'].lower() == 'right' else 0.0,
            'left': 1 if row['label'].lower() == 'left' else 0.0,
            'neutral': 1.0 if row['label'].lower() == 'neutral' else 0.0
        }
        doc.cats = cats
        docbin.add(doc)
    return docbin


docbin_train = df2docbin(df_train, nlp)
docbin_test = df2docbin(df_test, nlp)
docbin_validation = df2docbin(df_validation, nlp)

docbin_train.to_disk(TRAIN)
docbin_test.to_disk(TEST)
docbin_validation.to_disk(VALIDATION)

### Train the model 

In [235]:
train(output_path=MODEL_DIR, use_gpu=-1, config_path=CONFIG)

[38;5;4mℹ Saving to output directory: NLP/spacy_model[0m
[38;5;4mℹ Using CPU[0m
[1m
[38;5;2m✔ Initialized pipeline[0m
[1m
[38;5;4mℹ Pipeline: ['textcat'][0m
[38;5;4mℹ Initial learn rate: 0.001[0m
E    #       LOSS TEXTCAT  CATS_SCORE  CATS_MICRO_P  CATS_MICRO_R  SCORE 
---  ------  ------------  ----------  ------------  ------------  ------
  0       0          0.22       18.34         37.94         37.94    0.31
  0     200         29.31       82.79         80.91         80.91    0.81
  0     400         13.36       93.64         92.53         92.53    0.92
  0     600         12.44       84.78         82.13         82.13    0.82
  0     800          8.17       79.20         80.54         80.54    0.79
  0    1000          5.94       91.69         90.33         90.33    0.90
  0    1200          3.73       86.78         84.58         84.58    0.84
  1    1400          3.53       81.26         83.11         83.11    0.82
  1    1600          1.67       93.57         92.53 

### Evaluate the model

In [236]:
nlp = spacy.load(MODEL_BEST)
validation_data = DocBin().from_disk(VALIDATION)
validate_docs = list(validation_data.get_docs(nlp.vocab))


def evaluate(nlp, docs):
    true_labels = []
    predicted_labels = []
    
    for doc in docs:
        true_labels.append(max(doc.cats, key=doc.cats.get))  
        pred_doc = nlp(doc.text)  
        predicted_labels.append(max(pred_doc.cats, key=pred_doc.cats.get))  

    report = classification_report(true_labels, predicted_labels, output_dict=True)
    return pd.DataFrame(report).transpose()  

eval_results = evaluate(nlp, validate_docs)

print("Evaluation Results:")
print(eval_results.round(2)) 

Evaluation Results:
              precision  recall  f1-score  support
left               0.88    0.77      0.82   196.00
neutral            1.00    1.00      1.00    33.00
right              0.79    0.90      0.84   191.00
accuracy           0.84    0.84      0.84     0.84
macro avg          0.89    0.89      0.89   420.00
weighted avg       0.85    0.84      0.84   420.00


### Validation 

In [237]:
def print_prediction_results(df, column_name, tc, nlp):
    for index, row in df.iterrows():
        doc = nlp.make_doc(row[column_name])
        
        result = tc.predict([doc])
        
        max_score = -1
        max_label = ""
        
        for label, score in zip(tc.labels, result[0]):
            if score > max_score:
                max_score = score
                max_label = label
        
        print(f"Document: {row[column_name]}")
        print(f"Predicted Sentiment: {max_label} ({max_score*100:.2f}%)")
        print(f"Correct Sentiment: {row['label']}")
        print("===========================")

loaded_model = spacy.load(MODEL_BEST)
textcat = loaded_model.get_pipe('textcat')
print_prediction_results(df_validation, 'statement', textcat, nlp)

Document: While it's integral for us to safeguard our environment and responsibly use our natural resources, we must not stifle our economy or overburden our businesses with excessive regulations. I believe in leveraging American innovation and technology to optimize our energy production, which includes judiciously exploiting our oil, coal, and natural gas reserves. It's essential to strike a balance between environmental stewardship and economic growth.
Predicted Sentiment: right (99.94%)
Correct Sentiment: right
Document: In regards to family and gender issues, I strongly believe in equal rights and opportunities for all, irrespective of gender identity or family structure. I advocate for policies that support a diverse range of families and the right of individuals to express and define their own identities. It's also crucial to provide resources to ensure the wellbeing and empowerment of all families and individuals.
Predicted Sentiment: left (100.00%)
Correct Sentiment: left
Docu