# **News Bias Analysis For AI Lab Application**

1. **Summary of the [paper](https://arxiv.org/abs/1607.06520)**

> The paper covers gender stereotypes in word embeddings and explores methods of limiting bias or completely debiasing these word embeddings while still maintaining embedding utility.
 Firstly, the authors explain how studying stereotypes results in more consistency as opposed to studying bias. The different types of bias are covered, establishing the distinction between direct bias and indirect bias. Direct bias is explained as being when gender-neutral words are closer to one gender over the other. For example, “nurse” is associated with “she” much more often than “he”. Indirect bias is when relationships between gender-neutral words reflect gender associations. For example, the fact that the words “bookkeeper” and “receptionist” are much closer to “softball” than “football” may be due to female associations with “bookkeeper”, “receptionist” and “softball”.
To address these biases, two approaches to debiasing word embeddings were introduced. Before these approaches however, the gender subspace was introduced. The gender subspace is the gender direction computed by taking the average difference across gender pairs. This establishes the area of our total vector space that concerns gender.
The first approach is called “hard debiasing”, where we remove gender bias completely. First for gender-neutral pairs, we begin by projecting the word vector off the gender direction so its gender component is zero. Then for explicit gender pairs, we make sure they are exactly equal distance from any gender-neutral word but we keep their distinction if they’re used in other contexts (for example, we use the phrase “grandfather a regulation” instead of “grandmother a regulation”) by maintaining parts of the vector outside the gender direction.
The second approach is called “soft debiasing", where we reduce bias by keeping some of the embedding’s original structure. First we apply a linear transformation to the embedding and try to keep word meanings (dot products) the same, while pushing gender-neutral words away from the gender direction. We can use parameter λ (lambda) to control how aggressive the debiasing is. If λ is large, then the debiasing is more aggressive and similar to hard debiasing. But if λ is smaller, then the debiasing preserves more of the structure.
These approaches of debiasing are evaluated and it is found that debiasing does not degrade performance. It was also found that hard debiasing did indeed decrease bias (from 19% to 6%).
These methods would definitely apply to our media bias project, as we can identify a “bias subspace” (for example, left–right, Israel–Palestine, Democrat–Republican), which would allow us to isolate and compare portrayals across ideological lines in word embedding spaces. Just like gender-neutral words like “nurse” were projected along the gender axis in the paper, we could project political terms (for example, “leader”, “attack”, “peace”) along a left-right axis to examine directions. After doing this we might consider adapting the debiasing approach discussed in the paper to measure or neutralize political slant in certain applications.

2. **Analyze analogy generation**


> The paper generates analogies using the formula:  
>  
> $$ S(a, b, x, y) = \cos(a - b,\; x - y) $$
> where **a**, **b**, **x**, and **y** are word vectors. This measures how parallel the vector difference $a - b$ is to $x - y$ (how similar their directional relationships are)  
>  
> In the paper’s example:  $a = \text{"she"}$  and $b = \text{"he"}$  
>  
> They then generate $(x, y)$ pairs such that: $a - b \approx x - y$
>
>  
> which lets them find analogies similar to "she is to he as x is to y".




In [None]:
!pip install gensim
import gensim.downloader as api
from itertools import combinations
import numpy as np



In [None]:
model = api.load("word2vec-google-news-300")



In [None]:
conflict_words = [
    'soldier', 'militant', 'terrorist', 'rebel', 'army', 'government', 'resistance', 'occupation',
    'leader', 'official', 'violence', 'peace', 'attack', 'defense', 'strike', 'conflict',
    'freedom', 'hostage', 'war', 'ceasefire', 'minister', 'state', 'nation', 'military', 'protest',
    'uprising', 'regime', 'diplomat', 'oppression', 'autonomy', 'settlement'
]
valid_words = [w for w in conflict_words if w in model]

In [None]:
a, b = model['israel'], model['palestine']
bias_direction = b - a


threshold = 0.5
analogies = []

for x, y in combinations(valid_words, 2):
    vec_x, vec_y = model[x], model[y]
    similarity = np.dot(vec_x, vec_y) / (np.linalg.norm(vec_x) * np.linalg.norm(vec_y))  # cosine similarity

    if similarity >= threshold: # reasonable pairs
        diff = vec_y - vec_x
        sim = np.dot(bias_direction, diff) / (np.linalg.norm(bias_direction) * np.linalg.norm(diff)) # similarity between diff and bias direction
        analogies.append((x, y, sim))

analogies.sort(key=lambda tup: -tup[2]) # sort descending based on similarity score
top_pairs = analogies[:15] # top 15 pairs most similar to israel -> palestine

print("top analogies aligned with israel → palestine:\n")
for x, y, score in top_pairs:
    print(f"{x:12} → {y:12}  (similarity: {score:.4f})")

top analogies aligned with israel → palestine:

occupation   → oppression    (similarity: 0.0444)
militant     → rebel         (similarity: 0.0310)
occupation   → war           (similarity: 0.0135)
militant     → terrorist     (similarity: -0.0035)
army         → military      (similarity: -0.0142)
conflict     → war           (similarity: -0.0715)
peace        → ceasefire     (similarity: -0.1128)


3. **Compare portrayals**


In [None]:
# good-bad portrayal axis

good_words = ['good', 'peaceful', 'honest', 'hero', 'liberator']
bad_words = ['bad', 'violent', 'corrupt', 'terrorist', 'oppressor']

# creating a vector (portrayal axis) by averaging the direction from each good word to its bad counterpart
portrayal_axis = np.mean([model[bad] - model[good] for good, bad in zip(good_words, bad_words)], axis=0)


In [None]:
# comparing directional bias across each pair

print("portrayal bias (positive = more aligned with 'bad'):\n")
for x, y, _ in top_pairs:
    # projections onto portrayal axis to see which side the words are more aligned with
    proj_x = np.dot(model[x], portrayal_axis) / np.linalg.norm(portrayal_axis)
    proj_y = np.dot(model[y], portrayal_axis) / np.linalg.norm(portrayal_axis)
    print(f"{x:12} score: {proj_x:+.4f}   |   {y:12} score: {proj_y:+.4f}")

portrayal bias (positive = more aligned with 'bad'):

occupation   score: +0.3438   |   oppression   score: +0.4714
militant     score: +1.1886   |   rebel        score: +0.5857
occupation   score: +0.3438   |   war          score: +0.1753
militant     score: +1.1886   |   terrorist    score: +1.8119
army         score: +0.0363   |   military     score: +0.2891
conflict     score: +0.2326   |   war          score: +0.1753
peace        score: -0.3424   |   ceasefire    score: +0.0579


**Analysis of the Portrayals**

I took the top analogy pairs that were most similar to the direction israel → palestine, and then projected both words in each pair onto a “good–bad” axis made from opposite pairs like peaceful → violent and hero → terrorist.

The results show that the model tends to frame words like "ceasefire" less positively than terms like "peace". It reflects how even synonyms or related terms can be portrayed very differently depending on context and media framing.