# Passage Ranking by Similarity

In [36]:
from sentence_transformers import SentenceTransformer, util
import pandas as pd

## Define the Concept of "Open/Close Internet" and Embed Both Policy Sentences and Queries

In [2]:
open_internet_queries = pd.read_csv('open_internet.csv')['Sentences'].to_list()

close_internet_queries = pd.read_csv('close_internet.csv')['Sentences'].to_list()

print(len(open_internet_queries), open_internet_queries)

print(len(close_internet_queries), close_internet_queries)

10 ['Toward this end, the Dutch government wants to work together more effectively with other parties on the security and the reliability of an open and free digital society.', 'The Netherlands stands for safe and reliable ICT1 and the protection of the openness and freedom of the Internet.', 'The goal of this is to play an active role, in the global context of an open and transparent dialogue, of touching upon topics which can contribute to this strategy, such as improving the game rules on the Internet and combating abuse.', 'Privacy, respect for others and fundamental rights such as the freedom of expression and information gathering must be maintained. ', 'An appropriate balance must remain between, on the one hand, our desire for public and national security and, on the other, the safeguarding of our fundamental rights. ', 'Cyberspace is a global common and no country can control a piece of this global common.', 'To enhance global cooperation by promoting shared understanding and 

In [3]:
# Load SBERT model
model = SentenceTransformer('all-MiniLM-L6-v2')

open_internet_test, close_internet_test = pd.read_csv('open_internet_test.csv').dropna()['Sentences'].to_list(), pd.read_csv('close_internet_test.csv').dropna()['Sentences'].to_list()
# Your list of policy document sentences
sentences = open_internet_test

sentences = list(set(sentences))

print(len(sentences))
for i in sentences: print(i)

# Encode both
sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
open_internet_query_embeddings = model.encode(open_internet_queries, convert_to_tensor=True)
close_internet_query_embeddings = model.encode(close_internet_queries, convert_to_tensor=True)

6
internet providers shall respect device neutrality by not limiting the right to use the internet on any legal device that does not impair use of the net or the quality of service
The FCC was chartered to promote competition, innovation, and investment in our networks. In service of that mission, there is no higher calling than protecting an open, accessible, and free Internet.
guaranteeing the freedom of mass information and the prohibition of censorship;
internet providers shall not block or interfere in any way with the rights of the user to use any content, application or service on the internet
It also reaffirms and recommits its partners to a single global Internet – one that is truly open and fosters competition, privacy, and respect for human rights. The Declaration’s principles include commitments to:
An open Internet is essential to the American economy, and increasingly to our very way of life. By lowering the cost of launching a new idea, igniting new political movements, 

## Compute Semantic Similarity and Label Sentences Based on a Threshold

### Open & Closed

In [None]:
print('Ranking sentences by "openness: \n')
# Compute cosine similarity
cos_scores = util.cos_sim(sentence_embeddings, open_internet_query_embeddings)  # shape: (num_sentences, num_queries)

# For each sentence, get the max similarity across all queries
max_scores = cos_scores.max(dim=1).values  # best match per sentence

sorted_indices = max_scores.argsort(descending=True)
for idx in sorted_indices:
    print(f"{max_scores[idx]:.2f} - {sentences[idx]}")
# threshold = 0.5
# labels = ["Relevant" if score > threshold else "Irrelevant" for score in max_scores]

# for sentence, label, score in zip(sentences, labels, max_scores):
#     print(f"[{label} | score={score:.2f}] {sentence}")



Ranking sentences by "openness: 

0.61 - guaranteeing the freedom of mass information and the prohibition of censorship;
0.57 - It also reaffirms and recommits its partners to a single global Internet – one that is truly open and fosters competition, privacy, and respect for human rights. The Declaration’s principles include commitments to:
0.54 - An open Internet is essential to the American economy, and increasingly to our very way of life. By lowering the cost of launching a new idea, igniting new political movements, and bringing communities closer together, it has been one of the most significant democratizing influences the world has ever known.
0.50 - internet providers shall not block or interfere in any way with the rights of the user to use any content, application or service on the internet
0.47 - internet providers shall respect device neutrality by not limiting the right to use the internet on any legal device that does not impair use of the net or the quality of service
0

In [33]:
print('Ranking sentences by "closedness: \n')
# Compute cosine similarity
cos_scores = util.cos_sim(sentence_embeddings, close_internet_query_embeddings)  # shape: (num_sentences, num_queries)

# For each sentence, get the max similarity across all queries
max_scores = cos_scores.max(dim=1).values  # best match per sentence

threshold = 0.5
labels = ["Relevant" if score > threshold else "Irrelevant" for score in max_scores]

# Output
for sentence, label, score in zip(sentences, labels, max_scores):
    print(f"[{label} | score={score:.2f}] {sentence}")

# sorted_indices = max_scores.argsort(descending=True)
# for idx in sorted_indices:
#     print(f"{max_scores[idx]:.2f} - {sentences[idx]}")

Ranking sentences by "closedness: 

[Relevant | score=0.60] An open Internet is essential to the American economy, and increasingly to our very way of life. By lowering the cost of launching a new idea, igniting new political movements, and bringing communities closer together, it has been one of the most significant democratizing influences the world has ever known.
[Relevant | score=0.52] internet providers shall not block or interfere in any way with the rights of the user to use any content, application or service on the internet
[Relevant | score=0.60] It also reaffirms and recommits its partners to a single global Internet – one that is truly open and fosters competition, privacy, and respect for human rights. The Declaration’s principles include commitments to:
[Irrelevant | score=0.38] If a consumer requests access to a website or service, and the content is legal, your ISP should not be permitted to block it. That way, every player — not just those commercially affiliated with

### Comparison

In [52]:
from sentence_transformers import util

# Compute cosine similarities
open_cos_scores = util.cos_sim(sentence_embeddings, open_internet_query_embeddings)
close_cos_scores = util.cos_sim(sentence_embeddings, close_internet_query_embeddings)

# Max similarity per sentence across multiple queries
open_scores = open_cos_scores.max(dim=1).values
close_scores = close_cos_scores.max(dim=1).values

# Set relevance threshold
threshold = 0.5
open_labels = ["Relevant" if score > threshold else "Irrelevant" for score in open_scores]
close_labels = ["Relevant" if score > threshold else "Irrelevant" for score in close_scores]

# Output: Print side-by-side comparison
print('Comparing "openness" vs "closedness" scores for each sentence:\n')
for sentence, o_score, o_label, c_score, c_label in zip(sentences, open_scores, open_labels, close_scores, close_labels):
    if o_score > c_score: tendency = 'MORE OPEN'
    else: tendency = 'MORE CLOSED'
    print(f"[Open: {o_label}|{o_score:.2f}] [Closed: {c_label}|{c_score:.2f}] [{tendency}] {sentence}")


Comparing "openness" vs "closedness" scores for each sentence:

[Open: Relevant|0.54] [Closed: Relevant|0.60] [MORE CLOSED] An open Internet is essential to the American economy, and increasingly to our very way of life. By lowering the cost of launching a new idea, igniting new political movements, and bringing communities closer together, it has been one of the most significant democratizing influences the world has ever known.
[Open: Irrelevant|0.50] [Closed: Relevant|0.52] [MORE CLOSED] internet providers shall not block or interfere in any way with the rights of the user to use any content, application or service on the internet
[Open: Relevant|0.57] [Closed: Relevant|0.60] [MORE CLOSED] It also reaffirms and recommits its partners to a single global Internet – one that is truly open and fosters competition, privacy, and respect for human rights. The Declaration’s principles include commitments to:
[Open: Relevant|0.61] [Closed: Relevant|0.50] [MORE OPEN] guaranteeing the freedom 

# Using Augmented Data

In [46]:
from sentence_transformers import SentenceTransformer, util
import pandas as pd

In [47]:
open_internet_queries = pd.read_excel('Open_Sentences_ChatGPT.xlsx')['Sentences'].to_list()
close_internet_queries = pd.read_excel('Closed_Sentences_ChatGPT.xlsx')['Sentences'].to_list()
open_internet_query_embeddings = model.encode(open_internet_queries, convert_to_tensor=True)
close_internet_query_embeddings = model.encode(close_internet_queries, convert_to_tensor=True)

In [48]:
open_internet = pd.read_csv('open_internet.csv')['Sentences'].to_list()

close_internet = pd.read_csv('close_internet.csv')['Sentences'].to_list()

open_internet_test, close_internet_test = pd.read_csv('open_internet_test.csv').dropna()['Sentences'].to_list(), pd.read_csv('close_internet_test.csv').dropna()['Sentences'].to_list()

sentences = open_internet + open_internet_test 
sentences = list(set(sentences))
sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
print(len(sentences), sentences)

16 ['The Netherlands stands for safe and reliable ICT1 and the protection of the openness and freedom of the Internet.', 'Through vigorous and eective international cooperation, establish a multi-lateral, Close and accept democratic and transparent international Internet governance system, and jointly build a peaceful, secure, open, cooperative and orderly cyberspace.', 'The FCC was chartered to promote competition, innovation, and investment in our networks. In service of that mission, there is no higher calling than protecting an open, accessible, and free Internet.', 'internet providers shall respect device neutrality by not limiting the right to use the internet on any legal device that does not impair use of the net or the quality of service', 'The goal of this is to play an active role, in the global context of an open and transparent dialogue, of touching upon topics which can contribute to this strategy, such as improving the game rules on the Internet and combating abuse.', 'g

In [49]:
# Compute cosine similarities
open_cos_scores = util.cos_sim(sentence_embeddings, open_internet_query_embeddings)
close_cos_scores = util.cos_sim(sentence_embeddings, close_internet_query_embeddings)

# Max similarity per sentence across multiple queries
open_scores = open_cos_scores.max(dim=1).values
close_scores = close_cos_scores.max(dim=1).values

# Set relevance threshold
threshold = 0.5
open_labels = ["Relevant" if score > threshold else "Irrelevant" for score in open_scores]
close_labels = ["Relevant" if score > threshold else "Irrelevant" for score in close_scores]

# Output: Print side-by-side comparison
print('Classifying open-internet-opinion sentences:\n')
tendency_stats = {}
tendency_stats['open'] = 0
tendency_stats['close'] = 0
for sentence, o_score, o_label, c_score, c_label in zip(sentences, open_scores, open_labels, close_scores, close_labels):
    if o_score > c_score: 
        tendency = 'MORE OPEN'
        tendency_stats['open'] += 1
    else: 
        tendency = 'MORE CLOSED'
        tendency_stats['close'] += 1
    print(f"[Open: {o_label}|{o_score:.2f}] [Closed: {c_label}|{c_score:.2f}] [{tendency}] {sentence}")
print(tendency_stats)

Classifying open-internet-opinion sentences:

[Open: Relevant|0.57] [Closed: Relevant|0.51] [MORE OPEN] The Netherlands stands for safe and reliable ICT1 and the protection of the openness and freedom of the Internet.
[Open: Relevant|0.69] [Closed: Relevant|0.65] [MORE OPEN] Through vigorous and eective international cooperation, establish a multi-lateral, Close and accept democratic and transparent international Internet governance system, and jointly build a peaceful, secure, open, cooperative and orderly cyberspace.
[Open: Irrelevant|0.47] [Closed: Irrelevant|0.42] [MORE OPEN] The FCC was chartered to promote competition, innovation, and investment in our networks. In service of that mission, there is no higher calling than protecting an open, accessible, and free Internet.
[Open: Relevant|0.74] [Closed: Relevant|0.61] [MORE OPEN] internet providers shall respect device neutrality by not limiting the right to use the internet on any legal device that does not impair use of the net o

In [40]:
sentences = close_internet + close_internet_test 
sentences = list(set(sentences))
sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
print(len(sentences), sentences)

15 ['No infringement of sovereignty in cyberspace will be tolerated.', 'New territories for national sovereignty. Cyberspace has become a new area for important human activity of equal importance to land, sea, air and space, national sovereignty has extended and stretched into cyberspace, sovereignty in cyberspace has become an important component part of national sovereignty.\xa0', 'Respect for sovereignty in cyberspace, safeguarding cybersecurity.', 'facilitating the development of an international information security system aimed at countering threats of the use of information technologies to compromise the strategic stability, at strengthening equal strategic partnership in the sphere of information security, as well as protecting the information sovereignty of the Russian Federation', "Within a state's borders, a state will be controlling its own cyberspace.\xa0", 'Resolutely defending sovereignty in cyberspace.', 'Manage online activities within the scope of our country’s sovere

In [44]:
# Compute cosine similarities
open_cos_scores = util.cos_sim(sentence_embeddings, open_internet_query_embeddings)
close_cos_scores = util.cos_sim(sentence_embeddings, close_internet_query_embeddings)

# Max similarity per sentence across multiple queries
open_scores = open_cos_scores.max(dim=1).values
close_scores = close_cos_scores.max(dim=1).values

# Set relevance threshold
threshold = 0.5
open_labels = ["Relevant" if score > threshold else "Irrelevant" for score in open_scores]
close_labels = ["Relevant" if score > threshold else "Irrelevant" for score in close_scores]

tendency_stats = {}
tendency_stats['open'] = 0
tendency_stats['close'] = 0
# Output: Print side-by-side comparison
print('Classifying close-internet-opinion sentences:\n')
for sentence, o_score, o_label, c_score, c_label in zip(sentences, open_scores, open_labels, close_scores, close_labels):
    if o_score > c_score: 
        tendency = 'MORE OPEN'
        tendency_stats['open'] += 1
    else: 
        tendency = 'MORE CLOSED'
        tendency_stats['close'] += 1
    print(f"[Open: {o_label}|{o_score:.2f}] [Closed: {c_label}|{c_score:.2f}] [{tendency}] {sentence}")

print(tendency_stats)

Classifying close-internet-opinion sentences:

[Open: Relevant|0.62] [Closed: Relevant|0.73] [MORE CLOSED] No infringement of sovereignty in cyberspace will be tolerated.
[Open: Relevant|0.62] [Closed: Relevant|0.82] [MORE CLOSED] New territories for national sovereignty. Cyberspace has become a new area for important human activity of equal importance to land, sea, air and space, national sovereignty has extended and stretched into cyberspace, sovereignty in cyberspace has become an important component part of national sovereignty. 
[Open: Relevant|0.69] [Closed: Relevant|0.68] [MORE OPEN] Respect for sovereignty in cyberspace, safeguarding cybersecurity.
[Open: Relevant|0.50] [Closed: Relevant|0.58] [MORE CLOSED] facilitating the development of an international information security system aimed at countering threats of the use of information technologies to compromise the strategic stability, at strengthening equal strategic partnership in the sphere of information security, as well 