# Interpretation of BertForSequenceClassification in captum

In [1]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

from detector import Detector
import torch

In [2]:
# import sys
#
# print(sys.executable)
# model2 = BertForSequenceClassification.from_pretrained('../model/')


In [2]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [3]:

# load model
model = Detector("./4ChatGPTModel.pt")
model.model.to(device)
model.model.eval()
model.model.zero_grad()

# load tokenizer
tokenizer = model.tokenizer

Initializing Detector...


In [4]:
def predict(inputs):
    return model.model(inputs)[0]

In [5]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [6]:
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, add_special_tokens=False)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_token_type_pair(input_ids, sep_ind=0):
    seq_len = input_ids.size(1)
    token_type_ids = torch.tensor([[0 if i <= sep_ind else 1 for i in range(seq_len)]], device=device)
    ref_token_type_ids = torch.zeros_like(token_type_ids, device=device)# * -1
    return token_type_ids, ref_token_type_ids

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)
    position_ids = torch.arange(seq_length, dtype=torch.long, device=device)
    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

def custom_forward(inputs):
    preds = predict(inputs)
    return torch.softmax(preds, dim = 1)[0][0].unsqueeze(-1)

def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    return attributions

lig = LayerIntegratedGradients(custom_forward, model.model.roberta.embeddings)

In [13]:
# model.model(input_ids)

In [14]:
# predict(input_ids)

tensor([0.0032], device='cuda:0', grad_fn=<UnsqueezeBackward0>)

In [7]:
NUM_OF_TEXT = 10
MUTATION, REAL, SYNTHETIC, SYNTHETIC_MUTATION = 0, 1, 2, 3
FILE_TYPE = SYNTHETIC
# DATA_FILE = './data/Test_WikiHumanQuarterSet.json'
# DATA_FILE = './data/Test_WikiMutationFullReplaceAntonyms.json'
# DATA_FILE = './data/Test_WikiMutationFullReplaceRandomWords.json'
# DATA_FILE = './data/Test_WikiMutationFullReplaceSynonyms.json'
# DATA_FILE = './data/Test_WikiMutationFullSet.json'
# DATA_FILE = './data/Test_WikiMutationFullSetDeleteArticles.json'
# DATA_FILE = './data/Test_WikiMutationFullSetMisspellings.json'
# DATA_FILE = './data/Test_WikiMutationFullSetReplaceAE.json'
# DATA_FILE = './data/Test_WikiMutationQuarterSet.json'
# DATA_FILE = './data/Test_WikiSyntheticFullSet.json'
DATA_FILE = './data/Test_WikiChatGPTSyntheticQuarterSet.json'

In [8]:
from utils2 import load_standard_json
import random
text_list = load_standard_json(DATA_FILE, True)
print(text_list[0])
random.shuffle(text_list)

Kiadtiphon Udom is a Thai business magnate and entrepreneur who is best known for his success in the real estate industry. Born on June 30, 1971, in Bangkok, Thailand, Udom started his career as a real estate agent and quickly climbed the ladder of success to become the founder and CEO of his own real estate development company, the Udom Property Development Company Limited.

Udom was born into a family of real estate developers and grew up surrounded by the business. He studied Business Administration at  


In [9]:
text_list = ["Text generation has been an area of research that has gained significant attention in recent years due to the advancement of natural language processing and machine learning techniques. Text generation refers to the automated process of generating written content, such as articles, stories, or even chatbot responses, without the need for human intervention. The potential applications of text generation are vast, ranging from aiding in content creation for businesses to developing chatbots that can simulate human-like conversations.", #write me a thesis introduction on text generation
"Despite the significant progress made in text generation, challenges remain in creating text that is not only grammatically correct but also semantically meaningful and contextually appropriate. Furthermore, ethical considerations surrounding the potential misuse of text generation, such as the creation of fake news and impersonation, have brought attention to the need for responsible use and development of text generation models.",
"This thesis aims to explore the current state of text generation research, including the techniques used, evaluation metrics, and potential applications. The study will also examine the ethical implications of text generation and provide recommendations for responsible use and development of text generation models. By providing insights into the current state and future direction of text generation research, this thesis aims to contribute to the advancement of this field and promote the development of responsible and ethical text generation applications",
"Title: A Systematic Literature Review on Deepfake Detection Techniques\nIntroduction:\nDeepfakes are a type of synthetic media created using artificial intelligence (AI) algorithms that generate realistic images, videos, or audio that manipulate the perception of reality. Deepfake technology has raised serious concerns over its potential use in spreading disinformation, defamation, and propaganda. Consequently, researchers and experts have focused on developing effective deepfake detection techniques to combat these issues. This systematic literature review (SLR) aims to provide an overview of the current state-of-the-art deepfake detection techniques and their performance metrics.", #Write me an SLR on deep fake detection
"Methodology:\nThis SLR followed a systematic approach, including the identification of relevant literature, selection of primary studies, data extraction, and synthesis of results. We used a combination of search terms related to deepfake detection, including \"deepfake,\" \"deep learning,\" \"fake media,\" \"image manipulation,\" and \"video manipulation,\" to search the databases. We included peer-reviewed articles, conference papers, and preprints published between 2017 and 2022. We excluded articles that did not focus on deepfake detection or did not propose any detection techniques."]
text_list = [text[:512] for text in text_list]

In [9]:
#Green most important, red least important
print('\033[1m', 'Visualization For Score', '\033[0m')
for text in text_list[:NUM_OF_TEXT]:
    input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
    token_type_ids, ref_token_type_ids = construct_input_ref_token_type_pair(input_ids, sep_id)
    position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
    attention_mask = construct_attention_mask(input_ids)

    indices = input_ids[0].detach().tolist()
    all_tokens = tokenizer.convert_ids_to_tokens(indices)

    custom_forward(input_ids)

    attributions, delta = lig.attribute(inputs=input_ids,
                                        baselines=ref_input_ids,
                                        return_convergence_delta=True,
                                        internal_batch_size=1)

    score = predict(input_ids)

    attributions_sum = summarize_attributions(attributions)

    # storing couple samples in an array for visualization purposes
    all_tokens = [token.replace("Ġ", "") for token in all_tokens]
    score_vis = viz.VisualizationDataRecord(
                            attributions_sum,
                            torch.softmax(score, dim = 1)[0][0],
                            torch.argmax(torch.softmax(score, dim = 1)[0]),
                            FILE_TYPE,
                            text,
                            attributions_sum.sum(),
                            all_tokens,
                            delta)

    viz.visualize_text([score_vis])

[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Andy Love is a former British politician who served as a Member of Parliament (MP) for the Edmonton constituency from 1997 to 2015. He was a prominent figure in the UK's centre-left Labour Party and held various roles within the party, including being a member of the Foreign Affairs Select Committee. Early Life and Education Andy Love was born on 4 July 1954 in Hackney, London. He grew up in Tottenham and attended Tottenham Grammar School. After completing his A-levels, he went on to study at the Univers",1.33,"#s Andy Love is a former British politician who served as a Member of Parliament ( MP ) for the Edmonton constituency from 1997 to 2015 . He was a prominent figure in the UK 's centre - left Labour Party and held various roles within the party , including being a member of the Foreign Affairs Select Committee . Ċ Ċ Early Life and Education Ċ Ċ Andy Love was born on 4 July 1954 in Hack ney , London . He grew up in Tottenham and attended Tottenham Gram mar School . After completing his A - levels , he went on to study at the Univers #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Sticky Studios is a Dutch independent video game development company based in Amsterdam. The company was founded in 2012 by Casper van Est and Pepijn Rijnders. They specialize in creating games for mobile platforms, consoles, and PC. History Sticky Studios was started in 2012 by Casper van Est and Pepijn Rijnders, both of whom had previously worked for Guerrilla Games. The company started as a small team of developers in Amsterdam, with a focus on creating games for mobile platforms such as iOS and Andro",-2.14,"#s St icky Studios is a Dutch independent video game development company based in Amsterdam . The company was founded in 2012 by Cas per van Est and Pep ijn R ij nd ers . They specialize in creating games for mobile platforms , consoles , and PC . Ċ Ċ History Ċ Ċ St icky Studios was started in 2012 by Cas per van Est and Pep ijn R ij nd ers , both of whom had previously worked for Gu errilla Games . The company started as a small team of developers in Amsterdam , with a focus on creating games for mobile platforms such as iOS and And ro #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Reema Sen is an Indian actress who predominantly works in Tamil, Telugu, and Hindi films. She was born on 29 October 1981 in Kolkata, West Bengal, India. Early Life and Education: Reema Sen completed her schooling in Kolkata and then moved to Mumbai to pursue her modeling career. Later, she completed her graduation in English Literature from the University of Calcutta. She was awarded the Best Student award for her outstanding academic performance. Career: Reema Sen began her acting career in the Tamil",1.26,"#s Re ema Sen is an Indian actress who predominantly works in Tamil , Tel ugu , and Hindi films . She was born on 29 October 1981 in K olk ata , West Bengal , India . Ċ Ċ Early Life and Education : Ċ Ċ Re ema Sen completed her schooling in K olk ata and then moved to Mumbai to pursue her modeling career . Later , she completed her graduation in English Literature from the University of Cal cut ta . She was awarded the Best Student award for her outstanding academic performance . Ċ Ċ Care er : Ċ Ċ Re ema Sen began her acting career in the Tamil #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Ibrahim Kanaan is a Lebanese politician who has been serving as a member of parliament representing the Lebanese Forces party from the Metn District since 1992. He has also held several key positions in government and politics, including Minister of Economy and Trade and Minister of Finance. Early Life and Education Ibrahim Kanaan was born in Dhour El Choueir, Lebanon in 1950. He attended Collège des Frères in Dhour El Choueir before studying economics and political science at Saint Joseph University in B",0.75,"#s I b rahim K ana an is a Lebanese politician who has been serving as a member of parliament representing the Lebanese Forces party from the Met n District since 1992 . He has also held several key positions in government and politics , including Minister of Economy and Trade and Minister of Finance . Ċ Ċ Early Life and Education Ċ I b rahim K ana an was born in D hour El Chou e ir , Lebanon in 1950 . He attended Coll Ã¨ ge des Fr Ã¨ res in D hour El Chou e ir before studying economics and political science at Saint Joseph University in B #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Lasioglossum lionotum Lasioglossum lionotum is a species of bee belonging to the family Halictidae. The species was first discovered by Thomas H. Atkinson in 1906 in the western United States. L. lionotum is a small bee, approximately 5 to 8 millimeters in length. The bee has a yellowish-brown head, thorax, and scutellum. The legs are yellowish-brown, while the abdomen is black with yellow markings. Taxonomy and classification The genus Lasioglossum contains over 1,000",1.26,"#s L asi og loss um lion ot um Ċ Ċ L asi og loss um lion ot um is a species of bee belonging to the family Hal ict idae . The species was first discovered by Thomas H . Atkinson in 1906 in the western United States . L . lion ot um is a small bee , approximately 5 to 8 mill imeters in length . The bee has a yellow ish - brown head , thor ax , and sc ut ell um . The legs are yellow ish - brown , while the abdomen is black with yellow markings . Ċ Ċ Tax onomy and classification Ċ Ċ The genus Las i og loss um contains over 1 , 000 #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Santiago Ramirez (footballer, born 2001) Santiago Ramirez is an Argentine professional football player. He was born on March 16, 2001, in Buenos Aires, Argentina. He began his football career with Club Atlético Boca Juniors. Career Ramirez began playing football at a young age and joined the Boca Juniors youth academy where he developed his skills as an attacking midfielder. In 2020, he was promoted to the first team of Boca Juniors, one of the most successful football clubs in Argentina. His impressive",-0.83,"#s S anti ago Ramirez ( football er , born 2001 ) Ċ Ċ S anti ago Ramirez is an Argentine professional football player . He was born on March 16 , 2001 , in Buenos Aires , Argentina . He began his football career with Club Atl Ã©t ico B oca Jun iors . Ċ Ċ Care er Ċ Ċ Ram irez began playing football at a young age and joined the B oca Jun iors youth academy where he developed his skills as an attacking midfielder . In 2020 , he was promoted to the first team of B oca Jun iors , one of the most successful football clubs in Argentina . His impressive #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Kim Ko-am Kim Ko-am was a South Korean artist who rose to fame for his unique fusion of traditional Korean painting techniques and modern abstract expressionism. He was born on June 15, 1938, in Seoul, South Korea, and died on October 22, 2003, in the same city. Early life and education Kim Ko-am grew up in the artistic atmosphere of his father, Heo Baek-ryeon, a master of traditional Korean painting. Heo Baek-ryeon was a close friend of the famous painter Lee Jung-seop, who often visited their house, i",2.5,"#s Kim Ko - am Ċ Ċ Kim Ko - am was a South Korean artist who rose to fame for his unique fusion of traditional Korean painting techniques and modern abstract expression ism . He was born on June 15 , 1938 , in Seoul , South Korea , and died on October 22 , 2003 , in the same city . Ċ Ċ Early life and education Ċ Ċ Kim Ko - am grew up in the artistic atmosphere of his father , He o Ba ek - ry eon , a master of traditional Korean painting . He o Ba ek - ry eon was a close friend of the famous painter Lee Jung - se op , who often visited their house , i #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Russom is a scenic town located in the Gash-Barka region of Eritrea. The town is situated at an altitude of 1,107 meters (3,632 feet) above sea-level and has a total population of approximately 9,096 people. History Russom has a rich historical background, and it was mainly inhabited by the Tigrigna and Tigre people. These communities are known for their rich cultural heritage, including traditional dance, poetry, songs, and attire. During the colonial period, Russom served as an important center of tra",0.39,"#s Russ om is a scenic town located in the G ash - B ark a region of Erit rea . The town is situated at an altitude of 1 , 107 meters ( 3 , 6 32 feet ) above sea - level and has a total population of approximately 9 , 09 6 people . Ċ Ċ History Ċ Ċ Russ om has a rich historical background , and it was mainly inhabited by the T igr ign a and Tig re people . These communities are known for their rich cultural heritage , including traditional dance , poetry , songs , and attire . Ċ Ċ During the colonial period , Russ om served as an important center of tra #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Hermann Christof von Russwurm (March 24, 1799 – October 29, 1865) was a German born Bishop and missionary in Africa. Early years and education Russwurm was born in Wechselburg, Saxony, Germany, on March 24, 1799, and grew up in a family of pastors. He received his early education in a local school and continued his theological studies at the University of Leipzig, from where he graduated in 1821. Missionary work in Ghana After his ordination, Russwurm was sent to Keta, a town on the coast",1.46,"#s H erman n Christ of von Russ w ur m ( March 24 , 17 99 âĢĵ October 29 , 1865 ) was a German born Bishop and missionary in Africa . Ċ Ċ Early years and education Ċ Ċ Russ w ur m was born in We ch sel burg , Sax ony , Germany , on March 24 , 17 99 , and grew up in a family of pastors . He received his early education in a local school and continued his theological studies at the University of Le ip zig , from where he graduated in 18 21 . Ċ Ċ Mission ary work in Ghana Ċ Ċ After his ord ination , Russ w ur m was sent to K eta , a town on the coast #/s"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
2.0,2 (0.00),"Thomas B. Steel (born July 17, 1965) is an American entrepreneur and philanthropist. He is primarily known as the founder of Steel Industries, a multi-national conglomerate with interests in various sectors such as steel, mining, energy, and real estate. He is one of the most successful business leaders in the United States and is also known for his generous donations to charitable causes. Early life and education Thomas B. Steel was born in Chicago, Illinois, to a middle-class family. His father worked",0.27,"#s Thomas B . Steel ( born July 17 , 1965 ) is an American entrepreneur and philanthrop ist . He is primarily known as the founder of Steel Industries , a multi - national conglomerate with interests in various sectors such as steel , mining , energy , and real estate . He is one of the most successful business leaders in the United States and is also known for his generous donations to charitable causes . Ċ Ċ Early life and education Ċ Ċ Thomas B . Steel was born in Chicago , Illinois , to a middle - class family . His father worked #/s"
,,,,
