In [1]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import torch
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU")

Using GPU: Tesla T4


In [3]:
!pip install transformers datasets evaluate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [4]:
from transformers import pipeline, AutoTokenizer, AutoModelForQuestionAnswering
import pandas as pd

# Load the dataset
df = pd.read_csv("train-with-spoilers.csv")
df =  df[df['tags'] == 'phrase']

#distilbert-base-uncased-distilled-squad

# Create the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModelForQuestionAnswering.from_pretrained("roberta-base")

# Define the question answering pipeline
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer, batch_size = 1, device = 0)

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForQuestionAnswering: ['lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use 

In [5]:
df

Unnamed: 0.1,Unnamed: 0,uuid,postText,targetParagraphs,targetTitle,spoiler,spoilerPositions,tags,start_positions,end_positions
1,1,b1a1f63d-8853-4a11-89e8-6b2952a393ec,NASA sets date for full recovery of ozone hole,2070 is shaping up to be a great year for Moth...,Hole In Ozone Layer Expected To Make Full Reco...,2070,"[[[0, 0], [0, 4]]]",phrase,[0],[4]
2,2,008b7b19-0445-4e16-8f9e-075b73f80ca4,This is what makes employees happy -- and it's...,"Despite common belief, money isn't the key to ...",Intellectual Stimulation Trumps Money For Empl...,intellectual stimulation,"[[[1, 186], [1, 210]]]",phrase,[272],[296]
4,4,31b108a3-c828-421a-a4b9-cf651e9ac859,The perfect way to cook rice so that it's perf...,"Boiling rice may seem simple, but there is a v...",Revealed: The perfect way to cook rice so that...,in a rice cooker,"[[[5, 60], [5, 76]]]",phrase,[655],[671]
10,10,46aa8d72-fbd4-4796-a85b-0a0363c21812,Analysis: This may be the most brutal number i...,Plenty has been made of the big Congressional ...,This may be the most brutal number in the CBO ...,750 percent,"[[[2, 109], [2, 120]]]",phrase,[583],[594]
11,11,917e1106-413c-4be5-818b-ad500314feaa,#TeenMom2 star @PBandJenelley_1 reveals the se...,"""Teen Mom 2"" star Jenelle Evans took to Twitte...",'Teen Mom 2' Star Jenelle Evans Reveals Sex Of...,boy,"[[[0, 103], [0, 106]]]",phrase,[103],[106]
...,...,...,...,...,...,...,...,...,...,...
3178,3178,edec6cde-bbd3-4f50-ba51-ef98bdf19b63,"He Dug A Huge Hole In His Backyard, And What H...","When it comes to home projects, sometimes the ...","He Dug A Huge Hole In His Backyard, And What H...",own underground bunker!,"[[[20, 70], [20, 93]]]",phrase,[6104],[6127]
3179,3179,84277c0f-4ef1-444e-8e04-55c84af593e5,Best Buy Has An Insane Xbox One Deal For A Lim...,Best Buy Has An Insane Xbox One Deal For A Lim...,Best Buy Has An Insane Xbox One Deal For A Lim...,$50 off,"[[[8, 101], [8, 108]]]",phrase,[1814],[1821]
3183,3183,5b61d712-8b03-4ee6-ba04-b39ce2b206f7,Student forced to carry papers to prove she ca...,She's not a tourist visiting a foreign country...,"Melona Clark, Hampton University Student, Carr...",student at Hampton University in Virginia,"[[[1, 9], [1, 50]]]",phrase,[163],[204]
3198,3198,9d05984c-3920-47c0-aa97-8df58cca1fec,You need to see this Twitter account that pred...,What the HELL?! 1. Unless you’re somewhere wit...,"WTF, It Looks Like This Twitter Account ""Predi...",@beyoncefan666,"[[[3, 55], [3, 69]]]",phrase,[408],[422]


In [6]:
def preprocess_function(examples):
    questions = [q.strip() for q in examples["postText"]]
    inputs = tokenizer(
        questions,
        examples["targetParagraphs"],
        max_length=384,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    answers = examples["spoiler"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        start_char = int(examples["start_positions"][i][1])
        end_char = start_char + len(examples["spoiler"][i])
        sequence_ids = inputs.sequence_ids(i)

        # Find the start and end of the context
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        context_start = idx
        while sequence_ids[idx] == 1:
            idx += 1
        context_end = idx - 1

        # If the answer is not fully inside the context, label it (0, 0)
        if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
            start_positions.append(0)
            end_positions.append(0)
        else:
            # Otherwise it's the start and end token positions
            idx = context_start
            while idx <= context_end and offset[idx][0] <= start_char:
                idx += 1
            start_positions.append(idx - 1)

            idx = context_end
            while idx >= context_start and offset[idx][1] >= end_char:
                idx -= 1
            end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

In [7]:
import torch
torch.cuda.empty_cache()

In [8]:
from sklearn.model_selection import train_test_split

df_train, df_test = train_test_split(df, test_size=0.2)

In [9]:
from datasets import load_dataset
import datasets

val_dataset2 = datasets.Dataset.from_pandas(df_test)
train_dataset2 = datasets.Dataset.from_pandas(df_train)

In [10]:
train_dataset = train_dataset2.map(preprocess_function, batched=True,
    remove_columns=train_dataset2.column_names)


Map:   0%|          | 0/1093 [00:00<?, ? examples/s]

In [11]:
validation_dataset = val_dataset2.map(preprocess_function, batched=True,
    remove_columns=val_dataset2.column_names)


Map:   0%|          | 0/274 [00:00<?, ? examples/s]

In [12]:
print(validation_dataset)

Dataset({
    features: ['start_positions', 'end_positions', 'input_ids', 'attention_mask'],
    num_rows: 274
})


In [13]:
print(train_dataset)

Dataset({
    features: ['start_positions', 'end_positions', 'input_ids', 'attention_mask'],
    num_rows: 1093
})


In [14]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

In [15]:
torch.cuda.empty_cache()

In [16]:
pip install --upgrade accelerate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [17]:
!pip install transformers --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [18]:
!pip install torch --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [19]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="qa_train_passage",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=5,
    weight_decay=0.01,
    save_strategy="epoch",
    load_best_model_at_end=True,
    #push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

trainer.train()



Epoch,Training Loss,Validation Loss
1,1.9594,1.619157
2,1.429,1.524606
3,1.1216,1.67866
4,0.8692,2.018714
5,0.6047,2.321848


TrainOutput(global_step=2735, training_loss=1.1401159366698326, metrics={'train_runtime': 591.8352, 'train_samples_per_second': 9.234, 'train_steps_per_second': 4.621, 'total_flos': 1070990081671680.0, 'train_loss': 1.1401159366698326, 'epoch': 5.0})

In [20]:
from transformers import pipeline

question_answerer = pipeline("question-answering", model=model, tokenizer=tokenizer, batch_size = 4, device = 0)
# question_answerer(question=question, context=context)

predictions = []

for index, row in df_test.iterrows():
    context = row["targetParagraphs"]
    question = row["postText"]
    response = question_answerer(context=context, question=question)
    predictions.append(response["answer"])



In [None]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m66.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.28.1


In [None]:
from transformers import pipeline, AutoTokenizer, AutoModelForQuestionAnswering
import pandas as pd



# Create the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModelForQuestionAnswering.from_pretrained("roberta-base")

# Define the question answering pipeline
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)

Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForQuestionAnswering: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use 

In [None]:
import torch
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU")

Using GPU: Tesla T4


In [None]:
from transformers import pipeline, AutoTokenizer, AutoModelForQuestionAnswering
import pandas as pd

# Load the dataset
df = pd.read_csv("train.csv")

# Create the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModelForQuestionAnswering.from_pretrained("roberta-base")

# Define the question answering pipeline
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer, batch_size = 1, device = 0)

# Loop through each row in the dataset and train the model
for index, row in df.iterrows():
    context = row["targetParagraphs"] +" "+ str(row["targetTitle"]) +" "+ str(row["targetDescription"])
    question = row["postText"]
    answer = row["spoiler"]
    qa_pipeline(context=context, question=question, answer=answer)

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForQuestionAnswering: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use 

In [None]:
model.save_pretrained("/newmodel")

In [None]:

from transformers import AutoModelForQuestionAnswering
from google.colab import drive
import pandas as pd

# Mount Google Drive
drive.mount('/content/draive')

Mounted at /content/draive


In [None]:
# Read the file from Google Drive
model = AutoModelForQuestionAnswering.from_pretrained("/content/drive/MyDrive/NLP model")

OSError: ignored

In [None]:

# Use the trained model to generate responses

question = "Five Nights at Freddy’s Sequel Delayed for Weird Reason Five Nights at Freddy's creator Scott Cawthon takes to Steam to tease a possible delay for Five Nights at Freddy's: Sister Location, the fifth game in the series. "
context = "Five Nights at Freddy’s creator Scott Cawthon takes to Steam to tease a possible delay for Five Nights at Freddy’s: Sister Location, the fifth game in the series., For the past couple of years, horror gaming fans have been able to look forward to one new entry in the Five Nights at Freddy’s series after another, with four core games, one RPG spinoff, and a novel released so far. The next game in the franchise, Five Nights at Freddy’s: Sister Location, was scheduled to release this coming Friday, October 7th, but if developer Scott Cawthon is to be believed, the project has been delayed by a few months.,According to a post by Cawthon on the Five Nights at Freddy’s: Sister Location Steam page, the game is being delayed because it’s too dark. Cawthon said that some of the plot elements are so disturbing that they are making him feel sick, and so he is thinking about delaying the game so that he can rework it entirely \into something kid-friendly.\,Delays happen in the gaming industry all the time, but it’s rare for a game to be delayed mere days before its release. Five Nights at Freddy’s fans are confused and angry about this latest development, as many were looking forward to playing the game on Friday. Something else upsetting fans is Cawthon’s reasoning that the game is too dark to release, as being dark and disturbing are two characteristics that many consumers look for in a horror game.Cawthon’s reason for suddenly delaying Five Nights at Freddy’s Sister Location from its planned October 7th release date doesn’t make much sense. A more likely scenario is that this is just a weird publicity stunt meant to hype the game as being so disturbing that its developer almost didn’t even release it. Alternatively, perhaps Cawthon is delaying the game for technical reasons and decided to concoct this story instead of admitting that the fifth core game in the series has issues.,Fans should also consider the possibility that Cawthon is just trolling in an attempt to throw them off the scent of an early release. Cawthon has a habit of surprising fans by releasing Five Nights at Freddy’s games early, and it wouldn’t be all that shocking for Five Nights at Freddy’s: Sister Location to carry on that tradition, despite Cawthon’s post to the contrary., With October 7th just a few days away, fans will learn soon enough whether or not Cawthon is serious about Sister Location‘s delay. If the game is delayed, it will be interesting to see if Cawthon actually does rework it to be more kid-friendly,\ or if he goes with a slightly altered version of his original vision., Five Nights at Freddy’s: Sister Location is scheduled to launch on October 7th for PC as well as iOS and Android mobile devices., Source: Scott Cawthon"

response = qa_pipeline(context=context, question=question)

print(response["answer"])

Location is scheduled to launch on October 7th for PC as well as


In [None]:

question = "Say it ain't so! Jon Stewart has set his official departure date from #TheDailyShow"
context =  "Jon Stewart now has a firm departure date from Comedy Central’s \The Daily Show.\ The comic announced on Monday’s broadcast of the program that he will leave the show after its August 6th broadcast.,The disclosure paves the way for the show’s new host, Trevor Noah, and suggests that Stewart will not hang around as candidates start to make announcements this year about running in the 2016 election for U.S. President.Stewart had said previously that he would continue to do the program until some time between July and the end of 2015.,Stewart made the announcement at the very end of the evening’s broadcast, just before rolling the program’s signature \"Moment of Zen\" final segment. He offered few details about what he might do for his final broadcast, but did reiterate a contest that would give a viewer the chance to attend the program’s last taping.,Stewart’s announcement sets the stage for Trevor Noah, a South African comic who has hosted a late-night program in that country, to take the reins of the series. Noah is a relative unknown in the United States and has already come under scrutiny for a series of controversial tweets made in past years that were discovered on social media after Comedy Central announced him as Stewart’s heir.,The Viacom-owned network and Stewart have both come out in support of Noah, urging audiences to give him a chance before passing judgement on his humor."
   
response = qa_pipeline(context=context, question=question)

print(response["answer"])

the program until some time between July


In [None]:

question = "what is the date?"
context =  "Jon Stewart now has a firm departure date from Comedy Central’s \The Daily Show.\ The comic announced on Monday’s broadcast of the program that he will leave the show after its August 6th broadcast.,The disclosure paves the way for the show’s new host, Trevor Noah, and suggests that Stewart will not hang around as candidates start to make announcements this year about running in the 2016 election for U.S. President.Stewart had said previously that he would continue to do the program until some time between July and the end of 2015.,Stewart made the announcement at the very end of the evening’s broadcast, just before rolling the program’s signature \"Moment of Zen\" final segment. He offered few details about what he might do for his final broadcast, but did reiterate a contest that would give a viewer the chance to attend the program’s last taping.,Stewart’s announcement sets the stage for Trevor Noah, a South African comic who has hosted a late-night program in that country, to take the reins of the series. Noah is a relative unknown in the United States and has already come under scrutiny for a series of controversial tweets made in past years that were discovered on social media after Comedy Central announced him as Stewart’s heir.,The Viacom-owned network and Stewart have both come out in support of Noah, urging audiences to give him a chance before passing judgement on his humor."
   
response = qa_pipeline(context=context, question=question)

print(response["answer"])

the program until some time between July


In [None]:

question = "How big is Justin Bieber's dick really?"
context =  "A single question now plagues the minds of all Americans, weighing down our brains as we slump in our office chairs, then slump in our cars, then slump in our couches, and then slump into bed: how big is Justin Bieber's penis really? The swaggy lil pop star and his cavalry of minders would have us believe that Justin Bieber has a huge dickLast week, Calvin Klein released photos of Bieber modeling their underwear for a new ad campaign. One memorable shot showed off the singer's protruding package in arresting profile. Shortly after the photos hit the Internet, a web site called Breathe Heavy posted what it claimed was the same image prior to re-touching. If that claim were accurate, it would mean that Calvin Klein (well, not him personally, although maybe) had stuffed Bieber's stocking nearly to bursting. Here are the two images side-by-side:, Bieber's team immediately insisted that Breathe Heavy's photo was fake, and requested the web site take it down. Breathe Heavy complied, originally replacing the photos with an editor's note, but eventually removing the entire post altogether. In that since-deleted note, Breathe Heavy's editor seems to accept Bieber's explanation at gunpoint.,Bieber denies the photo is real, and I respect that and will believe him.,The question, therefore, is: Are the claims of the Bieber camp correct, and the photo fake?,Or did Breathe Heavy have the real photo, and capitulate in the face of legal intimidation?,(It's easy to make a case that Breathe Heavy's photos are the real deal: We know that at least one photo was significantly retouched prior to publication, as Bieber's camp did not dispute an earlier TMZ story alleging that Calvin Klein sculpted Bieber's pecs, filled out his abs and bestowed him pubes in this ad from the campaign; furthermore, virtually every celebrity photoshoot in America gets touched up at some point. Why would Bieber's dick be a grand outlier?),But in many ways this dispute is just a lead-in to an essential American question: What exactly is Bieber packing?, Let's be true detectives.,This is a screencap taken from a video of the Calvin Klein shoot that Bieber himself posted to Instagram. Here we have a direct, unaltered view of his package and can plainly see that it looks quite different than the massive knot he is sporting in the photo advertisement. Front-bulge will almost always look less impressive than side-bluge, granted, and this is a fine bulge, certainly, but one that seems far off of Calvin Klein's idealized Burmese python.,Last September, Bieber appeared onstage at the Fashion Rocks concert. For some reason he stripped down to his underwear, which produced a number of generally alarming photos such as this one.,There are a number of things we can glean from this photo. One is that Justin Bieber has muscles. Look at the strong boy! Another is that his happy trail does indeed appear to stop abruptly right about where it does in the pre-Photoshop version of the Calvin Klein shot in which a model gropes him. But because Bieber wore jet black briefs that reveal no hint of bulge, this photo doesn't help us understand how big his dick actually is.,For that, we must consult more candid shots.,In 2013, Bieber went to Hawaii and jumped off a cliff. After exiting the water, he was photographed walking on the beach, resulting in the image you see here:,This is perhaps the most revealing shot of Bieber's bulge in the wild. Does it look exceedingly large? I'd say not. In fact, it looks like any man's normal penis. Of course, it should be noted that it's unfair to judge a dick by what it looks like immediately after being submerged in the sea. However we can only work with the materials we have.,Next we will consult a Tumblr called Justin Bieber's Bulge, a blog \"dedicated to Justin Bieber's glorious, wonderful bulge,\" which is not run by me. For a Tumblr devoted to one man's dick, it's a pretty boring blog, but there is one compelling photo.,Here is a fan shot of Justin Bieber in concert, his leather drop-crotch pants dropped well below his crotch. We can see a hint of bulge, and from this angle it does not look like Justin Bieber is trying to smuggle a butternut squash through airport security, as Calvin Klein might want us to believe.,That is evidence supporting the theory that Justin Bieber is adequately endowed. Arguing in favor of Justin Bieber's alleged big dick are two people: Tati Neves, a Brazilian model, and Bieber's trainer Patrick Nilsson. These two claim to have seen Bieber's flesh in the flesh, and if we're to believe them, Calvin Klein has staked its reputation on the right massive dong.,Neves claims to have slept with Bieber during his infamous Brazilian sex romp. Here is what she told a British tabloid about Bieber's D:,Speaking to The Sun, Brazilian model Tati Nevas said: \Take it from me, he's well endowed - and very good in bed.\,Nilsson, meanwhile, was shuttled out to do damage control in the wake of the Calvin Klein Photoshop controversy. Here, according to Breathe Heavy, is his assessment,And to make up it, here's a new quote from Justin's trainer Patrick Nilsson, who says JB is packing. \I can definitely confirm that he is a well-endowed guy. I sound weird saying that, but yes.,Indeed you do.,Two people claim to have personal connections with Justin Bieber's dick and claim it is large, but one is on Justin Bieber's payroll. While we will consider their opinions, the overwhelming visual evidence suggests that Justin Bieber's penis is perfectly average—large enough to adequately fill out a pair of briefs, but not so large that it could arouse envy and terror when plastered across sprawling billboards, or choke a cow, without enhancement.,In this case, it appears, Justin Bieber is the same as any man.,Still, we don't know for sure, and here is where we turn to you, our readers: Have you ever had sex with Justin Bieber? Have you ever seen his dick? Do you know someone who has? Are you Scooter Braun? Let's settle this debate once and for all. Email me at jordan@gawker.com or leave a comment below."

response = qa_pipeline(context=context, question=question)

print(response["answer"])

enough to adequately fill out a pair


In [None]:

import pandas as pd
df_validation = pd.read_csv("validation.csv")

In [None]:
spoliers = []

for index, row in df_validation.iterrows():
    context = row["targetParagraphs"]+ str(row["targetTitle"]) +" "+ str(row["targetDescription"])
    question = row["postText"]
    response = qa_pipeline(context=context, question=question)
    spoliers.append(response["answer"])
    print(response["answer"])




version of his original vision. Five Nights at Freddy’s: Sister
one of
your
Costumes The mythology of punk music's evolution can be traced back, more
subject, and love diving into
wholeheartedly
Protein: 9 g Calcium: 39 mgThe
Transfiguration. After all, the Dark
disappointed by
It was fun, 'cause you're acting," she said. "
participating in a lingerie shoot -- or cutting out
who Obama just dined
true of every 1 in
someone unless I believe 100 percent that
her. "There are so many blessed memories here, but there comes a
human health, and they're OK
erm, exit point keeps it from making a sound
her boyfriend, according to the Sun Sentinel, which cited a police report.
had some
part of
same reason -- and that their careers may suffer as a result
out there on whether melatonin supplements are truly an
it. That was totally worthwhile
one of her sisters
clear of iOS
Primary: Mike Rounds Wins GOP Nomination
each
current roles. So the impossible relationship officially came to
Guess
version as 

In [None]:
df_validation['generated_spoiler'] = spoliers

In [None]:


pip install nltk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:

import pandas as pd
from nltk.tokenize import TreebankWordTokenizer

# Define the tokenizer function
tokenizer = TreebankWordTokenizer().tokenize

# Tokenize the generated spoilers
df_validation['tokenized_generated_spoiler'] = df_validation['generated_spoiler'].apply(tokenizer)

In [None]:
df_validation.to_csv('calculateBleu.csv', index=False)

In [None]:

import pandas as pd

# Load the dataset
df_validation = pd.read_csv("calculateBleu.csv")

In [None]:

import nltk
# Calculate the BLEU score

true_spoiler = df_validation['tokenized_spoiler'].tolist()
myoutput = df_validation['tokenized_generated_spoiler'].tolist()
bleu_score = nltk.translate.bleu_score.corpus_bleu(true_spoiler,myoutput)

print("BLEU score: ", bleu_score)

BLEU score:  1.2811120036438317e-231


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


In [None]:

pip install seqeval

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16180 sha256=820928a2b893f9c5a515421278fb41133251d698a8e52ed25a2ad2355bf819f0
  Stored in directory: /root/.cache/pip/wheels/1a/67/4a/ad4082dd7dfc30f2abfe4d80a2ed5926a506eb8a972b4767fa
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2


In [None]:

from nltk.metrics import f_measure, precision, recall

# Define true labels and predicted labels as lists of sentences
y_true = df_validation['spoiler']
y_pred = df_validation['generated_spoiler']


# Tokenize the sentences and convert them to tuples of words
tokenize = lambda sent: tuple(sent.split())
y_true_tok = [tokenize(sent) for sent in y_true]
y_pred_tok = [tokenize(str(sent)) for sent in y_pred]

# Calculate precision, recall and F1 score
precision_score = precision(set(y_true_tok), set(y_pred_tok))
recall_score = recall(set(y_true_tok), set(y_pred_tok))
f1_score = f_measure(set(y_true_tok), set(y_pred_tok))

print("Precision: {:.2f}".format(precision_score))
print("Recall: {:.2f}".format(recall_score))
print("F1 score: {:.2f}".format(f1_score))


Precision: 0.00
Recall: 0.00
F1 score: 0.00


In [None]:
pip install datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess
  Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m69.1 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.7,>=0.3.0
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
Collecting responses<0.19
  Downloading responses-0.18

In [None]:
import nltk
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [None]:

pip install bert_score

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bert_score
  Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: bert_score
Successfully installed bert_score-0.3.13


In [None]:

import pandas as pd
from bert_score import score


generated_spoilers = df_validation['generated_spoiler'].tolist()
true_spoilers = df_validation['spoiler'].tolist()

# Convert any floats to strings
generated_spoilers = [str(s) for s in generated_spoilers]
true_spoilers = [str(s) for s in true_spoilers]

# Calculate the BERT score for each pair of sentences
scores = score(generated_spoilers, true_spoilers, lang='en', verbose=False)

# Extract precision, recall, and F1 scores
precision, recall, f1 = scores

# Print average scores
print(f"BERT precision score: {precision.mean():.4f}")
print(f"BERT recall score: {recall.mean():.4f}")
print(f"BERT F1 score: {f1.mean():.4f}")

Downloading (…)lve/main/config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


BERT precision score: 0.8289
BERT recall score: 0.8141
BERT F1 score: 0.8208


In [None]:

pip install meteor_score

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement meteor_score (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for meteor_score[0m[31m
[0m

In [None]:
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [None]:
import nltk
from nltk.translate import meteor_score

# Example reference and hypothesis sentences
ref = df_validation['spoiler'].tolist()
hyp = df_validation['generated_spoiler'].tolist()

# Tokenize the reference and hypothesis sentences
ref_tokens = [nltk.word_tokenize(sentence) for sentence in ref]
hyp_tokens = nltk.word_tokenize(hyp[0])

# Calculate the METEOR score
score = meteor_score.meteor_score(ref_tokens, hyp_tokens)

print(score)


0.2757352941176471


In [None]:
import nltk

bleu_scores = []

for index, row in df_validation.iterrows():
    if row['tags'] == "phrase":
        true_spoiler = row['tokenized_spoiler']
        myoutput = row['tokenized_generated_spoiler']
        bleu_score = nltk.translate.bleu_score.sentence_bleu([true_spoiler], myoutput)
        bleu_scores.append(bleu_score)

average_bleu_score = sum(bleu_scores) / len(bleu_scores)
print(f"Average BLEU score for phrase: {average_bleu_score}")

Average BLEU score for phrase: 0.09807566784529623


In [None]:
import nltk

bleu_scores = []

for index, row in df_validation.iterrows():
    if row['tags'] == "passage":
        true_spoiler = row['tokenized_spoiler']
        myoutput = row['tokenized_generated_spoiler']
        bleu_score = nltk.translate.bleu_score.sentence_bleu([true_spoiler], myoutput)
        bleu_scores.append(bleu_score)

average_bleu_score = sum(bleu_scores) / len(bleu_scores)
print(f"Average BLEU score for passage: {average_bleu_score}")

Average BLEU score for passage: 0.11739245447531876


In [None]:
import nltk

bleu_scores = []

for index, row in df_validation.iterrows():
    if row['tags'] == "multi":
        true_spoiler = row['tokenized_spoiler']
        myoutput = row['tokenized_generated_spoiler']
        bleu_score = nltk.translate.bleu_score.sentence_bleu([true_spoiler], myoutput)
        bleu_scores.append(bleu_score)

average_bleu_score = sum(bleu_scores) / len(bleu_scores)
print(f"Average BLEU score for multi: {average_bleu_score}")

Average BLEU score for multi: 0.09795534973731813


In [None]:
import nltk
from nltk.translate import meteor_score
import re

meteor_scores = []

for index, row in df_validation.iterrows():
    if row['tags'] == "phrase":
        ref = row['spoiler']
        hyp = row['generated_spoiler']

        # Check if ref and hyp are strings
        if type(ref) != str or type(hyp) != str:
            continue

        # Check if ref and hyp are empty or have length 0
        if len(ref) == 0 or len(hyp) == 0:
            continue

        # Remove non-ascii characters and special characters
        ref = re.sub(r'[^\x00-\x7F]+',' ', ref)
        hyp = re.sub(r'[^\x00-\x7F]+',' ', hyp)
        ref = re.sub(r'[^a-zA-Z0-9\s]','', ref)
        hyp = re.sub(r'[^a-zA-Z0-9\s]','', hyp)

        ref_tokens = nltk.word_tokenize(ref)
        hyp_tokens = nltk.word_tokenize(hyp)

        score = meteor_score.meteor_score([ref_tokens], hyp_tokens)
        meteor_scores.append(score)

average_meteor_score = sum(meteor_scores) / len(meteor_scores)
print(f"Average METEOR score phrase: {average_meteor_score}")

Average METEOR score phrase: 0.04918042965371538


In [None]:
import nltk
from nltk.translate import meteor_score
import re

meteor_scores = []

for index, row in df_validation.iterrows():
    if row['tags'] == "passage":
        ref = row['spoiler']
        hyp = row['generated_spoiler']

        # Check if ref and hyp are strings
        if type(ref) != str or type(hyp) != str:
            continue

        # Check if ref and hyp are empty or have length 0
        if len(ref) == 0 or len(hyp) == 0:
            continue

        # Remove non-ascii characters and special characters
        ref = re.sub(r'[^\x00-\x7F]+',' ', ref)
        hyp = re.sub(r'[^\x00-\x7F]+',' ', hyp)
        ref = re.sub(r'[^a-zA-Z0-9\s]','', ref)
        hyp = re.sub(r'[^a-zA-Z0-9\s]','', hyp)

        ref_tokens = nltk.word_tokenize(ref)
        hyp_tokens = nltk.word_tokenize(hyp)

        score = meteor_score.meteor_score([ref_tokens], hyp_tokens)
        meteor_scores.append(score)

average_meteor_score = sum(meteor_scores) / len(meteor_scores)
print(f"Average METEOR score phrase: {average_meteor_score}")

Average METEOR score phrase: 0.051624282095863464


In [None]:
import nltk
from nltk.translate import meteor_score
import re

meteor_scores = []

for index, row in df_validation.iterrows():
    if row['tags'] == "multi":
        ref = row['spoiler']
        hyp = row['generated_spoiler']

        # Check if ref and hyp are strings
        if type(ref) != str or type(hyp) != str:
            continue

        # Check if ref and hyp are empty or have length 0
        if len(ref) == 0 or len(hyp) == 0:
            continue

        # Remove non-ascii characters and special characters
        ref = re.sub(r'[^\x00-\x7F]+',' ', ref)
        hyp = re.sub(r'[^\x00-\x7F]+',' ', hyp)
        ref = re.sub(r'[^a-zA-Z0-9\s]','', ref)
        hyp = re.sub(r'[^a-zA-Z0-9\s]','', hyp)

        ref_tokens = nltk.word_tokenize(ref)
        hyp_tokens = nltk.word_tokenize(hyp)

        score = meteor_score.meteor_score([ref_tokens], hyp_tokens)
        meteor_scores.append(score)

average_meteor_score = sum(meteor_scores) / len(meteor_scores)
print(f"Average METEOR score phrase: {average_meteor_score}")

Average METEOR score phrase: 0.03553560362052593


In [None]:
pip install bert_score

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import pandas as pd
from bert_score import score

total_precision, total_recall, total_f1, count = 0, 0, 0, 0

for index, row in df_validation.iterrows():
    if row['tags'] == "phrase":
        # Convert floats to strings if necessary
        if isinstance(row['generated_spoiler'], float):
            gen_spoiler = str(row['generated_spoiler'])
        else:
            gen_spoiler = row['generated_spoiler']
        if isinstance(row['spoiler'], float):
            spoiler = str(row['spoiler'])
        else:
            spoiler = row['spoiler']

        # Calculate the BERT score for each pair of sentences
        scores = score([gen_spoiler], [spoiler], lang='en', verbose=False)

        # Extract precision, recall, and F1 scores
        precision, recall, f1 = scores

        # Add to running total
        total_precision += precision.mean()
        total_recall += recall.mean()
        total_f1 += f1.mean()
        count += 1

if count > 0:
    # Calculate and print average scores
    avg_precision = total_precision / count
    avg_recall = total_recall / count
    avg_f1 = total_f1 / count
    print(f"BERT precision score (average): {avg_precision:.4f}")
    print(f"BERT recall score (average): {avg_recall:.4f}")
    print(f"BERT F1 score (average): {avg_f1:.4f}")
else:
    print("No rows with 'tags' equal to 'phrase' found in DataFrame")


Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaM

BERT precision score (average): 0.8270
BERT recall score (average): 0.8184
BERT F1 score (average): 0.8216


In [None]:
import pandas as pd
from bert_score import score

total_precision, total_recall, total_f1, count = 0, 0, 0, 0

for index, row in df_validation.iterrows():
    if row['tags'] == "passage":
        # Convert floats to strings if necessary
        if isinstance(row['generated_spoiler'], float):
            gen_spoiler = str(row['generated_spoiler'])
        else:
            gen_spoiler = row['generated_spoiler']
        if isinstance(row['spoiler'], float):
            spoiler = str(row['spoiler'])
        else:
            spoiler = row['spoiler']

        # Calculate the BERT score for each pair of sentences
        scores = score([gen_spoiler], [spoiler], lang='en', verbose=False)

        # Extract precision, recall, and F1 scores
        precision, recall, f1 = scores

        # Add to running total
        total_precision += precision.mean()
        total_recall += recall.mean()
        total_f1 += f1.mean()
        count += 1

if count > 0:
    # Calculate and print average scores
    avg_precision = total_precision / count
    avg_recall = total_recall / count
    avg_f1 = total_f1 / count
    print(f"BERT precision score (average): {avg_precision:.4f}")
    print(f"BERT recall score (average): {avg_recall:.4f}")
    print(f"BERT F1 score (average): {avg_f1:.4f}")
else:
    print("No rows with 'tags' equal to 'phrase' found in DataFrame")

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaM

BERT precision score (average): 0.8314
BERT recall score (average): 0.8177
BERT F1 score (average): 0.8243


In [None]:
import pandas as pd
from bert_score import score

total_precision, total_recall, total_f1, count = 0, 0, 0, 0

for index, row in df_validation.iterrows():
    if row['tags'] == "multi":
        # Convert floats to strings if necessary
        if isinstance(row['generated_spoiler'], float):
            gen_spoiler = str(row['generated_spoiler'])
        else:
            gen_spoiler = row['generated_spoiler']
        if isinstance(row['spoiler'], float):
            spoiler = str(row['spoiler'])
        else:
            spoiler = row['spoiler']

        # Calculate the BERT score for each pair of sentences
        scores = score([gen_spoiler], [spoiler], lang='en', verbose=False)

        # Extract precision, recall, and F1 scores
        precision, recall, f1 = scores

        # Add to running total
        total_precision += precision.mean()
        total_recall += recall.mean()
        total_f1 += f1.mean()
        count += 1

if count > 0:
    # Calculate and print average scores
    avg_precision = total_precision / count
    avg_recall = total_recall / count
    avg_f1 = total_f1 / count
    print(f"BERT precision score (average): {avg_precision:.4f}")
    print(f"BERT recall score (average): {avg_recall:.4f}")
    print(f"BERT F1 score (average): {avg_f1:.4f}")
else:
    print("No rows with 'tags' equal to 'phrase' found in DataFrame")

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaM

BERT precision score (average): 0.8278
BERT recall score (average): 0.7962
BERT F1 score (average): 0.8113
