In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, confusion_matrix,roc_auc_score

from transformers import pipeline

import torch

In [None]:
poems = pd.read_csv('robert_frost_collection.csv')
poems.head()

Unnamed: 0,Name,Content,Collection,Year of Publication
0,,,,
1,Stopping by Woods on a Snowy Evening,Whose woods these are I think I know. \nHis ...,New Hampshire,1923.0
2,Fire and Ice,"Some say the world will end in fire,\nSome say...",New Hampshire,1923.0
3,The Aim was Song,Before man came to blow it right\nThe wind onc...,New Hampshire,1923.0
4,The Need of Being Versed in Country Things,The house had gone to bring again\nTo the midn...,New Hampshire,1923.0


In [None]:
content = poems["Content"].dropna().tolist()

In [None]:
lines = []
for poem in content:
  for line in poem.split('\n'):
    lines.append(line.rstrip())

In [None]:
lines = [line for line in lines if len(line)>0]
lines[:5]

['Whose woods these are I think I know.',
 'His house is in the village though;',
 'He will not see me stopping here',
 'To watch his woods fill up with snow.',
 'My little horse must think it queer']

In [None]:
gen = pipeline('text-generation', model='gpt2')

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


In [None]:
lines[0]

'Whose woods these are I think I know.'

In [None]:
gen(lines[0],max_length=20)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "Whose woods these are I think I know. I also don't think there's anything in the woods which is the same as I believe. I will tell you that I'm going to make the best judgement I can on this topic.I think it's good to have some time to think. I mean I'm not going to sit on the couch and pretend I'm not going to do something that you're not going to want to do. And there are things in the woods that are not what I want to do. I want to be able to go to the grocery store and buy stuff from the store. You know what I mean? I want to look at my phone and see where it's gone and what it is that I want to buy, do my research and see what it is.I want to be able to do something. There are things that I've been missing in my life that I want to do. I want to be able to make a difference for the people that I love and have made a difference for them in this world. I want to make a difference and do something for them that I can't do for myself.I don't want to be making a hu

In [None]:
gen(lines[1], max_length=30,num_return_sequences =2) #we are generating text for line 1

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'His house is in the village though; I am only in it for the sake of my children, who have been sent to school.\n\n"I am sorry to hear of your loss; it is my fault I should have been sent to school.\n\n"I could not have done more but to send my daughter to school.\n\n"I have been given a new name for the school, as I have written the name of my daughter for a girl who has been sent to school as well as for the boys.\n\n"I have had no difficulties in getting my daughter to school.\n\n"It is my fault I should have been sent to school for something; I did not know what it would be.\n\n"I am sorry I have not sent my daughter to school.\n\n"I am sorry to hear of my daughter\'s loss. It is my fault I should have been sent to school for something, but I have not been able to do so.\n\n"My daughter doesn\'t look too well and I have been told she needs a little extra support.\n\n"I am sorry I have not sent my daughter to school.\n\n"I am sorry to hear of the loss of my daugh

In [None]:
#remove the whitespace from the text for the 2 sequence we generated
for i in gen(lines[1], max_length=30,num_return_sequences =2):
  print(i['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


His house is in the village though; a small building is next to it, and there's a large courtyard. It's a rather small place, and it's quite old even today.

"Ah, my dear, I'm glad to see you, but I'm sure I'll find a lot of trouble if I don't try to enter the house."

"This is not a big town, it's just a small village. What's your name?"

"Rukon-kun."

"Rukon-kun is a little girl called Maori-chan. She's a really nice person, and we've been talking all this time. When I met her, she was just a kid, but now she's an adult."

"You mean, I'm thinking that she's really good?"

"I'm thinking she's really good. I think it's that she's not as easy to talk to as her older sister. And my room's not very big, so you're not really used to sitting in front of me."

"No. I know that. It's just that I'm a little clumsy at the moment, but I don't like to get in trouble and think I'm a little clumsy. So you're
His house is in the village though; it was a great deal better back then. The town still ha

In [None]:
import textwrap
def wrap(x):
    return textwrap.fill(x,replace_whitespace=False, fix_sentence_endings=True)

In [None]:
out = gen(lines[0],max_length=30)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Whose woods these are I think I know.  The first time I saw anything
like the Redneck is when she was out for a stroll in the woods of the
hills in South Dakota.  I was watching her when she came back to the
house with her young son.  When she came back she looked like she was
dead.  So I thought, "Wow, this is the first time I've really seen
her.  I can't believe she's gone.  I want to help her out."  And he
did, and what a wonderful thing that was.

So I came home, went into
the woods and they showed me the pictures.  I was so excited to go
through them.  I was so excited to go down on my knees and watch her.
I think about that because she's such a beautiful little girl.  She's
so small, but she's so beautiful.

Her hair is so long, but her face
is so light.  She looks like a little girl in the forest.  She's so
bright.  And she's so big, like she never leaves the house.  I think
it was just as beautiful as the man who did that.

How did you start
to see the Redneck?

I'm very curiou

In [None]:
prompt ='Transformers have a wide variet of application in nlp'
out =gen(prompt, max_length=60, num_return_sequences=3)
for i in out:
  print(wrap(i['generated_text']))
  print()

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Transformers have a wide variet of application in nlp4 or nlp5.

They
have a wide variet of application in nlp4 or nlp5. They have a wide
variet of application in nlp4, nlp5, or nlp6, depending on what the
application type is.

They have a wide variet of application in nlp4,
nlp5, or nlp6, depending on what the application type is.  They have a
wide variet of application in nlp4, nlp5, or nlp6, depending on what
the application type is.

They have a wide variet of application in
nlp4, nlp5, or nlp6, depending on what the application type is.  They
have a wide variet of application in nlp4, nlp5, or nlp6, depending on
what the application type is.

They have a wide variet of application
in nlp4, nlp5, or nlp6, depending on what the application type is.
They have a wide variet of application in nlp4, nlp5, or nlp6,
depending on what the application type is.

They have a wide variet of
application in nlp4,

Transformers have a wide variet of application in nlp and nlp2. For
example, the f