#Text Generation Using Hugging Face

Importing Necessary Libraries

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, confusion_matrix, roc_auc_score

from transformers import pipeline

import torch

Importing the dataset for Text Generation

In [2]:
peoms = pd.read_csv('robert_frost_collection.csv')
peoms.head(5)

Unnamed: 0,Name,Content,Collection,Year of Publication
0,,,,
1,Stopping by Woods on a Snowy Evening,Whose woods these are I think I know. \nHis ...,New Hampshire,1923.0
2,Fire and Ice,"Some say the world will end in fire,\nSome say...",New Hampshire,1923.0
3,The Aim was Song,Before man came to blow it right\nThe wind onc...,New Hampshire,1923.0
4,The Need of Being Versed in Country Things,The house had gone to bring again\nTo the midn...,New Hampshire,1923.0


In [4]:
content=peoms['Content'].dropna().tolist()

In [5]:
lines = []
for peom in content:
  for line in peom.split('\n'):
    lines.append(line.rstrip())

In [6]:
lines = [line for line in lines if len(line) > 0]
lines[:5]

['Whose woods these are I think I know.',
 'His house is in the village though;',
 'He will not see me stopping here',
 'To watch his woods fill up with snow.',
 'My little horse must think it queer']

In [7]:
gen = pipeline('text-generation')

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


In [8]:
lines[0]

'Whose woods these are I think I know.'

Generating the text by using the maxlength upto

In [10]:
gen(lines[0],max_length = 20)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'Whose woods these are I think I know. And how many of them are there? I don\'t know. I\'m sure they\'re all there. They\'re all there. And if they\'re not there, it\'s kind of like the people that say that they\'re out there trying to save the world."\n\nAnd he says it\'s a world that\'s "still very much alive in the form of my son."\n\n"We\'re still very much alive in the form of my son," he says, "and I always say that we\'re still alive, but I guess it\'s a little bit ironic that I\'m saying it in this context and I don\'t always get the benefit of the doubt. But I\'m saying that we\'re still alive for the purposes of the whole world."'}]

BY number of texts to generate

In [11]:
gen(lines[1],max_length = 20, num_return_sequences = 2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "His house is in the village though; I think he is from another country though; I don't think he is from there yet. I think he might have a good career there. He is a really good guy. But I am just trying to be helpful and kind.\n\nDid he know you were coming back to visit him?\n\nNo. I had no idea he was back. He has no idea. I am surprised by the lack of interest in this case. I don't think he really knows where he's headed.\n\nDid you know he was on his way to Germany when he was arrested?\n\nNo. He was supposed to go to Berlin. I think he was just going to Europe to meet all the people he had met there. I don't know that he had been here much. He was at the train station. He was in the room where he did the interview and then he went to the train station. He didn't have a ticket.\n\nDid he know that you were coming back?\n\nNo. He was going to Germany because he was trying to meet some people. We don't know how he got here. I am amazed at his ignorance.\n\nDo yo

In The output we got 2 text generation with token also

In [12]:
import textwrap
def wrap(x):
  return textwrap.fill(x, replace_whitespace=False, fix_sentence_endings=True)

In [15]:
out = gen(lines[0],max_length = 30)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Whose woods these are I think I know.  What makes me sad is I don't
know.  I don't know if I am a human, a creature of life, or a machine.
I'm just a person."

The movie, directed by Tim Burton, stars Benedict
Cumberbatch, Michael Caine, Ben Stiller, Jennifer Connelly, Bradley
Cooper, Tessa Thompson, Josh Brolin, Michael Keaton, James Cromwell,
and James Marsden.  It was released on June 22.


Taking my prompt as a input

In [18]:
prompt="transformers have a wide variety of applications in nlp"
out=gen(prompt,max_length=50)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


transformers have a wide variety of applications in nlp.h and npx.h.
NTP and NFS are very similar to the way npx is built up in ntp.  In
npx, we have two types of npfs.  The first is npfs.pfs.  It's an array
of files and directories that contains the contents of all the NTP
files that will be transmitted.  The second type of npfs is
npfs.pfs.bin.  It has a bitmap that contains the PFS-like files.  The
files may contain a number of different headers, and they all have a
bitmap value in the end.  The bitmap is used to identify the NTP
headers with the most significant bits.

With each header you can see
which files are being read from the file.  The bitmap is used to
identify the NTP headers that are being read from the file.  With each
file, you can see the length and size of the file.

NTP is a bitmap
that contains the PFS-like files.  Each file contains a bitmap value.
The bitmap is used to identify the NTP headers that are being read
from the file.

The bitmap is used to identify the