# TEXT GENERATION

**1: Install Transformers Library**

Installs the Hugging Face transformers library, which provides pre-trained models for a variety of natural language processing (NLP) tasks including text generation.

In [1]:
!pip install transformers



**2:  Import Required Libraries**


Imports essential Python libraries such as pandas, numpy, torch, matplotlib, and seaborn, along with Hugging Face's pipeline for text generation.

In [2]:
import pandas as pd
import numpy as np
from transformers import pipeline
import torch
import matplotlib.pyplot as plt
import seaborn as sns

**3: Load Dataset**

Loads a CSV file containing poems/texts into a pandas DataFrame. This dataset is used as the input for text generation experiments.

In [3]:
poems = pd.read_csv('/content/robert_frost_collection.csv')
poems.head(5)

Unnamed: 0,Name,Content,Collection,Year of Publication
0,,,,
1,Stopping by Woods on a Snowy Evening,Whose woods these are I think I know. \nHis ...,New Hampshire,1923.0
2,Fire and Ice,"Some say the world will end in fire,\nSome say...",New Hampshire,1923.0
3,The Aim was Song,Before man came to blow it right\nThe wind onc...,New Hampshire,1923.0
4,The Need of Being Versed in Country Things,The house had gone to bring again\nTo the midn...,New Hampshire,1923.0


**4: Clean and Preprocess Poem Lines**

Cleans the text data by splitting poems into individual lines and filtering out empty lines. Prepares the content for line-by-line text generation.

In [4]:
content = poems['Content'].dropna().tolist()

In [7]:
lines = []
for poem in content:
 for line in poem.split("\n"):
  lines.append(line.rstrip())

In [8]:
lines = [line for line in lines if len(line) >0]
lines[:5]

['Whose woods these are I think I know.',
 'His house is in the village though;',
 'He will not see me stopping here',
 'To watch his woods fill up with snow.',
 'My little horse must think it queer']

**5: Initialize Text Generation Pipeline**

Initializes the Hugging Face text generation pipeline using a default model (e.g., GPT-2). This model will be used to generate new text based on the input lines.

In [37]:
gen = pipeline('text-generation')

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [38]:
lines[0]

'Whose woods these are I think I know.'

**6: Generate Text for Sample Lines**

Uses the pipeline to generate short text continuations (with max_length=20) for the first two lines of the dataset. Demonstrates basic functionality.

In [39]:
gen(lines[0], max_length = 20)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'Whose woods these are I think I know. I don\'t know if I\'ve ever heard of one of them. I guess maybe it\'s just the shape of the trees. I guess I\'ve just got to get my hands on one or two of them. I mean, there\'s this one. It\'s a little long. I think I\'ve got it on, I guess. I guess the name\'s just, like, "Budman." I think it\'s a little long. I mean, I\'ve never seen it that long. I think it\'s a little too long. It\'s like a big, white thing. Like, the one in the upper left. I know the name\'s a little bit long on the back, I know the name\'s a little bit long on the front. I know it\'s a little bit long. "Budman" is just like, "Budman." It\'s like, "Budman!" But I don\'t know how long it\'s been, I just don\'t think it\'s been long enough to go to the store. I have a good idea what that is. I\'ve never seen it that long, but I\'m just like, "Budman." So when I see it, I just like it. I don\'t know'}]

In [40]:
gen(lines[1], max_length = 20)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'His house is in the village though; and there is a place for the people to go to, where the people can have a meal, and drink and eat.\n\n22 And they came to the house of the sons of Gai, and they were going to the village, and there was Gai there, and the sons of Gai went unto them and smote them, and they said to Gai, "Why do you say that he does not bring food for us?" And Gai answered, "I will bring food from him, and we will not be hungry." And Gai said, "I will not give you food of any kind to eat, for I am not a man of any gods, and I do not know who or what you are, but I have heard that the son of Gai was able to accomplish some great thing." And Gai went to his wife and said to her, "I have heard that the son of Gai lived for a long time and suffered for his people, and he was able to accomplish some great thing; and when he was able to accomplish some great thing he did not eat, and went to another man, and there he came to the village, and he said unto 

**7: Generate and Display Wrapped Output**

Defines a helper function using Python’s textwrap module and generates a longer text continuation (with max_length=30) for the first poem line and prints it using the wrapping function for better formatting.

In [41]:
import textwrap
def wrap(x):
  return textwrap.fill(x, replace_whitespace=False, fix_sentence_endings= True)

In [42]:
out = gen(lines[0], max_length = 30)
print (wrap (out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Whose woods these are I think I know.  And he would like to see them,
and see where he goes, and see what he comes back with.  And he has a
lot of problems with the people he's supposed to help, and he's not
the guy that's supposed to lead you to the end.  But it's also a real
problem for me.  I know how to do all of this, and I've been doing it
all my life, but I have so many problems with people, and I've been
doing it all my life, and I've been doing it all my life with a lot of
different people.  And so I just want to get those problems to a point
where I can actually get out, and to put a stop to them.  It's like
I'm the one that's going to do it, and I'm going to do it all at once;
I'm going to do it all at once.


I don't have to ask about
everything, it's just, and it's going to be a very, very hard job for
me to do it all at once.  I just like to do it all at once, and I know
that I don't have to ask about everything.  So I do all of this
through work, through all of this, and

**8: Custom Prompt Text Generation**

Demonstrates text generation using a custom prompt. Generates a longer continuation (up to 100 tokens) to show how the model handles novel inputs.

In [45]:
prompt = "transformers have a wide variet of applications in nlp"
out= gen(prompt, max_length = 100)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


transformers have a wide variet of applications in nlp.  In this post,
we'll look at two popular implementations of the NLP library:

The
first implementation of the NLP library uses a standard library called
the NLP_NPPARSE. This is used by a number of popular programs such as
the NLP_NPPARSE and NLP_NPPARSE. The second implementation uses a
standard library called the NLP_NPPARSE_ENV. The NLP_NPPARSE_ENV is a
wrapper around the NLP_NPPARSE, which implements functions that return
different values for different input values.  In this post, we will
assume that the NLP_NPPARSE_ENV is the same as the
NLP_NPPARSE_CURRENT_HANDLER.

The following code snippet shows how to
use the NLP library to build a new string that is converted to a
string of integer characters.

// create string from string var string
= 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' // convert
string to string var string = [ 'abcdefghijklmnopqrstuvw
