# Importing Libraries

In [None]:
import nltk

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

In [2]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# STOPWORDS

In [3]:
sample_text = 'Generative AI uses algorithms to produce new and unique content, like text, images, and more.'

# will make tokens of sample_text
tokens = word_tokenize(sample_text)
tokens

['Generative',
 'AI',
 'uses',
 'algorithms',
 'to',
 'produce',
 'new',
 'and',
 'unique',
 'content',
 ',',
 'like',
 'text',
 ',',
 'images',
 ',',
 'and',
 'more',
 '.']

In [5]:
# will stop words if repeated
stop_words= set(stopwords.words('english'))

In [6]:
filtered_text = [word for word in tokens if word.lower() not in stop_words]
filtered_text

['Generative',
 'AI',
 'uses',
 'algorithms',
 'produce',
 'new',
 'unique',
 'content',
 ',',
 'like',
 'text',
 ',',
 'images',
 ',',
 '.']

In [7]:
final_text = ' '.join(filtered_text)
final_text

'Generative AI uses algorithms produce new unique content , like text , images , .'

# Stemming

In [8]:
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
import time

ps=PorterStemmer()

In [10]:
text = 'Generative AI uses algorithms producing new and unique content, like text, images, and more.'

tokens = word_tokenize(text)

# will perform stemming on each word
stemm = [ps.stem(word) for word in tokens]
final = ' '.join(stemm)
final

'gener ai use algorithm produc new and uniqu content , like text , imag , and more .'

# Regex

In [11]:
import re 

In [12]:
def regex_magic(line):
    line = re.sub(r'[^\w\s]','',line)
    line = re.sub(r'\d+','',line)
    
    return line

line = 'hello!, these are numbers 12345 and these are not @#$%^&*'
print(regex_magic(line))

hello these are numbers  and these are not 


# TEXT GENERATION

# Greedy Sampling

In [None]:
!pip install torch
!pip install torch torchvision torchaudio
!pip install transformers
!pip install --upgrade torch transformers

In [15]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Input text
input_text = 'THIS IS GENERATIVE AI'
input_ids = tokenizer.encode(input_text, return_tensors='pt')

In [16]:
# Generate text
output = model.generate(input_ids, max_length=50, do_sample=False)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print('\n\n\n', generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.





 THIS IS GENERATIVE AI.

The AI is not a "human" or "computer" but a "computer" that is capable of doing things that humans cannot.

The AI is not a "human" or "computer"


# Beam Search

In [17]:
output = model.generate(input_ids, max_length=50, num_beams=5, early_stopping=True)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print('\n\n\n', generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.





 THIS IS GENERATIVE AI.

I'm not saying this is a bad thing, but I'm not saying it's a bad thing.

I'm not saying this is a bad thing.

I'm not saying this is


# Top-K Sampling

In [18]:
output = model.generate(input_ids, max_length=100, do_sample=True, top_k=50)
print('\n\n\n', tokenizer.decode(output[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.





 THIS IS GENERATIVE AI!

I'm using this language to test whether or not this code works well. If yes, use it before you start writing or reviewing other code, though. If no, use it later. I just tried this:

This is a good idea because you shouldn't rely on Python 2.7 on your machine. It should work well on anything that relies on Python 2.0. For example this code is like:

Here, the script


# Top-P Sampling 

In [19]:
output = model.generate(input_ids, max_length=100, do_sample=True, top_p=0.92)
print('\n\n\n', tokenizer.decode(output[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.





 THIS IS GENERATIVE AI


If you have heard of the game "The Lord of the Rings", then you probably already know what the game is. It is set in the world of Middle-earth and contains elements of Tolkien's epic fantasy. It was first published in 1995 and has been played by over 200,000 people. Since then, numerous sequels have been developed. A third game, the next in the series "A World Without End" was written in 2007.


The
