# GENERATIVE-TEXT-MODEL

This notebook demonstrates how to generate coherent paragraphs on specific topics using GPT-2 (a transformer-based model) or an LSTM-based model.
We use the Hugging Face Transformers library for GPT-2 and TensorFlow/Keras for a simple LSTM example.

### 1. Install Required Libraries

In [14]:
!pip install transformers tensorflow --quiet

### 2. GPT-2 Text Generation Demo

In [15]:
from transformers import pipeline

def gpt2_generation(prompt, max_length=200, num_return_sequences=1):
    """
    Generates text using a pretrained GPT-2 model.
    Args:
        prompt (str): Input text prompt.
        max_length (int): Maximum length of generated text.
        num_return_sequences (int): Number of variations to generate.
    Returns:
        list[str]: Generated text sequences.
    """
    generator = pipeline('text-generation', model='gpt2')
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return [out['generated_text'] for out in outputs]

In [16]:
prompt = "The future of artificial intelligence in healthcare is"
generated = gpt2_generation(prompt, max_length=150)
print("GPT-2 Generated Text:\n", generated[0])

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


GPT-2 Generated Text:
 The future of artificial intelligence in healthcare is still far from clear. There are some hints there are a few promising models, but at the moment it's far too early to talk about them with certainty, or make predictions about them with confidence that anything new will ever happen. But at the moment it still appears that the next big question will be that which future of artificial intelligence has the most potential, and which should be expected fairly soon, and which is what we're looking at next.

It would probably be a useful analogy to a person in a game, who knows what happens next in the future, but who is most capable of answering the question of (should) the future of artificial intelligence. If everyone in the game is capable of answering


### 3. LSTM-based Text Generation Demo

In [19]:
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
import numpy as np

# Sample corpus
corpus = [
    "Natural language processing enables computers to understand human language.",
    "Deep learning methods have revolutionized AI research.",
    "Transformers outperform RNNs on many NLP tasks."
]

# Tokenization
tokenizer = tf.keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(corpus)
sequences = []
for line in corpus:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_seq = token_list[:i+1]
        sequences.append(n_gram_seq)

# Pad sequences and create predictors & label
data = pad_sequences(sequences, padding='pre')
X, labels = data[:,:-1], data[:,-1]
y = tf.keras.utils.to_categorical(labels, num_classes=len(tokenizer.word_index)+1)

# Build LSTM Model
def build_lstm(vocab_size, seq_len):
    model = Sequential()
    model.add(Embedding(vocab_size, 50, input_length=seq_len))
    model.add(LSTM(100))
    model.add(Dense(vocab_size, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    return model

vocab_size = len(tokenizer.word_index) + 1
seq_len = X.shape[1]
model = build_lstm(vocab_size, seq_len)

# Train model
epochs = 100
model.fit(X, y, epochs=epochs, verbose=0)

# Text generation with LSTM
def generate_text_lstm(seed_text, next_words, max_seq_len):
    """
    Generates text using the trained LSTM model.
    Args:
        seed_text (str): Initial text to start generation.
        next_words (int): Number of words to generate.
        max_seq_len (int): Maximum sequence length for padding.
    Returns:
        str: Generated text.
    """
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_seq_len, padding='pre')
        predicted = model.predict(token_list, verbose=0)
        predicted_word_index = np.argmax(predicted, axis=1)[0]
        output_word = ''
        for word, index in tokenizer.word_index.items():
            if index == predicted_word_index:
                output_word = word
                break
        seed_text += ' ' + output_word
    return seed_text

# Example LSTM generation
generated_lstm = generate_text_lstm("Transformers", next_words=10, max_seq_len=seq_len)
print("LSTM Generated Text:\n", generated_lstm)


LSTM Generated Text:
 Transformers outperform rnns on many nlp tasks tasks tasks tasks tasks


### 4. User prompt-driven generation function

In [22]:
def generate_on_topic(topic, model_type='gpt2', **kwargs):
    """
    Generate a paragraph on a specific topic using selected model.
    Args:
        topic (str): The topic prompt.
        model_type (str): 'gpt2' or 'lstm'.
    Returns:
        str: Generated paragraph.
    """
    if model_type == 'gpt2':
        return gpt2_generation(topic, **kwargs)[0]
    elif model_type == 'lstm':
        # choose a fixed small number of words
        return generate_text_lstm(topic, next_words=50, max_seq_len=seq_len)
    else:
        raise ValueError("Unsupported model type: choose 'gpt2' or 'lstm'.")

# Demo on user prompt
user_topic = "Climate change and sustainable energy"
print("Generated Paragraph on User Topic (GPT-2):\n", generate_on_topic(user_topic, model_type='gpt2', max_length=200))

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Paragraph on User Topic (GPT-2):
 Climate change and sustainable energy management, both in the form of carbon taxes and other mitigation policies, have all benefited local energy companies and helped create jobs. They are also supporting growth in renewable technologies, which enable local businesses to create, store and charge their own energy resources.

The impact of these three measures is vast. In 2013, the Australian Energy Market Operator (AEMO) registered a gain of 5.6%. However, in 2014-2015, the AEMO registered a loss of 6.4%.

Australian Energy Market Operator's impact on regional business, particularly coal mining, is a complex task. While Australian businesses are increasingly involved in international trade, they are constrained by Australian regulations. This means international competition and industry must be held accountable.
