## Generatiivinen tekstimalli
Tekstiaineistoksi valitsimme kaikki kolme Lord of The Rings -kirjasarjan kirjaa englannin kielellä.
Aineistoksi olisi varmasti riittänyt yksikin kirja, tai pieni osa siitä, mutta halusimme selivttää, vaikuttaako aineiston laajuus tekstin järkevyyteen.

In [1]:
# Cell 1: Setup and imports
import os
os.environ['KERAS_BACKEND'] = 'tensorflow'  # You can change to 'jax' or 'torch' if preferred
import tensorflow as tf
import numpy as np
import keras
import sentencepiece as spm

print(f"Keras version: {keras.__version__}")
print(f"Keras backend: {keras.config.backend()}")
print(f"Tensorflow version: {tf.__version__}")
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Keras version: 3.9.2
Keras backend: tensorflow
Tensorflow version: 2.19.0
Num GPUs Available:  0


### Ladataan malli ja generoidaan teksti

Malli on koulutettu eri notebookissa ja tallennettu.

Ladataan malli

In [3]:
# Load saved model
model = keras.models.load_model('LotR_trilogy_best_model.keras')
# Load SentencePiece tokenizer
sp = spm.SentencePieceProcessor()
sp.load('Lotr_trilogy_sp.model')

True

Tekstin generointifunktio

In [4]:
seq_length = 64
def generate_text(model, sp, prompt, num_tokens=100, temperature=1.0):
    """Generate text based on a prompt with proper lowercase handling."""
    # Convert prompt to lowercase to match training data
    lowercase_prompt = prompt#.lower()

    # Encode the prompt
    input_ids = sp.encode_as_ids(lowercase_prompt)

    # Rest of your generation code stays the same...
    if len(input_ids) < seq_length:
        padding_length = seq_length - len(input_ids)
        input_ids = [0] * padding_length + input_ids
    else:
        padding_length = 0
        input_ids = input_ids[-seq_length:]

    # Generated tokens
    generated_ids = list(input_ids[padding_length:])

    # Generate text token by token
    for _ in range(num_tokens):
        x = np.array([input_ids])
        predictions = model.predict(x, verbose=0)[0]
        logits = predictions[-1]
        logits = logits / temperature
        exp_logits = np.exp(logits - np.max(logits))
        probs = exp_logits / np.sum(exp_logits)
        next_token = np.random.choice(len(probs), p=probs)
        generated_ids.append(next_token)
        input_ids = input_ids[1:] + [next_token]

    # Decode the generated sequence
    generated_text = sp.decode(generated_ids)

    return generated_text

Generoidaan hieman tekstiä

In [None]:
# Cell 10: Generate sample text
prompts = [
    "Gandalf was feeling depressed and alone. He had lost his staff. He had lost his way. He had lost his mind.",
    "Frodo was drinking coffee with Sam. They were discussing the weather. It was a sunny day.",
    "Aragorn was in the woods. He was hunting for orcs. He was hungry and tired.",
]

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    generated = generate_text(model, sp, prompt, num_tokens=100, temperature=1.2)
    print(generated)


Prompt: Gandalf was feeling depressed and alone. He had lost his staff. He had lost his way. He had lost his mind.
Gandalf was feeling depressed and alone. He had lost his staff. He had lost his way. He had lost his mind. The ring now went off again, and he was now feeling very wretched. He was hard and rather dismal. He missed Pippin, and felt sleep they made their way. All hours passed without being followed at night! We've got to begin with this laid hold of the Ring, and you know the way well. I wonder what sort of dreams they are having.' They went round to the other side. They had not long to breathe, and yet passing away beyond his sight into the hollow,

Prompt: Frodo was drinking coffee with Sam. They were discussing the weather. It was a sunny day.
Frodo was drinking coffee with Sam. They were discussing the weather. It was a sunny day. The was a flood. 'It may be helped,' said Gandalf. 'We shall need your help, and the help of anything that will not be set in this place. An

Generoitu teksti ei muodosta kovin järkevää tarinaa, mutta viereiset lauseet näyttävät usein liittyvän toisiinsa jonkin verran.

Koko kirjasarjan syöttö ei ilmeisesti vaikuttanut tarpeeksi.