**Projet GENERATION : Base de données de tweets de Trump** :

Ce notebook regroupe plusieurs étapes clés :

1. **Téléchargement et prétraitement des tweets**  
   - Récupération des données à partir d'un fichier CSV.
   - Nettoyage des tweets (normalisation Unicode, suppression des URLs, etc.) et ajout de tokens spéciaux.

2. **Entraînement du tokenizer SentencePiece (BPE)**  
   - Création d'un modèle BPE sur le corpus de tweets nettoyé.

3. **Encodage du corpus et préparation des données**  
   - Transformation du texte en séquences d'IDs.
   - Découpage du corpus en chunks pour l'entraînement.

4. **Construction et entraînement du modèle Transformer**  
   - Définition d'une architecture Transformer avec masque causal.
   - Entraînement du modèle sur les données préparées.

5. **Génération de texte et évaluation BLEU**  
   - Génération de tweets via différentes méthodes (sampling, beam search, etc.).
   - Évaluation de la qualité de la génération à l'aide du score BLEU.


#1. Téléchargement et Prétraitement des Tweets

Nous téléchargeons le CSV depuis Google Drive, lisons le contenu, nettoyons les tweets et ajoutons des tokens spéciaux.

In [None]:

import os
import re
import csv
import numpy as np
import unicodedata  # Pour la normalisation Unicode

import tensorflow as tf
from tensorflow import keras
import sentencepiece as spm

# Téléchargement du CSV depuis Google Drive
csv_url = "https://drive.google.com/uc?export=download&id=1s1isv9TQjGiEr2gG__8bOdBFvQlmepRt"
path_to_file = keras.utils.get_file("realdonaltrump.csv", csv_url)
print("CSV file downloaded to:", path_to_file)

# %%
# Lecture du CSV
raw_tweets = []
with open(path_to_file, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        raw_tweets.append(row['content'])

print("Number of tweets loaded:", len(raw_tweets))
print("First 5 tweets:\n", raw_tweets[:5], "\n")

# %%
# Prétraitement et nettoyage des tweets
BOS_TOKEN = "[BOS]"
EOS_TOKEN = "[EOS]"
PAD_TOKEN = "[PAD]"

def clean_tweet(tweet):
    """
    Nettoie un tweet en :
      - Normalisant les caractères Unicode.
      - Ignorant les tweets avec des URLs de médias.
      - Retirant les URLs et les espaces superflus.
      - Retirant '#' des hashtags (conservant le texte).
      - Préservant les tokens spéciaux ([BOS], [EOS], [PAD]).
      - Normalisant les espaces.
    """
    tweet = unicodedata.normalize("NFKC", tweet)
    if "pic.twitter.com" in tweet:
        return None
    tweet = re.sub(r"http\S+", "", tweet)
    tweet = re.sub(r"@\s+(\w+)", r"@\1", tweet)
    tweet = tweet.replace("#", "")
    for token in [BOS_TOKEN, EOS_TOKEN, PAD_TOKEN]:
        tweet = tweet.replace(token, f" {token} ")
    tweet = re.sub(r"[^a-zA-Z0-9@#\[\]\s\.\,\!\?\(\)]", "", tweet)
    tweet = re.sub(r"\s+", " ", tweet).strip()
    return tweet

tweets_cleaned = [clean_tweet(tw) for tw in raw_tweets]
tweets_cleaned = [tw for tw in tweets_cleaned if tw is not None]
print("Number of cleaned tweets:", len(tweets_cleaned))
print("First 5 cleaned tweets:\n", tweets_cleaned[:5], "\n")

# %%
# Écriture des tweets nettoyés dans un fichier texte
text_file = "tweets.txt"
with open(text_file, "w", encoding="utf-8") as f:
    for tweet in tweets_cleaned:
        tw_clean = tweet.replace("\n", " ").replace("\r", " ").strip()
        f.write(BOS_TOKEN + tw_clean + EOS_TOKEN + "\n")

# %%
# Concaténation des tweets pour l'entraînement de SentencePiece
all_text = ""
for tw in tweets_cleaned:
    tw_clean = tw.replace("\n", " ").replace("\r", " ").strip()
    all_text += f"{BOS_TOKEN} {tw_clean} {EOS_TOKEN} "


Downloading data from https://drive.google.com/uc?export=download&id=1s1isv9TQjGiEr2gG__8bOdBFvQlmepRt
[1m11331653/11331653[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
CSV file downloaded to: /root/.keras/datasets/realdonaltrump.csv
Number of tweets loaded: 43352
First 5 tweets:
 ['Be sure to tune in and watch Donald Trump on Late Night with David Letterman as he presents the Top Ten List tonight!', 'Donald Trump will be appearing on The View tomorrow morning to discuss Celebrity Apprentice and his new book Think Like A Champion!', 'Donald Trump reads Top Ten Financial Tips on Late Show with David Letterman: http://tinyurl.com/ooafwn - Very funny!', 'New Blog Post: Celebrity Apprentice Finale and Lessons Learned Along the Way: http://tinyurl.com/qlux5e', '"My persona will never be that of a wallflower - I’d rather build walls than cling to them" --Donald J. Trump'] 

Number of cleaned tweets: 39667
First 5 cleaned tweets:
 ['Be sure to tune in and watch Donald Trump


## 2. Entraînement du Tokenizer SentencePiece (BPE)

 Nous entraînons le tokenizer sur le fichier `tweets.txt` si le modèle n'existe pas.

In [None]:
# %%
model_prefix = "subword_model"
vocab_size = 450  # À ajuster si besoin

if not os.path.exists(model_prefix + ".model"):
    spm.SentencePieceTrainer.Train(
        f"--input={text_file} "
        f"--model_prefix={model_prefix} "
        f"--vocab_size={vocab_size} "
        f"--character_coverage=1.0 "
        f"--model_type=bpe "
        f"--bos_id=-1 --eos_id=-1 --pad_id=-1 "
        f"--user_defined_symbols={BOS_TOKEN},{EOS_TOKEN},{PAD_TOKEN}"
    )
    print("SentencePiece model trained.")
else:
    print("SentencePiece model found. Skipping training.")

# %%
sp = spm.SentencePieceProcessor(model_file=model_prefix + ".model")
test_str = "[BOS] Hello [EOS]"
test_ids = sp.encode_as_ids(test_str)
print("Test enc ->", test_str, ":", test_ids)
print("Test dec ->", sp.decode_ids(test_ids), "\n")

SentencePiece model trained.
Test enc -> [BOS] Hello [EOS] : [378, 1, 308, 27, 382, 378, 2]
Test dec -> [BOS] Hello [EOS] 




## 3. Encodage du Corpus et Préparation des Données

 Nous encodons le texte complet et découpçons le corpus en chunks de taille fixe.

In [None]:
# %%
encoded_corpus = sp.encode_as_ids(all_text)
print("Total length of encoded corpus:", len(encoded_corpus))

# %%
max_len = 70  # Taille de chaque chunk
pad_id = sp.piece_to_id(PAD_TOKEN)
chunks = []
start = 0
while start < len(encoded_corpus):
    end = start + max_len
    chunk = encoded_corpus[start:end]
    if len(chunk) < max_len:
        chunk += [pad_id] * (max_len - len(chunk))
    chunks.append(chunk)
    start = end
chunks = np.array(chunks, dtype=np.int32)
print("Number of chunks:", len(chunks), "-> shape:", chunks.shape)

# %%
# Création des paires (X, Y) pour le language modeling
X = chunks[:, :-1]
Y = chunks[:, 1:]
print("X.shape =", X.shape, "Y.shape =", Y.shape)

# %%
# Division en ensembles d'entraînement et de validation
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(X, Y, test_size=0.2, random_state=42)
print("Train shapes:", x_train.shape, y_train.shape)
print("Validation shapes:", x_val.shape, y_val.shape)

# %%
# Vérification rapide du décodage
print("Sample input:", sp.decode_ids(X[1].tolist()))
print("Sample target:", sp.decode_ids(Y[1].tolist()))

Total length of encoded corpus: 2300936
Number of chunks: 32871 -> shape: (32871, 70)
X.shape = (32871, 69) Y.shape = (32871, 69)
Train shapes: (26296, 69) (26296, 69)
Validation shapes: (6575, 69) (6575, 69)
Sample input: morning to discuss Celebrity Apprentice and his new book Think Like A Champion! [EOS] [BOS] Donald Trump reads Top Ten Financial Tips on Late Show with David Letterm
Sample target: orning to discuss Celebrity Apprentice and his new book Think Like A Champion! [EOS] [BOS] Donald Trump reads Top Ten Financial Tips on Late Show with David Letterman


## 4. Construction et Entraînement du Modèle Transformer

Nous définissons et entraînons un modèle Transformer avec un masque causal.

In [None]:
# %%
from tensorflow.keras import layers
from tensorflow.keras.saving import register_keras_serializable

@register_keras_serializable()
class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, dropout_rate=0.1):
        super().__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.ffn = keras.Sequential([
            layers.Dense(ff_dim, activation="relu"),
            layers.Dense(embed_dim),
        ])
        self.dropout1 = layers.Dropout(dropout_rate)
        self.dropout2 = layers.Dropout(dropout_rate)

    def call(self, inputs, training=False):
        seq_len = tf.shape(inputs)[1]
        mask = tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)
        mask = tf.expand_dims(mask, axis=0)
        attn_output = self.att(inputs, inputs, inputs, attention_mask=mask)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

# %%
@register_keras_serializable()
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim, dropout_rate=0.0):
        super().__init__()
        self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=embed_dim)
        self.dropout = layers.Dropout(dropout_rate)

    def call(self, x, training=False):
        seq_length = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=seq_length, delta=1)
        pos_emb = self.pos_emb(positions)
        token_emb = self.token_emb(x)
        out = token_emb + pos_emb
        return self.dropout(out, training=training)

# %%
# Hyperparamètres
embed_dim = 128
num_heads = 4
ff_dim = 512
num_layers = 4
dropout_rate = 0.1
batch_size = 128
epochs = 5
max_input_len = max_len - 1

# %%
# Construction du modèle
inputs = keras.Input(shape=(max_input_len,), dtype=tf.int32)
x = TokenAndPositionEmbedding(max_input_len, sp.get_piece_size(), embed_dim, dropout_rate)(inputs)
for _ in range(num_layers):
    x = TransformerBlock(embed_dim, num_heads, ff_dim, dropout_rate)(x)
outputs = layers.Dense(sp.get_piece_size(), activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.summary()

# %%
# Entraînement
history = model.fit(
    x_train, y_train,
    validation_data=(x_val, y_val),
    batch_size=batch_size,
    epochs=epochs,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=3, min_lr=1e-6)
    ]
)

# %%
# Sauvegarde du modèle entraîné
model.save("transformer_model.keras") # Change the file path to include the .keras extension

Epoch 1/5
[1m206/206[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 138ms/step - accuracy: 0.1400 - loss: 4.6327 - val_accuracy: 0.2175 - val_loss: 3.6536 - learning_rate: 0.0010
Epoch 2/5
[1m206/206[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 63ms/step - accuracy: 0.2324 - loss: 3.5931 - val_accuracy: 0.3050 - val_loss: 3.2086 - learning_rate: 0.0010
Epoch 3/5
[1m206/206[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 62ms/step - accuracy: 0.2994 - loss: 3.2407 - val_accuracy: 0.3415 - val_loss: 3.0133 - learning_rate: 0.0010
Epoch 4/5
[1m206/206[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 65ms/step - accuracy: 0.3301 - loss: 3.0711 - val_accuracy: 0.3612 - val_loss: 2.9081 - learning_rate: 0.0010
Epoch 5/5
[1m206/206[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 66ms/step - accuracy: 0.3493 - loss: 2.9612 - val_accuracy: 0.3741 - val_loss: 2.8388 - learning_rate: 0.0010


## 5. Génération de Texte et Évaluation BLEU

 Nous définissons plusieurs méthodes de génération (sampling, beam search et classique) et évaluons leurs performances à l'aide du score BLEU.


### Fonctions de génération

In [None]:


# %%
def generate_text_with_sampling(model, sp, start_text, max_tokens=30, temperature=0.8,
                                top_k=40, top_p=0.9, repetition_penalty=1.1, max_input_len=69):
    eos_id = sp.piece_to_id("[EOS]")
    pad_id = sp.piece_to_id("[PAD]")
    bos_id = sp.piece_to_id("[BOS]")
    start_tokens = sp.encode_as_ids(start_text)
    tokens = [bos_id] + start_tokens
    for _ in range(max_tokens):
        input_tokens = tokens[-max_input_len:]
        if len(input_tokens) < max_input_len:
            input_tokens = [pad_id] * (max_input_len - len(input_tokens)) + input_tokens
        input_array = np.array([input_tokens], dtype=np.int32)
        preds = model.predict(input_array, verbose=0)
        logits = preds[0, -1, :]
        for token_id in set(tokens):
            if logits[token_id] < 0:
                logits[token_id] *= repetition_penalty
            else:
                logits[token_id] /= repetition_penalty
        logits = logits / temperature
        probs = np.exp(logits) / np.sum(np.exp(logits))
        sorted_indices = np.argsort(probs)[::-1]
        sorted_probs = probs[sorted_indices]
        if top_k > 0:
            sorted_indices = sorted_indices[:top_k]
            sorted_probs = sorted_probs[:top_k]
        if top_p < 1.0:
            cumulative_probs = np.cumsum(sorted_probs)
            cutoff_index = np.searchsorted(cumulative_probs, top_p) + 1
            sorted_indices = sorted_indices[:cutoff_index]
            sorted_probs = sorted_probs[:cutoff_index]
        sorted_probs = sorted_probs / np.sum(sorted_probs)
        next_id = np.random.choice(sorted_indices, p=sorted_probs)
        tokens.append(next_id)
        if next_id == eos_id:
            break
    if eos_id in tokens:
        eos_index = tokens.index(eos_id)
        final_tokens = tokens[1:eos_index]
    else:
        final_tokens = tokens[1:]
    final_tokens = [int(token) for token in final_tokens]
    return sp.decode_ids(final_tokens)

def beam_search_generate(model, sp, start_text, beam_width=3, max_tokens=30, temperature=0.8,
                         length_penalty=0.8, repetition_penalty=1.1, max_input_len=69):
    eos_id = sp.piece_to_id("[EOS]")
    pad_id = sp.piece_to_id("[PAD]")
    bos_id = sp.piece_to_id("[BOS]")
    prompt_tokens = sp.encode_as_ids(start_text)
    initial_tokens = [bos_id] + prompt_tokens
    beams = [(initial_tokens, 0.0)]
    for _ in range(max_tokens):
        new_beams = []
        for seq, score in beams:
            if seq[-1] == eos_id:
                new_beams.append((seq, score))
                continue
            input_tokens = seq[-max_input_len:]
            if len(input_tokens) < max_input_len:
                input_tokens = [pad_id] * (max_input_len - len(input_tokens)) + input_tokens
            input_array = np.array([input_tokens], dtype=np.int32)
            preds = model.predict(input_array, verbose=0)
            logits = preds[0, -1, :] / temperature
            for token_id in set(seq):
                if logits[token_id] < 0:
                    logits[token_id] *= repetition_penalty
                else:
                    logits[token_id] /= repetition_penalty
            probs = np.exp(logits) / np.sum(np.exp(logits))
            top_indices = np.argsort(probs)[-beam_width:]
            for token_id in top_indices:
                token_prob = probs[token_id]
                new_seq = seq + [token_id]
                new_score = score + np.log(token_prob) / (len(new_seq) ** length_penalty)
                new_beams.append((new_seq, new_score))
        new_beams = sorted(new_beams, key=lambda x: x[1], reverse=True)
        beams = new_beams[:beam_width]
        if all(seq[-1] == eos_id for seq, _ in beams):
            break
    best_seq, best_score = beams[0]
    if eos_id in best_seq:
        eos_index = best_seq.index(eos_id)
        best_seq = best_seq[1:eos_index]
    else:
        best_seq = best_seq[1:]
    best_seq = [int(token) for token in best_seq]
    return sp.decode_ids(best_seq)

def generate_text_classic(model, sp, start_text, max_tokens=30, temperature=0.8, max_input_len=69):
    eos_id = sp.piece_to_id("[EOS]")
    pad_id = sp.piece_to_id("[PAD]")
    bos_id = sp.piece_to_id("[BOS]")
    start_tokens = sp.encode_as_ids(start_text)
    tokens = [bos_id] + start_tokens
    for _ in range(max_tokens):
        input_tokens = tokens[-max_input_len:]
        if len(input_tokens) < max_input_len:
            input_tokens = [pad_id] * (max_input_len - len(input_tokens)) + input_tokens
        input_array = np.array([input_tokens], dtype=np.int32)
        preds = model.predict(input_array, verbose=0)
        logits = preds[0, -1, :]
        logits = logits / temperature
        probs = np.exp(logits) / np.sum(np.exp(logits))
        next_id = np.random.choice(len(probs), p=probs)
        tokens.append(next_id)
        if next_id == eos_id:
            break
    if eos_id in tokens:
        eos_index = tokens.index(eos_id)
        final_tokens = tokens[1:eos_index]
    else:
        final_tokens = tokens[1:]
    final_tokens = [int(token) for token in final_tokens]
    return sp.decode_ids(final_tokens)

### BLEU Evaluation

 Nous utilisons NLTK pour calculer le score BLEU entre une phrase de référence et la phrase générée.

In [None]:
# %%
import nltk
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

def compute_bleu(reference, candidate):
    reference_tokens = reference.split()
    candidate_tokens = candidate.split()
    smoothie = SmoothingFunction().method1
    return sentence_bleu([reference_tokens], candidate_tokens, smoothing_function=smoothie)

### Chargement du Modèle et du Tokenizer

On charge le modèle sauvegardé ainsi que le tokenizer SentencePiece.

### Génération de Texte et Évaluation BLEU

Nous définissons quelques prompts de test et leurs références, puis nous générons du texte par les différentes méthodes.

In [None]:

# %%
test_prompts = [
    "I will make America",
    "Fake news is",
    "China is",
    "Thank you"
]

references = {
    "I will make America": "I will make America great again with strong leadership and policies.",
    "Fake news is": "Fake news is misleading information used to manipulate public opinion.",
    "China is": "China is emerging as a dominant force in the global economy.",
    "Thank you": "Thank you for your support and trust in our vision."
}

# %%
print("=== Sampling Generation ===")
for prompt in test_prompts:
    gen_text = generate_text_with_sampling(
        model, sp, prompt, max_tokens=30,
        temperature=0.8, top_k=40, top_p=0.9,
        repetition_penalty=1.1, max_input_len=69
    )
    print(f"\nPrompt: {prompt}\nGenerated: {gen_text}")

print("\n=== Beam Search Generation ===")
for prompt in test_prompts:
    gen_text = beam_search_generate(
        model, sp, prompt, beam_width=3, max_tokens=30,
        temperature=0.8, length_penalty=0.8,
        repetition_penalty=1.1, max_input_len=69
    )
    print(f"\nPrompt: {prompt}\nGenerated: {gen_text}")

print("\n=== Classic Temperature Generation ===")
for prompt in test_prompts:
    gen_text = generate_text_classic(
        model, sp, prompt, max_tokens=30,
        temperature=0.8, max_input_len=69
    )
    print(f"\nPrompt: {prompt}\nGenerated: {gen_text}")

=== Sampling Generation ===

Prompt: I will make America
Generated: I will make AmericaTrump! Congace I late she few the Elub Top and oW endrujection sk

Prompt: Fake news is
Generated: Fake news is all oftol, with @Moredaison .......done to legar it and lass of he will ne

Prompt: China is
Generated: China is goaling about America and rigle feinutm as. Thanks Fast Alore Is the hunm

Prompt: Thank you
Generated: Thank you Boline Smit! (TrumpDatton by all calise lanargin and we far sidat

=== Beam Search Generation ===

Prompt: I will make America
Generated: I will make America Great! 

Prompt: Fake news is
Generated: Fake news is a lot official! 

Prompt: China is
Generated: China is a little energy. 

Prompt: Thank you
Generated: Thank you! 

=== Classic Temperature Generation ===

Prompt: I will make America
Generated: I will make America ...oc myepprent st ...ivethare theyountthergmericck...ratit exe Wh re need Mrat thisle runce

Prompt: Fake news is
Generated: Fake news is should o

### BLEU Score Evaluation

Nous évaluons les générations par les trois méthodes en calculant leur score BLEU par rapport aux références.




In [None]:

# %%
print("\n=== BLEU Score Evaluation (Classic Generation) ===")
for prompt in test_prompts:
    generated_text = generate_text_classic(
        model, sp, prompt, max_tokens=30,
        temperature=0.8, max_input_len=69
    )
    if prompt in references:
        bleu = compute_bleu(references[prompt], generated_text)
        print(f"\nPrompt: {prompt}\nGenerated: {generated_text}\nBLEU Score: {bleu:.4f}")

# %%
print("\n=== BLEU Score Evaluation (Sampling Generation) ===")
for prompt in test_prompts:
    generated_text = generate_text_with_sampling(
        model, sp, prompt, max_tokens=30,
        temperature=0.8, top_k=40, top_p=0.9,
        repetition_penalty=1.1, max_input_len=69
    )
    if prompt in references:
        bleu = compute_bleu(references[prompt], generated_text)
        print(f"\nPrompt: {prompt}\nGenerated: {generated_text}\nBLEU Score: {bleu:.4f}")

# %%
print("\n=== BLEU Score Evaluation (Beam Search Generation) ===")
for prompt in test_prompts:
    generated_text = beam_search_generate(
        model, sp, prompt, beam_width=3, max_tokens=30,
        temperature=0.8, length_penalty=0.8,
        repetition_penalty=1.1, max_input_len=69
    )
    if prompt in references:
        bleu = compute_bleu(references[prompt], generated_text)
        print(f"\nPrompt: {prompt}\nGenerated: {generated_text}\nBLEU Score: {bleu:.4f}")



=== BLEU Score Evaluation (Classic Generation) ===

Prompt: I will make America
Generated: I will make America su6 ( aselehenmp Ent Thanks V00 f whatonald Th moreobpent people[BOS]usting am year le5ousld
BLEU Score: 0.1345

Prompt: Fake news is
Generated: Fake news isontoll it yj Trump[BOS] time
BLEU Score: 0.0455

Prompt: China is
Generated: China is The V0 w B by de[uelayV job mO thate F want(20te Trump al H2016 big coniveia
BLEU Score: 0.0228

Prompt: Thank you
Generated: Thank you re prl d St amagusW theT greatbamaIavctteraderealove Jass thatadeulade weAurut
BLEU Score: 0.0360

=== BLEU Score Evaluation (Sampling Generation) ===

Prompt: I will make America
Generated: I will make America Trump......Yelain tunt ratis, this seatem out of fieilen is lybolller
BLEU Score: 0.1778

Prompt: Fake news is
Generated: Fake news is now moneL! Promite? Whate 3 would thId. Great suffeners, dional Petm
BLEU Score: 0.0707

Prompt: China is
Generated: China is now it so treridce. OAMAMERO! 4I. Bus