In [1]:
# On désactive les avertissements SSL pour éviter les problèmes de certificats
import os
os.environ['PYTHONHTTPSVERIFY'] = '0'

# Importation des modules nécessaires pour le traitement du langage naturel
from transformers import pipeline, set_seed, AutoTokenizer, AutoModelForCausalLM
import torch
import random

# Désactivation de la vérification du certificat SSL pour les requêtes vers Hugging Face
# Cette configuration est particulièrement utile dans des environnements où les certificats
# peuvent poser problème (réseaux d'entreprise, VPN, etc.)
os.environ['CURL_CA_BUNDLE'] = ''

# Note : Cette configuration doit être faite avant toute tentative de connexion à Hugging Face

# On fixe une graine aléatoire pour garantir la reproductibilité des résultats
set_seed(70)  # Le choix de 42 est arbitraire, on peut utiliser n'importe quel nombre

In [2]:
def generate_story(
    initial_phrase,
    model_name="gpt2-large",
    max_length=200,
    num_return_sequences=1,
    seed=None,
    temperature=1.0,
    top_p=0.9,
    repetition_penalty=1.2
):
    """
    Génère une ou plusieurs séquences de texte à partir d'une phrase initiale en utilisant un modèle GPT.

    Args:
        initial_phrase (str): Phrase de départ pour la génération
        model_name (str): Nom du modèle GPT à utiliser (par défaut: "gpt2-large")
        max_length (int): Longueur maximale du texte généré, en tokens (par défaut: 200)
        num_return_sequences (int): Nombre de séquences différentes à générer (par défaut: 1)
        seed (int): Graine pour la reproductibilité des résultats (par défaut: None)
        temperature (float): Contrôle de la créativité (par défaut: 1.0)
        top_p (float): Seuil de probabilité cumulative pour le sampling (par défaut: 0.9)
        repetition_penalty (float): Pénalité pour les répétitions (par défaut: 1.2)

    Returns:
        list: Liste des séquences de texte générées
    """
    # Si une graine est fournie, on l'initialise pour la reproductibilité
    if seed is not None:
        set_seed(seed)

    # Création du pipeline de génération de texte
    generator = pipeline('text-generation', model=model_name)

    # Génération du texte avec les paramètres spécifiés
    outputs = generator(
        initial_phrase,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        temperature=temperature,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
        # Paramètres additionnels pour améliorer la qualité
        pad_token_id=generator.tokenizer.eos_token_id,
        do_sample=True  # Active l'échantillonnage aléatoire
    )

    # Extraction des textes générés
    generated_texts = [output['generated_text'] for output in outputs]

    return generated_texts

In [3]:
stories = generate_story("Il était une fois", temperature=0.8)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.25G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [4]:
stories[0]

'Il était une fois un mot de côté (j\'ai rien lui faire en moins) qui nous sommes à porter.\n\n"I was obliged to send you a note in which I assured you that, if it did not suit me, and my mistress had been kind enough to ask of you for it, the duke would have given his word of honour that I should be treated with more respect than he did." – J\'ai bientôt que je vais obligé quelque chose mieux et ma femme avaissait prête que le duc ne m\'avait pas donner mon chemin ; j\'aurai entendu parvenir un mot de toutes ceux qui n\'ont pas me laisse au moindrement les plus salut.\n\n"You mean the duke?" asked Athos. – Vous ê'

In [5]:
def main():
    """
    Script principal pour tester différentes configurations de génération de texte.
    Explore l'impact des différents paramètres sur la génération de texte.
    """
    initial_phrase = "The professor try to explain at the class..."

    print("\n=== Test 1: Configuration par défaut ===")
    print("Cette configuration utilise les paramètres par défaut pour établir une base de comparaison")
    result1 = generate_story(initial_phrase)
    print(f"Résultat 1:\n{result1[0]}\n")

    print("\n=== Test 2: Configuration personnalisée avec créativité accrue ===")
    print("Cette configuration utilise une température plus élevée pour plus de créativité")
    result2 = generate_story(
        initial_phrase,
        seed=42,
        temperature=1.2,
        top_p=0.95,
        max_length=300
    )
    print(f"Résultat 2:\n{result2[0]}\n")

    print("\n=== Test 3: Trois générations avec seeds aléatoires ===")
    print("Cette série de tests montre comment différentes graines produisent des résultats variés")
    for i in range(3):
        random_seed = random.randint(1, 10000)
        print(f"\nGénération {i+1} (seed={random_seed}):")
        result3 = generate_story(
            initial_phrase,
            seed=random_seed,
            max_length=150
        )
        print(result3[0])

    print("\n=== Test 4: Utilisation du modèle GPT-2 Medium ===")
    print("Ce test utilise un modèle différent pour comparer les résultats")
    result4 = generate_story(
        initial_phrase,
        model_name="gpt2-medium",
        max_length=250
    )
    print(f"Résultat 4:\n{result4[0]}\n")

In [6]:
print("=== Début des tests de génération de texte ===")
print("Note: Les résultats peuvent varier même avec des seeds fixes en raison")
print("de la nature stochastique du processus de génération.")
main()
print("\n=== Fin des tests de génération de texte ===")

=== Début des tests de génération de texte ===
Note: Les résultats peuvent varier même avec des seeds fixes en raison
de la nature stochastique du processus de génération.

=== Test 1: Configuration par défaut ===
Cette configuration utilise les paramètres par défaut pour établir une base de comparaison


Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Résultat 1:
The professor try to explain at the class...


"It is a theory about the past, but it has many consequences and we should not believe in it. I don't know when it came up, I do know that it was very long ago."


A moment later, the professor stands up from his chair and says: "I would like you to go back a year to your final year." He looks at the student's name on the computer screen, then he hands over the phone so they can be called again, the student says yes. He goes into a brief office conference with the other students who are not present as the teacher just walks away for 20 minutes or more. The professor is gone for only 15 minutes and in the classroom the rest of the time the student does nothing more than sit there until the last few minutes when the student turns around and asks what the topic of discussion is for the next week, when the teachers finally return. That day will eventually have another


=== Test 2: Configuration personnalisée avec créativité accrue

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Résultat 2:
The professor try to explain at the class... He was really angry because something awful happened and had never stopped doing it before. There were all sorts of things he didn't want himself, but this one was probably not a pretty picture when you see him in front of your face... You can be so good for about 50 hours out of 100. Then something happens. Your grades go down significantly and you find yourself with little work left over in the end. As bad as that sounds? It might have just been my job."


"Hey Harry," Draco continued, "what are you saying?"


Draco looked up from his notes once more. Voldemort has done something worse than I could've suspected though! A thought raced through every nook and cranny within Hogwarts' mind: What is she talking about?


He reached into the pile and drew on his sleeve a tiny, white slip of parchment. One last glance at Ginny brought some assurance back to his brain's eye but in another half hour, another dark cloud had appeared and h

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


The professor try to explain at the class... "You are actually saying, I am your father?"

I say, "Yes."

My dad smile. "Your father? Well, I am proud of you for making this decision and making a change in your life." He gives me some advice. "It's time to take things one step at a time," he tells me. "Don't worry about money too much, that won't matter when you're older and need something done. Do what you love first. Go for it."

That evening I have coffee with my dad and we talk for another hour about how great his family is. Then there was some small talk, but we never really did anything more than

Génération 2 (seed=471):


Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


The professor try to explain at the class...
A group of college boys show up for a date. The professor is not happy about this and tries to convince them that he's making things difficult. Eventually he gets the boys to listen to his story, then tells his story again at dinner. "Why don't you just follow me around?" the professor says while shaking his head in disbelief. When the professors father comes home from work his son asks him why it took so long for his brother to come over. His dad doesn't really want to say anything because he's worried about what might happen later on...

Génération 3 (seed=1099):


Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


The professor try to explain at the class...

"There are three major elements that make up our universe: The Big Bang, which began when a dense cloud of primordial hydrogen and helium exploded as it cooled, creating an expanding fireball; dark matter, the invisible stuff found in most galaxies but unseen within the galaxy cluster SDSS J170925.13-272236 (see picture); and dark energy — the mysterious force of gravity that exerts its pull on all matter except for ordinary light particles. Dark matter is thought to weigh between 2 and 10 percent of the universe's mass. Although scientists know of no direct evidence of their existence, these objects were one of the primary motivations behind theoretical calculations of the age of the universe

=== Test 4: Utilisation du modèle GPT-2 Medium ===
Ce test utilise un modèle différent pour comparer les résultats


config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Résultat 4:
The professor try to explain at the class...

My point is this: A person can have a good idea about what works, and then become completely blind to all of its flaws. Or they may not know how things work; but that's just another side effect—one which has no direct bearing on whether or when someone makes mistakes in life (or any other aspect for it). In contrast with people who were fully focused upon their own inner motivation as much I was regarding personal growth before working out hard habits like cardio exercise or doing proper nutrition,...

-I could argue until I'm blue in hell if fitness should be regulated by anything--but ultimately we're humans trying our best today so there must always remain room left over… -What he said after telling me "if you don't want bad energy products put your head down" made perfect sense....the guy knows his stuff.....no need whatsoever being mad...just let yourself relax......it'll help more than make him feel better himself ~~ —Kurt

In [7]:
def test_multiple_models(initial_phrase, max_length=200):
    """
    Teste la génération de texte avec différents modèles de langage.

    Args:
        initial_phrase (str): Phrase initiale pour la génération
        max_length (int): Longueur maximale du texte généré
    """
    # Configuration pour chaque modèle
    models_config = {
        "openai-gpt": {
            "name": "openai-gpt",
            "description": "Modèle GPT original d'OpenAI"
        },
        "facebook/opt-125m": {
            "name": "facebook/opt-125m",
            "description": "Modèle OPT de Facebook, version légère"
        }
        # Note: Llama et Gemma nécessitent des configurations spéciales
    }

    for model_name, config in models_config.items():
        print(f"\n=== Test avec {config['description']} ===")
        print(f"Modèle: {model_name}")

        try:
            # Création du pipeline avec gestion de la mémoire
            generator = pipeline(
                'text-generation',
                model=model_name,
                #device_map='auto'  # Utilise GPU si disponible
            )

            # Génération du texte
            result = generator(
                initial_phrase,
                max_length=max_length,
                temperature=0.9,
                do_sample=True
            )

            print("\nRésultat de la génération:")
            print(result[0]['generated_text'])

        except Exception as e:
            print(f"Erreur lors de l'utilisation du modèle {model_name}:")
            print(f"Message d'erreur: {str(e)}\n")

In [8]:
# Test principal
initial_phrase = "The professor try to explain at the class..."

print("=== Début des tests multi-modèles ===")
print("Note: Certains modèles peuvent nécessiter des ressources importantes")
print("ou des configurations spéciales pour fonctionner.")

test_multiple_models(initial_phrase)

print("\n=== Note sur les modèles spéciaux ===")
print("Llama-3.1-8B-Instruct et google/gemma-7b nécessitent des")
print("configurations et des autorisations spéciales.")

=== Début des tests multi-modèles ===
Note: Certains modèles peuvent nécessiter des ressources importantes
ou des configurations spéciales pour fonctionner.

=== Test avec Modèle GPT original d'OpenAI ===
Modèle: openai-gpt


config.json:   0%|          | 0.00/656 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/479M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/74.0 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/816k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/458k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.27M [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.



Résultat de la génération:
The professor try to explain at the class... " 
 " the professor? do i know him? " said rachel. 
 " no, he's just a friend of my dad's. i don't know him, though, and i'm glad dad's not there, " rachel responded. 
 " that must be so sad! " 
 " it's been really hard, i guess. and what was it about this guy that made me give up on him? " rachel asked. 
 " your dad? he's like the coolest guy in school. i mean, he's a good friend of mine... and he was a wonderful kisser, " said sarah. 
 " i bet when he and my dad got serious, he must have kissed you in the hallway, " said rachel. 
 they both laughed and sarah said, " no way! he said he didn't! " 
 " no way! " rachel yelled. " my dad kissed girls on the school dance floor and

=== Test avec Modèle OPT de Facebook, version légère ===
Modèle: facebook/opt-125m


config.json:   0%|          | 0.00/651 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/251M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.



Résultat de la génération:
The professor try to explain at the class... She thought if the students got mad or depressed the teacher of the class would get fired for being too polite.
So, yes, he was a bit rude. But he should have been able to explain if the students got mad. What do you think of a class that doesn't teach what it's teaching?

I think this is one of the reasons why this course has become so popular at my college. Students do NOT want to be taught anything other than the basic stuff. So they are asking for a lot more from a course that's not too much more complex.

I think they realized that it has no effect on the students' grades so they are just gonna sit around and wait. I will say that I think there are people who will argue about class length. If you're trying to teach 12 math questions, that doesn't really help with most of the students.

So, if you're trying to teach

=== Note sur les modèles spéciaux ===
Llama-3.1-8B-Instruct et google/gemma-7b nécessitent des