Importation des Bibliothèques Nécessaires

In [1]:
import random
import json
import pickle
import numpy as np
import tensorflow as tf





Configuration de l'Analyseur Linguistique avec NLTK

In [2]:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

Chargement des Intentions à partir d'un Fichier JSON

In [3]:
intents = json.loads(open('intents.json').read())

In [4]:
intents

{'intents': [{'tag': 'greeting',
   'patterns': ['Hi',
    'Hey',
    'Is anyone there?',
    'Hi there',
    'Hello',
    'Hey there',
    'Howdy',
    'Hola',
    'Bonjour',
    'Konnichiwa',
    'Guten tag',
    'Ola'],
   'responses': ['Hello there. Tell me how are you feeling today?',
    'Hi there. What brings you here today?',
    'Hi there. How are you feeling today?',
    'Great to see you. How do you feel currently?',
    "Hello there. Glad to see you're back. What's going on in your world right now?"]},
  {'tag': 'morning',
   'patterns': ['Good morning'],
   'responses': ["Good morning. I hope you had a good night's sleep. How are you feeling today? "]},
  {'tag': 'afternoon',
   'patterns': ['Good afternoon'],
   'responses': ['Good afternoon. How is your day going?']},
  {'tag': 'evening',
   'patterns': ['Good evening'],
   'responses': ['Good evening. How has your day been?']},
  {'tag': 'night',
   'patterns': ['Good night'],
   'responses': ['Good night. Get some prop

Initialisation des Listes pour le Traitement des Intentions

In [5]:
words = []
classes = []
documents = []
ignoreLetters = ['?', '!', '.', ',']


Traitement des Intentions pour la Construction des Données

In [6]:
for intent in intents['intents']:
    for pattern in intent['patterns']:
        wordList = nltk.word_tokenize(pattern)
        words.extend(wordList)
        documents.append((wordList, intent['tag']))
        if intent['tag'] not in classes:
            classes.append(intent['tag'])

# Lemmatisation des mots et suppression des caractères ignorés
words = [lemmatizer.lemmatize(word) for word in words if word not in ignoreLetters]

# Suppression des duplicatas et tri des mots
words = sorted(set(words))

In [7]:
words

["'ll",
 "'m",
 "'re",
 "'s",
 "'ve",
 'AI',
 'Adorable',
 'Advancements',
 'All',
 'Am',
 'Amusing',
 'Ancient',
 'Animal',
 'Application',
 'Applications',
 'Architectural',
 'Are',
 'Art',
 'Artistic',
 'Astronomy',
 'Astrophysics',
 'Au',
 'Autonomous',
 'Balancing',
 'Biohacking',
 'Biotech',
 'Blockchain',
 'Bonjour',
 'Book',
 'Brain',
 'Brain-teasers',
 'Brainteasers',
 'Budgeting',
 'Building',
 'Business',
 'Bye',
 'Can',
 'Career',
 'Celebrity',
 'Celestial',
 'Challenge',
 'Challenging',
 'Cheer',
 'Choosing',
 'Cinematic',
 'City',
 'Classic',
 'Climate',
 'Code',
 'Coding',
 'Cognitive',
 'College',
 'Community',
 'Consciousness',
 'Controversial',
 'Conundrums',
 'Cosmic',
 'Cosmological',
 'Could',
 'Creative',
 'Creativity',
 'Crypto',
 'Cryptocurrency',
 'Culinary',
 'Cultural',
 'Curious',
 'Current',
 'Cutting-edge',
 'Cybersecurity',
 'Debates',
 'Deep',
 'Define',
 'Depression',
 'Digital',
 'Discuss',
 'Do',
 'Dream',
 'E-learning',
 'Eastern',
 'Eco-system',
 'E

Sauvegarde des Données Prétraitées

In [8]:
# Tri et sauvegarde des classes
classes = sorted(set(classes))
pickle.dump(words, open('words.pkl', 'wb'))
pickle.dump(classes, open('classes.pkl', 'wb'))

Initialisation des Données d'Entraînement

In [8]:
training = []
outputEmpty = [0] * len(classes)

Préparation des Données d'Entraînement

In [9]:
for document in documents:
    bag = []
    wordPatterns = document[0]
    wordPatterns = [lemmatizer.lemmatize(word.lower()) for word in wordPatterns]
    
    # Construction du sac de mots
    for word in words:
        bag.append(1) if word in wordPatterns else bag.append(0)

    # Construction de la sortie attendue
    outputRow = list(outputEmpty)
    outputRow[classes.index(document[1])] = 1
    
    # Ajout des données d'entraînement
    training.append(bag + outputRow)

Mélange des Données d'Entraînement

In [10]:
random.shuffle(training)
training = np.array(training)

Définition du Modèle de Chatbot avec TensorFlow

In [11]:
trainX = training[:, :len(words)]
trainY = training[:, len(words):]

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128, input_shape=(len(trainX[0]),), activation = 'relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation = 'relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(len(trainY[0]), activation='softmax'))




Compilation du Modèle avec l'Optimiseur SGD

In [12]:
sgd = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])


Entraînement du Modèle et Sauvegarde

In [13]:
model.fit(trainX, trainY, epochs=200, batch_size=5, verbose=1)
model.save('chatbot_model.h5')
print('Done')

Epoch 1/200


Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 

  saving_api.save_model(


Importation des Modules et Chargement du Modèle

In [10]:
import random
import pickle
import numpy as np
import nltk
from nltk.stem import WordNetLemmatizer
from tensorflow.keras.models import load_model

Chargement des Données et du Modèle

In [11]:

words = pickle.load(open('words.pkl', 'rb'))
classes = pickle.load(open('classes.pkl','rb'))
model = load_model('chatbot_model.h5')




Fonctions de Prétraitement

In [12]:

lemmatizer = WordNetLemmatizer()

def clean_up_sentence(sentence):
    sentence_words = nltk.word_tokenize(sentence)
    sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
    return sentence_words

Fonction de Création du Sac de Mots (BoW)

In [13]:

def bow(sentence, words, show_details=True):
    sentence_words = clean_up_sentence(sentence)
    bag = [0]*len(words)
    for s in sentence_words:
        for i, w in enumerate(words):
            if w == s:
                bag[i] = 1
                if show_details:
                    print(f"found in bag: {w}")
    return(np.array(bag))

Fonction de Prédiction de Classe

In [14]:
def predict_class(sentence, model, classes, words):
    # Création du sac de mots (BoW) pour la phrase
    p = bow(sentence, words, show_details=False)
    
    # Prédiction de la classe à l'aide du modèle
    res = model.predict(np.array([p]))[0]
    
    # Seuil d'erreur pour filtrer les prédictions
    ERROR_THRESHOLD = 0.25
    
    # Filtrage des prédictions au-dessus du seuil
    results = [[i, r] for i, r in enumerate(res) if r > ERROR_THRESHOLD]

    # Tri des résultats par probabilité décroissante
    results.sort(key=lambda x: x[1], reverse=True)
    
    # Création d'une liste de dictionnaires avec les classes prédites et les probabilités associées
    return_list = [{"intent": classes[r[0]], "probability": str(r[1])} for r in results]
    
    return return_list

Fonction de Réponse du Chatbot

In [15]:
def chatbot_response(text):
    # Chargement du modèle, des mots, et des classes depuis les fichiers
    model = load_model('chatbot_model.h5')
    words = pickle.load(open('words.pkl', 'rb'))
    classes = pickle.load(open('classes.pkl', 'rb'))
    
    # Prédiction de la classe du texte
    ints = predict_class(text, model, classes, words)
    
    # Récupération de la classe prédite
    tag = ints[0]['intent']
    
    # Recherche de la classe dans les intentions
    for intent in intents['intents']:
        if intent['tag'] == tag:
            # Sélection d'une réponse aléatoire associée à la classe
            response = random.choice(intent['responses'])
            return response



Boucle de Conversation avec le Chatbot

In [16]:

while True:
    # Obtenir l'entrée de l'utilisateur
    user_input = input("You: ")
    
    # Vérifier si l'utilisateur souhaite quitter la conversation
    if user_input.lower() == 'exit':
        break

    # Obtenir la réponse du chatbot en fonction de l'entrée de l'utilisateur
    response = chatbot_response(user_input)
    
    # Afficher la réponse du chatbot
    print("ChatBot:", response)



ChatBot: Hi there. How are you feeling today?
ChatBot: Somewhere in the universe
ChatBot: That's no problem. I can see why you'd be stressed out about that. I can suggest you some tips to alleviate this issue. Would you like to learn more about that?
ChatBot: I'll see you soon.


In [17]:
# Assume that you've already defined the functions and loaded the model and intents

def chat_with_itself():
    max_turns = 5  # Set the maximum number of turns for the conversation

    for _ in range(max_turns):
        # Simulate ChatBot 1 asking a question
        question = "What is your favorite color?"
        print(f"ChatBot 1: {question}")

        # Simulate ChatBot 2 responding
        response_bot2 = chatbot_response(question)
        print(f"ChatBot 2: {response_bot2}")

        # Simulate ChatBot 2 asking a question
        question_bot2 = "Do you prefer cats or dogs?"
        print(f"ChatBot 2: {question_bot2}")

        # Simulate ChatBot 1 responding
        response = chatbot_response(question_bot2)
        print(f"ChatBot 1: {response}")

        # You can continue the conversation or break the loop based on your needs

chat_with_itself()


ChatBot 1: What is your favorite color?
ChatBot 2: Call me BMZ
ChatBot 2: Do you prefer cats or dogs?
ChatBot 1: Somewhere in the universe
ChatBot 1: What is your favorite color?
ChatBot 2: I'm BMZ. I am a conversational agent designed to mimic a therapist. So how are you feeling today?
ChatBot 2: Do you prefer cats or dogs?
ChatBot 1: Somewhere in the universe
ChatBot 1: What is your favorite color?
ChatBot 2: Call me BMZ
ChatBot 2: Do you prefer cats or dogs?
ChatBot 1: Everywhere
ChatBot 1: What is your favorite color?
ChatBot 2: I'm BMZ. I am a conversational agent designed to mimic a therapist. So how are you feeling today?
ChatBot 2: Do you prefer cats or dogs?
ChatBot 1: Somewhere in the universe
ChatBot 1: What is your favorite color?
ChatBot 2: You can call me BMZ.
ChatBot 2: Do you prefer cats or dogs?
ChatBot 1: Duh I live in your computer


In [16]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
import random
import json
import re

# Load your intents data
with open('intents.json', 'r') as file:
    intents_data = json.load(file)

all_patterns = [pattern for intent in intents_data['intents'] for pattern in intent['patterns']]

# Load the pre-trained GPT-2 model
gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2")
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Explicitly set pad_token_id to eos_token_id
gpt2_model.config.pad_token_id = gpt2_tokenizer.eos_token_id

# Load the pre-trained chatbot model
model_path_chatbot = "chatbot_model.h5"
chatbot_model = load_model(model_path_chatbot)
tokenizer_chatbot = None  # Replace with the appropriate tokenizer if you have one

# Function to generate a question with GPT-2
def generate_simple_gpt2_question():
    random_pattern = random.choice(all_patterns)
    input_text = f"{random_pattern}"  # Adding "Generate a question:" before the pattern

    # Encode the text
    input_ids = gpt2_tokenizer.encode(input_text, return_tensors="pt", max_length=512)

    # Set attention mask
    attention_mask = torch.ones(input_ids.shape, dtype=torch.long)

    # Generate the sequence
    output = gpt2_model.generate(input_ids, attention_mask=attention_mask, max_length=150, num_beams=5,
                                 no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

    # Decode the sequence
    generated_sequence = gpt2_tokenizer.decode(output[0], skip_special_tokens=True)

    # Extract the completed question after a question mark, period, or within quotes
    match = re.search(r"(.*[.!?])|(\".*\")", generated_sequence)
    if match:
        generated_question = match.group()
    else:
        generated_question = generated_sequence

    return generated_question

# Function for simulating a conversation between the two chatbots
def chat_between_bots():
    print("MAHMOUD: Hi! I'm MAHMOUD the chatbot.")
    print("BMZ: Hello, I am BMZ, a chatbot model.")
    print("Let's start our conversation.")

    for _ in range(3):  # Number of questions in the discussion (you can adjust this for demo purposes)
        # Generate a question with GPT-2
        gpt2_question = generate_simple_gpt2_question()
        print(f"MAHMOUD: {gpt2_question}\n")

        # Get the response from the chatbot model
        chatbot_response_text = chatbot_response(gpt2_question)
        print(f"BMZ: {chatbot_response_text}\n")

# Launch the conversation between the two chatbots
chat_between_bots()


  from .autonotebook import tqdm as notebook_tqdm
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


MAHMOUD: Hi! I'm MAHMOUD the chatbot.
BMZ: Hello, I am BMZ, a chatbot model.
Let's start our conversation.
MAHMOUD: My brother died a few years ago, and I'm still trying to figure out what happened to him. I don't know what he did to himself, but I do know that he was a good person.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


BMZ: The most important thing is to talk to someone you trust. This might be a friend, colleague, family member, or GP. In addition to talking to someone, it may be useful to find out more information about what you are experiencing. These things may help to get some perspective on what you are experiencing, and be the start of getting help.

MAHMOUD: Can you help?



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


BMZ: Yes, sure. How can I help you?

MAHMOUD: Tell me about yourself.

BMZ: Call me BMZ

