<a href="https://colab.research.google.com/github/Sparadrap1101/Blockchain_Chatbot/blob/main/Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Blockchain Chatbot

*Machine Learning Project - Alexis Cerio*

#### Welcome to my Deep Learning Chatbot project!
This chatbot aims to answer basic questions about Blockchain using Deep Learning.

A quick set up is require to use this chatbot in Colab, follow the instructions:

- First run the import code section below.

In [5]:
# We import and install all required libraries here.
import numpy
!pip install tflearn
import tflearn
import nltk
nltk.download('punkt')
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()
import tensorflow
import random
import json
import pickle

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


- Then go to [the initial repo](https://github.com/Sparadrap1101/Blockchain_Chatbot) and download `files.zip`.

- After unzipping `files.zip` on your computer, add all 6 files in the file section on the left on Colab in order to use my trained model.

- Then, run the next section:

In [7]:
# We start by opening our 'dataset.json' file which contains the chatbot dataset.
with open("dataset.json") as file:
    data = json.load(file)

# Then we verify if we already have trained our model and we load the arrays if it's the case.
try:
    print("Oui")
    with open("data.pickle", "rb") as f:
        words, labels, training, output = pickle.load(f)

# If we haven't trained our model yet, we create new arrays and start to fill them with the dataset.
except:
    print("Non")
    words = []
    labels = []
    docs_x = []
    docs_y = []

    # We look on our dataset, get the patterns (i.e. questions), and tokenize them in different words. We then store them and their tags.
    for intent in data["intents"]:
        for pattern in intent["patterns"]:
            tokenizedWords = nltk.word_tokenize(pattern)
            words.extend(tokenizedWords)
            docs_x.append(tokenizedWords)
            docs_y.append(intent["tag"])

        # Then we store our tags in the labels array.
        if intent["tag"] not in labels:
            labels.append(intent["tag"])

    # We stem and sort these different words to simplify utilisation. We also sort our labels.
    words = [stemmer.stem(word.lower()) for word in words if word != "?"]
    words = sorted(list(set(words)))
    labels = sorted(labels)


    # In this section we will check which words of a pattern are in our global words list, translate that in numbers and add those in arrays.
    training = []
    output = []

    # We create an empty array with the good lenght.
    emptyOutput = [0 for _ in range(len(labels))]

    for index, doc in enumerate(docs_x):
        bag = []

        stemmedWords = [stemmer.stem(word.lower()) for word in doc]

        # If the words in the global list is in the pattern, we append 1 in the bag[] array. Else we append 0.
        for word in words:
            if word in stemmedWords:
                bag.append(1)
            else:
                bag.append(0)

        outputRow = emptyOutput[:]
        outputRow[labels.index(docs_y[index])] = 1

        # We then append the bag in the training array and the outputRow in the output array.
        training.append(bag)
        output.append(outputRow)

    # We transform these arrays in numpy array to use it later.
    training = numpy.array(training)
    output = numpy.array(output)

    # When finished, we store our arrays in a pickle file in order reuse them another time without doing all this again.
    with open("data.pickle", "wb") as f:
        pickle.dump((words, labels, training, output), f)

# Finally, we set up our neural network here. We have chosen 3 layers of 8 neurons fully connected for this model.
tensorflow.compat.v1.reset_default_graph()
net = tflearn.input_data(shape=[None, len(training[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(output[0]), activation="softmax")
net = tflearn.regression(net)

model = tflearn.DNN(net)

# If our model has already been trained, we load it.
try:
  print("Oui")
  model.load("model.tflearn")

# If it hasn't, we start training it (can be long). 
# Then we save the model to be able to use it another time without having to train it again.
except:
  print("Non")
  model.fit(training, output, n_epoch=1000, batch_size=8, show_metric=True)
  model.save("model.tflearn")

Oui
Oui


- Finally run the last section and enjoy chatting with the bot about Blockchain stuff!

In [8]:
# Here is our wordsBag() function, it helps us werify which words of the user question correspond to the words in our array of words.
def wordsBag(userSentence, words):
    bag = [0 for _ in range(len(words))]

    # We tokenize and stem them.
    userWords = nltk.word_tokenize(userSentence)
    userWords = [stemmer.stem(word.lower()) for word in userWords]

    # If a word in the sentence correspond to a word of our dataset, we put 1 on bag[] array to the correct index.
    for userWord in userWords:
        for index, word in enumerate(words):
            if word == userWord:
                bag[index] = 1
    
    # We then return this array to help our model making his prediction.
    return numpy.array(bag)

# Finally, the chat function from where the user will interact with the bot.
def chat():
    print("Start talking with the bot (type exit to stop)!\n")

    # We make a loop for a discussion continue until the user write 'exit'.
    while True:
        inp = input("You: ")
        if inp.lower() == "exit":
            break

        # From our wordsBag() function, our list of words and the input of the user, we make a prediction 
        # with our model in order to find the best answer.
        results = model.predict([wordsBag(inp, words)])[0]

        # We keep only the most probable answer and we get his index to get the tag of the answer we want.
        indexResults = numpy.argmax(results)
        tag = labels[indexResults]

        for tags in data["intents"]:
            if tags['tag'] == tag:
                responses = tags['responses']

        # We print a random in the 2/3 possible responses to have some variance for the user and not always the same response for the same question.
        print("Bot:", random.choice(responses), "\n")

chat()

Start talking with the bot (type exit to stop)!

You: Hello
Bot: Bonjour, comment allez-vous ? 

You: what's your name ?
Bot: I'm Bob, a language model for Blockchain questions. 

You: what are you doing ?
Bot: I.m Bob, your bot assistant 

You: explain me bitcoin
Bot: Bitcoin is a cryptocurrency that operates on a decentralized network, allowing for secure and transparent transfer of funds without the need for intermediaries. It is the first and largest cryptocurrency by market capitalization. 

You: explain me ethereum
Bot: Ethereum is a blockchain platform that allows developers to build decentralized applications and smart contracts. It has its own cryptocurrency called Ether (ETH) and operates on a decentralized network. 

You: I don't understand impermanent loss
Bot: AMMs provide a more streamlined and user-friendly experience but may have lower liquidity compared to order book DEXs. Order book DEXs have higher liquidity and better price discovery, but may have longer wait times 

KeyboardInterrupt: ignored