## Conversational AI

The motivation behind this project is to develop a simple conversational AI bot that is built using the Simpletransformers library and gpt language model. 

Model training is done on a custom set of input/conversation prompts using the data featured in minimal_training.json file. The data in the json file follows the Facebook Persona-chat format.

Each entry in personachat is a dict with two keys personality and utterances, the dataset is a list of entries.

personality: list of strings containing the personality of the agent
utterances: list of dictionaries, each of which has two keys which are lists of strings.
candidates: [next_utterance_candidate_1, ..., next_utterance_candidate_19] The last candidate is the ground truth response observed in the conversational data
history: [dialog_turn_0, ... dialog_turn N], where N is an odd number since the other user starts every conversation

#### Import the required libraries

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


/kaggle/input/gpt-personachat-cache/vocab.json
/kaggle/input/gpt-personachat-cache/model_training_args.bin
/kaggle/input/gpt-personachat-cache/._added_tokens.json
/kaggle/input/gpt-personachat-cache/config.json
/kaggle/input/gpt-personachat-cache/merges.txt
/kaggle/input/gpt-personachat-cache/added_tokens.json
/kaggle/input/gpt-personachat-cache/pytorch_model.bin
/kaggle/input/configjson/config.json
/kaggle/input/originaljson/original.json
/kaggle/input/minimal-training/minimal_training.json


#### Install the simpletransformers library

In [2]:
!pip install simpletransformers

Collecting simpletransformers
  Downloading simpletransformers-0.47.0-py3-none-any.whl (208 kB)
[K     |████████████████████████████████| 208 kB 544 kB/s eta 0:00:01
Collecting seqeval
  Downloading seqeval-0.0.12.tar.gz (21 kB)
Collecting streamlit
  Downloading streamlit-0.64.0-py2.py3-none-any.whl (7.1 MB)
[K     |████████████████████████████████| 7.1 MB 6.6 MB/s eta 0:00:01
Collecting transformers>=3.0.2
  Downloading transformers-3.0.2-py3-none-any.whl (769 kB)
[K     |████████████████████████████████| 769 kB 10.6 MB/s eta 0:00:01
[?25hCollecting tqdm>=4.47.0
  Downloading tqdm-4.48.2-py2.py3-none-any.whl (68 kB)
[K     |████████████████████████████████| 68 kB 3.6 MB/s  eta 0:00:01
Collecting cachetools>=4.0
  Downloading cachetools-4.1.1-py3-none-any.whl (10 kB)
Collecting pillow>=6.2.0
  Downloading Pillow-7.2.0-cp37-cp37m-manylinux1_x86_64.whl (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 11.9 MB/s eta 0:00:01
[?25hCollecting pydeck>=0.1.dev5
  Downloading py

#### Import the conversational AI model from the simpletransformers library

In [3]:
from simpletransformers.conv_ai import ConvAIModel



#### Specify the required training arguments

In [4]:
train_args = {
    "num_train_epochs": 4,
    "save_model_every_epoch": False,
}

#### (Optional) If a gpu is present, execute the following to install the required libraries and to setup the environament to use CUDA

In [None]:
%%writefile setup.sh

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./
!sh setup.sh

In [None]:
!sh setup.sh

#### Initialize the model with the necessary arguments

In [5]:
model = ConvAIModel("gpt", "../input/gpt-personachat-cache", use_cuda=False, args=train_args)

#### Train the model with the minimal_training.json file containing custom input data

In [6]:
model.train_model("../input/minimal-training/minimal_training.json")

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=4.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 4', max=1.0, style=ProgressStyle(descr…

Running loss: 3.854018


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 4', max=1.0, style=ProgressStyle(descr…

Running loss: 4.161917


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 4', max=1.0, style=ProgressStyle(descr…

Running loss: 2.624475


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 4', max=1.0, style=ProgressStyle(descr…

Running loss: 2.019119



#### Interact with the model

In [None]:
model.interact()