<a href="https://colab.research.google.com/github/dhegit-ide/Machine-Learning/blob/main/H071201030_Dhea_Gita_Chatbot_Model_RandomForest.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chatbot AI Star Wars Menggunakan Model RandomForest


H071201030 Dhea Gita

2022

### Deskripsi Data

Chatbot sangat membantu organisasi bisnis dan juga pelanggan. Mayoritas orang lebih suka berbicara langsung dari *chatbox* daripada menelepon pusat layanan. Pada tugas UAS ini saya akan membangun proyek yang menarik di Chatbot. Saya akan mengimplementasikan chatbot dari awal yang dapat memahami apa yang dibicarakan pengguna dan memberikan respons yang sesuai. Chatbot hanyalah perangkat lunak cerdas yang dapat berinteraksi dan berkomunikasi dengan orang-orang seperti manusia. 

Dalam proyek ini, [dataset](https://www.kaggle.com/datasets/aslanahmedov/star-wars-chat-bot) yang digunakan untuk membuat AI Chatbot difokuskan untuk The Star Wars Cinematic Universe dan mencoba melatihnya sedemikian rupa sehingga dapat menjawab beberapa pertanyaan dasar tentang Star Wars.

### Data Fields

Jenis file yang digunakan adalah `json`, dengan isi sebagai berikut:
* **intents** - list data chat
* **tag** - kategori chat
* **patterns** - pola chat yang mungkin diberikan user
* **responses** - kumpulan jawaban yang dipilih random untuk merespon user

### Objektif

* Part 1: Persiapan Dataset
* Part 2: Preprosesing Data
* Part 3: Split Dataset Menjadi Data `test` dan `train`
* Part 4: Membangun dan Menguji Model
* Part 5: Pengaplikasian Model pada Chatbot

### Yang diperlukan

* Python libraries: `pandas`, `numpy`, `scikit-learn`, `json`
* Data File: `starwarsintents.json`

# Part 1: Persiapan Dataset

mengambil dataset menggunakan API Kaggle

In [None]:
import os
from kaggle.api.kaggle_api_extended import KaggleApi

os.environ['KAGGLE_USERNAME'] = 'dheagita'
os.environ['KAGGLE_KEY'] = '3bf3305f51e13783ce4bffb42d995548'

!kaggle datasets download -d aslanahmedov/star-wars-chat-bot

star-wars-chat-bot.zip: Skipping, found more recently modified local copy (use --force to force download)


mengekstrak file `.zip`

In [None]:
from zipfile import ZipFile
  
# loading the temp.zip and creating a zip object
with ZipFile("star-wars-chat-bot.zip", 'r') as zObject:
  
    # Extracting all the members of the zip 
    # into a specific location.
    zObject.extractall(path="./")

class untuk mengonversi data `json` ke bentuk `DataFrame`

In [1]:
import json
import pandas as pd
from random import choice

class JSONParser:
    def __init__(self):
        self.text = []
        self.intents = []
        self.responses = {}

    def parse(self, json_path):
        with open(json_path) as data_file:
            self.data = json.load(data_file)

        for intent in self.data['intents']:
            for pattern in intent['patterns']:
                self.text.append(pattern)
                self.intents.append(intent['tag'])
            for resp in intent['responses']:
                if intent['tag'] in self.responses.keys():
                    self.responses[intent['tag']].append(resp)
                else:
                    self.responses[intent['tag']] = [resp]

        self.df = pd.DataFrame({'text_input': self.text,
                                'intents': self.intents})

        print(
            f"[INFO] Data JSON converted to DataFrame with shape : {self.df.shape}")

    def get_dataframe(self):
        return self.df

    def get_response(self, intent):
        return choice(self.responses[intent])

load data json ke dataframe

In [2]:
path = "./starwarsintents.json"
jp = JSONParser()
jp.parse(path)
df = jp.get_dataframe()

[INFO] Data JSON converted to DataFrame with shape : (97, 2)


menampilkan data `df`

In [3]:
df

Unnamed: 0,text_input,intents
0,Hi,greeting
1,Hey,greeting
2,How are you,greeting
3,Is anyone there?,greeting
4,Hello,greeting
...,...,...
92,Who is Mr. ASLAN,myself
93,Mr. ASLAN profile,myself
94,Mr. ASLAN details.,myself
95,Tell me a story?,stories


menampilkan info data

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   text_input  97 non-null     object
 1   intents     97 non-null     object
dtypes: object(2)
memory usage: 1.6+ KB


# Part 2: Preprosesing Data

* mengubah input menjadi *lower case*
* menghilangkan tanda baca pada inputan

In [5]:
def preprocess(chat):
    chat = chat.lower()
    tandabaca = tuple(string.punctuation)
    chat = ''.join(ch for ch in chat if ch not in tandabaca)
    return chat

menambahkan kolom baru `text_input_prep`

In [6]:
import string

df['text_input_prep'] = df.text_input.apply(preprocess)

menampilkan dataframe `df`

In [7]:
df

Unnamed: 0,text_input,intents,text_input_prep
0,Hi,greeting,hi
1,Hey,greeting,hey
2,How are you,greeting,how are you
3,Is anyone there?,greeting,is anyone there
4,Hello,greeting,hello
...,...,...,...
92,Who is Mr. ASLAN,myself,who is mr aslan
93,Mr. ASLAN profile,myself,mr aslan profile
94,Mr. ASLAN details.,myself,mr aslan details
95,Tell me a story?,stories,tell me a story


menampilkan info data lagi

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   text_input       97 non-null     object
 1   intents          97 non-null     object
 2   text_input_prep  97 non-null     object
dtypes: object(3)
memory usage: 2.4+ KB


# Part 3: Split Dataset Menjadi Data `test` dan `train`

memilih feature (X) dan target (Y)

In [9]:
X = df.text_input_prep
y = df.intents

membagi dataset dengan rasio 20%:80%

In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

hasil pembagian dataset:

In [11]:
print("Jumlah data X_train:", len(X_train))
print("Jumlah data X_test:", len(X_test))

Jumlah data X_train: 77
Jumlah data X_test: 20


# Part 4: Membangun dan Menguji Model

* menggunakan model `RandomForestClassifier` 
* `CountVectorizer` agar dapat mengonversi string menjadi float
* `pipeline` agar dapat menggunakan `CountVectorizer` ke model

In [12]:
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier

pipeline = make_pipeline(CountVectorizer(), RandomForestClassifier())
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

perbandingan data aktual dan hasil prediksi `y_test`

In [13]:
print("Data Aktual\t:", list(y_test[:10]))
print("Hasil Prediksi\t:", list(predictions[:10]))

Data Aktual	: ['mission', 'Menu', 'myself', 'goodbye', 'funny', 'about me', 'jedi', 'Menu', 'goodbye', 'greeting']
Hasil Prediksi	: ['mission', 'tasks', 'myself', 'thanks', 'alive', 'about me', 'jedi', 'thanks', 'thanks', 'thanks']


nilai akurasi model

In [14]:
accuracy = pipeline.score(X_test, y_test)
print("Akurasi:", accuracy)

Akurasi: 0.55


# Part 5: Pengaplikasian Model pada Chatbot

In [15]:
def bot_response(chat, pipeline, jp):
    chat = preprocess(chat)
    res = pipeline.predict_proba([chat])
    max_prob = max(res[0])
    if max_prob < 0.2:
        return "Sorry, I don't understand what you're saying :(", None
    else:
        max_id = np.argmax(res[0])
        pred_tag = pipeline.classes_[max_id]
        return jp.get_response(pred_tag), pred_tag

In [16]:
import numpy as np

print("[INFO] You are already connected to the Bot")
telwhile True:
    chat = input("You >> ")
    res, tag = bot_response(chat,pipeline,jp)
    print(f"Bot >> {res}")
    if tag == "goodbye":
        break

[INFO] You are already connected to the Bot
You >> Hey
Bot >> Yes, I am here.
You >> Hi
Bot >> Thans does not 
You >> Yo
Bot >> Ooooo Hello, looking for someone or something?
You >> What you can do?
Bot >> I can do whatever you asks me to do
You >> wht u cn do?
Bot >> Glad to help!
You >> no
Bot >> Any time!
You >> wht u can do?
Bot >> Right now i'm in developing stage as soon i'm developed, I can do everything
You >> Are you alive?
Bot >> No, i don't think so I need to do all this
You >> do u alive?
Bot >> No, i don't think so I need to do all this
You >> can u run?
Bot >> I'm in doubt about that
You >> can u turu?
Bot >> Right now i'm in developing stage as soon i'm developed, I can do everything
You >> Do you serve drinks?
Bot >> Ok our best optins: Fuzzy Tauntaun, Bloody Rancor, Jedi Mind Trick, T-16 Skyhopper, Yub Nub, Jet Juice, Hyperdrive, Rancor Beer.
You >> Fuzzy Tauntaun
Bot >> Happy to help!
You >> help
Bot >> You are at the address.
You >> jedi
Bot >> Luke Skywalker, Yoda, 