<a href="https://colab.research.google.com/github/Reynxzz/zyo-virtual-tour-guide/blob/main/Zyo_AI_Virtual_Tour_Guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Zyo: Your Personalized Tour Guide to Explore Indonesia**

Proyek ini dibuat untuk memenuhi submisi pada perlombaan [AI Innovation Challenge Compfest 2022](https://www.compfest.id/).

 **Zyo** adalah sebuah aplikasi berbasis AI untuk memecahkan masalah pariwisata di Indonesia. Menurunnya angka COVID-19 di Indonesia membuat pariwisata Indonesia mulai kembali pulih. Berdasarkan laporan Badan Pusat Statistik (BPS), jumlah kunjungan **wisatawan mancanegara** (wisman) ke Indonesia mencapai **111,06 ribu** kunjungan per April 2022. Jumlah kunjungan wisman ini **naik tajam hampir 500% (yoy)** dibandingkan dengan jumlah kunjungan pada April 2021. Sementara, jika dibandingkan dengan bulan sebelumnya, jumlah kunjungan wisman pada April 2022 **meningkat sebesar 172,27% (m-t-m)**. Angka ini diprediksi akan terus meningkat pada bulan-bulan berikutnya (Dikutip dari [katadata.co.id](https://https://databoks.katadata.co.id/datapublish/2022/06/02/kunjungan-wisman-ke-ri-naik-500-yoy-terbanyak-dari-asia-tenggara#:~:text=Berdasarkan%20laporan%20Badan%20Pusat%20Statistik,jumlah%20kunjungan%20pada%20April%202021.)).

Sektor pariwisata merupakan sektor yang berkontribusi besar dalam mendorong perekonomian Indonesia. Meningkatnya angka wisatawan mancanegara yang datang ke Indonesia membuka peluang yang sangat lebar untuk lebih mempromosikan pariwisata dan keragaman budaya di Indonesia. Namun, sektor pariwisata di Indonesia masih memiliki berbagai masalah untuk diselesaikan.

Menurut [Nugroho (2020)](https://https://ejournal.bsi.ac.id/ejurnal/index.php/jp), beberapa masalah yang ada pada sektor pariwisata Indonesia saat ini adalah:

1.   Kualitas Sumberdaya Manusia (SDM) yang Masih Kurang Mendukung
2.   Komunikasi dan Publikasi Yang Masih Kurang
3.   Belum Memadainya Infrastruktur Pariwisata di Beberapa Daerah

**Zyo** hadir untuk mengatasi permasalahan tersebut. Zyo merupakan aplikasi berbasis AI yang berperan sebagai pemandu wisata untuk menemani perjalanan wisatawan mancanegara secara virtual. Dengan Zyo, wisatawan mancanegara dapat mengenal lebih dekat dengan tempat-tempat wisata dan keragaman budaya yang ada di Indonesia. Zyo akan merekomendasikan tempat menarik, memandu hingga ke tujuan wisata, dan memberikan pengetahuan baru mengenai tempat tersebut beserta budaya yang ada di sekitar tempat tersebut.

# Scope Project



*   **Target Pengguna:** Wisatawan mancanegara (18-30 tahun)
*   **Cakupan Aplikasi:** karena keterbatasan waktu dan sumber daya, aplikasi ini hanya akan memberikan panduan mengenai beberapa tempat wisata menarik di pulau Bali.
*   **Bahasa:** Inggris



# Data Collecting

Karena target pengguna aplikasi ini adalah wisatawan mancanegara, bahasa yang dominan digunakan adalaha **bahasa inggris**. Untuk melihat perilaku wisatawan mancanegara saat berwisata ke Indonesia, mulai dari persiapan hingga saat sampai di tujuan (termasuk perkataan/ucapan yang sering diucapkan), kami melakukan eksplorasi dengan melihat beberapa *travel vlog* yang dibuat oleh wisatawan yang berasal dari berbagai negara untuk kemudian mengambil **data berupa transkrip/subtitle** (.txt) yang dapat dimanfaatkan untuk melatih kecerdasan Zyo.

Selain itu, data pertanyaan FAQ mengenai pariwisata Bali (.txt) juga dikumpulkan melalui berbagai website berikut.

*   https://www.balispirit.com/community/blog/bali-travel-faqs
*   https://traveltriangle.com/blog/know-all-about-bali/
*   https://www.travelonline.com/bali/frequently-asked-questions
*   https://thetravelauthor.com/faq-for-bali/

# Data Preprocessing

Membuat Intents file yang berisi beberapa pola pertanyaan dan respon untuk melatih Zyo. Karena keterbatasan pola pertanyaan yang didapat dari sumber yang ada, Data Augmentation dilakukan untuk memperbesar ukuran datasets. Beberapa cara yang dilakukan adalah:


*   Translasi ke bahasa asing kemudian ditranslasi kembali (Back translation)
*   Parafrase menggunakan bantuan [Quilboot](https://quillbot.com/)



In [1]:
%%writefile intentsZyo.json
{"intents": [
        {"tag": "greeting",
         "patterns": ["Hi", "How are you", "Is anyone there?", "Hello", "Good day", "Whats up"],
         "responses": ["Hello! I'm Zyo :)", "Good to see you again!", "Hi there, how can I help you?"],
         "context_set": ""
        },
        {"tag": "goodbye",
         "patterns": ["cya", "See you later", "Goodbye", "I am Leaving", "Have a Good day"],
         "responses": ["Sad to see you go :(", "Talk to you later", "Goodbye!"],
         "context_set": ""
        },
        {"tag": "age",
         "patterns": ["how old", "how old is zyo", "what is your age", "how old are you", "age?"],
         "responses": ["I am 20 years old!", "20 years young!"],
         "context_set": ""
        },
        {"tag": "name",
         "patterns": ["what is your name", "what should I call you", "whats your name?"],
         "responses": ["You can call me Zyo", "I'm Zyo!", "I'm Zyo! Your Virtual Tour Guide!"],
         "context_set": ""
        },
        {"tag": "weather_faq",
         "patterns": ["What's the weather like in Bali?", "How is the weather in Bali?", "How does the weather in Bali?", "weather in bali", "bali's weather", "bali is hot", "bali is cold"],
         "responses": ["Bali is generally hot, sunny and humid and typically the rainy season is between November - Feb/March. However rain usually pours heavily for a short duration in the afternoon or during the night, leaving the rest of the day warm and sunny (around 90 degrees) and cools off at night"],
         "context_set": ""
        },
        {"tag": "visa_faq",
         "patterns": ["Do I need a VISA to enter Bali?", "visa required", "Is a VISA required to enter Bali?", "Is a VISA necessary to enter Bali?", "visa necessary?"],
         "responses": ["Indonesia has a free 30-Day visa-on-arrival system (tourist visa). If you would like to stay more than 30 days, then you will need to pay $35 at the airport when you arrive at the airport for the option to extend with an Indonesian agent for up to 60-days"],
         "context_set": ""
        },
        {"tag": "safe_faq",
         "patterns": ["Is Bali Safe?", "Is there much crime?", "safety in bali", "is bali secure?", "crime in bali"],
         "responses": ["Bali has always been peaceful, As for safety in the streets, there is very little violent crime in Bali, especially in Ubud - just be cautious against pickpockets in the beach areas and with handbags on motorbikes late at night"],
         "context_set": ""
        },
        {"tag": "drinkwater_faq",
         "patterns": ["Can I drink the water?", "Bali's water", "water drinkable?", "drink water in Bali"],
         "responses": ["Always drink bottled or filtered water. 99% of restaurants in Bali use bottled water for all cooking purposes and it is advisable to drink and brush teeth with bottled water to avoid 'Bali Belly' - a mild dysentery which can occur but passes in a few days."],
         "context_set": ""
        },
        {"tag": "clothing_faq",
         "patterns": ["What clothing can I wear in Bali?", "what should i wear?", "clothes to wear in bali", "bikini allowed?", "baggy clothes"],
         "responses": ["Where you will stay will 99% of the time be air conditioned or with a fan, but outside it is hot and humid. Flip flops or comfortable walking sandals are a must with light clothing like t-shirts and shorts to keep you cool are recommended and a light pullover in the evenings if you are riding a motorbike. When entering temples you must be respectful and have your shoulders covered and wear a sarong (for men and women)"],
         "context_set": ""
        },
        {"tag": "foodprice_faq",
         "patterns": ["What's the normal price for food in Bali?", "food price in bali", "food cheap or expensive?", "price of the food?"],
         "responses": ["Local Indonesian food like nasi campur and nasi/mie goreng will cost you a lot less, around IDR 30,000 (USD $2.50), a meal at a mid range restaurant in will cost between IDR 100,000-250,000 ($12-20) and fine dining restaurants could cost anywhere between IDR 500,000 and 1,000,000+ ($40-$80)."],
         "context_set": ""
        },
        {"tag": "rentcar_faq",
         "patterns": ["Should I rent a car?", "what car to rent?", "car rental available?", "rent a car"],
         "responses": ["It is not necessary, nor advisable to rent a car in Bali. Taxis will cost about $25 - $30 to take you from the Airport at Denpasar to Ubud (approximately 1.5-hour drive, depending on traffic). Most hotels offer a free shuttle or taxi service to take you to the town center in Ubud. You can also hire a private driver and car for $40 - $60 per day for day excursions."],
         "context_set": ""
        },
        {"tag": "buddhist_faq",
         "patterns": ["Is Bali a Buddhist country?", "religion in bali", "Buddhist", "bali islam?"],
         "responses": ["Though Bali is a multi-religious island, most of the people on the island follow Balinese Hinduism which is a fusion of Indian and local Bali customs and culture. Muslim, Christianity, and Buddhism are a few other minority religions on the island."],
         "context_set": ""
        },
        {"tag": "wifi_faq",
         "patterns": ["Does Bali airport have WiFi?", "wifi in airport", "airport wifi"],
         "responses": ["Denpasar Ngurah International Airport in Bali has free wifi and it is easy to connect to the network too. The free session lasts for 240 minutes."],
         "context_set": ""
        },
        {"tag": "airportcount_faq",
         "patterns": ["How many airports does Bali have?", "airport in bali", "bali airport", "airport"],
         "responses": ["There is only one airport in Bali i.e. Denpasar Ngurah International Airport. The airport has two terminals designated to domestic and international flights. The airport has all the modern facilities to provide convenience and comfort to the passengers."],
         "context_set": ""
        },
        {"tag": "language_faq",
         "patterns": ["What language do they speak in Bali?", "bali language", "speak in bali", "bahasa bali", "bahasa indonesia", "language speak"],
         "responses": ["Bahasa Bali and Bahasa Indonesian are the most prominent languages on the island. Most people in Balinese are bilingual and speak both these languages. A fewer people are fluent in the English language."],
         "context_set": ""
        },
        {"tag": "exchange_faq",
         "patterns": ["What’s the exchange rate in Indonesia?", "exchange rate", "What is the current currency exchange rate in Indonesia?"],
         "responses": ["The most popular exchange rate in Indonesia is the USD to IDR rate. The national currency of Bali is the Indonesian Rupiah (Rp)."],
         "context_set": ""
        },
        {"tag": "bestthingkid_faq",
         "patterns": ["What are the best things to do in Bali with kids?", "Which activities in Bali are the best for families?", "What are the finest family-friendly activities in Bali?", "bali for kids", "bali for family", "kids in bali", "traveling with kids"],
         "responses": ["You can take your kids to Bali Safari & Marine Park which is one of the most visited theme parks on the island. Dream Museums is another popular tourist attraction in Bali."],
         "context_set": ""
        },
        {"tag": "casino_faq",
         "patterns": ["Does Bali have a casino?", "bali casino", "casino in bali", "casino available", "casino"],
         "responses": ["No, Bali does not have a casino. There are no sanctioned casinos or poker rooms in Bali. In fact, the activities of gambling and betting are illegal in Bali as it is against their religious principles."],
         "context_set": ""
        },
        {"tag": "whattoeat_faq",
         "patterns": ["What to eat in Bali?", "eat in bali", "best food in bali", "bali food", "delicious food in bali"],
         "responses": ["When you are in Bali, you must try the authentic Balinese food as it is one of its kind and a lot of travelers visit Bali just to try the local delicacies. You must try dishes like Sate, Nasi Ayam, and Nasi Campur, Bebek and Ayam Betutu, Babi Guling and Traditional cakes and desserts."],
         "context_set": ""
        },
        {"tag": "bestressort_faq",
         "patterns": ["Which are the best beach resorts in Bali?","bali resort", "best resort in bali", "bali resorts", "recommended resorts"],
         "responses": ["If you are a beach lover and are planning to stay at a beach resort in Bali, then the best options for you are The Mulia, St.Regis Bali, The Royal Purnama and W Bali."],
         "context_set": ""
        },
        {"tag": "besthotel_faq",
         "patterns": ["Which are the best hotels in Seminyak", "best hotel in seminyak", "best hotel", "recommended hotel"],
         "responses": ["The best hotels in Seminyak are Horison Seminyak, The Haven Bali Seminyak, Courtyard by Marriott Bali Seminyak Resort and Harris Hotel Seminyak."],
         "context_set": ""
        },
        {"tag": "timevisit_faq",
         "patterns": ["When is the best time to go?", "best time to go", "what time to go", "time to come", "what year to come to bali"],
         "responses": ["Bali is a spectacular destination to visit all year round with stunning tropical scenery and endless opportunities for adventure. While temperatures tend to average 27 °C throughout the year, we recommend visiting between May to August for optimal tropical conditions without the holiday crowds."],
         "context_set": ""
        },
        {"tag": "corona_faq",
         "patterns": ["Is there Corona Virus in Bali?", "corona in bali", "corona", "covid 19", "bali corona covid", "bali virus"],
         "responses": ["Bali is in the midst of vaccinations, and most of the main tourist areas have already been fully vaccinated. During the Pandemic, numbers of infections stayed fairly low on the island. All resorts and public areas maintain good health protocols, using disinfectants and wiping down tables etc."],
         "context_set": ""
        },
        {"tag": "taxis_faq",
         "patterns": ["What taxis do you catch in Bali?", "taxi to go", "what taxi to ride", "bali taxi"],
         "responses": ["The best taxis are generally Bluebird, as they are metered and monitored. If you are willing to barter, you can always have someone else drive you around. There are usually plenty of willing drivers outside most resorts."],
         "context_set": ""
        },
        {"tag": "illegal_faq",
         "patterns": ["What is illegal in Bali?", "drugs is illegal", "illegal in bali", "drugs"],
         "responses": ["Drugs. Bali is very, very strict on them, and if you are caught, there is basically no defence. You will end up in jail."],
         "context_set": ""
        },
        {"tag": "smoke_faq",
         "patterns": ["Can i smoke in Bali?", "smoking in bali", "cigarettes in bali", "can i smoke?"],
         "responses": ["Many Balinese people smoke, and cigarettes are quite cheap, although not the same quality as home."],
         "context_set": ""
        },

   ]
}

Overwriting intentsZyo.json


In [2]:
from google.colab import files
uploaded = files.upload()

Saving intentsZyo.json to intentsZyo (4).json


In [3]:
data = next(iter(uploaded.values()))
type(data)
print(data)

b'{"intents": [\r\n        {"tag": "greeting",\r\n         "patterns": ["Hi", "How are you", "Is anyone there?", "Hello", "Good day", "Whats up"],\r\n         "responses": ["Hello! I\'m Zyo :)", "Good to see you again!", "Hi there, how can I help you?"],\r\n         "context_set": ""\r\n        },\r\n        {"tag": "goodbye",\r\n         "patterns": ["cya", "See you later", "Goodbye", "I am Leaving", "Have a Good day"],\r\n         "responses": ["Sad to see you go :(", "Talk to you later", "Goodbye!"],\r\n         "context_set": ""\r\n        },\r\n        {"tag": "age",\r\n         "patterns": ["how old", "how old is zyo", "what is your age", "how old are you", "age?"],\r\n         "responses": ["I am 20 years old!", "20 years young!"],\r\n         "context_set": ""\r\n        },\r\n        {"tag": "name",\r\n         "patterns": ["what is your name", "what should I call you", "whats your name?"],\r\n         "responses": ["You can call me Zyo", "I\'m Zyo!", "I\'m Zyo! Your Virtual T

In [4]:
import json
json.loads(data.decode())

{'intents': [{'tag': 'greeting',
   'patterns': ['Hi',
    'How are you',
    'Is anyone there?',
    'Hello',
    'Good day',
    'Whats up'],
   'responses': ["Hello! I'm Zyo :)",
    'Good to see you again!',
    'Hi there, how can I help you?'],
   'context_set': ''},
  {'tag': 'goodbye',
   'patterns': ['cya',
    'See you later',
    'Goodbye',
    'I am Leaving',
    'Have a Good day'],
   'responses': ['Sad to see you go :(', 'Talk to you later', 'Goodbye!'],
   'context_set': ''},
  {'tag': 'age',
   'patterns': ['how old',
    'how old is zyo',
    'what is your age',
    'how old are you',
    'age?'],
   'responses': ['I am 20 years old!', '20 years young!'],
   'context_set': ''},
  {'tag': 'name',
   'patterns': ['what is your name',
    'what should I call you',
    'whats your name?'],
   'responses': ['You can call me Zyo',
    "I'm Zyo!",
    "I'm Zyo! Your Virtual Tour Guide!"],
   'context_set': ''},
  {'tag': 'weather_faq',
   'patterns': ["What's the weather like 

Import segala library yang dibutuhkan, seperti nltk, numpy, keras, dll.

In [5]:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
import json
import pickle
import tensorflow
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizer_v2.gradient_descent import SGD
import random

In [6]:
!pip install nlpaug

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0m

In [7]:
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
import nlpaug.flow as nafc

from nlpaug.util import Action

In [8]:
!pip install -U transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0m

Synonym Augmented

In [9]:
words = []
labels = []
documents = []
ignore_words = ['?', '!', ',','.','-','UNK']
intents = json.loads(data)
#nltk.download('punkt')

# for intent in intents['intents']:
#     for pattern in intent['patterns']:
#       aug = naw.SynonymAug(aug_src='wordnet', lang='eng')
#       augmented_text = aug.augment(pattern)
#       print("Asli:", pattern)
#       print("Augmented:", augmented_text)

Multiple Version Word Augmented

In [10]:
# for intent in intents['intents']:
#     for pattern in intent['patterns']:
#       for i in range(3):
#         #random word augmentation
#         aug = naw.RandomWordAug(action=Action.SWAP)
#         augmented_text = aug.augment(pattern)
#         #word synonym augmentation
#         aug2 = naw.SynonymAug(aug_src='wordnet', lang='eng')
#         augmented_text2 = aug2.augment(augmented_text)
#         #contextual word embedding augmentation
#         aug3 = naw.ContextualWordEmbsAug(model_path='distilbert-base-uncased-distilled-squad', aug_p=0.1)
#         augmented_text3 = aug3.augment(augmented_text2)
#         print("Asli:", pattern)
#         print("Augmented:", augmented_text)
#         print("Augmented:", augmented_text2)
#         print("Augmented:", augmented_text3)

In [None]:
!pip install git+https://github.com/PrithivirajDamodaran/Parrot_Paraphraser.git

In [None]:
!pip install --upgrade pip

In [None]:
!pip install sentencepiece

In [None]:
!pip install transformers[sentencepiece]

In [None]:
!pip install tokenizers

In [33]:
from parrot import Parrot
import torch
import warnings
warnings.filterwarnings("ignore")

def random_state(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

random_state(1234)

#Init models (make sure you init ONLY once if you integrate this to your code)
parrot = Parrot(model_tag="prithivida/parrot_paraphraser_on_T5", use_gpu=True)

In [58]:
def convertTuple(tup):
        # initialize an empty string
    str = ''
    for item in tup:
        str = str + item
    return str

for intent in intents['intents']:
    for pattern in intent['patterns']:
      
      print("Input phrase: ", pattern)
      para_phrases = parrot.augment(input_phrase=pattern, use_gpu=True, do_diverse=True)
      for paraphrase in para_phrases or []:                                  
        print("Paraphrase:", convertTuple(paraphrase[0])) 
      print("-"*100)

Input phrase:  Hi
Paraphrase: Hi
----------------------------------------------------------------------------------------------------
Input phrase:  How are you
Paraphrase: how are you doing?
Paraphrase: how are you?
----------------------------------------------------------------------------------------------------
Input phrase:  Is anyone there?
Paraphrase: is there anyone?
Paraphrase: who is there?
Paraphrase: is anyone here?
Paraphrase: is there anyone here?
Paraphrase: is anyone there?
Paraphrase: is there anyone there?
----------------------------------------------------------------------------------------------------
Input phrase:  Hello
Paraphrase: Hello
----------------------------------------------------------------------------------------------------
Input phrase:  Good day
----------------------------------------------------------------------------------------------------
Input phrase:  Whats up
Paraphrase: what's the deal?
--------------------------------------------------

In [49]:
phrases = ["Can you recommed some upscale restaurants in Rome?"]
for phrase in phrases:
  print("-"*100)
  print(phrase)
  print("-"*100)
  para_phrases = parrot.augment(input_phrase=phrase,
                                use_gpu=True,
                                diversity_ranker="levenshtein",
                                do_diverse=True, 
                                max_return_phrases = 10, 
                                max_length=32, 
                                adequacy_threshold = 0.99, 
                                fluency_threshold = 0.90)
  for paraphrase in para_phrases or []:                                  
      print(paraphrase)    

----------------------------------------------------------------------------------------------------
Can you recommed some upscale restaurants in Rome?
----------------------------------------------------------------------------------------------------


In [44]:
!pip install googletrans==3.1.0a0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting googletrans==3.1.0a0
  Downloading googletrans-3.1.0a0.tar.gz (19 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting httpx==0.13.3
  Downloading httpx-0.13.3-py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.1/55.1 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==0.9.*
  Downloading httpcore-0.9.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.6/42.6 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting rfc3986<2,>=1.3
  Downloading rfc3986-1.5.0-py2.py3-none-any.whl (31 kB)
Collecting sniffio
  Downloading sniffio-1.2.0-py3-none-any.whl (10 kB)
Collecting hstspreload
  Downloading hstspreload-2022.8.1-py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
[?25hColl

In [None]:
from googletrans import Translator

In [None]:
translator = Translator()

Backtranslation Augmented

In [None]:
for intent in intents['intents']:
    for pattern in intent['patterns']:
      result = translator.translate(pattern, dest='zh-tw')
      result2 = translator.translate(result.text, dest='en')
      print("Asli:", pattern)
      print("Augmented:", result2.text)

In [None]:
words = []
labels = []
documents = []
ignore_words = ['?', '!', ',','.','-','[',']']
intents = json.loads(data)
#nltk.download('punkt')

for intent in intents['intents']:
    for pattern in intent['patterns']:
        #tokenize each word
        w = nltk.word_tokenize(pattern)
        words.extend(w)
        #add documents in the corpus
        documents.append((w, intent['tag']))
        # add to our classes list
        if intent['tag'] not in labels:
            labels.append(intent['tag'])
print(documents)

In [None]:
#nltk.download('omw-1.4')
#nltk.download('wordnet')
lemmatizer = WordNetLemmatizer()
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))
print(words)

In [None]:
labels = sorted(list(set(labels)))
# documents = combination between patterns and intents
print (len(documents), "documents")
# labels = intents
print (len(labels), "labels", labels)
# words = all words, vocabulary
print (len(words), "unique lemmatized words", words)
pickle.dump(words,open('texts.pkl','wb'))
pickle.dump(labels,open('labels.pkl','wb'))

# Training Data

In [None]:
# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(labels)
# training set, bag of words for each sentence
for doc in documents:
    # initialize our bag of words
    bag = []
    # list of tokenized words for the pattern
    pattern_words = doc[0]
    # lemmatize each word - create base word, in attempt to represent related words
    pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
    # create our bag of words array with 1, if word match found in current pattern
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)
    
    # output is a '0' for each tag and '1' for current tag (for each pattern)
    output_row = list(output_empty)
    output_row[labels.index(doc[1])] = 1
    
    training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)

#print(training)
# create train and test lists. X - patterns, Y - intents
train_x = list(training[:,0])
train_y = list(training[:,1])

# print(train_x)
# print(train_y) 
print("Training data created")

In [None]:
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
#fitting and saving the model 
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save('model_Zyo.h5', hist)
print("model created")