# Outlines
- Seq2seq Models (Case Study 06)
- Transformers: BERT (Case Study 07)
- Question Answering Vs. Chatbots
- Conclusion
- References

# Seq2seq Model

The Sequence to Sequence model (seq2seq) consists of two RNNs — an encoder and a decoder. The encoder reads the input sequence, word by word and emits a context (a function of final hidden state of encoder), which would ideally capture the essence (semantic summary) of the input sequence. Based on this context, the decoder generates the output sequence, one word at a time while looking at the context and the previous word during each timestep.

![seq2seq Architecture](1_CkeGXClZ5Xs0MhBc7xFqSA.png)

The main objective of a Seq2Seq model is to deal with problems where the input data and the output data encoded as vectors have different dimensions from each other. This is a limitation for the normal Deep Neural Networks that use to work with vectors of same dimension.

Check Case Study 06 (Chatbot): https://colab.research.google.com/drive/1oAoVK3hOQ5n-A6CoKG1Bt2eZ6mG78nYE?usp=sharing



* Advantages of Seq2Seq Model:

     - Variable Input and Output Length
     - Contextual Understanding

* Disadvantages of Seq2Seq Model:

     - Difficulty in Handling Long Sequences
     - Over-Reliance on Encoder
     - Lack of Explanation

# BERT

BERT makes use of Transformer, an attention mechanism that learns contextual relations between words (or sub-words) in a text. In its vanilla form, Transformer includes two separate mechanisms — an encoder that reads the text input and a decoder that produces a prediction for the task. Since BERT’s goal is to generate a language model, only the encoder mechanism is necessary.


![BERT Architecture](BERT-base-architecture.png)


- BERT can be used on a wide variety of language tasks:

  - Sentiment Analysis
  - Question answering
  - Text Classification: Arabic Dialect (AraBERT and MarBERT)
  - Text generation
  - Summarization
  
- How Does BERT Work:

  - Large amount of training data
  - Masked Language Model (MLM): 15% tokens are hidden during the training
  - Next Sentence Prediction (NSP)
  - Transformers (Attention)



Check Case Study 07 (QA System): https://colab.research.google.com/drive/1FYBfGBX6TQQO2wBvXAev7n0RTVbZ6evP?usp=sharing

- Advantages of Transformers over seq2seq:

  - Attention Mechanism
  - Parallel Computation
  - Scalability
  - Context-Aware Encoding
  - Bidirectional Encoding
  - Handling Variable-Length Input/Output
  - Transfer Learning and Fine-Tuning

# Question Answering Vs. Chatbot

- Question Answering (QA):

QA systems are designed to provide specific answers to user questions based on a given context or knowledge base.

The input to a QA system is typically a specific question, and the system's goal is to produce a concise and accurate answer.

QA systems often rely on information retrieval and extraction techniques to find relevant information from a structured or unstructured data source.

QA systems can be task-oriented, focusing on specific domains or knowledge areas, or they can be open-domain, attempting to answer questions on a wide range of topics.

Examples of QA systems include factoid question answering systems (e.g., answering "Who is the president of the United States?") and reading comprehension systems (e.g., answering questions about a given passage).

- Chatbot:

Chatbot systems are designed to simulate human-like conversations and engage in interactive dialogues with users.

Chatbots can handle a wide range of user inputs and respond with contextually relevant and meaningful replies.

The goal of a chatbot is to provide an interactive conversational experience and assist users with various tasks, provide information, or engage in casual conversations.

Chatbots use natural language understanding and generation techniques, and they can be rule-based, retrieval-based, or generative-based depending on the underlying architecture and approach.

Chatbots can be designed for specific domains or operate as open-domain conversational agents.

Examples of chatbots include customer support chatbots, virtual assistants (e.g., Siri, Alexa), and social chatbots.

# Conclusion

In summary, BERT and Seq2Seq models are widely used in chatbots and QA systems. BERT's strength lies in its ability to understand context and provide accurate responses, while Seq2Seq models excel in generating coherent and relevant text. The use of **prompt engineering** further enhances the performance of these models by allowing developers to guide the output. Overall, these advancements in models and techniques have paved the way for more advanced and customizable chatbots and QA systems.

# References

- https://cnvrg.io/seq2seq-model/
- https://blog.suriya.app/2016-06-28-easy-seq2seq/
- https://vvsmanideep.medium.com/interview-chat-bot-using-seq2seq-model-fe9059fffe64
- https://towardsdatascience.com/generative-chatbots-using-the-seq2seq-model-d411c8738ab5
- https://arxiv.org/pdf/1706.03762.pdf
- https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
- https://arxiv.org/abs/1810.04805
- https://machinelearningmastery.com/a-brief-introduction-to-bert/
- https://huggingface.co/blog/bert-101#1-what-is-bert-used-for
- https://www.pytorchlightning.ai/blog/how-to-fine-tune-bert-with-pytorch-lightning
- https://sites.aub.edu.lb/mindlab/2020/02/28/arabert-pre-training-bert-for-arabic-language-understanding/
- https://github.com/h9-tect/arabic-lib-transformers
- https://huggingface.co/UBC-NLP/MARBERT
- https://blog.invgate.com/gpt-3-vs-bert#:~:text=While%20GPT%2D3%20only%20considers,sentence%20or%20phrase%20is%20essential.
- https://towardsdatascience.com/question-answering-with-a-fine-tuned-bert-bc4dafd45626
- https://www.kaggle.com/datasets/grafstor/simple-dialogs-for-chatbot