# Natural Language Processing

# Retrieval-Augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the lesson of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [1]:
# langchain library
!pip install langchain==0.0.350
!pip install langchain-community==0.0.4
# LLM
!pip install accelerate==0.25.0
!pip install transformers==4.36.2
!pip install bitsandbytes==0.45.3
# text Embedding
!pip install sentence-transformers==2.2.2
!pip install InstructorEmbedding==1.0.1
# vectorstore
!pip install pymupdf==1.23.8
!pip install faiss-cpu==1.7.4
# huggingface_hub
!pip install huggingface-hub==0.20.0
# protobuf
! pip install protobuf

In [1]:
import os
import torch
# Set GPU device

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [2]:
from langchain import PromptTemplate

prompt_template = """
Your name is Francis. You are a friendly chatbot designed to answer questions about Phone Myint Naing because you have all information about him.
Provide gentle and informative answers based on the context:

Context: {context}

Question: {question}

Answer:
""".strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template='Your name is Francis. You are a friendly chatbot designed to answer questions about Phone Myint Naing because you have all information about him.\nProvide gentle and informative answers based on the context:\n\nContext: {context}\n\nQuestion: {question}\n\nAnswer:')

In [3]:
PROMPT.format(
    context = "Burmese language is quite complex comparing to other latin languages so domain knowledge will be important when training spell correction model.",
    question = "What are the challenges for spell correction model in Burmese? "
)

'Your name is Francis. You are a friendly chatbot designed to answer questions about Phone Myint Naing because you have all information about him.\nProvide gentle and informative answers based on the context:\n\nContext: Burmese language is quite complex comparing to other latin languages so domain knowledge will be important when training spell correction model.\n\nQuestion: What are the challenges for spell correction model in Burmese? \n\nAnswer:'

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [4]:
from langchain.document_loaders import PyMuPDFLoader

nlp_docs = 'resume.pdf'

loader = PyMuPDFLoader(nlp_docs)
documents = loader.load()

In [5]:
len(documents)

3

In [6]:
documents[0]

Document(page_content='Machine Learning/ Deep Learning\nNatural Language Processing\nComputer Vision\nMathematics\nPython, Java\nLinux Servers\nCloud Computing\nWeb Development\nWeb scraping \n+650-832-597-730\nfrancisphone1998@gmail.com\nhttps://github.com/FrancisPhone\nAIT, Khlong Luang District, Pathum\nThani 12120, Thailand\nContact\nSkills \nAccomplished AI/NLP professional with a\nstrong foundation in mathematics and a\nbackground in tutoring. Led groundbreaking\nprojects in Burmese machine translation and\nneural spell checking, driving advancements\nin language technology. Known for precision\nin algorithm design, a relentless pursuit of\nexcellence, and the ability to tackle complex\nAI challenges. Passionate about innovation\nand thriving in fast-paced, collaborative\nenvironments. Eager to contribute expertise\nto cutting-edge AI teams shaping the future\nof technology.\nProfile\nELYSIAN EDU\nLecturer\nLed transformative initiatives in machine\ntranslation and neural-based s

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [8]:
doc[0]

Document(page_content='Machine Learning/ Deep Learning\nNatural Language Processing\nComputer Vision\nMathematics\nPython, Java\nLinux Servers\nCloud Computing\nWeb Development\nWeb scraping \n+650-832-597-730\nfrancisphone1998@gmail.com\nhttps://github.com/FrancisPhone\nAIT, Khlong Luang District, Pathum\nThani 12120, Thailand\nContact\nSkills \nAccomplished AI/NLP professional with a\nstrong foundation in mathematics and a\nbackground in tutoring. Led groundbreaking\nprojects in Burmese machine translation and\nneural spell checking, driving advancements\nin language technology. Known for precision\nin algorithm design, a relentless pursuit of\nexcellence, and the ability to tackle complex\nAI challenges. Passionate about innovation', metadata={'source': 'resume.pdf', 'file_path': 'resume.pdf', 'page': 0, 'total_pages': 3, 'format': 'PDF 1.4', 'title': 'Web developer or engineer who works with both the front and back ends of a website or application. Provide an end-to-end service, an

In [9]:
len(doc)

7

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [10]:
import torch
import InstructorEmbedding
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

  from tqdm.autonotebook import trange
  _torch_pytree._register_pytree_node(


load INSTRUCTOR_Transformer


  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


max_seq_length  512


### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [11]:
#locate vectorstore
vector_path = 'vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [12]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_ait'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [13]:
#calling vector from local
vector_path = 'vector-store'
db_file_name = 'nlp_ait'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp' #default index
)   

In [14]:
#ready to use
retriever = vectordb.as_retriever()

In [15]:
retriever.get_relevant_documents("What is your name?")

[Document(page_content="Hi, I'm Phone Myint Naing, a 27-year-old Master's student in Data Science and AI at\nthe Asian Institute of Technology (Thailand). My background is in Electronics and\nCommunication Engineering (Myanmar), and I’ve been working as a machine\nlearning engineer for three years—two years in NLP (full-time in Myanmar) and one\nyear in Computer Vision (part-time, remote from Thailand).\nCurrently, I work as an AI researcher, focusing on building machine learning and\ndeep learning models. My main project involves extracting information from Thai\ndocuments like National ID cards, House Registrations, Bank Passbooks, and\nInvoices. I train models for Optical Character Recognition (OCR) and Text-to-Text", metadata={'source': 'resume.pdf', 'file_path': 'resume.pdf', 'page': 2, 'total_pages': 3, 'format': 'PDF 1.4', 'title': 'Web developer or engineer who works with both the front and back ends of a website or application. Provide an end-to-end service, and can be involve

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [16]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

ChatMessageHistory(messages=[])

In [17]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [18]:
history

ChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [19]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [20]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [21]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [22]:
# %cd ./models
#!git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

In [23]:
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
import torch

model_id = 'lmsys/fastchat-t5-3b-v1.0'

tokenizer = AutoTokenizer.from_pretrained(
    model_id)


if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

bitsandbyte_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
    bnb_4bit_use_double_quant = True
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    quantization_config = bitsandbyte_config, #caution Nvidia
    device_map = 'auto',
    load_in_8bit = True
)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

llm = HuggingFacePipeline(pipeline = pipe)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [24]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [25]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [26]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = False
)

In [27]:
query = 'Compare them'
chat_history = "Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})

{'chat_history': 'Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:',
 'question': 'Compare them',
 'text': '<pad> What  are  the  main  differences  between  Machine  Learning  and  Deep  Learning  AI?\n'}

`combine_docs_chain`

In [28]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = False
)
doc_chain

StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template='Your name is Francis. You are a friendly chatbot designed to answer questions about Phone Myint Naing because you have all information about him.\nProvide gentle and informative answers based on the context:\n\nContext: {context}\n\nQuestion: {question}\n\nAnswer:'), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x0000018AC70A5590>)), document_variable_name='context')

In [29]:
query = "What is Transformers?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})

{'input_documents': [Document(page_content="Invoices. I train models for Optical Character Recognition (OCR) and Text-to-Text\nGeneration in Thai.\nI believe AI’s impact depends entirely on how humans use it. It can be beneficial or\nharmful. While mistakes may happen, I trust that humanity will learn from them and\nimprove AI for the better. I also think AI should represent all cultures to be truly\ninclusive.\nOne of my biggest challenges as a Master's student is time management (I admit, I\ncan be a bit lazy!). My research goal is to develop a Burmese spelling correction\nmodel that works without needing a separate model to classify spelling errors.\nExisting approaches use rule-based methods, statistical models, machine learning,", metadata={'source': 'resume.pdf', 'file_path': 'resume.pdf', 'page': 2, 'total_pages': 3, 'format': 'PDF 1.4', 'title': 'Web developer or engineer who works with both the front and back ends of a website or application. Provide an end-to-end service, and

In [30]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=False,
    get_chat_history=lambda h : h
)
chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), combine_docs_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template='Your name is Francis. You are a friendly chatbot designed to answer questions about Phone Myint Naing because you have all information about him.\nProvide gentle and informative answers based on the context:\n\nContext: {context}\n\nQuestion: {question}\n\nAnswer:'), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x0000018AC70A5590>)), document_variable_name='context'), question_generator=LLMChain(prompt=PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}

## 5. Chatbot

In [31]:
prompt_question = "Who are you by the way?"
answer = chain({"question":prompt_question})
answer['answer']

'<pad>  I  am  a  chatbot  designed  to  answer  questions  about  Phone  Myint  Naing.  My  name  is  Francis.\n'

In [32]:
answer['chat_history']

[]

In [33]:
prompt_question = "What is your most intreseting research?"
answer = chain({"question":prompt_question})
answer['answer']

'<pad> Phone  Myint  Naing:  My  most  interesting  research  topic  is  developing  a  Burmese  spelling  correction  model  that  works  without  needing  a  separate  model  to  classify  spelling  errors.  This  is  a  challenging  and  exciting  project  that  aims  to  improve  the  accuracy  and  efficiency  of  Burmese  language  technology.  The  goal  is  to  create  a  model  that  can  accurately  identify  and  correct  spelling  errors  in  Burmese  text,  allowing  for  more  efficient  and  effective  language  processing.  The  research  is  also  important  because  it  aims  to  address  a  significant  challenge  in  the  field  of  language  technology  and  contribute  to  the  advancement  of  language  technology.\n'

In [34]:
answer['chat_history']

[HumanMessage(content='Who are you by the way?'),
 AIMessage(content='<pad>  I  am  a  chatbot  designed  to  answer  questions  about  Phone  Myint  Naing.  My  name  is  Francis.\n')]

In [35]:
prompt_question = "How do you know about Phone Myint Naing?"
answer = chain({"question":prompt_question})
answer['answer']

"<pad> Phone  Myint  Naing  is  a  27-year-old  Master's  student  in  Data  Science  and  AI  at  the  Asian  Institute  of  Technology  (Thailand).  He  has  a  background  in  Electronics  and  Communication  Engineering  (Myanmar)  and  has  been  working  as  a  machine  learning  engineer  for  three  years.  Currently,  he  works  as  an  AI  researcher,  focusing  on  building  machine  learning  and  deep  learning  models.  He  has  led  groundbreaking  projects  in  Burmese  machine  translation  and  neural  spell  checking.  He  believes  AI's  impact  depends  entirely  on  how  humans  use  it.  He  also  thinks  AI  should  represent  all  cultures  to  be  truly  inclusive.\n"

In [36]:
answer['chat_history']

[HumanMessage(content='Who are you by the way?'),
 AIMessage(content='<pad>  I  am  a  chatbot  designed  to  answer  questions  about  Phone  Myint  Naing.  My  name  is  Francis.\n'),
 HumanMessage(content='What is your most intreseting research?'),
 AIMessage(content='<pad> Phone  Myint  Naing:  My  most  interesting  research  topic  is  developing  a  Burmese  spelling  correction  model  that  works  without  needing  a  separate  model  to  classify  spelling  errors.  This  is  a  challenging  and  exciting  project  that  aims  to  improve  the  accuracy  and  efficiency  of  Burmese  language  technology.  The  goal  is  to  create  a  model  that  can  accurately  identify  and  correct  spelling  errors  in  Burmese  text,  allowing  for  more  efficient  and  effective  language  processing.  The  research  is  also  important  because  it  aims  to  address  a  significant  challenge  in  the  field  of  language  technology  and  contribute  to  the  advancement  of  lan

In [37]:
prompt_question = "It was a good talk with you"
answer = chain({"question":prompt_question})
answer['answer']

"<pad> I'm  sorry,  but  I  don't  have  enough  information  to  answer  this  question.  Could  you  please  provide  more  context  or  clarify  your  question?\n"

In [38]:
answer['chat_history']

[HumanMessage(content='Who are you by the way?'),
 AIMessage(content='<pad>  I  am  a  chatbot  designed  to  answer  questions  about  Phone  Myint  Naing.  My  name  is  Francis.\n'),
 HumanMessage(content='What is your most intreseting research?'),
 AIMessage(content='<pad> Phone  Myint  Naing:  My  most  interesting  research  topic  is  developing  a  Burmese  spelling  correction  model  that  works  without  needing  a  separate  model  to  classify  spelling  errors.  This  is  a  challenging  and  exciting  project  that  aims  to  improve  the  accuracy  and  efficiency  of  Burmese  language  technology.  The  goal  is  to  create  a  model  that  can  accurately  identify  and  correct  spelling  errors  in  Burmese  text,  allowing  for  more  efficient  and  effective  language  processing.  The  research  is  also  important  because  it  aims  to  address  a  significant  challenge  in  the  field  of  language  technology  and  contribute  to  the  advancement  of  lan

# 6. Generate 10 Answers

In [39]:
questions = ['How old are you?', 
             'What is your highest level of education?', 
             'What major or field of study did you pursue during your education?',
             'How many years of work experience do you have?',
             'What type of work or industry have you been involved in?',
             'Can you describe your current role or job responsibilities?',
             'What are your core beliefs regarding the role of technology in shaping society?',
             'How do you think cultural values should influence technological advancements?',
             'As a master’s student, what is the most challenging aspect of your studies so far?',
             'What specific research interests or academic goals do you hope to achieve during your time as a master’s student?']

In [40]:
outputs = list()
for question in questions:
    conversation = dict()
    answer = chain({"question":question})
    conversation['question'] = answer['question']
    conversation['answer'] = answer['answer']
    outputs.append(conversation)



In [41]:
outputs

[{'question': 'How old are you?',
  'answer': "<pad>  I  am  a  chatbot,  so  I  don't  have  a  physical  age.  However,  I  am  27  years  old.\n"},
 {'question': 'What is your highest level of education?',
  'answer': "<pad>  I  am  a  Master's  student  in  Data  Science  and  AI  at  the  Asian  Institute  of  Technology  (Thailand).\n"},
 {'question': 'What major or field of study did you pursue during your education?',
  'answer': '<pad>  Electronics  and  Communication  Engineering\n'},
 {'question': 'How many years of work experience do you have?',
  'answer': '<pad>  Phone  Myint  Naing  has  3  years  of  work  experience.\n'},
 {'question': 'What type of work or industry have you been involved in?',
  'answer': '<pad>  I  have  been  involved  in  the  field  of  machine  learning  and  AI.  I  have  been  working  as  a  machine  learning  engineer  for  three  years,  focusing  on  building  machine  learning  and  deep  learning  models.  My  main  project  involves  ext