## Multiple Choice Questions

**1. What is the main difference between Chat models and Text models?** C  
A. Chat models are newer  
B. Chat models return data in a different format than Text models  
C. Chat models can include role prompts in the message  
D. Text models can transmit multiple messages at once  

**2. Which of the following statements about OpenAI Assistant Threads is correct?** B
A. A Thread can only contain one Assistant message  
B. A Thread automatically manages the context window to ensure it does not exceed the model’s context length limit  
C. A Thread will automatically delete after 24 hours  
D. You cannot add new messages to an already created Thread  

**3. When adding a Function to Assistants, what needs to be provided?** C  
A. The function’s Python code  
B. The function’s JavaScript code  
C. The function’s JSON description  
D. The function’s C++ code  

**4. Which of the following operations is possible when using the Code Interpreter for data analysis?** ABC  
A. Data slicing  
B. Data filtering  
C. Data sorting  
D. Data encryption  

**5. What abilities do large language models have in data analysis?** ABCD  
A. Generate text summaries  
B. Read data  
C. Create visualizations  
D. Generate meaningful insights  

**6. What steps can be included in constructing a local document Q&A system?** ABCDE  
A. Loading: The document loader loads documents into a format that LangChain can read  
B. Splitting: The text splitter splits documents into specified-size segments, i.e., "document blocks" or "document pieces"  
C. Storage: The split "document blocks" are stored in vector databases as "embedded pieces"  
D. Retrieval: The system retrieves the split documents from storage  
E. Synthesis: The question and similar embedded pieces are passed to the language model to generate an answer  

**7. Which of the following are the basic principles of prompt engineering mentioned in the document?** ABD  
A. Write clear instructions  
B. Provide reference materials  
C. Increase the model’s training data volume  
D. Divide and simplify  

**8. What does the “Few-Shot Learning” concept in prompt engineering refer to?** B  
A. Learning from a large number of samples  
B. Learning and generalizing new tasks from a very small number of data samples  
C. Learning from only one sample  
D. Learning new tasks without any samples  

**9. If you want to implement few-shot prompting, which prompt template should you choose?** C  
A. PromptTemplate  
B. ChatPromptTemplate  
C. FewShotPromptTemplate  
D. PipelinePromptTemplate  

**10. What specific content should be included in a CoT prompt template?** ABC  
A. Description of the AI’s role and goal  
B. Explanation of the chain of thought  
C. Examples that follow the chain of thought  
D. Search for multiple thinking paths  

**11. Scenario: You are building a smart writing assistant that can generate an article based on user-provided prompts. You are using GPT-3 as the language model. The output of GPT-3 is raw text, and you need to convert it into structured JSON format so that your application can further process it.**  
**Question: In this scenario, which LangChain output parser would you choose?** AC  
A. PydanticOutputParser  
B. XMLOutputParser  
C. StructuredOutputParser  
D. CommaSeparatedListOutputParser  

**12. Which type of memory can both remember summaries of conversations from many turns ago and retain the raw content from the most recent turns?** D  
A. ConversationBufferMemory  
B. ConversationBufferWindowMemory  
C. ConversationSummaryMemory  
D. ConversationSummaryBufferMemory  

**13. In what aspect does GPT-4 have significant capability improvements compared to previous GPT models?** C  
A. English text processing  
B. Coding tasks  
C. Audio processing  
D. Image recognition  

**14. The working principle of RAG can be summarized as which of the following steps?** ABC  
A. Retrieval: The model first uses a retrieval system to find relevant documents or paragraphs from a large document collection for the given input (question)  
B. Context Encoding: After finding relevant documents or paragraphs, the model encodes them along with the original input (question)  
C. Generation: The model generates an output (answer) using the encoded context information  
D. Check: Use the search function to verify the accuracy of the final information  

**15. Which type of agent is considered a simulation agent (role-playing in a simulated environment, attempting to simulate specific scenarios or behaviors)?** C  
A. AutoGPT  
B. BabyAGI  
C. CAMEL  
D. Hugging

**16. Which of the following statements is correct?** ACD  
A. AutoGPT can automatically link multiple tasks to achieve a large goal  
B. BabyAGI is characterized by integrating multimodal perception capabilities to handle multiple AI tasks  
C. HuggingGPT selects appropriate expert models from Hugging Face to execute tasks  
D. AutoGPT, BabyAGI, and HuggingGPT are all self-driven (autonomous) agents  

**17. When fine-tuning using the OpenAI API, how is the fine-tuning job cost estimated?** B  
A. Basic cost per 1k Tokens × Token count in the input file  
B. Basic cost per 1k Tokens × Token count in the input file × Number of training Epochs  
C. Token count in the input file × Number of training Epochs  
D. Token count in the input file * Number of training Epochs  

**18. What is the main effect achieved by quantization technology?** C  
A. Improve model accuracy  
B. Increase model parameter size  
C. Reduce model size  
D. Increase training time  

**19. Which architecture does the Sora model use?** C  
A. U-Net  
B. GAN  
C. Diffusion Transformer (DiT)  
D. RNN  

**20. Which of the following are examples of multimodal interaction tasks?** ABCD  
A. Text to speech  
B. Speech to text  
C. Text to image  
D. Image to video  

## Practical Questions

**Project**: Internal Employee Knowledge Base Q&A System.

**Project Description**: "Floral" is a large-scale online flower sales platform with its own business processes and standards, as well as Standard Operating Procedure (SOP) manuals for employees. Relevant information is shared during new employee onboarding training. However, this information is scattered across various internal websites and directories of the HR department, making it inconvenient to access at times. Additionally, employees may struggle to find the desired content promptly due to lengthy documents, and sometimes, company policies are updated while employees still have outdated document versions.

To address these needs, we will develop a "Doc-QA" system based on various internal knowledge manuals.

This question-and-answer system will understand employees' inquiries and provide precise answers based on the latest employee manuals.

**Prepared Data:**

Internal data includes various files in PDF, Word, and TXT formats. These have already been provided.

LangChain  
1. Data Sources  
2. LLM app  
3. Use-Cases


In [None]:
# # Install required packages
# !pip install langchain openai faiss-cpu unstructured pypdf python-docx
# !pip install -U langchain langchain-community
# !pip install unstructured pdfminer.six python-docx faiss-cpu
# !pip install python-dotenv


In [12]:
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

from langchain_community.document_loaders import PyPDFLoader, UnstructuredWordDocumentLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

base_path = 'docs'

# Loop through files and load based on file extension
for file in os.listdir(base_path):
    file_path = os.path.join(base_path, file)

    if file.endswith(".pdf"):
        loader = PyPDFLoader(file_path)
    elif file.endswith(".docx"):
        loader = UnstructuredWordDocumentLoader(file_path)
    elif file.endswith(".txt"):
        loader = TextLoader(file_path, encoding="utf-8")
    else:
        continue  # Skip unsupported file types

    docs = loader.load()
    all_docs.extend(docs)

    

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = splitter.split_documents(all_docs)

embedding_model = OpenAIEmbeddings(openai_api_key=api_key)
vectorstore = FAISS.from_documents(split_docs, embedding_model)

llm = ChatOpenAI(model_name="gpt-4", openai_api_key=api_key)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever(), return_source_documents=True)

vectorstore.save_local("flower_doc_qa_index")
