**Project**: Flower Secret Realm: RAG-Powered Internal Q&A System with LangChain & Flask

**Project Description**: "Floral" is a large-scale online flower sales platform with its own business processes and standards, as well as Standard Operating Procedure (SOP) manuals for employees. Relevant information is shared during new employee onboarding training. However, this information is scattered across various internal websites and directories of the HR department, making it inconvenient to access at times. Additionally, employees may struggle to find the desired content promptly due to lengthy documents, and sometimes, company policies are updated while employees still have outdated document versions.

To address these needs, we will develop a "Doc-QA" system based on various internal knowledge manuals.

This question-and-answer system will understand employees' inquiries and provide precise answers based on the latest employee manuals.

**Prepared Data:**

Internal data includes various files in PDF, Word, and TXT formats.

LangChain
1. Data Sources
2. LLM app
3. Use-Cases

In [2]:
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")



In [3]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

In [4]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

In [5]:
import bs4
from langchain import hub
from langchain_core.documents import Document
from langchain.document_loaders import PyPDFLoader, UnstructuredWordDocumentLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents from the document

pdf_loader = PyPDFLoader("document/Flower_Employee_Handbook.pdf")
pdf_docs = pdf_loader.load()

word_loader = UnstructuredWordDocumentLoader("document/Flower_Operations_Guide.docx")
word_docs = word_loader.load()

txt_loader = TextLoader("document/The_Complete_Guide_to_Flower_Language.txt", encoding="utf-8")
txt_docs = txt_loader.load()

docs = pdf_docs + word_docs + txt_docs

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")



In [6]:
# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [7]:
result = graph.invoke({"question": "Tell me the Marriage Leave policy?"})

print(f'Context: {result["context"]}\n\n')
print(f'Answer: {result["answer"]}')

Context: [Document(id='a7c74ee8-8d24-4c4f-9e24-ba1b2c147003', metadata={'producer': '', 'creator': 'WPS 文字', 'creationdate': '2024-02-20T10:33:29+02:33', 'author': 'doc2pdf', 'comments': '', 'company': '', 'keywords': '', 'moddate': '2024-02-20T10:33:29+02:33', 'sourcemodified': "D:20240220103329+02'33'", 'subject': '', 'title': '', 'trapped': '/False', 'source': 'document/Flower_Employee_Handbook.pdf', 'total_pages': 32, 'page': 22, 'page_label': '23'}, page_content="Fresh Flower Group\n3. Sick Leave: Employees requiring sick leave must\nprovide medical certificates and receipts from county-\nlevel or higher hospitals or designated hospitals of the\ncompany. They must inform their department leader and HR\ndepartment orally or in writing before the start of their\nshift. During sick leave, employees will receive 80% of\ntheir basic salary.\n4. Marriage Leave: Employees are entitled to 3 days of\nmarriage leave. Conditions for late marriage (women aged\n23, men aged 25) and first marri

In [8]:
result = graph.invoke({"question": "How many points I can get if I actively cooperate with supervisors?"})

print(f'Context: {result["context"]}\n\n')
print(f'Answer: {result["answer"]}')

Context: [Document(id='1aae51b6-fda1-42c1-925a-a4c88c6f39ff', metadata={'producer': '', 'creator': 'WPS 文字', 'creationdate': '2024-02-20T10:33:29+02:33', 'author': 'doc2pdf', 'comments': '', 'company': '', 'keywords': '', 'moddate': '2024-02-20T10:33:29+02:33', 'sourcemodified': "D:20240220103329+02'33'", 'subject': '', 'title': '', 'trapped': '/False', 'source': 'document/Flower_Employee_Handbook.pdf', 'total_pages': 32, 'page': 24, 'page_label': '25'}, page_content='education, public criticism, probation, dismissal, etc.\n2. Reward Criteria:\n2.1. Those who make significant contributions to the\ncompany\'s operations, management, and service quality\nimprovement are awarded 5-10 points.\n2.2. Employees who perform significantly in management and\nservice work, receive praise from customers and relevant\ndepartments, are verbally praised for 5 points, and receive a\nwritten commendation for an additional 10 points.\n2.3. Those who provide constructive suggestions for improving\nthe co

In [10]:
from flask import Flask, render_template, request

app = Flask(__name__)

@app.route("/", methods=["GET", "POST"])
def index():
    result = None
    if request.method == "POST":
        question = request.form.get("question")
        if question:
            output = graph.invoke({"question": question}) 
            result = {"context": output["context"], "result": output["answer"]}
    return render_template("index.html", result=result)

if __name__ == "__main__":
    app.run(debug=False, port=5001)

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5001
[33mPress CTRL+C to quit[0m
127.0.0.1 - - [16/Feb/2025 21:21:46] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [16/Feb/2025 21:21:46] "[33mGET /static/flower.jpg HTTP/1.1[0m" 404 -
127.0.0.1 - - [16/Feb/2025 21:21:46] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
127.0.0.1 - - [16/Feb/2025 21:22:01] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [16/Feb/2025 21:22:01] "[33mGET /static/flower.jpg HTTP/1.1[0m" 404 -
127.0.0.1 - - [16/Feb/2025 21:22:57] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [16/Feb/2025 21:22:57] "[33mGET /static/flower.jpg HTTP/1.1[0m" 404 -
