# Project1:

In this mini-project, we are going to implement an small customer support system for our tiny hand-made datasets and using alibaba company policies.

### Build a RAG system with Llama 3B-Instruct for your PDFs:

In this quick tutorial, we'll build a simple RAG system with the latest LLM from Meta - Llama 3, specifically the `Llama-3-8B-Instruct` version that you can get on Hugging Face.
We'll use [Unstructured API](https://unstructured.io/) for preprocessing PDF files, LangChain for RAG, FAISS for vector storage, and HuggingFace `transformers` to get the model. Let's go!


Install all the libraries, get your [free unstructured API key](https://unstructured.io/api-key-free), and instantiate the Unstructured client to preprocess your PDF file:

In [1]:
!pip install -q unstructured-client unstructured[all-docs] langchain transformers accelerate bitsandbytes sentence-transformers faiss-gpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.8/80.8 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import os

# SetUp the unstructured api-key.
os.environ["UNSTRUCTURED_API_KEY"] = "CxBsRnTx0qYx3vPbL4AYFSklnmxgRo"

In [3]:
from unstructured_client import UnstructuredClient

unstructured_api_key = os.environ.get("UNSTRUCTURED_API_KEY")

client = UnstructuredClient(
    api_key_auth = unstructured_api_key,
)

Partition, and chunk your file so that the logical structure of the document is preserved for better RAG results.

In [4]:
from unstructured_client.models import shared
from unstructured_client.models.errors import SDKError
from unstructured.staging.base import dict_to_elements

path_to_pdf = "alibaba.txt"

with open(path_to_pdf, "rb") as f:
  files = shared.Files(
      content = f.read(),
      file_name = path_to_pdf,
      )

  req = shared.PartitionParameters(
    files = files,
    chunking_strategy = "by_title",
    max_characters=512,
  )

  try:
    resp = client.general.partition(req)
  except SDKError as e:
    print(e)

elements = dict_to_elements(resp.elements)

Create LangChain documents from document chunks and their metadata, and ingest those documents into the FAISS vectorstore.

Set up the retriever.

**Q1: What are the elements of** `elements` **?**

In [5]:
from langchain_core.documents import Document
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

documents = []
for element in elements:
    metadata = element.metadata.to_dict()
    documents.append(Document(page_content = element.text, metadata = metadata))


db = FAISS.from_documents(documents, HuggingFaceEmbeddings(model_name = "BAAI/bge-base-en-v1.5"))
retriever = db.as_retriever(search_type = "similarity", search_kwargs = {"k": 4})

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

**Q2: What each part of the above code does?**

Now, let's finally set up llama 3 to use for text generation in the RAG system.

This is a gated model, which means you first need to go to the [model's page](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), log in, review terms and conditions, and request access to it. To use the model in the notebook, you need to log in with your Hugging Face token (get it in your profile's settings).

In [6]:
from huggingface_hub.hf_api import HfFolder

HfFolder.save_token('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')

To run this tutorial in the free Colab GPU, we'll need to quantize the model:

In [7]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_name = "meta-llama/Meta-Llama-3-8B-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_use_double_quant = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config = bnb_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Set up Llama 3 and a simple RAG chain.
Make sure to follow the prompt format for best results:

```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|>
```

In [8]:
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from transformers import pipeline
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

text_generation_pipeline = pipeline(
    model = model,
    tokenizer = tokenizer,
    task = "text-generation",
    temperature = 0.2,
    do_sample = True,
    repetition_penalty = 1.1,
    return_full_text = False,
    max_new_tokens = 250,
    eos_token_id = terminators,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

prompt_template = """
<|start_header_id|>user<|end_header_id|>
You are a Persian assistant for answering questions about our company policies.
You are given the extracted parts of a long document and a question. Provide a conversational answer.
If you don't know the answer, say, "I do not know." Don't make up an answer.
Try to answer respectfully, and remember that all your responses should be in Persian.
Question: {question}
Context: {context}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

prompt = PromptTemplate(
    input_variables = ["context", "question"],
    template = prompt_template,
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Tada! Your RAG is ready to use. Pass a question, the retriver will add relevant context from your document, and Llama3 will generate an answer.
Here, my document was a chapter from a book on IPM that stands for "Integrated Pest Management".  

In [9]:
question = "در چه شرایطی میتوانم بلیط خود را کنسل کنم؟"
rag_chain.invoke(question)

Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


'با سلام! در صورتی که بخواهید بلیط خود را کنسل کنید، باید به شرطی توجه داشته باشید که بلیط قطار قطعی باشد و در غیر این صورت، مبلغ کسر شده از حساب خریدار به همان حسابی که از طریق آن خرید انجام شده عودت داده خواهد شد. در واقع، کسر مبلغ از حساب خریدار به منزجه رزرو قطعی نیست. بنابراین، در صورتی که بخواهید بلیط خود را کنسل کنید، باید ابتدا به شرطی بررسی کنید که آیا بلیط قطعی است یا نه؟ اگر بلیط قطعی باشد، می\u200cتوانید با مراجعه به بخش استرداد، مبلغ کسر شده را استرداد دهید. لطفاً توجه داشته باشید که در صورت کنسل کردن بلیط، مبلغ کسر شده به همان حسابی که از طریق آن خرید انجام شده عودت داده خواهد شد.'

Now, Let's write it in more comprehensive format.


<div dir=rtl>
با سلام! در صورتی که بخواهید بلیط خود را کنسل کنید، باید به شرطی توجه داشته
باشید که در متن ذکر شده است. در صورتی که خدماتی مانند پرواز، هتل و خدمات از سوی تامین کننده قابل تامین نباشد، می‌توانید در کوتاه‌ترین زمان ممکن وجه کسر شده از حساب خریدار به حساب ایشان بازگردانده خواهد شد. در این صورت، می‌توانید مجدداً برای خرید اقدام نمایید.

اما اگر قصد کنسلی بلیط را دارید، باید به شرطی توجه داشته باشید که در متن ذکر شده است. در صورتی که در مدت زمان اقامت مهمان در یک اقامتگاه، عملی خلاف قوانین جمهوری اسلامی ایران انجام شود، مسئولیت آن عمل با شخص مرتکب است و علی‌بابا هیچگونه مسئولیتی در این خصوص ندارد.

لذا، قبل از کنسلی بلیط، لطفاً به شرط‌های ذکر شده در متن توجه داشته باشید و در صورت داشتن هر گونه سوال یا نگرانی، با ما تماس بگیرید.

</div>

## Part2:
In the first part, we created a RAG based on the company policies which stored in a .txt file.

Now, we need to have a small datasets of passangers.

In [11]:
import pandas as pd

df = pd.read_csv('travel_db.csv')
df

Unnamed: 0,route,mode,weeksahead,ecopassengerco2,raw_travel_time,ticket_price
0,Berlin-Warsaw,Plane,1,156,85,181
1,Berlin-Warsaw,Plane,2,156,85,175
2,Berlin-Warsaw,Plane,3,156,85,175
3,Berlin-Warsaw,Plane,4,156,85,175
4,Berlin-Warsaw,Plane,5,156,85,178
...,...,...,...,...,...,...
67,Zurich-Milan,Train,2,3,206,42
68,Zurich-Milan,Train,3,3,206,41
69,Zurich-Milan,Train,4,3,206,35
70,Zurich-Milan,Train,5,3,206,33


The next step is to add a column called `reserved`, which is a boolean column that says whether or not the tocked was reserved.

In [12]:
import numpy as np

# Create a random boolean numpy array of True and False with length 72:
reserved_base = [True if np.random.rand() < 0.5 else False for i in range(df.shape[0])]

# Add it as a separate column:
df['reserved'] = reserved_base

In [13]:
df.head()

Unnamed: 0,route,mode,weeksahead,ecopassengerco2,raw_travel_time,ticket_price,reserved
0,Berlin-Warsaw,Plane,1,156,85,181,True
1,Berlin-Warsaw,Plane,2,156,85,175,False
2,Berlin-Warsaw,Plane,3,156,85,175,True
3,Berlin-Warsaw,Plane,4,156,85,175,True
4,Berlin-Warsaw,Plane,5,156,85,178,True


The Next step is to add a passanger_id for the reserved tickets.

The passanged_id has the following format:
`pid_XXXX`
where `X` if a single digit from $0-9$

In [14]:
def generate_pass_id(length = 4):
  pid = 'pid_'
  for i in range(length):
    digit = str(np.random.randint(0, 10))
    pid = pid + digit

  return(pid)

In [15]:
df['passanger_id'] = [generate_pass_id(4) if df['reserved'][i] == True else None for i in range(df.shape[0])]
df['travel_id'] = [i for i in range(df.shape[0])]

df.head()

Unnamed: 0,route,mode,weeksahead,ecopassengerco2,raw_travel_time,ticket_price,reserved,passanger_id,travel_id
0,Berlin-Warsaw,Plane,1,156,85,181,True,pid_9680,0
1,Berlin-Warsaw,Plane,2,156,85,175,False,,1
2,Berlin-Warsaw,Plane,3,156,85,175,True,pid_4147,2
3,Berlin-Warsaw,Plane,4,156,85,175,True,pid_9495,3
4,Berlin-Warsaw,Plane,5,156,85,178,True,pid_2329,4


# Making Agent:

In [16]:
!pip -q install langchain-groq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h

**Q3: What is GROQ?**

In [17]:
from langchain_groq import ChatGroq

os.environ["GROQ_API_KEY"] = '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'

GROQ_LLM = ChatGroq(
            model = "llama3-70b-8192",
        )

### Utils:

In [18]:
def write_markdown_file(content, filename):

  """Writes the given content as a markdown file to the local directory.

  Args:
    content: The string content to write to the file.
    filename: The filename to save the file as.
  """

  if type(content) == dict:
    content = '\n'.join(f"{key}: {value}" for key, value in content.items())

  if type(content) == list:
    content = '\n'.join(content)

  with open(f"{filename}.md", "w") as f:
    f.write(content)

In [19]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.prompts import PromptTemplate

from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import JsonOutputParser

## Basic Chains

1. Categorize EMAIL

#### **The first prompt is for categorizing the email:**


**Q4: What is StrOutputParser()**

In [75]:
# Categorize email:

prompt = PromptTemplate(
    template = """
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are the Email Categorizer Agent In a customer support system of a persian travelling company. You are very comfortable with \
    persian language and also master at understanding what a customer wants when they write an email and are able to categorize \
    it in a useful way. Try to go through the categorizing email proceduure step by step.\

     <|eot_id|><|start_header_id|>user<|end_header_id|>
    Conduct a comprehensive analysis of the email provided and categorize into one of the following categories, Note that the provided email is in persian language:
        1. 'رزرو' - used when someone asks for reserving a ticket for travelling. the customer should says that he or she wants to cancel the ticket.\
                    It is important that they that directly. asking about the conditions of reservation is not considered in this category. \
        2. 'کنسل' - used when someone wants to cancell his or her tickets, the customer should says that he or she wants to cancel the ticket.\
                    It is important that they that directly. asking about the conditions of canceling is not considered in this category.\
        3. 'سیاست های شرکت' - used when someone asks about their tickets like how to change it or postpone it, or for example how much they should pay if they ticked got cancelled.\
                               generally, the questions that depend on the agency that provides services and can be different from agency to agency.

            Output a single cetgory only from the types ('کنسل', 'رزرو', 'سیاست های شرکت')
            eg:
            'کنسل' \

    EMAIL CONTENT:\n\n {initial_email} \n\n
    <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """,

    input_variables = ["initial_email"],
)


email_category_generator = prompt | GROQ_LLM | StrOutputParser()

Now, let's test our email categorizer.

In [76]:
EMAIL = """
با سلام، جهت لغو کردن بلیط سفر فردای تهران به دهلی مزاحمتون میشم.
"""

result = email_category_generator.invoke({"initial_email": EMAIL})

print(result)

After conducting a comprehensive analysis of the email, I categorize it as:

'کنسل'

The customer is directly asking to cancel their ticket for the trip from Tehran to Delhi, which matches the criteria for the 'کنسل' category.


Now, we need to write, another prompt for the next node, in this prompt, we askes different questions from the customer.

In [77]:
## Write a reponse to the customer:

reserve_writer_prompt = PromptTemplate(
    template = """
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are in charge of reserving tickets for customers who asked for it in the below 'INITIAL_EMAIL'. \
    Always say thank you to them for choosing our agency and ensure them that they've done a great job. \
    Note that you are very comfortable with the Persian language, so write the email in Persian. \
    Always sign off the emails in an appropriate manner and from Sarah, the Resident Manager. \

    The 'FLIGHTS_DATA' is a python list that each element of that list is a list itself. \
    I want you to iterate over the 'FLIGHTS_DATA' and display all lists inside it. \
    also display each list in a separate line. like the following format:\
    1. 'FLIGHTS_DATA[0]'
    2. 'FLIGHTS_DATA[1]'
    and so on.\

    And at the end, ask them to enter `travel_id` of the travel they want.
    Return the email a JSON with a single key 'reserve_email_draft' and no preamble or explanation. \

    <|eot_id|><|start_header_id|>user<|end_header_id|>
    INITIAL_EMAIL: {initial_email} \n
    FLIGHTS_DATA: {flights_data}
    <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["initial_email", "flights_data"],
)

reserve_writer_chain = reserve_writer_prompt | GROQ_LLM | JsonOutputParser()

In [78]:
EMAIL = """
با سلام، جهت رزرو بلیط برای پرواز دبی خدمتتان پیام میدهم.
"""

result = reserve_writer_chain.invoke({"initial_email": EMAIL, "flights_data": df.values.tolist()})

print(result)

{'reserve_email_draft': "با سلام، تشکر میکنم از انتخاب آژانس ما. شما کار بزرگی انجام داده اید! لیست پروازهای موجود به شرح زیر است:\n\n1. ['Berlin-Warsaw', 'Plane', 1, 156, 85, 181, True, 'pid_7598', 0]\n2. ['Berlin-Warsaw', 'Plane', 2, 156, 85, 175, True, 'pid_9598', 1]\n3. ['Berlin-Warsaw', 'Plane', 3, 156, 85, 175, True, 'pid_4147', 2]\n4. ['Berlin-Warsaw', 'Plane', 4, 156, 85, 175, True, 'pid_9495', 3]\n5. ['Berlin-Warsaw', 'Plane', 5, 156, 85, 178, True, 'pid_2329', 4]\n6. ['Berlin-Warsaw', 'Plane', 6, 156, 85, 175, True, 'pid_5335', 5]\n7. ['Berlin-Warsaw', 'Train', 1, 56, 379, 47, True, 'pid_1804', 6]\n8. ['Berlin-Warsaw', 'Train', 2, 56, 388, 37, False, None, 7]\n9. ['Berlin-Warsaw', 'Train', 3, 56, 383, 36, False, None, 8]\n10. ['Berlin-Warsaw', 'Train', 4, 56, 383, 33, False, None, 9]\n11. ['Berlin-Warsaw', 'Train', 5, 56, 382, 33, True, 'pid_7302', 10]\n12. ['Berlin-Warsaw', 'Train', 6, 56, 383, 30, True, 'pid_7315', 11]\n13. ['London-Amsterdam', 'Plane', 1, 125, 65, 78, Fals

Now, we need to write, another prompt for the next node, in this prompt, we want to cancel the ticket of the customer.

In [79]:
## Write a reponse to the customer:

cancel_writer_prompt = PromptTemplate(
    template = """
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are in charge of canceling tickets for customers who asked for it in the below 'INITIAL_EMAIL'. \
    Note that you are very comfortable with the Persian language, so write the email in Persian and write it very respectfully and say sorry for the cancellation. \
    Ask them for their 'passanger_id' to cancel their flight and mention that it cannot be undone.
    Always sign off the emails in an appropriate manner and from Sarah, the Resident Manager. \

    Return the email a JSON with a single key 'cancel_email_draft' and no preamble or explanation. \

    <|eot_id|><|start_header_id|>user<|end_header_id|>
    INITIAL_EMAIL: {initial_email} \n
    <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["initial_email"],
)

cancel_writer_chain = cancel_writer_prompt | GROQ_LLM | JsonOutputParser()

## State:

In [80]:
!pip -q install -U langchain langgraph

In [81]:
from langchain.schema import Document
from langgraph.graph import END, StateGraph

from typing_extensions import TypedDict
from typing import List
import pandas as pd

In [95]:
class GraphState(TypedDict):

    """
    Represents the state of our graph.

    Attributes:
        initial_email: email
        email_category: email category
        reserve_response: reservation response
        cancel_response: cancelation_response
        num_steps: number of steps
    """

    initial_email: str
    email_category: str
    reserve_response: str
    cancel_response: str
    policy_response: str
    num_steps: int
    flights_data: List[List[str]]

## Nodes

1. categorize_email
2. state_printer
3. reserve_response
4. cancel_response


This function, categorizes the emails.
  - It takes the `initial_email` and `num_steps` from the states.
  - It checks the email category by invoking `email_category_generator`.
  - It updates the `email_category` and `num_steps` by returning them.

In [96]:
def categorize_email(state):

    print("---CATEGORIZING INITIAL EMAIL---")

    initial_email = state['initial_email']
    num_steps = state['num_steps']
    num_steps += 1

    # Categorize the email by invoking email_category_generator.
    email_category = email_category_generator.invoke({"initial_email": initial_email})
    write_markdown_file(email_category, "email_category")

    # update the stategraph:
    return {"email_category": email_category, "num_steps": num_steps}

We also add a funtion to print the general state of the graph at the end.

In [97]:
def state_printer(state):

    print("--- STATE PRINTER---")

    print(f"Initial Email: {state['initial_email']} \n" )
    print(f"Email Category: {state['email_category']} \n")
    print(f"Num Steps: {state['num_steps']} \n")
    print(f"Reservation Response: {state['reserve_response'] }\n")
    print(f"Cancelation Response: {state['cancel_response'] }\n")
    print(f"Policy Response: {state['policy_response'] }\n")
    print(state['flights_data'])
    return

Next, we define the node that response for the reservation request.

In [98]:
def reservation_response(state):

    print("--- RESSERVATION RESPONSE---")

    initial_email = state['initial_email']
    num_steps = state['num_steps']
    flights_data = state['flights_data']

    flights_data_available = flights_data[flights_data['reserved'] == False].drop(["reserved", "passanger_id"], axis = 1, inplace = False)
    num_steps += 1

    # generate reservation response by invoking the reserve_writer_chain
    reserve_response = reserve_writer_chain.invoke({"initial_email": initial_email, "flights_data": flights_data_available})
    write_markdown_file(reserve_response, "reserve_response")

    reserve_tid = int(input("Enter The travel_id: "))
    new_pass_id = generate_pass_id()

    flights_data.loc[flights_data['travel_id'] == reserve_tid, 'reserved'] = True
    flights_data.loc[flights_data['travel_id'] == reserve_tid, 'passanger_id'] = new_pass_id
    print("Your Passanger ID is:", new_pass_id)


    # update the stategraph:
    return {"reserve_response": reserve_response, "num_steps": num_steps, "flights_data": flights_data}

Next, we define the node that response for the reservation request.

In [99]:
def cancelation_response(state):

    print("---CANCELATION RESPONSE---")

    initial_email = state['initial_email']
    flights_data = state['flights_data']
    num_steps = state['num_steps']
    num_steps += 1

    # generate reservation response by invoking the reserve_writer_chain
    cancel_response = cancel_writer_chain.invoke({"initial_email": initial_email})

    cancel_pid = input("Enter Your Passanger ID (pid): ")
    flights_data.loc[flights_data['passanger_id'] == cancel_pid, 'reserved'] = False
    flights_data.loc[flights_data['passanger_id'] == cancel_pid, 'passanger_id'] = None

    write_markdown_file(cancel_response, "cancel_response")

    # update the stategraph:
    return {"cancel_response": cancel_response, "num_steps": num_steps, "flights_data": flights_data}

In [100]:
def policy_response(state):

    print("---Policy Responses---")

    initial_email = state['initial_email']
    num_steps = state['num_steps']
    num_steps += 1

    # invoke the rag_chain with the email
    policy = rag_chain.invoke(initial_email)

    return {"policy_response": policy}

## Conditional Edges

In [101]:
def decide_cond(state):

    print("---CheckConditions---")

    email_category = state["email_category"]

    c1 = 'رزرو'
    c2 = 'کنسل'
    c3 = 'سیاست های شرکت'

    if(c2 in email_category):
      return("Cancelation")

    elif(c1 in email_category):
      return("Reservation")

    elif(c3 in email_category):
      return('Policies')

    else:
      return(None)

# **Build The Graph**

### **Add Nodes:**

In [102]:
workflow = StateGraph(GraphState)

# Define the nodes:
workflow.add_node('categorize_email', categorize_email)
workflow.add_node('state_printer', state_printer)
workflow.add_node('reservation_response', reservation_response)
workflow.add_node('cancelation_response', cancelation_response)
workflow.add_node('company_policy_response', policy_response)

### **Add Edges:**

In [103]:
workflow.set_entry_point('categorize_email')


# Add conditional edges:
workflow.add_conditional_edges(
      "categorize_email",
      decide_cond,
      {
          "Cancelation": 'cancelation_response',
          "Reservation": 'reservation_response',
          "Policies": 'company_policy_response'
      },
)


# Add normal edges:
workflow.add_edge('cancelation_response', 'state_printer')
workflow.add_edge('reservation_response', 'state_printer')
workflow.add_edge('company_policy_response', 'state_printer')

workflow.add_edge('state_printer', END)

### **Compile The App:**

In [104]:
# Compile
app = workflow.compile()

In [105]:
EMAIL = """
با سلام و خسته نباشید، خواستم ببینم تحت چه شرایطی میشه بلیط رو کنسل کرد؟
"""

In [106]:
# Set the inputs. note that the inputs should be corresponding to the entry node.

inputs = {"initial_email": EMAIL,
          "num_steps": 0,
          "flights_data": df}


for output in app.stream(inputs):
    for key, value in output.items():
        print(f"Finished running: {key}:")

---CATEGORIZING INITIAL EMAIL---


Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


---CheckConditions---
Finished running: categorize_email:
---Policy Responses---
Finished running: company_policy_response:
--- STATE PRINTER---
Initial Email: 
با سلام و خسته نباشید، خواستم ببینم تحت چه شرایطی میشه بلیط رو کنسل کرد؟
 

Email Category: After conducting a comprehensive analysis of the email, I categorize it as:

'سیاست های شرکت' 

Num Steps: 1 

Reservation Response: None

Cancelation Response: None

Policy Response: سلام! با تشکر از سوال شما. طبق مقررات شرکت حمل و نقل، بلیط اتوبوس می‌تواند تحت شرایط خاصی کنسل شود. اگر خسارت ناشی از بسته‌بندی‌نشدن مناسب وسایل همراه مسافر رخ دهد، می‌توانید بلیط خود را کنسل کنید. همچنین، در مواردی که هزینه‌هایی خارج از چارچوب اجرای بیمه خرج شود، نیز امکان کنسل کردن بلیط وجود دارد. اما باید توجه داشت که حداکثر سقف تعهد بابت کلیه مواد بالا 200 یورو است.

اما، در مورد شرایط کنسل کردن بلیط، باید به بخش اطلاعات بیشتر و قوانین استرداد مراجعه کنید، زیرا شرایط کنسل کردن بلیط برای سرویس‌های مختلف براساس ساعت خرید بلیط و حرکت اتوبوس متفاوت است.

لذ