# Intentions Roadmap
- Intention 1: Manage Personal Information
- Intention 2: Query for scholarships or international opportunities
- Intention 3: Query for universities and courses
- Intention 4: Matchmaking
- Intention 5: Query previously-made matches
- Intention 6: Leverage RAG (PDFs and Websites)
- Intention 7: Company Information
- Intention (None): Chitchat

In [12]:
from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

In [13]:
load_dotenv()

True

Here are some auxiliar functions to save your synthetic data.

In [14]:
from auxiliar import add_messages

In [15]:
UNI_DATABASE = {
    "Areas": [  "Mathematics",
                "Pilot",
                "Life and Health Sciences",
                "Social Sciences and Humanities",
                "Languages and Translation"
            ],
    "Universities": [    
                "Instituto Politecnico do Porto",
                "Escola Superior de Artes e Design",
                "Instituto Superior de PaÃ§os de BrandÃ£o",
                "Escola Superior de Hotelaria e Turismo do Estoril",
                "Instituto Piaget",
                "Instituto Superior de Transportes e ComunicaÃ§Ãµes",
                "Instituto PolitÃ©cnico do CÃ¡vado e do Ave",
                "Universidade Catolica Portuguesa",
                "Universidade Fernando Pessoa",
                "Instituto Superior de CiÃªncias Empresariais e de Turismo",
                "Instituto Superior de Assistentes e IntÃ©rpretes",
                "Instituto Superior PolitÃ©cnico Gaya",
                "Instituto PolitÃ©cnico de Tomar",
                "Instituto Superior de Entre Douro e Vouga",
                "Instituto Superior Bissaya Barreto",
                "Universidade AtlÃ¢ntica",
                "Instituto PolitÃ©cnico da Guarda",
                "Universidade de Aveiro",
                "Universidade Nova de Lisboa",
                "Universidade do Porto",
                "Instituto PolitÃ©cnico de Castelo Branco",
                "Universidade de Lisboa",
                "Instituto Superior de LÃ­nguas e AdministraÃ§Ã£o",
                "Instituto Superior de Psicologia Aplicada",
                "Instituto PolitÃ©cnico de Viana do Castelo",
                "European University Portugal",
                "Universidade dos Acores",
                "Escola Nautica Infante D. Henrique",
                "Instituto PolitÃ©cnico de Leiria",
                "Universidade de Evora",
                "Instituto Superior D. Afonso III - INUAF",
                "Universidade Lusiada",
                "Instituto Superior de Tecnologias AvanÃ§adas - ISTEC",
                "Universidade Internacional Lisboa",
                "Universidade Aberta Lisboa",
                "Instituto de Artes Visuais, Design e Marketing - IADE",
                "Instituto PolitÃ©cnico de Lisboa",
                "Instituto PolitÃ©cnico de Portalegre",
                "Universidade da Madeira",
                "Universidade do Minho",
                "Military University Shoumen",
                "Ecole Nationale SupÃ©rieure des Telecommunications de Bretagne",
                "Technical University of Budapest",
                "Institute of Social Studies",
                "Viterbo State University",
                "Schiller International University, London",
                "University of Stavanger",
                "Dubna International University for Nature, Society and Man",
                "Gulhane Military Medical Academy",
                "University of Trieste",
    ],
    "Scholarships": [
            "The Stipendium Hungaricum Scholarship",
            "Hungarian Diaspora Scholarship",
            "Scholarship Programme for Christian Young People (SCYP)",
            "Students at Risk Programme",
            "DiSCo Lazio",
            "Grants for Italians Residing Abroad",
            "Luciano Fonda College",
            "Regional Scholarship",
            "STEM contributions",
    ]
}

In [16]:
class SyntheticUserMessage(BaseModel):

    message: str = Field(
        ...,
        title="Message",
        description="The user message to generate for the target task intention.",
    )


class ListSyntheticUserMessages(BaseModel):

    messages: list[SyntheticUserMessage] = Field(
        ...,
        title="Messages",
        description="The list of synthetic user messages to generate for the target task intention.",
    )

output_parser = PydanticOutputParser(pydantic_object=ListSyntheticUserMessages)

In [17]:
system_prompt = """
You are tasked with generating synthetic user messages for an e-commerce platform called Cobuy, which specializes in electronics and gadgets.

The user intentions are:
{user_intentions}

Your task is to create {k} distinct messages for the following target task intention:
{target_task_intention}

Specific information about the target task intention:
{target_task_intention_description}

Follow these guidelines:
1. Focus exclusively on the target task intention, ensuring the message is relevant.
2. Each message should be between 5 and 20 words.
3. Avoid including any details or references to other user intentions.
4. Ensure the messages sound natural and typical of user queries for the given intention.
5. Follow the provided format strictly to maintain consistency.

Message format:
{format_instructions}
"""

prompt = PromptTemplate(
    template=system_prompt,
    input_variables=["k", "user_intentions", "target_task_intention" "target_task_intention_description", "format_instructions"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

In [18]:
llm = ChatOpenAI(temperature=0.0, model="gpt-4o-mini")

user_intentions = ["manage_personal_information", "search_scholarships_and_internationals","search_universities" , "matchmaking", "query_matches", "leverage_rag", "company_info"]
k = 50 # Number of synthetic user messages to generate for each target task intention

file_name = "synthetic_intetions.json"

synthetic_data_chain = prompt | llm | output_parser

# Intention 1 - Manage Personal Information

In [19]:
intention = "Manage Personal Information"

description = "The user wants to manage his personal information. \
    For example, the user might modify the following informations: \
        - Username \
        - User Preferences (Add, modify, remove or change) about universities \
        - Password \
        - Country \
        - Age \
        - Education Level (high school, bachelor's, master's)\
    The user might express his request in various ways, but not as a question. Usually the user has a specified value in mind."

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

order_status_messages = []
for message in response.messages:
    order_status_messages.append({"Intention":intention, "Message":message.message})

Now you can check and edit your synthetic messages in a json file.

In [20]:
add_messages(order_status_messages, file_name)

# Intention 2 - Query Database for Scholarships and International Opportunities

In [21]:
intention = "search_universities"

description = """The user intends to query the database for some information. In particular, he might ask the following topics: \
                - Universities \
                - Course Programmes \
                - Specific Subjects (exams)\
                
                A user can ask for this kind of information in every way, ranging from a specific request to the broadest and most general question.
                This request includes just asking for a general overview of a specific topic, or asking for a specific aspect of the topic (for example: requisites of a scholarship, location of a university, et cetera...)

                Universities: {Universities} \
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

create_order_messages = []

for message in response.messages:
    create_order_messages.append({"Intention":intention, "Message":message.message})

In [22]:
add_messages(create_order_messages, file_name)

# Intention 3. Query database for university, courses and subjects

In [23]:
intention = "search_scholarships_and_internationals"

description = """The user intends to query the database for some information. In particular, he might ask the following topics: \
                - Scholarships \
                - International opportunities (example: Erasmus+)\
                
                A user can ask for this kind of information in every way, ranging from a specific request to the broadest and most general question.
                This request includes just asking for a general overview of a specific topic, or asking for a specific aspect of the topic (for example: requisites of a scholarship, location of a university, et cetera...)

                Scholarships: {Scholarships} \
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

create_order_messages = []

for message in response.messages:
    create_order_messages.append({"Intention":intention, "Message":message.message})

In [24]:
add_messages(create_order_messages, file_name)

# Intention 3 - Matchmaking

In [25]:
intention = "Matchmaking"


description = """The user is interested in getting some matches, between his university preferences and the ones available in the database. \
                The prompt can just simply be a request to do the matches, or it can contain further specifications for the university (such as courses, programmes, scholarships, et cetera...). 
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [26]:
add_messages(product_information_messages, file_name)

# Intention 4: Query Matches


In [27]:
intention = "query_matches"


description = """The user has previously made the requests to make matches. Now the user wants to re-access the matches that have been previously made, so he can consult them again.
                The prompt should be simple.
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [28]:
add_messages(product_information_messages, file_name)

# Intention 5: Leverage RAG


In [29]:
intention = "leverage_rag"


description = """The user wishes to upload an external document about a university-related topic for the chatbot to process and explain it to the user. The document can be either in form of a PDF file (which is uploaded through external methods), or it can be a link to a website.
                The prompt might (but not always) have further questions or requests about the document.
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [30]:
add_messages(product_information_messages, file_name)

# Intention 6: Company Information

In [31]:
intention = "company_info"


description = """The user wishes to know more about the company UniMatch. \
                The questions can include topics such as the company mission and values, its history, about its creators, about the chatbot, and other potential specific aspects.
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [32]:
add_messages(product_information_messages, file_name)

# No Intention: None

In [33]:
system_prompt = """
You are tasked with generating synthetic user messages.

The user intentions are:
{user_intentions}

Your task is to create {k} distinct messages completely unrelated to the available user intentions.
These messages should be generic and not related to any specific task or intention.
The user is engaging in casual conversation.
The user might ask general questions, share opinions, or express emotions. 
The user might also ask for totaly none related questions to the platform. 
The user might ask general questions, share opinions, or express emotions.

Follow these guidelines:
1. Focus exclusively on not being related to any of the user intentions.
2. Each message should be between 5 and 20 words.
3. Avoid including any details or references to other user intentions.
4. Ensure the messages sound natural and typical of user queries for the given intention.
5. Follow the provided format strictly to maintain consistency.

Message format:
{format_instructions}
"""

In [34]:
prompt = PromptTemplate(
    template=system_prompt,
    input_variables=["k", "user_intentions"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

synthetic_data_chain = prompt | llm | output_parser

In [35]:
response = synthetic_data_chain.invoke({"k": (k//2), "user_intentions": user_intentions})

none_related_messages = []

for message in response.messages:
    none_related_messages.append({"Intention":"None", "Message":message.message})

In [36]:
add_messages(none_related_messages, file_name)