# Intentions Roadmap
- Intention 1: Manage Personal Information
- Intention 2: Query for scholarships or international opportunities
- Intention 3: Query for universities and courses
- Intention 4: Matchmaking
- Intention 5: Query previously-made matches
- Intention 6: Leverage RAG (PDFs and Websites)
- Intention 7: Company Information
- Intention (None): Chitchat

In [1]:
from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

Here are some auxiliar functions to save your synthetic data.

In [3]:
from auxiliar import add_messages

In [4]:
class SyntheticUserMessage(BaseModel):

    message: str = Field(
        ...,
        title="Message",
        description="The user message to generate for the target task intention.",
    )


class ListSyntheticUserMessages(BaseModel):

    messages: list[SyntheticUserMessage] = Field(
        ...,
        title="Messages",
        description="The list of synthetic user messages to generate for the target task intention.",
    )

output_parser = PydanticOutputParser(pydantic_object=ListSyntheticUserMessages)

In [5]:
system_prompt = """
You are tasked with generating synthetic user messages for a chatbot platform called UniMatch, which specializes in universities and courses.

The user intentions are:
{user_intentions}

Your task is to create {k} distinct messages for the following target task intention:
{target_task_intention}

Specific information about the target task intention:
{target_task_intention_description}

Follow these guidelines:
1. Focus exclusively on the target task intention, ensuring the message is relevant.
2. Each message should be between 5 and 20 words.
3. Avoid including any details or references to other user intentions.
4. Ensure the messages sound natural and typical of user queries for the given intention.
5. Follow the provided format strictly to maintain consistency.

Message format:
{format_instructions}
"""

prompt = PromptTemplate(
    template=system_prompt,
    input_variables=["k", "user_intentions", "target_task_intention" "target_task_intention_description", "format_instructions"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

In [6]:
llm = ChatOpenAI(temperature=0.0, model="gpt-4o-mini")

user_intentions = ["manage_personal_information", "search_scholarships_and_internationals","search_universities" , "matchmaking", "query_matches", "leverage_rag", "company_info"]
k = 50 # Number of synthetic user messages to generate for each target task intention

file_name = "synthetic_intetions.json"

synthetic_data_chain = prompt | llm | output_parser

# Intention 1 - Manage Personal Information

In [7]:
intention = "Manage Personal Information"

description = """The user wants to manage his personal information or wants the bot to describe his personal information. 
    For example, the user might modify or access the following informations: 
        - Username 
        - User Preferences (Add, modify, remove or change) about universities 
        - Password 
        - Country 
        - Age 
        - Education Level (high school, bachelor's, master's)
    The user might express his request in various ways, from a general request to a specific one.

    Make half of the intentions trying to modify, and the other half trying to access.
    
    Examples: 
    I want to change my username to <...> 
    Can you change my password to <...>? 
    Would it be possible to modify my age to <...>? 
    Change Country to Italy 
    Modify my user preferences; now I prefer to study in big cities and to study in a friendly environment. 
    Can you tell me about myself? 
    What are my preferences? 
    Who am I? 
    What is my age?"""

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

order_status_messages = []
for message in response.messages:
    order_status_messages.append({"Intention":intention, "Message":message.message})

Now you can check and edit your synthetic messages in a json file.

In [8]:
add_messages(order_status_messages, file_name)

# Intention 2 - Query Database for Scholarships and International Opportunities

In [11]:
intention = "search_universities"

description = """The user intends to query the database for some information. In particular, he might ask the following topics: \
                - Universities \
                - Course Programmes \
                - Specific Subjects (exams)\
                
                A user can ask for this kind of information in every way, ranging from a specific request to the broadest and most general question.
                This request includes just asking for a general overview of a specific topic, or asking for a specific aspect of the topic (for example: requisites of a scholarship, location of a university, et cetera...)
                
                Examples:
                Can you reccomend me some good universities in <...>?
                Do you have any universities with courses in <...>?
                Find me some universities specializing in <...>
                Search for courses with topics in <...>
                Are there courses with subjects in <...>?
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

create_order_messages = []

for message in response.messages:
    create_order_messages.append({"Intention":intention, "Message":message.message})

In [12]:
add_messages(create_order_messages, file_name)

# Intention 3. Query database for university, courses and subjects

In [13]:
intention = "search_scholarships_and_internationals"

description = """The user intends to query the database for some information. In particular, he might ask the following topics: \
                - Scholarships \
                - International opportunities (example: Erasmus+)\
                
                A user can ask for this kind of information in every way, ranging from a specific request to the broadest and most general question.
                This request includes just asking for a general overview of a specific topic, or asking for a specific aspect of the topic (for example: requisites of a scholarship, location of a university, et cetera...)
                
                Examples:
                Do you have any scholarships for <...>?
                Do you know of any universities with scholarships in <...>?
                Can you find me universities with international opportunities in <...>?
                Are there any international opportunities in <...>?
                Can you find scholarships for <...>?
                Find some universities with scholarships at <...>
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

create_order_messages = []

for message in response.messages:
    create_order_messages.append({"Intention":intention, "Message":message.message})

In [14]:
add_messages(create_order_messages, file_name)

# Intention 3 - Matchmaking

In [15]:
intention = "Matchmaking"


description = """The user is interested in getting some matches, between his university preferences and the ones available in the database. \
                The prompt can just simply be a request to do the matches, or it can contain further specifications for the university (such as courses, programmes, scholarships, et cetera...). \

                Examples
                I want to make matches
                I want you to find some matches for universities with courses in Mathemtics
                Could you make some university matches for me? If possible, include universities in Italy
                Make matches
                Can you do some matches of universities? 
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [16]:
add_messages(product_information_messages, file_name)

# Intention 4: Query Matches


In [17]:
intention = "query_matches"


description = """The user has previously made the requests to make matches. Now the user wants to re-access the matches that have been previously made, so he can consult them again.
                The prompt should be simple.

                Example:
                I want to search for my matches.
                Can you tell me about my university matches?
                Access matches.
                Find the previously-made matches
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [18]:
add_messages(product_information_messages, file_name)

# Intention 5: Leverage RAG


In [19]:
intention = "leverage_rag"


description = """The user has uploaded an external file (PDF or website link) and wants to do Q&A on it. The questions can be either general or specific, and the phrase usually starts by having declared that the file has been uploaded.

                Example:
                I have uploaded a PDF file. Can you tell me about the <...>?
                I gave the link. What is the <...>?
                You have the PDF file now. How is <...>?
                You have access to the website link's contents. <...>?
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [20]:
add_messages(product_information_messages, file_name)

# Intention 6: Company Information

In [21]:
intention = "company_info"


description = """The user wishes to know more about the company UniMatch. \
                The questions can include topics such as the company mission and values, its history, about its creators, about the chatbot, and other potential specific aspects.
                
                Examples:
                What is UniMatch?
                Who are the founders of UniMatch?
                What is this company about?
                Tell me about this product.
                Describe UniMatch for me.
                Explain me about the origin of UniMatch.
                """

response = synthetic_data_chain.invoke({"k": k, "user_intentions": user_intentions, "target_task_intention": intention, "target_task_intention_description": description})

product_information_messages = []

for message in response.messages:
    product_information_messages.append({"Intention":intention, "Message":message.message})

In [22]:
add_messages(product_information_messages, file_name)

# No Intention: None

In [23]:
system_prompt = """
You are tasked with generating synthetic user messages.

The user intentions are:
{user_intentions}

Your task is to create {k} distinct messages completely unrelated to the available user intentions.
These messages should be generic and not related to any specific task or intention.
The user is engaging in casual conversation.
The user might ask general questions, share opinions, or express emotions. 
The user might also ask for totaly none related questions to the platform. 
The user might ask general questions, share opinions, or express emotions.

Follow these guidelines:
1. Focus exclusively on not being related to any of the user intentions.
2. Each message should be between 5 and 20 words.
3. Avoid including any details or references to other user intentions.
4. Ensure the messages sound natural and typical of user queries for the given intention.
5. Follow the provided format strictly to maintain consistency.

Message format:
{format_instructions}
"""

In [24]:
prompt = PromptTemplate(
    template=system_prompt,
    input_variables=["k", "user_intentions"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

synthetic_data_chain = prompt | llm | output_parser

In [25]:
response = synthetic_data_chain.invoke({"k": (k//2), "user_intentions": user_intentions})

none_related_messages = []

for message in response.messages:
    none_related_messages.append({"Intention":"None", "Message":message.message})

In [26]:
add_messages(none_related_messages, file_name)

## Add Examples as Manually-defined Intentions

In [27]:
user_intentions

['manage_personal_information',
 'search_scholarships_and_internationals',
 'search_universities',
 'matchmaking',
 'query_matches',
 'leverage_rag',
 'company_info']

In [28]:
examples = {
    'manage_personal_information': [
        "I want to change my username to UrgedSpice456",
        "Can you change my password to !df@3d?",
        "Would it be possible to modify my age to 23?",
        "Change Country to Italy",
        "Modify my user preferences; now I prefer to study in big cities and to study in a friendly environment.",
        "Can you tell me about myself?",
        "What are my preferences?",
        "Who am I?",
        "What is my age?",
        "Can you describe my user preferences?",
        "Can you tell me my name?"],
    'search_universities':
    [
        "Can you recommend me some good universities in Italy?",
        "Do you have any universities with courses in Artificial Intelligence?",
        "Find me some universities specializing in Economics",
        "Search for courses with topics in Philosophy",
        "Are there courses with subjects in Ethics?"
    ],
    'search_scholarships_and_internationals':
    [
        "Do you have any scholarships for students in liberal arts?",
        "Do you know of any universities with scholarships in University of Trieste?",
        "Can you find me universities with international opportunities in Vienna?",
        "Are there any international opportunities in Lisboa?",
        "Can you find scholarships for STEM students?",
        "Find some universities with scholarships at Budapest" 
    ],
    'matchmaking':
[
    "I want to make matches",
    "I want you to find some matches for universities with courses in Mathematics",
    "Could you make some university matches for me? If possible, include universities in Italy",
    "Make matches",
    "Can you do some matches of universities?"
],

    'query_matches':[
        "I want to search for my matches.",
        "Can you tell me about my university matches?",
        "Access matches.",
        "Find the previously-made matches"
    ],
    'leverage_rag':
[
    "I have uploaded a PDF file. Can you tell me about the courses?",
    "I gave the link. What is the tution fee of the course in Mathematics?",
    "You have the PDF file now. How is the course programme structured?",
    "You have access to the website link's contents. What is the selling point of the campus?"
],
    'company_info': 
[
    "What is UniMatch?",
    "Who are the founders of UniMatch?",
    "What is this company about?",
    "Tell me about this product.",
    "Describe UniMatch for me.",
    "Explain me about the origin of UniMatch." ]}

In [44]:
total = []

In [45]:
id = 0
real_file = 'new_intentions.json'

In [47]:
for example in examples:
    for x in examples[example]:
        total.append({"Intention": example, "Message": x, "Id": id})
        id += 1

In [49]:
add_messages(total, real_file)