TERMINOLOGY MINING: This is the basic conversation to get a list of terminology that the user needs to know about to be able to understand the goal.

The reason for doing it this way, is because: when you ask the gpt "give terminology related to 'machine learning'" that the person should know, when for instance the person doesn't know some basic terminology, it still gives only quite advanced terms. Often, when learning something advanced without knowing the basics, is incredibly difficult. While if you have mastered the basics, the advanced level is quite easy. This way, it's possible to also "mine" these basic terms.

However, a question that can be asked is: when you want to explain machine learning, it's a quite broad term. And how deep do you want to explain it? A 'broad' definition is 'Machine learning is a subset of artificial intelligence (AI) that focuses on building systems that can learn from and make decisions based on data.'. In this definition they completely don't talk about the mathematics behind it, while the mathematics behind it is quite big and gives a completely another level of understanding.

SideNote (TODO): However, the step of searching for terminology that the LLM doesn't know yet doesn't have to be done by a LLM. This can be done a lot more quickly and cheaply with normal code. (TODO). The part where the LLM is useful, is to create automatic definitions of these new terminologies to search for new terminologies.

In [6]:
import autogen

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    file_location=".",
    filter_dict={
        "model": ["gpt-4", "gpt4", "gpt-4-32k"],
    },
)

print("LLM models: ", [config_list[i]["model"] for i in range(len(config_list))])

llm_config = {
    "timeout": 60,
    "cache_seed": 42,
    "config_list": config_list,
    "temperature": 0,
}

# autogen.ChatCompletion.start_logging()
termination_msg = lambda x: isinstance(x, dict) and "TERMINATE" == str(x.get("content", ""))[-9:].upper()

LLM models:  ['gpt-4-32k', 'gpt-4']


In [9]:
system_message = """You are an AI Assistant. 
you are given a LANGUAGE and CEFR level, 
you provide a very broad and structured list of terminology that the user should know to achieve this CEFR LEVEL
of the given LANGUAGE.
You answer only in JSON.
"""

# 1. create an AssistantAgent instance named "assistant"
assistant = autogen.AssistantAgent(
    name="assistant", 
    system_message=system_message,
    llm_config={
        "timeout": 600,
        "seed": 42,
        "config_list": config_list,
    },
    is_termination_msg=termination_msg
)

# Create the agent that represents the user in the conversation.
user_proxy = autogen.UserProxyAgent("user", code_execution_config=False, human_input_mode="NEVER", max_consecutive_auto_reply=0,
)

user_message = """English A1"""
# the assistant receives a message from the user_proxy, which contains the task description
user_proxy.initiate_chat(
    assistant,
    message=user_message,
)

[33muser[0m (to assistant):

English A1

--------------------------------------------------------------------------------
[33massistant[0m (to user):

{
    "Vocabulary": {
        "colors": ["red", "blue", "green", "yellow", "black", "white", "purple", "orange", "pink", "brown"],
        "numbers": ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"],
        "common verbs": ["be", "have", "do", "say", "go", "get", "make", "know", "come", "see"],
        "personal pronouns": ["I", "you", "he", "she", "it", "we", "they"],
        "this/that/these/those": ["this", "that", "these", "those"],
        "common nouns": ["man", "woman", "child", "day", "year", "government", "company", "group", "problem", "fact"],
        "Adjectives": ["big", "small", "large", "short", "long", "little", "happy", "sad", "good", "bad"],
        "Prepositions": ["at", "by", "for", "from", "of", "on", "to", "with", "in", "after"],
        "conjunctions": ["and", "that", "but", "or",