https://medium.com/@lokaregns/using-large-language-models-apis-with-python-a-comprehensive-guide-0020a51bf5b6

https://github.com/cheahjs/free-llm-api-resources

##### Initialise

In [2]:
from initialise import *

##### Defining the user, system message (i.e. personality of the chatbot)

In [4]:
user = {
    "domain": "data science",
    "machine_learning": 6,
    "statistics": 6,
    "healthcare": 1,
}

system_message = f'''
You are part of an interface helping to guide human users through explanations for an artificial intelligence system. 

This system generates binary predictions on whether someone gets the covid vaccine or not.

The user works in the field of {user["domain"]}. Rating their skillset out of 10:

Machine learning: {user["machine_learning"]}. Statistics: {user["statistics"]}. Healthcare: {user["healthcare"]}.

Provide comprehensive, but concise text-based explanations. Do not provide or recommend any code, unless explicity asked.

Your explanations should cater to their domain and skillset level, and be relevant to the artificial intelligence system. 

Only answer questions relating to questions about the artificial intelligence system.

Always recommend follow-up, clarifying questions the user could ask to help aid their understanding. 
'''

prompt = '''
What is ROC-AUC score?
'''

##### Simple LLM function for "one-shot" prompts
- [Google Gemini](https://aistudio.google.com/app)
- [Mistral AI](https://docs.mistral.ai/getting-started/quickstart/)
- [GitHub Marketplace Models](https://github.com/marketplace/models)

In [None]:
def oneshot_llm(model_name: str, prompt: str):
    
    if model_name in list(params.keys()):

        response = clients["github"].complete(
            messages = [
                SystemMessage(system_message),
                UserMessage(prompt)
                ],
                model       = params[model_name]["model"],
                temperature = params[model_name]["temperature"],
                top_p       = params[model_name]["top_p"],
                max_tokens  = params[model_name]["max_tokens"]
                )
        
        return print(response.choices[0].message.content)
    
    elif model_name == "google":
        
        response = clients[model_name].models.generate_content(
            model = "gemini-2.0-flash", 
            config = genai.types.GenerateContentConfig(system_instruction = system_message),
            contents = prompt
            )
        
        return print(response.text)

    elif model_name == "mistral":
        
        response = clients[model_name].chat.complete(
            model = "mistral-small-latest",
            messages = [{"role": "system", "content": system_message},
                        {"role": "user", "content": prompt}]
            )
        
        return print(response.choices[0].message.content)

##### Langchain for conversational memory
- https://python.langchain.com/docs/tutorials/chatbot/

In [70]:
def select_model(model_name: str):
    if model_name == "google":
        # https://python.langchain.com/docs/integrations/chat/google_generative_ai/
        return ChatGoogleGenerativeAI(
            model = "gemini-2.0-flash",
            temperature = 0,
            max_tokens = None,
            timeout = None,
            max_retries = 2)
        
    elif model_name == "mistral":
        # https://python.langchain.com/docs/integrations/chat/mistralai/
        return ChatMistralAI(
            model = "mistral-small-latest",
            temperature = 0,
            max_retries = 2)
        
    else:
        # https://python.langchain.com/docs/integrations/chat/azure_ai/
        return AzureAIChatCompletionsModel(
            model_name  = params[model_name]["model"],
            temperature = params[model_name]["temperature"],
            max_tokens  = params[model_name]["max_tokens"],
            max_retries = 2)

In [94]:
model = select_model("google")

prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_message), 
     MessagesPlaceholder(variable_name = "messages")
     ]
     )

class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]

def call_model(state: State):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow = StateGraph(state_schema = State)

workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer = memory)

config = {"configurable": {"thread_id": "conversation_1"}}

In [99]:
app.invoke({"messages": [HumanMessage("hello")]}, config,)["messages"]

[HumanMessage(content='hello', additional_kwargs={}, response_metadata={}, id='1766e397-c921-4445-9a68-21d5a9166a73'),
 AIMessage(content="Hello! I understand you're here to learn more about the AI system that predicts whether someone gets the COVID vaccine. Given your background in data science (ML: 6, Stats: 6, Healthcare: 1), I'll focus on explanations that highlight the machine learning and statistical aspects of the system, while keeping the healthcare context in mind.\n\nTo start, what aspects of the system are you most curious about? For example, are you interested in:\n\n*   The type of data used to train the model?\n*   The specific machine learning algorithm used?\n*   How the model's performance is evaluated?\n*   The features that are most influential in the model's predictions?\n\nAsking clarifying questions like these will help me tailor the explanations to your specific interests and knowledge level.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_r

In [73]:
prompt = "XGB or RF. Answer in 100 words"
input_messages = [HumanMessage(prompt)]
output = app.invoke(
    {"messages": input_messages},
    config,
)
output["messages"][-1].pretty_print()


Random Forests (RF) and XGBoost (XGB) are both powerful tree-based ensemble methods, but they differ in their approach. RF builds multiple decision trees independently and averages their predictions. XGBoost, on the other hand, uses gradient boosting, where trees are built sequentially, with each tree correcting the errors of its predecessors.

Given your AI system predicts COVID vaccine uptake, XGBoost might be preferred due to its ability to handle complex relationships and potential non-linear effects influencing vaccine decisions. However, RF can be more robust to overfitting and easier to tune. The best choice depends on your specific dataset and performance goals.

Follow-up questions: How large is the dataset? What is the dimensionality of the data? What evaluation metric are you using?


In [74]:
output["messages"]

[HumanMessage(content='XGB or RF. Answer in 100 words', additional_kwargs={}, response_metadata={}, id='d2c72688-690c-444e-a1cb-20566f1464f9'),
 AIMessage(content='Random Forests (RF) and XGBoost (XGB) are both powerful tree-based ensemble methods, but they differ in their approach. RF builds multiple decision trees independently and averages their predictions. XGBoost, on the other hand, uses gradient boosting, where trees are built sequentially, with each tree correcting the errors of its predecessors.\n\nGiven your AI system predicts COVID vaccine uptake, XGBoost might be preferred due to its ability to handle complex relationships and potential non-linear effects influencing vaccine decisions. However, RF can be more robust to overfitting and easier to tune. The best choice depends on your specific dataset and performance goals.\n\nFollow-up questions: How large is the dataset? What is the dimensionality of the data? What evaluation metric are you using?', additional_kwargs={}, res

In [None]:
model = select_model("chatgpt")

prompt_template = ChatPromptTemplate(
    messages = [
        ("system", "Answer simply"),
   ]
     )

class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]

def call_model(state: State):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow = StateGraph(state_schema = State)

workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer = memory)

config = {"configurable": {"thread_id": "conversation_1"}}

In [None]:
app.invoke(
    {"messages": [HumanMessage("What did I previously ask?")]},
    config,
)["messages"]#[-1].pretty_print()

[HumanMessage(content='What did I previously ask?', additional_kwargs={}, response_metadata={}, id='5ad6daaa-1874-4f4f-8d59-351dd464f08b'),
 AIMessage(content='Sure! Please ask your question.', additional_kwargs={}, response_metadata={'model': 'gpt-4.1-2025-04-14', 'token_usage': {'input_tokens': 9, 'output_tokens': 8, 'total_tokens': 17}, 'finish_reason': 'stop'}, id='run-2286e615-e445-4840-9684-bea0d7e5857c-0', usage_metadata={'input_tokens': 9, 'output_tokens': 8, 'total_tokens': 17}),
 HumanMessage(content='What did I previously ask?', additional_kwargs={}, response_metadata={}, id='9eaf39bf-e96e-4d69-81c4-7d507b149819'),
 AIMessage(content='Sure! Ask your question and I will answer simply.', additional_kwargs={}, response_metadata={'model': 'gpt-4.1-2025-04-14', 'token_usage': {'input_tokens': 9, 'output_tokens': 12, 'total_tokens': 21}, 'finish_reason': 'stop'}, id='run-fe0f078e-d712-47e7-acc8-968e35abca0a-0', usage_metadata={'input_tokens': 9, 'output_tokens': 12, 'total_tokens'

In [None]:
config = {"configurable": {"thread_id": "conversation_1"}}
query = "Hi I'm Henry, please tell me a joke."

input_messages = [HumanMessage(query)]
for chunk, metadata in app.stream(
    {"messages": input_messages},
    config,
    stream_mode = "messages",
):
    if isinstance(chunk, AIMessage):  # Filter to just model responses
        print(chunk.content, end = "")

Under Italian law, if a property owner dies without any heirs or a valid will, the ownership of their immovable property (real estate) passes to the Italian State. This is called "successione ereditaria vacante" (vacant inheritance). The State then has full rights over the property.

In [157]:
stream = app.stream(
    {"messages": input_messages},
    config,
    stream_mode = "messages",
)

In [159]:
for chunk, metadata in stream:
    print(chunk, metadata)

content='Sure' additional_kwargs={} response_metadata={'finish_reason': None} id='run-60065499-cdb1-411e-81fa-a82dbe042ac2' {'thread_id': 'conversation_1', 'langgraph_step': 13, 'langgraph_node': 'model', 'langgraph_triggers': ('branch:to:model',), 'langgraph_path': ('__pregel_pull', 'model'), 'langgraph_checkpoint_ns': 'model:b0921529-3b4f-9826-d384-19908d47f4f7', 'checkpoint_ns': 'model:b0921529-3b4f-9826-d384-19908d47f4f7', 'ls_provider': 'azureaichatcompletionsmodel', 'ls_model_type': 'chat', 'ls_model_name': 'openai/gpt-4.1', 'ls_temperature': 1.0, 'ls_max_tokens': 800}
content='!' additional_kwargs={} response_metadata={'finish_reason': None} id='run-60065499-cdb1-411e-81fa-a82dbe042ac2' {'thread_id': 'conversation_1', 'langgraph_step': 13, 'langgraph_node': 'model', 'langgraph_triggers': ('branch:to:model',), 'langgraph_path': ('__pregel_pull', 'model'), 'langgraph_checkpoint_ns': 'model:b0921529-3b4f-9826-d384-19908d47f4f7', 'checkpoint_ns': 'model:b0921529-3b4f-9826-d384-19908

Trimmed messages (to be implemented...)

In [None]:
from langchain_core.messages import SystemMessage, trim_messages

trimmer = trim_messages(
    max_tokens = 65,
    strategy = "last",
    token_counter = model,
    include_system = True,
    allow_partial = False,
    start_on = "human",
)

messages = [
    SystemMessage(content = "You're a good assistant"),
    HumanMessage(content = "Hi! I'm Henry"),
    AIMessage(content = "Hi!"),
    HumanMessage(content = "I like vanilla ice cream"),
    AIMessage(content = "Nice"),
    HumanMessage(content = "What's 2 + 2"),
    AIMessage(content = "4"),
    HumanMessage(content = "Thanks"),
    AIMessage(content = "No problem!"),
    HumanMessage(content = "Having fun?"),
    AIMessage(content = "Yes!"),
]

trimmer.invoke(messages)

[SystemMessage(content="You're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="Hi! I'm bob", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hi!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="What's 2 + 2", additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='No problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Yes!', additional_kwargs={}, response_metadata={})]

In [None]:
workflow = StateGraph(state_schema = State)

def call_model(state: State):
    trimmed_messages = trimmer.invoke(state["messages"])
    prompt = prompt_template.invoke(
        {"messages": trimmed_messages, "language": state["language"]}
    )
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer = memory)