## 2. `ConversationBufferMemory`
- **Purpose**: Keeps the **entire conversation history** verbatim in memory.
- **Use‑case**: Short chit‑chat bots where full context is valuable.
- **Limitations**: May hit LLM context‑window limits as conversation grows :contentReference[oaicite:1]{index=1}.

In [1]:
# ================================
# Step 1: Install dependencies
# ================================
!pip install -q langchain langchain-groq

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/130.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.2/130.2 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# ================================
# Step 2: Import libraries
# ================================
from langchain_groq import ChatGroq
from langchain.prompts.chat import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableWithMessageHistory
from langchain.memory import ConversationBufferMemory
from IPython.display import Markdown, display
from google.colab import userdata

In [3]:
# ================================
# Step 3: Setup Groq API + LLM
# ================================
api_key = userdata.get("GROQ_API_KEY")  # or hardcode it: "sk-..."
llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    api_key=api_key,
    temperature=0.3,
)

In [4]:
# ================================
# Step 4: Define Prompt Template
# ================================
# The system message sets the tone
system_msg = SystemMessagePromptTemplate.from_template(
    "You are a friendly, knowledgeable assistant. Answer user queries clearly and remember previous interactions."
)

# The MessagesPlaceholder allows memory to inject chat history dynamically
human_msg = HumanMessagePromptTemplate.from_template("{input}")

chat_prompt = ChatPromptTemplate.from_messages([
    system_msg,
    MessagesPlaceholder(variable_name="history"),  # enables memory tracking
    human_msg
])

In [8]:
# ================================
# Step 5: Setup ConversationBufferMemory
# ================================
# ConversationBufferMemory stores all user/assistant messages as-is
# Best for short conversations where token cost isn't a concern

memory = ConversationBufferMemory(return_messages=True)

In [9]:
# ================================
# Step 6: Combine LLM, Prompt, and Memory
# ================================
chat_chain = chat_prompt | llm

# Wrap the chain with memory support
chatbot = RunnableWithMessageHistory(
    chat_chain,
    lambda session_id: memory.chat_memory,
    input_messages_key="input",
    history_messages_key="history"
)

In [10]:
# ================================
# 🤖 Step 7: Start the Chat Loop
# ================================
session_id = "chat-session-001"

user_inputs = [
    "Hi! What is a confusion matrix?",
    "Can you explain how it's used in model evaluation?",
    "Thanks! What are its limitations?"
]

for input_text in user_inputs:
    response = chatbot.invoke(
        {"input": input_text},
        config={"configurable": {"session_id": session_id}}
    )

    display(Markdown(f"### ❓ User: {input_text}"))
    display(Markdown(f"**🤖 Assistant:** {response.content}"))

### ❓ User: Hi! What is a confusion matrix?

**🤖 Assistant:** A confusion matrix is a table used to evaluate the performance of a classification model, such as a machine learning algorithm. It's a simple yet powerful tool that helps you understand how well your model is doing in terms of accuracy, precision, recall, and other important metrics.

The matrix itself is a square table that summarizes the predictions against the actual outcomes. Here's a breakdown of what you'll typically find in a confusion matrix:

* **True Positives (TP)**: The number of instances where the model correctly predicted a positive outcome (e.g., "yes", "1", etc.).
* **True Negatives (TN)**: The number of instances where the model correctly predicted a negative outcome (e.g., "no", "0", etc.).
* **False Positives (FP)**: The number of instances where the model incorrectly predicted a positive outcome (e.g., predicted "yes" when the actual outcome was "no").
* **False Negatives (FN)**: The number of instances where the model incorrectly predicted a negative outcome (e.g., predicted "no" when the actual outcome was "yes").

Using these values, you can calculate various metrics, such as:

* **Accuracy**: (TP + TN) / (TP + TN + FP + FN)
* **Precision**: TP / (TP + FP)
* **Recall**: TP / (TP + FN)
* **F1-score**: 2 \* (Precision \* Recall) / (Precision + Recall)

The confusion matrix is a valuable tool for evaluating and improving the performance of your classification models. It helps you identify areas where your model might be struggling, such as high false positive or false negative rates, and make adjustments to improve its accuracy.

Do you have any specific questions about confusion matrices or classification models in general?

### ❓ User: Can you explain how it's used in model evaluation?

**🤖 Assistant:** The confusion matrix plays a crucial role in model evaluation, as it provides a detailed breakdown of a model's performance. Here's how it's used:

1. **Model Performance Metrics**: The confusion matrix is used to calculate various metrics that evaluate a model's performance, such as:
	* **Accuracy**: Overall, how often is the model correct?
	* **Precision**: When the model predicts a positive outcome, how often is it correct?
	* **Recall**: When the actual outcome is positive, how often does the model predict it correctly?
	* **F1-score**: The harmonic mean of precision and recall, providing a balanced measure of both.
	* **False Positive Rate**: The proportion of false positives among all negative instances.
	* **False Negative Rate**: The proportion of false negatives among all positive instances.
2. **Error Analysis**: The confusion matrix helps identify the types of errors a model is making. For example:
	* **High False Positive Rate**: The model is over-predicting positive outcomes, which may indicate that the model is too sensitive or has a high bias towards positive predictions.
	* **High False Negative Rate**: The model is under-predicting positive outcomes, which may indicate that the model is too conservative or has a high bias towards negative predictions.
3. **Class Imbalance**: The confusion matrix can help identify class imbalance issues, where one class has a significantly larger number of instances than the other. This can lead to biased models that perform well on the majority class but poorly on the minority class.
4. **Model Comparison**: Confusion matrices can be used to compare the performance of different models, allowing you to choose the best model for your specific problem.
5. **Hyperparameter Tuning**: The confusion matrix can be used to evaluate the impact of hyperparameter tuning on a model's performance, helping you identify the optimal hyperparameters for your model.
6. **Model Interpretation**: The confusion matrix can provide insights into how a model is making predictions, helping you understand the relationships between features and target variables.

By analyzing the confusion matrix, you can gain a deeper understanding of your model's strengths and weaknesses, identify areas for improvement, and make informed decisions about model selection, hyperparameter tuning, and feature engineering.

Do you have any specific questions about model evaluation or how to use the confusion matrix in practice?

### ❓ User: Thanks! What are its limitations?

**🤖 Assistant:** While the confusion matrix is a powerful tool for evaluating classification models, it has some limitations:

1. **Assumes Binary Classification**: The traditional confusion matrix is designed for binary classification problems, where there are only two classes (e.g., 0 and 1, yes and no). For multi-class classification problems, you'll need to use a more complex matrix or other evaluation metrics.
2. **Doesn't Account for Class Probabilities**: The confusion matrix only considers the predicted class labels, not the predicted probabilities. This can lead to misleading results if the model is producing uncertain or probabilistic predictions.
3. **Sensitive to Class Imbalance**: As I mentioned earlier, the confusion matrix can be affected by class imbalance issues. If one class has a significantly larger number of instances than the other, the matrix may not accurately reflect the model's performance.
4. **Doesn't Provide Insight into Model Uncertainty**: The confusion matrix doesn't provide information about the model's uncertainty or confidence in its predictions. This can make it difficult to identify situations where the model is making predictions with low confidence.
5. **Can be Misleading for Imbalanced Datasets**: When dealing with imbalanced datasets, the accuracy metric from the confusion matrix can be misleading. For example, a model that always predicts the majority class may have high accuracy but poor performance on the minority class.
6. **Ignores Cost-Sensitive Classification**: In some cases, the cost of false positives and false negatives may be different. The confusion matrix doesn't account for these differences, which can lead to suboptimal decision-making.
7. **Limited Interpretability for Complex Models**: For complex models, such as deep neural networks, the confusion matrix may not provide enough information to understand why the model is making certain predictions.
8. **Not Suitable for Regression Problems**: The confusion matrix is designed for classification problems and is not directly applicable to regression problems, where the goal is to predict a continuous value.

To address these limitations, you can use additional evaluation metrics and techniques, such as:

* **Receiver Operating Characteristic (ROC) Curve**: Plots the true positive rate against the false positive rate at different thresholds.
* **Precision-Recall Curve**: Plots the precision against the recall at different thresholds.
* **Area Under the Curve (AUC)**: Measures the model's ability to distinguish between classes.
* **Cohen's Kappa**: Measures the agreement between the model's predictions and the actual labels, accounting for chance agreement.
* **Brier Score**: Measures the mean squared error between the predicted probabilities and the actual outcomes.

By using a combination of these metrics and techniques, you can gain a more comprehensive understanding of your model's performance and limitations.

Do you have any specific questions about these limitations or how to address them?