# Notebook for Debugging and Running the PMAS Module

## Running the PMAS Module

In [1]:
from thesis2024.PMAS import PMAS
from thesis2024.utils import init_llm_langsmith

In [2]:
test_conversation = "Student: Hello, how are you today?\nTeaching Assistant: Hello, August! I'm here to assist you. How can I help you with your Intro to Machine Learning course today?\nStudent: I am trying to learn about methods like linear regression\nTeaching Assistant: Linear regression is a statistical method used in machine learning to predict a continuous outcome variable (y) based on one or more predictor variables (x). The goal is to find a linear relationship between these variables, which can be described by the equation:\n$$ y = \\beta_0 + \\beta_1x_1 + \\beta_2x_2 + ... + \\beta_nx_n + \\epsilon $$\nwhere:\n- \\( \\beta_0, \\beta_1, ..., \\beta_n \\) are coefficients that the model learns to optimize.\n- \\( \\epsilon \\) is the error term, accounting for the randomness in the data.\n\n### Simple Linear Regression Example\nConsider a scenario where you have data points that represent the relationship between the amount of hours studied (x) and the scores obtained in an exam (y). In simple linear regression, you would try to fit a line that best predicts the scores based on the hours studied. This line is represented by:\n$$ y = \\beta_0 + \\beta_1x + \\epsilon $$\n\n### Feature Transformation in Linear Regression\nIn more complex scenarios, such as predicting a value based on multiple inputs, you might need to transform features to capture non-linear relationships. For instance, if you're considering both the hours studied and the type of subject being studied, you might include polynomial terms (like \\(x^2\\)) or interactions between features (like \\(x_1 \\times x_2\\)) to improve your model. This is represented as:\n$$ y = \\beta_0 + \\beta_1x_1 + \\beta_2x_2 + \\beta_3x_1^2 + \\beta_4x_2^2 + \\beta_5x_1x_2 + \\epsilon $$\n\n### Applications\nLinear regression can be applied in various fields:\n- In finance, to predict stock prices based on historical performance and other economic indicators.\n- In health care, to predict patient outcomes based on treatment protocols and patient characteristics.\n- In marketing, to predict sales based on advertising spend across different media channels.\n\n### Practical Example Code\nHere is a simple Python example using the `scikit-learn` library to perform linear regression:\n```python\nfrom sklearn.linear_model import LinearRegression\nimport numpy as np\n\n# Example data\nX = np.array([[1], [2], [3], [4], [5]])  # Hours studied\ny = np.array([2, 4, 5, 4, 5])            # Exam scores\n\n# Create a linear regression model\nmodel = LinearRegression()\nmodel.fit(X, y)\n\n# Predict using the model\npredicted_scores = model.predict(X)\n\nprint(\"Coefficients:\", model.coef_)\nprint(\"Intercept:\", model.intercept_)\nprint(\"Predicted Scores:\", predicted_scores)\n```\n\nThis code snippet creates a model to predict exam scores based on the hours studied, fits the model to the data, and then uses it to make predictions.\nStudent: Wow that was a long explanation, way to much text for me.. can you boil it down a little?\nTeaching Assistant: Linear regression is a method used to predict a continuous variable based on one or more other variables. It finds the best line that fits the data, using an equation of the form:\n$$ y = \\beta_0 + \\beta_1x_1 + \\beta_2x_2 + ... + \\beta_nx_n + \\epsilon $$\nHere, $\\beta_0, \\beta_1, ..., \\beta_n$ are coefficients the model optimizes, and $\\epsilon$ represents error or noise in the data. This technique is useful in various fields like finance, healthcare, and marketing to make predictions based on historical data.\nStudent: Ahh thats much better. Can you elaborate further on the relevant formulas, in case I have to be able to describe them at the exam\nTeaching Assistant: In linear regression, the relationship between the dependent variable \\( y \\) and one or more independent variables \\( x \\) is modeled. The formulas involved are crucial for understanding how predictions are made.\n\n### 1. Simple Linear Regression Formula:\nIn simple linear regression, where there is only one predictor variable, the formula is:\n$$ y = \\beta_0 + \\beta_1x + \\epsilon $$\nHere:\n- \\( \\beta_0 \\) is the y-intercept of the regression line.\n- \\( \\beta_1 \\) is the slope of the regression line, indicating the effect of the predictor variable \\( x \\) on the dependent variable \\( y \\).\n- \\( \\epsilon \\) represents the error term, which accounts for the variability in \\( y \\) that cannot be explained by \\( x \\).\n\n### 2. Multiple Linear Regression Formula:\nWhen there are multiple predictor variables, the formula extends to:\n$$ y = \\beta_0 + \\beta_1x_1 + \\beta_2x_2 + ... + \\beta_nx_n + \\epsilon $$\nHere:\n- \\( \\beta_0, \\beta_1, ..., \\beta_n \\) are the coefficients for each predictor variable \\( x_1, x_2, ..., x_n \\).\n- \\( \\epsilon \\) is again the error term.\n\n### 3. Estimation of Coefficients:\nThe coefficients \\( \\beta_0, \\beta_1, ..., \\beta_n \\) are estimated using the method of least squares, which minimizes the sum of the squared differences between the observed values and the values predicted by the model:\n$$ \\text{Minimize} \\sum_{i=1}^n (y_i - (\\beta_0 + \\beta_1x_{1i} + ... + \\beta_nx_{ni}))^2 $$\n\n### 4. Assumptions:\nFor the linear regression model to provide valid predictions, several assumptions must be met:\n- Linearity: The relationship between the predictors and the outcome is linear.\n- Homoscedasticity: The variance of residual is the same for any value of \\( x \\).\n- Independence: Observations are independent of each other.\n- Normality: For any fixed value of \\( x \\), \\( y \\) is normally distributed.\n\nUnderstanding these formulas and assumptions is crucial for effectively applying linear regression analysis and interpreting its results. This knowledge will be particularly useful for your exam in demonstrating a thorough understanding of how linear regression works.\nStudent: Ok! So the least squares is the main method for estimating coefficients, but are there others?\nTeaching Assistant: Yes, August, besides the least squares method, there are several other methods for estimating coefficients in regression models:\n\n1. **Regularization Techniques**: These are modifications of the least squares method that add a penalty term to the loss function you are trying to minimize. Examples include:\n   - **Ridge Regression (L2 Regularization)**: Adds a penalty equal to the square of the magnitude of coefficients.\n   - **Lasso Regression (L1 Regularization)**: Adds a penalty equal to the absolute value of the magnitude of coefficients.\n   - **Elastic Net**: Combines penalties of Lasso and Ridge.\n   Regularization techniques help prevent overfitting by reducing model complexity.\n\n2. **Nearest-Neighbor Methods**: Unlike the global approach of least squares, nearest-neighbor methods estimate the response at a point based on the responses of its nearest neighbors in the training set. This can be more flexible than least squares and can adapt to more complex patterns in the data.\n\n3. **Cross-Validation**: While not a method for estimating coefficients directly, cross-validation is a technique used to evaluate the effectiveness of your regression model, including the choice of coefficient estimation method. It involves dividing the data into subsets, using some subsets to train the model and the remaining subset to test the model. This helps in selecting the model that generalizes best to unseen data.\n\nEach of these methods has its advantages and is chosen based on the specific requirements and characteristics of the data you are working with.\nStudent: What is the formula for L2?\nTeaching Assistant: L2 regularization, also known as Ridge Regression, is a technique used to prevent overfitting in machine learning models, particularly in the context of regression. The formula for L2 regularization involves adding a penalty term to the usual least squares loss function, which is proportional to the square of the magnitude of the coefficients. This penalty term is controlled by a parameter \\( \\lambda \\), which is a non-negative hyperparameter.\n\nThe modified loss function in L2 regularization is given by:\n$$\n\\text{Loss} = \\sum_{i=1}^n (y_i - \\hat{y}_i)^2 + \\lambda \\sum_{j=1}^p \\beta_j^2\n$$\nwhere:\n- \\( y_i \\) are the actual observed responses,\n- \\( \\hat{y}_i \\) are the predicted responses from the model,\n- \\( \\beta_j \\) are the model coefficients,\n- \\( n \\) is the number of data points,\n- \\( p \\) is the number of features (or predictors),\n- \\( \\lambda \\) is the regularization parameter that controls the trade-off between fitting the model well on the training data and keeping the model coefficients small to avoid overfitting.\n\nThe inclusion of the \\( \\lambda \\sum_{j=1}^p \\beta_j^2 \\) term encourages the coefficients to shrink towards zero, but not exactly zero, which helps in reducing model complexity and variance without substantially increasing the bias.\nStudent: Can you give me a trial exam question that I have to answer?\nTeaching Assistant: Here's a trial exam question for you, August:\n\n**Question:**\nYou are provided with a dataset containing housing prices in a city, where the main features include the size of the house (in square feet), the number of bedrooms, and the age of the house (in years). You are tasked with developing a linear regression model to predict the price of the houses.\n\n1. **Model Formulation:**\n   - Write down the equation for a multiple linear regression model that could be used for predicting house prices based on the given features.\n\n2. **Coefficient Estimation:**\n   - Explain how you would use the least squares method to estimate the coefficients of your model.\n\n3. **Regularization:**\n   - Suppose you notice that your model is overfitting the training data. Describe how you could modify your model to include Ridge Regression (L2 regularization) and explain how this helps reduce overfitting.\n\n4. **Model Evaluation:**\n   - Discuss how you would evaluate the performance of your model. Mention at least two metrics that you would use and explain why they are appropriate for this task.\n\nThis question covers several key aspects of linear regression and tests your ability to apply these concepts to a practical problem.\nStudent: I thing the model looks something like this:\nPrice=coeff_A * housesize + coeff_B * numberofbedrooms + coeff_C * age + error\n\nI dont know, can you explain the rest?\nTeaching Assistant: August, the equation you mentioned:\n$$ \\text{Price} = \\text{coeff}_A \\times \\text{housesize} + \\text{coeff}_B \\times \\text{numberofbedrooms} + \\text{coeff}_C \\times \\text{age} + \\text{error} $$\nrepresents a multiple linear regression model. Here's what each component represents:\n\n1. **Dependent Variable (Price):** This is the variable you are trying to predict. In this case, it's the price of the house.\n\n2. **Independent Variables (housesize, numberofbedrooms, age):** These are the features or predictors that you believe influence the dependent variable. Each feature has an associated coefficient which quantifies the impact of that feature on the price:\n   - **coeff_A:** This coefficient multiplies the size of the house (housesize). It represents the change in the price of the house for each unit increase in the size of the house, holding all other factors constant.\n   - **coeff_B:** This coefficient is associated with the number of bedrooms (numberofbedrooms). It quantifies the additional price expected due to an increase in the number of bedrooms, assuming other factors remain the same.\n   - **coeff_C:** This coefficient is linked to the age of the house (age). It shows how the price is expected to change as the house ages.\n\n3. **Error Term (error):** This term accounts for the variability in house price that cannot be explained by the house size, number of bedrooms, and age alone. It includes factors not present in the model, measurement errors, and other forms of randomness.\n\nWhen you use this model for prediction, you'll estimate the coefficients (coeff_A, coeff_B, coeff_C) from historical data using a method like least squares, which minimizes the sum of squared differences between observed and predicted prices. If you find that the model overfits the training data, you might consider regularization techniques like Ridge Regression to reduce overfitting by penalizing large coefficients.\n\nThis model can help you make informed predictions about house prices based on the specified features, and understanding each component will allow you to interpret the results effectively and make adjustments as needed.\nStudent: Ahh that makes sense\nTeaching Assistant: ### Coefficient Estimation:\nTo estimate the coefficients (\\(\\text{coeff}_A\\), \\(\\text{coeff}_B\\), \\(\\text{coeff}_C\\)) using the least squares method, you would follow these steps:\n1. **Set Up the Equation Matrix:** Organize your data into a matrix format where each row represents a data point and each column represents a predictor variable (housesize, numberofbedrooms, age). Add an additional column of ones to represent the intercept (\\(\\text{coeff}_0\\)).\n2. **Apply the Least Squares Formula:** Use the formula \\(\\beta = (X^TX)^{-1}X^Ty\\) to calculate the coefficients. Here, \\(X\\) is your matrix of predictors, \\(y\\) is the vector of observed prices, and \\(\\beta\\) is the vector of coefficients including the intercept.\n3. **Interpret the Results:** The output will give you the estimated coefficients that minimize the sum of squared residuals between the observed and predicted prices.\n\n### Regularization (Ridge Regression):\nIf you notice that your model is overfitting:\n1. **Modify the Loss Function:** Incorporate a penalty term into the least squares loss function. The Ridge Regression formula is:\n   \\[\n   \\text{Loss} = \\sum_{i=1}^n (y_i - \\hat{y}_i)^2 + \\lambda \\sum_{j=0}^p \\beta_j^2\n   \\]\n   Here, \\(\\lambda\\) is the regularization parameter that you choose to control the impact of the penalty term.\n2. **Fit the Model:** Use this modified loss function to fit your model. The penalty term will shrink the coefficients towards zero, reducing their magnitude and thus helping to prevent overfitting by not allowing the model to fit excessively to the noise in the training data.\n\n### Model Evaluation:\nTo evaluate the performance of your model, consider using these metrics:\n1. **Root Mean Squared Error (RMSE):** This metric measures the average magnitude of the errors between the predicted and actual values, giving you an idea of how close the predictions are to the actual prices.\n   \\[\n   \\text{RMSE} = \\sqrt{\\frac{\\sum_{i=1}^n (\\hat{y}_i - y_i)^2}{n}}\n   \\]\n2. **R-squared (\\(R^2\\)):** This metric indicates the proportion of variance in the dependent variable that is predictable from the independent variables. It provides a sense of how well unseen samples are likely to be predicted by the model.\n   \\[\n   R^2 = 1 - \\frac{\\sum_{i=1}^n (y_i - \\hat{y}_i)^2}{\\sum_{i=1}^n (y_i - \\bar{y})^2}\n   \\]\n   where \\(\\bar{y}\\) is the mean of the observed data.\n\nThese explanations should help you understand and complete your trial exam question effectively, demonstrating your grasp of linear regression concepts and their practical application.\nStudent: I did not ask you to explain it again, but than you Iguess\nTeaching Assistant: I'm glad that the explanation made sense to you, August! If you have any more questions or need further clarification on any aspect of linear regression or any other topic in your course, feel free to ask. Good luck with your studies!"

student_name = "August"
student_course = "IntroToMachineLearning"
student_subject = "Linear Regression"

llm_model = init_llm_langsmith(llm_key=4, temp=0.5, langsmith_name="PMAS Agent Testing 1")
pmas = PMAS(llm_model=llm_model,
        course=student_course,
        subject=student_subject,
        student_name=student_name,
        )

res = pmas.predict(conversation=test_conversation)
print("This is the response from the model:\n", res)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mFrom the conversation between August and the Teaching Assistant, it is possible to deduce certain learning preferences that August may have, which are indicated by specific parts of the dialogue.

1. **Preference for Concise Information**: August expressed a preference for more concise explanations when he mentioned that the initial detailed explanation was too much text for him. This suggests that he might benefit from bullet points, summaries, or simplified explanations rather than lengthy, detailed descriptions.
   - **Indicating snippet**: "Wow that was a long explanation, way to much text for me.. can you boil it down a little?"

2. **Visual Learning**: While not explicitly stated, August's request for further elaboration on formulas after receiving a simplified explanation suggests that he might find visual representations like formulas, graphs, or charts helpful in understanding concepts better.
   - **Indicating snipp

In [3]:
test_conversation = "Student: Hello! I am August. I would like to learn about ensemble methods\nTeaching Assistant: Hello August! I'm glad you're interested in ensemble methods in machine learning. Let's dive into what they are and why they are useful.\n\nEnsemble methods involve combining the outputs of multiple models to create a single model with superior predictive performance. This approach is analogous to seeking opinions from multiple experts to form a more reliable judgment.\n\nHere’s a simple breakdown:\n1. **Multiple Models**: You train multiple models on your data. These could be any type of models - decision trees, neural networks, etc.\n2. **Combine Outputs**: After training, instead of using just one model’s output, you aggregate the outputs from all the models. This aggregation might be an average, a majority vote, or another method depending on whether you are dealing with regression or classification tasks.\n\nThe main advantages of using ensemble methods include:\n- **Reduced Variance**: By averaging multiple models, the ensemble model is less likely to fit too closely to the training data (overfit), making it more robust and generalizable.\n- **Improved Accuracy**: Often, ensemble methods perform better than any single model in the ensemble, especially if the individual models make different types of errors.\n\nTwo popular techniques in ensemble methods are:\n- **Bagging**: Short for Bootstrap Aggregating, it involves training multiple models on different subsets of the dataset (sampled with replacement), and then averaging their predictions. An example is the Random Forest algorithm, which applies bagging to decision trees.\n- **Boosting**: This method trains models sequentially, each trying to correct the errors of the previous ones. The outputs are then combined through a weighted sum approach. AdaBoost is a well-known boosting algorithm.\n\nEnsemble methods are a powerful tool in machine learning, especially when you need robustness and accuracy from your predictive models. If you need more specific examples or deeper insights into any particular method, feel free to ask!\nStudent: Can you explain it bagging further?\nTeaching Assistant: Bagging, short for Bootstrap Aggregating, is an ensemble method used in machine learning to improve the stability and accuracy of machine learning algorithms. It involves the following key steps:\n\n1. **Multiple Subsets**: Bagging begins by creating multiple subsets of the original dataset. Each subset is created through a process called bootstrapping, which means sampling with replacement. This means that some observations may appear more than once in a single subset while others may not appear at all.\n\n2. **Training Multiple Models**: Each of these subsets is used to train a separate model. Typically, the same type of model is used for each subset but trained on different data. For instance, if you are using decision trees, each tree will be trained on a different subset.\n\n3. **Aggregating Outputs**: Once all models are trained, their outputs (predictions) are combined into a single output. The combination method often depends on the type of task:\n   - For regression problems, the average of the predictions is typically used.\n   - For classification, the majority vote (or sometimes a weighted vote) is used.\n\nThe main advantages of bagging include:\n- **Reduction in Variance**: By averaging multiple estimates, bagging reduces the chances of overfitting to the noise in the training data, which can be a significant problem in complex models.\n- **Improvement in Accuracy**: Bagging can lead to improvements in accuracy, especially for models that have high variance.\n\nA classic example of bagging in practice is the Random Forest algorithm, which applies the bagging technique to decision trees. In Random Forests, each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. Furthermore, when splitting a node during the construction of the tree, the best split is found from a random subset of the features, adding an extra layer of randomness that helps decrease the correlation between trees in the forest, thus increasing the ensemble's overall performance.\n\nBagging is particularly effective when the base models are high variance and low bias. Since each model is trained independently, bagging can also be easily parallelized, making it efficient for large-scale applications.\nStudent: Can you give an example of a case in which bagging can be used?\nTeaching Assistant: A practical example of where bagging can be used is in credit risk assessment. In this scenario, financial institutions need to predict whether a loan applicant is likely to default on their loan repayments. The challenge is that the dataset might contain complex patterns and noise, leading to a high variance if a single predictive model is used.\n\nHere's how bagging can be applied:\n1. **Data Sampling**: Multiple subsets of the original loan applicant dataset are created using bootstrapping, where each subset may include some applicants more than once and others not at all.\n2. **Model Training**: A decision tree model is trained on each subset. Decision trees are chosen because they are sensitive to variations in the dataset and can capture complex patterns in the data.\n3. **Aggregation of Predictions**: After training, the predictions from each decision tree are aggregated to make a final prediction for each applicant. For a binary outcome (default or no default), the aggregation could be a majority vote among the predictions from all trees.\n\nBy using bagging, the bank can reduce the variance of the predictions, leading to a more reliable assessment of credit risk. This method helps in making more accurate decisions, thereby reducing the risk of financial loss due to loan defaults.\n\nThis example illustrates how bagging helps in managing high-variance models and improves prediction reliability, making it a valuable technique in critical financial decision-making processes.\nStudent: Ahh I think that makes sense! Can you give another example?\nTeaching Assistant: Another example of where ensemble methods can be effectively used is in weather forecasting. In this scenario, the goal is to predict weather conditions such as temperature, precipitation, and wind speed. Weather patterns are incredibly complex and influenced by a myriad of factors, making single model predictions often unreliable.\n\nHere's how ensemble methods are applied in weather forecasting:\n1. **Multiple Models**: Different meteorological models are used to predict weather conditions. These models may vary in terms of the factors they consider (like atmospheric conditions, ocean temperatures, etc.) and their computational approaches.\n2. **Model Training**: Each model is trained using historical weather data. This training allows each model to capture different aspects or patterns of weather behavior.\n3. **Aggregation of Predictions**: The predictions from each model are then combined to form a single forecast. This aggregation might be done by taking the average of the predictions or using more sophisticated methods like weighted averages where more accurate models are given higher weights.\n\nThe use of ensemble methods in weather forecasting helps in improving the accuracy and reliability of the predictions. By combining multiple models, forecasters can mitigate the risk of relying on a single model that might have specific biases or limitations. This method is crucial for making better-informed decisions in disaster management, agriculture planning, and daily life activities dependent on weather conditions.\n\nThis example demonstrates the versatility of ensemble methods and their applicability in critical real-world scenarios where decision-making is heavily reliant on predictive accuracy."

student_name = "August"
student_course = "IntroToMachineLearning"
student_subject = "Linear Regression"

llm_model = init_llm_langsmith(llm_key=4, temp=0.5, langsmith_name="PMAS Agent Testing 1")
pmas = PMAS(llm_model=llm_model,
        course=student_course,
        subject=student_subject,
        student_name=student_name,
        )

res = pmas.predict(conversation=test_conversation)
print("This is the response from the model:\n", res)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe conversation indicates that the student, August, has specific learning preferences, particularly in understanding complex topics through concrete examples and step-by-step explanations.

1. **Learning through Examples**: August repeatedly asks for practical examples after receiving explanations about the concepts. This suggests a preference for learning through concrete examples that illustrate theoretical concepts.
2. **Step-by-Step Learning**: The student seems to benefit from detailed, step-by-step breakdowns of processes, as shown by the Teaching Assistant's detailed explanation of bagging and its steps, which appeared to clarify the concept for August.

**Indicative Parts of the Conversation**:
- **Learning through Examples**: "Can you give an example of a case in which bagging can be used?" This question shows August's need to understand concepts through practical applications.
- **Step-by-Step Learning**: The detai

## Debugging the Long Term Memory Module

In [1]:
from thesis2024.utils import init_llm_langsmith
from thesis2024.PMAS import LongTermMemory
langsmith_name = "LangMem TEST 1 "
llm_model = init_llm_langsmith(llm_key=40, temp=0.5, langsmith_name=langsmith_name)

ltm_class = LongTermMemory(user_name="AugustSemrau_PMAS_Test1")
print("User:\n", ltm_class.get_user_data())
print("\nCurrent thread: ", ltm_class.thread_id)
past_threads = ltm_class.past_thread_ids
print("Past threads: ", past_threads)

functions = ltm_class.client.list_memory_functions()
print("\nMemory Functions: ")
for function in functions:
    print(function)

summaries = ltm_class.get_thread_summaries()
print("\nThese are the thread summaries:\n\n", summaries)

# print("\nPast Thread Memories:\n")
# for thread in past_threads:
#     print(thread)
#     print(ltm_class.client.list_thread_memory(thread_id=str(thread))[0])

User:
 {'id': '8091e3a1-6209-42f1-ac2b-81a94ea27bd8', 'name': 'AugustSemrau_PMAS_Test1-8091'}

Current thread:  fd7c676c-a221-428d-a2ab-4d028a6ee305
Past threads:  ['274c778d-421a-4572-abe6-cb39cf3ec52c', '7d326584-c235-4be2-ac66-784699cc25d1', '3c068f9f-e601-4bb0-8927-5ba7aecef5a6', 'd43cd194-0fb0-4075-a257-0be0505b2b83', 'f8df743a-893d-4bf1-bec5-5962cfbb3292', 'ae58788c-cd47-4a79-88e5-de10bbb89602', 'ed48ce37-b338-4247-bab8-500128767511', '6a966685-e760-41cb-b9dd-356a15ba6ba6', 'fe73add6-f1f9-4796-befe-4a7681427a4f', '28421c1b-4ab9-4b7e-aa81-24b49cafb598']

Memory Functions: 
{'id': 'f9a97c15-b2c8-43f7-9c61-1db658173f1d', 'type': 'user_semantic_memory', 'custom_instructions': None, 'schema': {'name': 'SemanticUserMemory', 'description': 'Remember key information about the user based on the conversation.', 'parameters': {'title': 'extractTriplet', 'description': 'Extract RDF triples from the conversation.', 'type': 'object', 'properties': {'triples': {'title': 'Triples', 'description'

## Get the Student Profile

In [2]:
user_state = ltm_class.get_user_state()
print("User state:\n\n", user_state)



User Profile:  {'education': 'Intro to Machine Learning course', 'student_name': 'August'}
User state:

 {'education': 'Intro to Machine Learning course', 'student_name': 'August'}


## Get Semantic Memories for the Student in all threads

In [3]:
memories = ltm_class.get_user_semantic_memories(query="What university does the user attend?")
print("These are the memories:\n\n",memories)

These are the memories:

 August needs assistance with Intro to Machine Learning course.
August is learning about methods like linear regression.
August is learning about linear regression methods.
August is learning methods like linear regression.
August is learning about linear regression.
August interested in methods like linear regression.
AugustSemrau_PMAS_Test1-8091 is called August.
August needs to understand relevant formulas for an exam.
August asked for trial exam questions for practice.
August needs to be able to describe relevant formulas for an exam


## Get student learning preferences

In [2]:
learning_preferences = ltm_class.get_learning_preference_memories()
print("These are the learning preferences:\n\n", learning_preferences)



Learning preference memories:  {'user_id': '8091e3a1-6209-42f1-ac2b-81a94ea27bd8', 'memories': [], 'state': None, 'thread_summaries': None}
These are the learning preferences:

 


## Student subject comprehension

In [5]:
subject_comprehension = ltm_class.get_subject_comprehension_memories()
print("These are the subject comprehension memories:\n\n", subject_comprehension)



Subject comprehension memories:  {'user_id': '8091e3a1-6209-42f1-ac2b-81a94ea27bd8', 'memories': [], 'state': None, 'thread_summaries': None}
These are the subject comprehension memories:

 


## Reset memory functions

In [None]:
## Cleanup
functions = ltm_class.client.list_memory_functions()

for func in functions:
    if func["type"] == "user_semantic_memory":
        continue
    ltm_class.client.delete_memory_function(func["id"])

# Redudnant

In [None]:
ltm_class.save_conversation_step(user_query="Hello I am August and I study computer science at the Technical University of Denmark. I come from Jutland in Denmark. I do not understand math, and NEED to be explained technical concepts in words, not formulas.", llm_response="Hello August, how can I assist you today?")
ltm_class.save_conversation_step(user_query="I am studying the course IntroToMachineLearning and am trying to learn about the subject Cross-validation.\nI am having a hard time understanding the course material revolving around this subject. Can you help me understand it better by rewriting it to my learning preferences, which are the following:\nI prefer text-based explanations and metaphors in order to understand technical concepts?", 
                                 llm_response="Certainly, August! Let's dive into the concept of cross-validation using a metaphor that might help clarify things.\n\nImagine you're a chef trying to perfect a new recipe. Before you serve it at a big dinner party, you want to make sure it tastes just right and appeals to a variety of palates. To do this, you could cook the dish several times, each time using a different group of friends as taste-testers. Each group provides feedback, and you tweak the recipe accordingly. This way, you ensure the final version will be well-received by a diverse crowd.\n\nIn machine learning, cross-validation works in a similar way. When you build a model, you want it to perform well not just on the data you used to create it, but on new, unseen data as well. Cross-validation helps you ensure this by dividing your data into several smaller sets or \"folds\". Here's how it typically works:\n\n1. **Splitting the Data**: Just like dividing your friends into different tasting groups, you split your dataset into, say, 5 parts (or any other number, commonly referred to as \"k\"). Each part is called a fold.\n\n2. **Training and Validating**: You then train your model on 4 of these folds, using them as your training data (like cooking for four groups of friends). The remaining fold serves as your validation data, which is like the fifth group of friends you haven't cooked for yet. This group is used to evaluate how well your model performs.\n\n3. **Repeating the Process**: Just as you would rotate the role of taste-tester among your groups of friends, you rotate the validation fold in cross-validation. Each of the 5 folds gets a turn to be the validation set, with the model being trained on the other 4 folds each time.\n\n4. **Averaging the Results**: In the end, you average the performance metrics (like accuracy) from each of the 5 trials. This gives you a good idea of how well your model is likely to perform on unseen data.\n\nThis process helps prevent a common problem in machine learning called \"overfitting,\" where a model performs well on its training data but poorly on new, unseen data. It's like making sure your dish doesn’t just appeal to one group of friends but to all kinds of guests.\n\nI hope this metaphor helps you understand the concept of cross-validation better! If you have any more questions or need further details, feel free to ask.")

ltm_class.save_conversation_step(user_query="I think that makes it pretty clear for me! Thank you for the explanation.",
                                 llm_response="That's great to hear, August! I'm glad the metaphor helped clarify things for you. If you have any more questions or need further assistance with cross-validation or any other topic, don't hesitate to reach out. I'm here to help!")

ltm_class.save_conversation_step(user_query="What is the height of Mount Everest?.", llm_response="I will figure out.")



In [None]:
short_conversation = ("Student: Hello \n"
    " Teacher: Hello, how are you? \n"
    " Student: Good, but can you help me understand what gradient descent is? \n"
    " Teacher: Sure, I can help you with that. Gradient descent is an optimization algorithm used to minimize the cost function in machine learning. It works by iteratively updating the parameters of a model in the direction of the negative gradient of the cost function. This process is repeated until the algorithm converges to a local minimum of the cost function. Does that help? \n"
    " Student: Yes, thank you for explaining that. Can you also explain how gradient descent is used in training neural networks? \n"
    " Teacher: Of course! In training neural networks, gradient descent is used to update the weights and biases of the network in order to minimize the loss function. The gradient of the loss function with respect to the weights and biases is calculated using backpropagation, and then the weights and biases are updated in the direction of the negative gradient. This process is repeated for each batch of training data until the network converges to a set of weights and biases that minimize the loss function. Does that make sense? \n"
    " Student: Yes, that makes sense. Thank you for explaining it so clearly. \n"
    " Teacher: You're welcome! If you have any more questions, feel free to ask. I'm here to help. \n"
    " Student: Thank you, I will keep that in mind.")