## AI Tutor
This is a basic AI tutor that can take any techincal user query as input and respond with best of its capability subject to the limitation of the Frontier model used behind the scenes.

### Define imports and model

In [1]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display
from openai import OpenAI

load_dotenv(override=True)

MODEL = 'gemini-2.5-flash-lite'
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"
google_api_key = os.getenv("GOOGLE_API_KEY")
gemini = OpenAI(base_url=GEMINI_BASE_URL, api_key=google_api_key)

In [2]:
from prompts.ai_tutor import SYSTEM_PROMPT, USER_PROMPT

In [3]:
ERROR_TEMPLATE = "I'm sorry, I couldn't find an answer to your question.\nError: {error_message}\n"
ERROR_MSG_NO_RESPONSE = "No response from AI Tutor"
ERROR_MSG_NO_QUERY = "Query not found. Please ask a technical question!"

### Take user query and pass it to the AI Tutor

In [4]:
def get_response(question: str):
    """
    This function takes a user query and passes it to the AI Tutor.
    Args:
        question (str): user query
    Returns:
        response (str): response from the AI Tutor prompt
    """
    if not question:
        return ERROR_TEMPLATE.format(error_message=ERROR_MSG_NO_QUERY)
    response = gemini.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": USER_PROMPT.format(question=question)}
        ]
    )
    return response.choices[0].message.content if response.choices[0] else \
        ERROR_TEMPLATE.format(error_message=ERROR_MSG_NO_RESPONSE)

In [5]:
# Example 1: Machine learning question
display(Markdown(get_response("Explain what overfitting means in machine learning.")))

Imagine you're studying for a very specific history test. You memorize every single date, name, and event in your textbook. You ace that test! However, when you take a different history test on the same general topic but with slightly different questions or focusing on a different aspect, you struggle. You only learned the specific details for the first test, not the underlying historical trends or broader concepts.

This is a great analogy for **overfitting** in machine learning.

In machine learning, overfitting occurs when a model learns the training data *too well*, including its noise and random fluctuations, to the point where it **fails to generalize to new, unseen data.**

Here's a breakdown of what that means:

*   **The Model is Too Complex:** An overfitted model often has too many parameters or is too complex for the amount of data it has. It essentially has the capacity to "memorize" the training examples instead of learning the underlying patterns.

*   **High Accuracy on Training Data, Low Accuracy on Test Data:**
    *   **Training Accuracy:** The model performs exceptionally well on the data it was trained on. It can predict the outcomes for the training examples with very high precision.
    *   **Test Accuracy (or Generalization Accuracy):** When you introduce new, unseen data (the "test set"), the model's performance drops significantly. It makes many incorrect predictions because it's not equipped to handle variations it hasn't explicitly memorized.

*   **Learning Noise:** Think of noise as random errors or irrelevant details in your data. An overfitted model mistakenly treats these noisy elements as important patterns.

**Why is Overfitting a Problem?**

The goal of machine learning is to build models that can make accurate predictions on *future, unknown data*. An overfitted model is essentially useless in a real-world scenario because it can't reliably perform its intended task on new inputs. It's like having a highly specialized tool that only works on one specific bolt and nothing else.

**Visualizing Overfitting (Example: Regression)**

Imagine you're trying to fit a line to a set of data points.

*   **Underfitting:** A simple straight line might not capture the underlying trend in the data if the trend is curved. The model is too simple.
*   **Good Fit:** A gentle curve that follows the general trend of the data points. This model generalizes well.
*   **Overfitting:** A highly wiggly line that passes through *every single* data point perfectly, including any outliers or random deviations. This line will likely be far off when you try to predict new points that don't fall exactly on this intricate path.

**Common Causes of Overfitting:**

*   **High Model Complexity:** Using a model with too many parameters (e.g., a very deep neural network for a simple problem).
*   **Insufficient Training Data:** When you don't have enough data, the model can more easily memorize the limited examples.
*   **Training for Too Long:** In iterative learning algorithms (like neural networks), training for too many epochs can lead to overfitting.
*   **Noisy Data:** The presence of errors or irrelevant information in the training set.

**How to Combat Overfitting:**

*   **More Data:** Increasing the size and diversity of the training dataset.
*   **Simpler Models:** Choosing models with fewer parameters or less complexity.
*   **Regularization Techniques:** These techniques add penalties to the model's loss function for having large parameter values, encouraging simpler solutions (e.g., L1 and L2 regularization in neural networks).
*   **Cross-Validation:** A technique to evaluate a model's performance on different subsets of the data, giving a more robust estimate of its generalization ability.
*   **Early Stopping:** Monitoring the model's performance on a validation set during training and stopping when the performance starts to degrade (indicating overfitting).
*   **Feature Selection/Engineering:** Choosing only the most relevant features and discarding irrelevant ones.

In essence, overfitting is like a student who memorizes answers without understanding the concepts. They might do well on the exact questions they've seen before, but they'll falter when faced with new challenges. The goal in machine learning is to create models that truly *learn* and can adapt to new situations.

In [6]:
# Example 2: Python question
display(Markdown(get_response("Explain what is the difference between a tuple and a list in Python.")))

In Python, both **lists** and **tuples** are used to store collections of items. However, they have one fundamental difference that dictates their use cases:

**Mutability:**

*   **Lists are mutable:** This means you can change their contents after they are created. You can add, remove, or modify elements within a list.
*   **Tuples are immutable:** Once a tuple is created, you cannot change its contents. You cannot add, remove, or modify elements within a tuple.

Let's break down the key differences with examples:

| Feature        | List                                     | Tuple                                     |
| :------------- | :--------------------------------------- | :---------------------------------------- |
| **Mutability** | Mutable (can be changed)                 | Immutable (cannot be changed)             |
| **Syntax**     | Defined using square brackets `[]`       | Defined using parentheses `()`            |
| **Performance**| Generally slightly slower due to mutability overhead | Generally slightly faster due to immutability |
| **Use Cases**  | Collections that need to be modified, e.g., storing user input, dynamic data. | Collections that should not be changed, e.g., coordinates, database records, dictionary keys. |
| **Methods**    | Has methods like `append()`, `extend()`, `insert()`, `remove()`, `pop()`, `sort()`. | Has fewer methods, primarily `count()` and `index()`. |

---

### Examples:

**1. Creating Lists and Tuples:**

```python
# A list
my_list = [1, 2, 3, "hello", 3.14]
print(f"My list: {my_list}") # Output: My list: [1, 2, 3, 'hello', 3.14]

# A tuple
my_tuple = (1, 2, 3, "world", 2.71)
print(f"My tuple: {my_tuple}") # Output: My tuple: (1, 2, 3, 'world', 2.71)
```

**2. Modifying Lists vs. Tuples:**

```python
# Modifying a list
my_list = [10, 20, 30]
my_list.append(40)
my_list[0] = 5
print(f"Modified list: {my_list}") # Output: Modified list: [5, 20, 30, 40]

# Trying to modify a tuple (will result in an error)
my_tuple = (10, 20, 30)
# my_tuple.append(40) # This would raise an AttributeError
# my_tuple[0] = 5     # This would raise a TypeError
print(f"Original tuple: {my_tuple}")
```

**3. Using Tuples as Dictionary Keys:**

Since tuples are immutable, they can be used as keys in dictionaries, whereas lists cannot.

```python
# Valid dictionary with tuple keys
coordinates = {
    (10, 20): "Office",
    (30, 40): "Home"
}
print(f"Coordinates dictionary: {coordinates}")

# Invalid dictionary with list keys (will raise a TypeError)
# invalid_dict = {
#     [10, 20]: "Office",
#     [30, 40]: "Home"
# }
```

**4. When to Use Which:**

*   **Use Lists when:**
    *   You need a collection of items that you expect to change over time (add, remove, update).
    *   You are storing data that is inherently dynamic.

*   **Use Tuples when:**
    *   You want to ensure that the data in your collection remains unchanged. This is good for data integrity and can help prevent accidental modifications.
    *   You are representing a fixed set of related items, like coordinates (x, y), RGB color values (red, green, blue), or database records.
    *   You need to use the collection as a key in a dictionary.
    *   You want a slight performance advantage for collections that won't change.

In summary, the core distinction is **mutability**. Lists are flexible and changeable, while tuples are rigid and fixed. This fundamental difference influences their behavior, performance, and the scenarios in which they are best applied.