diff --git a/docs/machine-learning/programming-fundamentals/basic-syntax/conditionals.mdx b/docs/machine-learning/programming-fundamentals/basic-syntax/conditionals.mdx index e69de29..28206a5 100644 --- a/docs/machine-learning/programming-fundamentals/basic-syntax/conditionals.mdx +++ b/docs/machine-learning/programming-fundamentals/basic-syntax/conditionals.mdx @@ -0,0 +1,129 @@ +--- +title: Conditionals and Branching +sidebar_label: Conditionals +description: "Mastering If, Else, and Elif statements to control program flow and handle logic in Machine Learning pipelines." +tags: [python, programming, logic, conditionals, branching, mathematics-for-ml] +--- + +Machine Learning is often about making decisions. **Conditionals** allow our code to react differently depending on the input. Whether it's checking if a dataset is empty or deciding which model to load, `if-else` logic is the foundation of programmatic decision-making. + +## 1. The `if`, `elif`, and `else` Structure + +Python uses indentation to define the scope of conditional blocks. + +```python +accuracy = 0.85 + +if accuracy > 0.90: + print("Excellent model performance!") +elif accuracy > 0.70: + print("Good performance, but could be improved.") +else: + print("Model needs retraining.") + +``` + +```mermaid +flowchart TD + Start([Check Accuracy]) --> C1{Acc > 0.90?} + C1 -- Yes --> R1[Print: Excellent] + C1 -- No --> C2{Acc > 0.70?} + C2 -- Yes --> R2[Print: Good] + C2 -- No --> R3[Print: Retrain] + +``` + +## 2. Comparison and Logical Operators + +Conditionals rely on boolean expressions that evaluate to either `True` or `False`. + +### A. Comparison Operators + +* `==` (Equal to) +* `!=` (Not equal to) +* `>` / `<` (Greater/Less than) +* `>=` / `<=` (Greater/Less than or equal to) + +### B. Logical Operators (Chaining) + +* `and`: Both conditions must be True. +* `or`: At least one condition must be True. +* `not`: Reverses the boolean value. + +```python +# Check if learning rate is within a safe range +lr = 0.001 +if lr > 0 and lr < 0.1: + print("Learning rate is valid.") + +``` + +## 3. The "ReLU" Example: Math meets Logic + +One of the most famous conditional operations in Deep Learning is the **Rectified Linear Unit (ReLU)** activation function. + +$$ +\text{ReLU}(x) = \max(0, x) +$$ + +In Python code, this is a simple conditional: + +```python +def relu(x): + if x > 0: + return x + else: + return 0 + +``` + +## 4. Truthiness and Identity + +In ML data cleaning, we often check if a variable actually contains data. + +* **Falsy values:** `None`, `0`, `0.0`, `""` (empty string), `[]` (empty list), `{}` (empty dict). +* **Truthy values:** Everything else. + +```python +features = get_features() + +if not features: + print("Warning: No features found in dataset!") + +``` + +### `is` vs `==` + +* `==` checks for **Value equality** (Are the numbers the same?). +* `is` checks for **Identity** (Are they the exact same object in memory?). + +## 5. Inline Conditionals (Ternary Operator) + +For simple logic, Python allows a one-liner known as a ternary operator. + +```python +# status = "Spam" if probability > 0.5 else "Not Spam" +prediction = "Positive" if y_hat > 0.5 else "Negative" + +``` + +## 6. Match-Case (Python 3.10+) + +For complex branching based on specific patterns (like different file extensions or model types), the `match` statement provides a cleaner syntax than multiple `elif` blocks. + +```python +optimizer_type = "adam" + +match optimizer_type: + case "sgd": + print("Using Stochastic Gradient Descent") + case "adam": + print("Using Adam Optimizer") + case _: + print("Using Default Optimizer") + +``` + +--- + +Decision-making is key, but to keep our code clean, we shouldn't repeat our logic. We need to wrap our conditionals and loops into reusable components. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/basic-syntax/data-structures.mdx b/docs/machine-learning/programming-fundamentals/basic-syntax/data-structures.mdx index e69de29..8355139 100644 --- a/docs/machine-learning/programming-fundamentals/basic-syntax/data-structures.mdx +++ b/docs/machine-learning/programming-fundamentals/basic-syntax/data-structures.mdx @@ -0,0 +1,100 @@ +--- +title: Data Structures +sidebar_label: Data Structures +description: "Mastering Python's built-in collections: Lists, Tuples, Dictionaries, and Sets, and their specific roles in data science pipelines." +tags: [python, data-structures, lists, dictionaries, tuples, sets, mathematics-for-ml] +--- + +While basic types hold single values, **Data Structures** allow us to group, organize, and manipulate large amounts of information. In ML, choosing the wrong structure can lead to code that runs $100\times$ slower than it should. + +## 1. Lists: The Versatile Workhorse + +A **List** is an ordered, mutable collection of items. Think of it as a flexible array that can grow or shrink. + +* **Syntax:** `my_list = [0.1, 0.2, 0.3]` +* **ML Use Case:** Storing the history of "Loss" values during training so you can plot them later. + +```python +losses = [] +for epoch in range(10): + current_loss = train_step() + losses.append(current_loss) # Dynamic growth + +``` + +## 2. Tuples: The Immutable Safeguard + +A **Tuple** is like a list, but it **cannot be changed** after creation (immutable). + +* **Syntax:** `shape = (224, 224, 3)` +* **ML Use Case:** Defining image dimensions or model architectures. Since these shouldn't change accidentally during execution, a tuple is safer than a list. + +```mermaid +graph LR + L[List: Mutable] --> L_Edit["Can change: my_list[0] = 5"] + T[Tuple: Immutable] --> T_Edit["Error: 'tuple' object does not support assignment"] + style T fill:#fffde7,stroke:#fbc02d,color:#333 + +``` + +## 3. Dictionaries: Key-Value Mapping + +A **Dictionary** stores data in pairs: a unique **Key** and its associated **Value**. It uses a "Hash Table" internally, making lookups incredibly fast ( complexity). + +* **Syntax:** `params = {"learning_rate": 0.001, "batch_size": 32}` +* **ML Use Case:** Managing hyperparameters or mapping integer IDs back to human-readable text labels. + +```mermaid +graph TD + Key["Key: 'Cat'"] --> Hash["Hash Function"] + Hash --> Index["Memory Index: 0x42"] + Index --> Val["Value: [0.98, 0.02, ...]"] + style Hash fill:#e1f5fe,stroke:#01579b,color:#333 + +``` + +## 4. Sets: Uniqueness and Logic + +A **Set** is an unordered collection of **unique** items. + +* **Syntax:** `classes = {"dog", "cat", "bird"}` +* **ML Use Case:** Finding the unique labels in a messy dataset or performing mathematical operations like Union and Intersection on feature sets. + +## 5. Performance Comparison + +Choosing the right structure is about balancing **Speed** and **Memory**. + +| Feature | List | Tuple | Dictionary | Set | +| --- | --- | --- | --- | --- | +| **Ordering** | Ordered | Ordered | Ordered (Python 3.7+) | Unordered | +| **Mutable** | **Yes** | No | **Yes** | **Yes** | +| **Duplicates** | Allowed | Allowed | Keys must be unique | Must be unique | +| **Search Speed** | (Slow) | (Slow) | (Very Fast) | (Very Fast) | + +```mermaid +xychart-beta + title "Search Speed (Lower is Better)" + x-axis ["List", "Tuple", "Set", "Dict"] + y-axis "Time Complexity" 0 --> 120 + bar [100, 100, 5, 5] + +``` + +## 6. Slicing and Indexing + +In ML, we often need to "slice" our data (e.g., taking the first 80% for training and the last 20% for testing). + +$$ +\text{Syntax: } \text{data}[\text{start} : \text{stop} : \text{step}] +$$ + +```python +data = [10, 20, 30, 40, 50, 60] +train = data[:4] # [10, 20, 30, 40] +test = data[4:] # [50, 60] + +``` + +--- + +Now that we can organize data, we need to control the flow of our program—making decisions based on that data and repeating tasks efficiently. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/basic-syntax/exceptions.mdx b/docs/machine-learning/programming-fundamentals/basic-syntax/exceptions.mdx index e69de29..335fddc 100644 --- a/docs/machine-learning/programming-fundamentals/basic-syntax/exceptions.mdx +++ b/docs/machine-learning/programming-fundamentals/basic-syntax/exceptions.mdx @@ -0,0 +1,115 @@ +--- +title: Exception Handling +sidebar_label: Exceptions +description: "Learning to handle errors gracefully in Python to build robust and fault-tolerant Machine Learning pipelines." +tags: [python, programming, exceptions, error-handling, debugging, mathematics-for-ml] +--- + +In Machine Learning, things often go wrong: a dataset file is missing, a GPU runs out of memory, or a feature contains a `NaN` (Not a Number) that crashes a calculation. **Exception Handling** allows your program to "fail gracefully" rather than crashing completely. + +## 1. The Try-Except Block + +The basic tool for handling errors is the `try...except` block. You "try" a piece of code, and if it raises an error, the "except" block catches it. + +```python +try: + # Attempting to load a large dataset + data = load_dataset("huge_data.csv") +except FileNotFoundError: + print("Error: The dataset file was not found. Please check the path.") + +``` + +```mermaid +flowchart TD + Start([Start Try Block]) --> Op[Execute Code] + Op --> Success{Error Occurred?} + Success -- No --> End([Continue Program]) + Success -- Yes --> Catch[Match Exception Type] + Catch --> Handle[Execute Except Block] + Handle --> End + style Catch fill:#ffebee,stroke:#c62828,color:#333 + +``` + +## 2. Handling Multiple Exceptions + +Different operations can fail in different ways. You can catch specific errors to provide tailored solutions. + +* **`ValueError`**: Raised when a function receives an argument of the right type but inappropriate value (e.g., trying to take the square root of a negative number). +* **`TypeError`**: Raised when an operation is applied to an object of inappropriate type. +* **`ZeroDivisionError`**: Common in manual normalization logic. + +```python +try: + result = total_loss / num_samples +except ZeroDivisionError: + result = 0 + print("Warning: num_samples was zero. Setting loss to 0.") +except TypeError: + print("Error: Check if total_loss and num_samples are numbers.") + +``` + +## 3. The Full Lifecycle: `else` and `finally` + +To build truly robust pipelines (like those that open and close database connections), we use the extended syntax: + +1. **`try`**: The code that might fail. +2. **`except`**: Code that runs only if an error occurs. +3. **`else`**: Code that runs only if **no** error occurs. +4. **`finally`**: Code that runs **no matter what** (perfect for closing files or releasing GPU memory). + +```python +try: + file = open("model_weights.bin", "rb") + weights = file.read() +except IOError: + print("Could not read file.") +else: + print("Weights loaded successfully.") +finally: + file.close() + print("File resource released.") + +``` + +## 4. Raising Exceptions + +Sometimes, you *want* to stop the program if a specific condition isn't met. For example, if a user provides a negative learning rate. + +```python +def set_learning_rate(lr): + if lr <= 0: + raise ValueError(f"Learning rate must be positive. Received: {lr}") + return lr + +``` + +## 5. Exceptions in ML Data Pipelines + +In production ML, we use exceptions to ensure data quality. + +```mermaid +graph LR + Input[Data Source] --> Check{Check Data} + Check -->|Valid| Train[Start Training] + Check -->|Corrupted| Exc[Raise DataValidationError] + Exc --> Log[Log Error to Dashboard] + Exc --> Fallback[Use Last Known Good Data] + style Exc fill:#ffc107,stroke:#ff8f00,color:#333 + +``` + +## 6. Summary of Common ML Exceptions + +| Exception | When it happens in ML | +| --- | --- | +| **`IndexError`** | Trying to access a non-existent column or row index in an array. | +| **`KeyError`** | Looking for a hyperparameter in a config dictionary that doesn't exist. | +| **`AttributeError`** | Calling a method (like `.predict()`) on a model that hasn't been trained yet. | +| **`MemoryError`** | Loading a dataset that is larger than the available RAM. | + +--- + +Handling errors ensures your code doesn't crash, but how do we organize our code so it's easy to read and maintain? Let's explore the world of Classes and Objects. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/basic-syntax/functions.mdx b/docs/machine-learning/programming-fundamentals/basic-syntax/functions.mdx index e69de29..5174ffe 100644 --- a/docs/machine-learning/programming-fundamentals/basic-syntax/functions.mdx +++ b/docs/machine-learning/programming-fundamentals/basic-syntax/functions.mdx @@ -0,0 +1,121 @@ +--- +title: Functions and Scope +sidebar_label: Functions +description: "Mastering reusable code blocks in Python: defining functions, handling arguments, and understanding global vs. local scope in ML workflows." +tags: [python, programming, functions, scope, modularity, mathematics-for-ml] +--- + +In Machine Learning, we often repeat complex logic—calculating the distance between points, normalizing features, or computing gradients. **Functions** allow us to package this logic into reusable blocks, reducing errors and making our code "DRY" (Don't Repeat Yourself). + +## 1. Anatomy of a Function + +A function takes an **input** (parameters), performs an **action**, and returns an **output**. + +```python +def calculate_mse(y_true, y_pred): + """Calculates Mean Squared Error.""" + error = (y_true - y_pred) ** 2 + return error.mean() + +``` + +```mermaid +graph LR + In["Input: $$y_{true}, y_{pred}$$"] --> Logic["Function Logic: $$(y - \hat{y})^2$$"] + Logic --> Out["Output: Mean Error"] + style Logic fill:#e1f5fe,stroke:#01579b,color:#333 +``` + +## 2. Arguments and Parameters + +Python offers flexible ways to pass data into functions, which is essential for managing dozens of hyperparameters. + +### A. Positional vs. Keyword Arguments + +* **Positional:** Order matters. +* **Keyword:** Explicitly naming parameters (safer and more readable). + +```python +def train_model(learning_rate, epochs): + print(f"LR: {learning_rate}, Epochs: {epochs}") + +# Keyword arguments are preferred in ML for clarity +train_model(epochs=100, learning_rate=0.001) + +``` + +### B. Default Values + +Useful for hyperparameters that have a standard "sane" default. + +```python +def initialize_weights(size, distribution="normal"): + # If distribution isn't provided, it defaults to "normal" + pass + +``` + +## 3. Lambda Functions (Anonymous Functions) + +For simple, one-line operations, Python uses **Lambda** functions. These are frequently used in data cleaning with `pandas`. + +$$ +\text{Syntax: } \text{lambda } \text{arguments} : \text{expression} +$$ + +```python +# Convert Celsius to Fahrenheit for a feature +c_to_f = lambda c: (c * 9/5) + 32 +print(c_to_f(0)) # 32.0 + +``` + +## 4. Understanding Scope + +**Scope** determines where a variable can be seen or accessed. + +```mermaid +graph TD + Global["Global Scope: Variables defined outside any function"] --> Local["Local Scope: Variables defined inside a function"] + Local -->|Access| Global + Global -.->|Cannot Access| Local + +``` + +* **Global Scope:** Variables available throughout the entire script (e.g., a dataset loaded at the top). +* **Local Scope:** Variables created inside a function (e.g., a temporary calculation). They "die" once the function finishes. + +Bubble visualization + +## 5. Args and Kwargs (`*args`, `**kwargs`) + +In advanced ML libraries (like Scikit-Learn), you'll see these used to pass a variable number of arguments. + +* `*args`: Passes a list of positional arguments. +* `**kwargs`: Passes a dictionary of keyword arguments. + +```python +def build_layer(**hyperparams): + for key, value in hyperparams.items(): + print(f"{key}: {value}") + +build_layer(units=64, activation="relu", dropout=0.2) + +``` + +## 6. Functions as First-Class Citizens + +In Python, you can pass a function as an argument to *another* function. This is how we pass different **activation functions** or **optimizers** into a training function. + +```python +def apply_activation(value, func): + return func(value) + +# Passing the 'relu' function as data +output = apply_activation(10, relu) + +``` + +--- + +Functions help us stay organized, but sometimes code fails due to unexpected data or math errors. We need a way to catch those errors without crashing our entire training pipeline. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/basic-syntax/loops.mdx b/docs/machine-learning/programming-fundamentals/basic-syntax/loops.mdx index e69de29..4a06e81 100644 --- a/docs/machine-learning/programming-fundamentals/basic-syntax/loops.mdx +++ b/docs/machine-learning/programming-fundamentals/basic-syntax/loops.mdx @@ -0,0 +1,124 @@ +--- +title: Loops and Iteration +sidebar_label: Loops +description: "Mastering For loops, While loops, and the logic of iteration in Machine Learning pipelines." +tags: [python, programming, loops, iteration, training-loop, mathematics-for-ml] +--- + +In Machine Learning, we rarely do something once. We repeat operations thousands of times: passing data through a network, updating weights, or pre-processing thousands of images. This repetition is handled by **Loops**. + +## 1. The `for` Loop: Iterating over Sequences + +The `for` loop is the most common loop in Python. It iterates over any "iterable" (like a list, tuple, or dictionary). + +* **ML Use Case:** Iterating through a list of filenames to load images. + +```python +models = ["Linear", "SVM", "RandomForest"] + +for model in models: + print(f"Training {model}...") + +``` + +### Using `range()` + +When you need to repeat an action a specific number of times (like **Epochs**), use the `range()` function. + +```python +# Training for 5 epochs +for epoch in range(5): + print(f"Epoch {epoch + 1}/5") + +``` + +## 2. The `while` Loop: Conditional Iteration + +A `while` loop continues as long as a certain condition is `True`. + +* **ML Use Case:** **Early Stopping**. You might want to keep training a model until the error (loss) drops below a specific threshold. + +```python +loss = 1.0 +threshold = 0.01 + +while loss > threshold: + loss -= 0.005 # Simulate model learning + print(f"Current Loss: {loss:.4f}") + +``` + +```mermaid +flowchart TD + Start([Start Loop]) --> Cond{Loss > Threshold?} + Cond -- Yes --> Train[Train Model & Update Loss] + Train --> Cond + Cond -- No --> Stop([Stop: Convergence reached]) + style Cond fill:#fff3e0,stroke:#ef6c00,color:#333 + +``` + +## 3. Loop Control: `break` and `continue` + +Sometimes you need to alter the flow inside a loop: + +* **`break`**: Exits the loop entirely. (e.g., if the model starts over-fitting). +* **`continue`**: Skips the rest of the current block and moves to the next iteration. (e.g., skipping a corrupted image file). + +```python +for image in dataset: + if image.is_corrupted: + continue # Move to next image + process(image) + +``` + +## 4. The "ML Training Loop" Pattern + +In deep learning (PyTorch/TensorFlow), you will almost always see this nested structure: + +```mermaid +block-beta + columns 1 + block:outer["Outer Loop: Epochs (Going through the whole dataset)"] + block:inner["Inner Loop: Batches (Processing chunks of data)"] + Step["Forward Pass -> Calculate Loss -> Update Weights"] + end + end + style outer fill:#e1f5fe,stroke:#01579b,color:#333 + style inner fill:#f3e5f5,stroke:#7b1fa2,color:#333 + +``` + +## 5. Efficiency Tip: List Comprehensions + +Python offers a concise way to create lists using a single line of code. It is often faster and more readable than a standard `for` loop for simple transformations. + +**Standard Way:** + +```python +squared_errors = [] +for e in errors: + squared_errors.append(e**2) + +``` + +**Pythonic Way (List Comprehension):** + +```python +squared_errors = [e**2 for e in errors] + +``` + +## 6. The "Vectorization" Warning + +While loops are fundamental, **Standard Python loops are slow for mathematical operations.** If you are multiplying two matrices of size , a nested Python `for` loop will take seconds, while a **Vectorized** operation in NumPy will take milliseconds. + +| Operation | Python `for` loop | NumPy Vectorized | +| --- | --- | --- | +| **Summing 1M numbers** | ~50ms | ~1ms | +| **Matrix Multiplication** | $O(n^3)$ | Optimized BLAS/LAPACK | + +--- + +Loops help us repeat tasks, but how do we bundle those tasks into reusable blocks? This is where functions come in. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/basic-syntax/variables-and-data-types.mdx b/docs/machine-learning/programming-fundamentals/basic-syntax/variables-and-data-types.mdx index e69de29..9899bbd 100644 --- a/docs/machine-learning/programming-fundamentals/basic-syntax/variables-and-data-types.mdx +++ b/docs/machine-learning/programming-fundamentals/basic-syntax/variables-and-data-types.mdx @@ -0,0 +1,100 @@ +--- +title: Variables and Data Types +sidebar_label: Variables & Types +description: "Understanding Python's dynamic typing system, memory management, and the core data types essential for data science." +tags: [python, programming, variables, data-types, mutability, mathematics-for-ml] +--- + +In Python, variables are not "buckets" that hold values; they are **labels** (references) that point to objects in memory. In Machine Learning, understanding how these labels work is the difference between writing clean code and hunting for memory-leak bugs. + +## 1. Dynamic Typing + +Python is **dynamically typed**. You don't need to declare that a variable is an integer or a string; Python figures it out at runtime. + +```python +x = 0.01 # Initially a float (learning rate) +x = "Adam" # Now a string (optimizer name) + +``` + +```mermaid +graph LR + Label[Variable Label: 'x'] -->|Points to| Obj1["Float Object: $$0.01$$"] + Label -.->|Reassigned to| Obj2["String Object: 'Adam'"] + style Obj1 fill:#f5f5f5,stroke:#999,color:#333 + style Obj2 fill:#e1f5fe,stroke:#01579b,color:#333 +``` + +## 2. Fundamental Data Types in ML + +Every feature in your dataset will eventually be mapped to one of these fundamental Python types. + +### A. Numerical Types + +* **`int`**: Whole numbers (e.g., number of layers, epochs). +* **`float`**: Decimal numbers. Most weights and probabilities in ML are 64-bit or 32-bit floats. +* **`complex`**: Used in signal processing and Fourier transforms. + +### B. Boolean Type (`bool`) + +* Represents `True` or `False`. Used for binary masks (e.g., selecting rows in a dataset where `age > 30`). + +### C. Sequence Types + +* **`str`**: Strings are used for category labels or raw text in Natural Language Processing (NLP). +* **`list`**: A mutable, ordered sequence. +* **`tuple`**: An immutable (unchangeable) sequence. Often used for tensor shapes like `(3, 224, 224)`. + +## 3. Mutability: The "Gotcha" in ML + +Understanding **Mutability** is crucial when passing data through pre-processing functions. + +* **Mutable (Can be changed):** `list`, `dict`, `set`, `numpy.ndarray`. +* **Immutable (Cannot be changed):** `int`, `float`, `str`, `tuple`. + +```mermaid +graph TD + Data[Object in Memory] --> Mut[Mutable] + Data --> Immut[Immutable] + + Mut -->|Impact| M_Effect["Changing a copy changes the original if not 'deep-copied'"] + Immut -->|Impact| I_Effect["Any 'change' creates a brand new object"] + + style Mut fill:#fff3e0,stroke:#ef6c00,color:#333 + style Immut fill:#e8f5e9,stroke:#2e7d32,color:#333 + +``` + +:::warning ML Common Bug +If you pass a list of hyperparameters to a function and the function modifies that list, the original list *outside* the function will also be changed. Always use `.copy()` if you want to preserve the original data! +::: + +## 4. Type Casting + +In ML, we frequently need to convert data types (e.g., converting integer pixel values `0-255` to floats `0.0-1.0`). + +```python +# Converting types +pixel_val = 255 +normalized = float(pixel_val) / 255.0 + +# Checking types +print(type(normalized)) # Output: + +``` + +## 5. Summary Reference Table + +| Type | Example | Mutable? | Typical ML Usage | +| --- | --- | --- | --- | +| **int** | `32` | No | Batch size, epoch count. | +| **float** | `0.001` | No | Learning rate, weights. | +| **str** | `"cat"` | No | Target labels, text data. | +| **list** | `[1, 2, 3]` | **Yes** | Collecting loss values over time. | +| **tuple** | `(28, 28)` | No | Input dimensions of an image. | +| **dict** | `{"id": 1}` | **Yes** | Storing model configuration. | + + +--- + +Now that we know how to store single values and lists, we need to know how to organize them logically for complex tasks. Let's look at more advanced data structures. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/essential-libraries/matplotlib.mdx b/docs/machine-learning/programming-fundamentals/essential-libraries/matplotlib.mdx index e69de29..a9ac921 100644 --- a/docs/machine-learning/programming-fundamentals/essential-libraries/matplotlib.mdx +++ b/docs/machine-learning/programming-fundamentals/essential-libraries/matplotlib.mdx @@ -0,0 +1,113 @@ +--- +title: "Data Visualization: Matplotlib & Seaborn" +sidebar_label: Matplotlib +description: "Mastering the art of data visualization in Python: from basic line plots to complex statistical heatmaps." +tags: [python, matplotlib, seaborn, data-viz, eda, mathematics-for-ml] +--- + +In Machine Learning, a picture is worth a thousand rows of data. **Matplotlib** is the foundational "grandfather" library for plotting in Python, while **Seaborn** sits on top of it to provide beautiful, statistically-informed visualizations with much less code. + +## 1. The Anatomy of a Plot + +To master Matplotlib, you must understand its hierarchy. Every plot is contained within a **Figure**, which can hold one or more **Axes** (the actual plots). + +```mermaid +graph TD + Fig[Figure: The Window/Canvas] --> Ax1[Axes: Plot 1] + Fig --> Ax2[Axes: Plot 2] + + Ax1 --> Elements[Labels, Ticks, Legend, Data] + +``` + +## 2. Matplotlib: The Basics + +The most common interface is `pyplot`. It follows a state-machine logic similar to MATLAB. + +```python +import matplotlib.pyplot as plt + +# Data +epochs = [1, 2, 3, 4, 5] +loss = [0.9, 0.7, 0.5, 0.3, 0.2] + +# Plotting +plt.plot(epochs, loss, label='Training Loss', marker='o') +plt.title("Model Training Progress") +plt.xlabel("Epochs") +plt.ylabel("Loss") +plt.legend() +plt.show() + +``` + +## 3. Essential ML Plots + +In your ML workflow, you will constantly use these four types of visualizations: + +| Plot Type | Best Use Case | Library Choice | +| --- | --- | --- | +| **Line Plot** | Monitoring Loss/Accuracy over time (epochs). | Matplotlib | +| **Scatter Plot** | Finding correlations between two features ( vs ). | Seaborn | +| **Histogram** | Checking if a feature follows a **Normal Distribution**. | Seaborn | +| **Heatmap** | Visualizing a **Correlation Matrix** or **Confusion Matrix**. | Seaborn | + +## 4. Seaborn: Statistical Beauty + +Seaborn makes complex plots easy. It integrates directly with Pandas DataFrames and handles the labeling and coloring automatically. + +```python +import seaborn as sns + +# Load a built-in dataset +iris = sns.load_dataset("iris") + +# A single line to see relationships across all features +sns.pairplot(iris, hue="species") +plt.show() + +``` + +## 5. Visualizing Model Performance + +### The Heatmap (Confusion Matrix) + +A heatmap is the standard way to visualize where a classification model is getting confused. + +```python +# Assuming 'cm' is your confusion matrix array +sns.heatmap(cm, annot=True, cmap='Blues') +plt.xlabel("Predicted Label") +plt.ylabel("True Label") + +``` + +### Subplots + +Sometimes you need to compare multiple plots side-by-side (e.g., Training Loss vs. Validation Loss). + +```python +fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4)) + +ax1.plot(loss) +ax1.set_title("Loss") + +ax2.plot(accuracy) +ax2.set_title("Accuracy") + +``` + +## 6. The "Object-Oriented" vs "Pyplot" Debate + +* **`plt.plot()` (Pyplot):** Great for quick interactive exploration. +* **`fig, ax = plt.subplots()` (OO style):** Better for complex layouts and production scripts where you need fine-grained control over every element. + +## References for More Details + +* **[Matplotlib Plot Gallery](https://matplotlib.org/stable/gallery/index.html):** Finding code templates for literally any type of plot. + +* **[Seaborn Tutorial](https://seaborn.pydata.org/tutorial.html):** Learning how to visualize statistical relationships. + +--- + +Visualization is the final piece of our programming foundations. Now that you can process data with NumPy, clean it with Pandas, and visualize it with Matplotlib, you are ready to start building actual models. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/essential-libraries/numpy.mdx b/docs/machine-learning/programming-fundamentals/essential-libraries/numpy.mdx index e69de29..5d13877 100644 --- a/docs/machine-learning/programming-fundamentals/essential-libraries/numpy.mdx +++ b/docs/machine-learning/programming-fundamentals/essential-libraries/numpy.mdx @@ -0,0 +1,109 @@ +--- +title: "NumPy: Numerical Python" +sidebar_label: NumPy +description: "Mastering N-dimensional arrays, vectorization, and broadcasting: the foundational tools for numerical computing in ML." +tags: [python, numpy, arrays, vectorization, broadcasting, mathematics-for-ml] +--- + +If Python is the skeleton of Machine Learning, **NumPy** is the muscle. It is a library for scientific computing that introduces the **ndarray** (N-dimensional array), which is significantly faster and more memory-efficient than standard Python lists. + +## 1. Why NumPy? (Speed & Efficiency) + +Python lists are flexible but slow because they store pointers to objects scattered in memory. NumPy arrays store data in **contiguous memory blocks**, allowing the CPU to process them using SIMD (Single Instruction, Multiple Data). + +```mermaid +graph LR + List[Python List] --> L_Ptr["Scattered Memory (Slow)"] + Array[NumPy Array] --> A_Cont["Contiguous Block (Fast)"] + style Array fill:#e1f5fe,stroke:#01579b,color:#333 + +``` + +## 2. Array Anatomy and Shapes + +In ML, we describe data by its **Rank** (number of dimensions) and **Shape**. + +* **Scalar (Rank 0):** A single number. +* **Vector (Rank 1):** A line of numbers (e.g., a single sample's features). +* **Matrix (Rank 2):** A table of numbers (e.g., a whole dataset). +* **Tensor (Rank 3+):** Higher dimensional arrays (e.g., a batch of color images). + +```python +import numpy as np + +# Creating a 2D Matrix +data = np.array([[1, 2, 3], [4, 5, 6]]) +print(data.shape) # Output: (2, 3) -> 2 rows, 3 columns + +``` + +## 3. Vectorization + +**Vectorization** is the practice of replacing explicit `for` loops with array expressions. This is how we achieve high performance in Python. + +**Instead of this:** + +```python +# Slow: Element-wise addition with a loop +result = [] +for i in range(len(a)): + result.append(a[i] + b[i]) + +``` + +**Do this:** + +```python +# Fast: NumPy handles the loop in C +result = a + b + +``` + +## 4. Broadcasting: The Magic of NumPy + +Broadcasting allows NumPy to perform arithmetic operations on arrays with **different shapes**, provided they meet certain compatibility rules. + +```mermaid +graph TD + A["Matrix: (3, 3)"] + B["Scalar: (1,)"] + A -->|Add| B + B -->|Broadcast| B_Stretch["Stretched to (3, 3)"] + B_Stretch --> Result["Element-wise Sum"] + +``` + +**Example:** Adding a constant bias to every row in a dataset. + +```python +features = np.array([[10, 20], [30, 40]]) # Shape (2, 2) +bias = np.array([5, 5]) # Shape (2,) +result = features + bias # [[15, 25], [35, 45]] + +``` + +## 5. Critical ML Operations in NumPy + +| Operation | NumPy Function | ML Use Case | +| --- | --- | --- | +| **Dot Product** | `np.dot(a, b)` | Calculating weighted sums in a neuron. | +| **Reshaping** | `arr.reshape(1, -1)` | Changing an image from 2D to a 1D feature vector. | +| **Transposing** | `arr.T` | Aligning dimensions for matrix multiplication. | +| **Aggregations** | `np.mean()`, `np.std()` | Normalizing data (Standard Scaling). | +| **Slicing** | `arr[:, 0]` | Extracting a single column (feature) from a dataset. | + +## 6. Slicing and Masking + +NumPy allows for "Boolean Indexing," which is incredibly powerful for filtering data. + +```python +# Select all values in the array greater than 0.5 +weights = np.array([0.1, 0.8, -0.2, 0.9]) +positive_weights = weights[weights > 0] +# Result: [0.1, 0.8, 0.9] + +``` + +--- + +While NumPy handles the raw numbers, we need a way to manage data with column names, different data types, and missing values. For that, we turn to the most popular data manipulation library. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/essential-libraries/pandas.mdx b/docs/machine-learning/programming-fundamentals/essential-libraries/pandas.mdx index e69de29..08721e7 100644 --- a/docs/machine-learning/programming-fundamentals/essential-libraries/pandas.mdx +++ b/docs/machine-learning/programming-fundamentals/essential-libraries/pandas.mdx @@ -0,0 +1,121 @@ +--- +title: "Pandas: Data Manipulation" +sidebar_label: Pandas +description: "Mastering DataFrames, Series, and data cleaning techniques: the essential toolkit for exploratory data analysis (EDA)." +tags: [python, pandas, dataframe, data-cleaning, eda, mathematics-for-ml] +--- + +In Machine Learning, data rarely arrives ready for training. It comes in messy CSVs, Excel files, or SQL databases with missing values and inconsistent formatting. **Pandas** is the library designed to handle this "Data Wrangling." + +## 1. Core Data Structures + +Pandas is built on top of NumPy, but it adds labels (indices and column names) to the data. + +```mermaid +graph TD + Data[Pandas Data Structures] --> Series["Series (1D)"] + Data --> DF["DataFrame (2D)"] + + Series --> S_Desc["A single column of data with an index"] + DF --> DF_Desc["A table with rows and columns (The 'Excel' of Python)"] + +``` + +### The DataFrame + +A DataFrame is essentially a dictionary of Series objects. It is the primary object you will use to store your features () and targets (). + +```python +import pandas as pd + +# Creating a DataFrame from a dictionary +df = pd.DataFrame({ + 'Age': [25, 30, 35], + 'Salary': [50000, 60000, 70000] +}) + +``` + +## 2. Loading and Inspecting Data + +Pandas can read almost any format. Once loaded, we use specific methods to "peek" into the data. + +* **`pd.read_csv('data.csv')`**: The most common way to load data. +* **`df.head()`**: View the first 5 rows. +* **`df.info()`**: Check data types and memory usage. +* **`df.describe()`**: Get statistical summaries (mean, std, min, max). + +## 3. Selecting and Filtering Data + +In ML, we often need to separate our target variable from our features. We use `.loc` (label-based) and `.iloc` (integer-based) indexing. + +```python +# Select all rows, but only the 'Salary' column +target = df['Salary'] + +# Select rows where Age is greater than 30 +seniors = df[df['Age'] > 30] + +``` + +## 4. Data Cleaning: The "ML Pre-processing" Step + +Before a model can learn, the data must be "clean." Pandas provides high-level functions for the most common cleaning tasks: + +### A. Handling Missing Values + +Most ML algorithms cannot handle `NaN` (Not a Number) values. + +* **`df.isnull().sum()`**: Count missing values. +* **`df.dropna()`**: Remove rows with missing values. +* **`df.fillna(df.mean())`**: Fill missing values with the average (Imputation). + +### B. Handling Categorical Data + +ML models require numbers. We use Pandas to convert text to categories. + +* **`pd.get_dummies(df['City'])`**: One-Hot Encoding (turns "City" into multiple 0/1 columns). + +## 5. Grouping and Aggregation + +Commonly used in **Exploratory Data Analysis (EDA)** to find patterns. + +```python +# Calculate the average salary per city +avg_sal = df.groupby('City')['Salary'].mean() + +``` + +```mermaid +flowchart LR + A[Raw DataFrame] --> B["Split (by Category)"] + B --> C["Apply (Mean/Sum)"] + C --> D["Combine (New Table)"] + style B fill:#e1f5fe,stroke:#01579b,color:#333 + +``` + +## 6. Vectorized String Operations + +Pandas allows you to perform operations on entire text columns without writing loops—essential for **Natural Language Processing (NLP)**. + +```python +# Lowercase all text in a 'Reviews' column +df['Reviews'] = df['Reviews'].str.lower() + +``` + +## References for More Details + +* **Pandas Official "10 Minutes to Pandas":** +* [Link](https://pandas.pydata.org/docs/user_guide/10min.html) +* *Best for:* A quick syntax cheat sheet. + + +* **Kaggle - Data Cleaning Course:** +* [Link](https://www.kaggle.com/learn/data-cleaning) +* *Best for:* Practical, hands-on experience with messy real-world data. + +--- + +Pandas helps us clean the data, but "seeing is believing." To truly understand our dataset, we need to visualize the relationships between variables. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/essential-libraries/seaborn.mdx b/docs/machine-learning/programming-fundamentals/essential-libraries/seaborn.mdx index e69de29..fba429f 100644 --- a/docs/machine-learning/programming-fundamentals/essential-libraries/seaborn.mdx +++ b/docs/machine-learning/programming-fundamentals/essential-libraries/seaborn.mdx @@ -0,0 +1,104 @@ +--- +title: "Seaborn: Statistical Visualization" +sidebar_label: Seaborn +description: "Mastering high-level statistical plotting: visualizing distributions, regressions, and categorical relationships." +tags: [python, seaborn, data-viz, eda, statistics, mathematics-for-ml] +--- + +**Seaborn** is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. In Machine Learning, we use Seaborn to quickly understand the distribution of our features and the correlations between them. + +## 1. Why Seaborn over Matplotlib? + +Seaborn is designed to work directly with **Pandas DataFrames**. It automates much of the boilerplate code (like labeling axes and handling colors) that Matplotlib requires. + +```mermaid +graph LR + P[Pandas DataFrame] --> S[Seaborn] + S -->|Automated Logic| M[Matplotlib Engine] + M --> Out[Professional Statistical Plot] + + style S fill:#e1f5fe,stroke:#01579b,color:#333 + +``` + +## 2. Visualizing Distributions + +Before training a model, you need to know if your data is "Normal" (Gaussian) or skewed. + +### Histograms and KDEs + +The `displot()` (distribution plot) combines a histogram with a **Kernel Density Estimate (KDE)** to show the "shape" of your data. + +```python +import seaborn as sns +import matplotlib.pyplot as plt + +# Visualizing the distribution of a feature +sns.displot(df['feature_name'], kde=True, color="skyblue") +plt.show() + +``` + +## 3. Visualizing Relationships + +### Scatter Plots and Regressions + +In ML, we often want to see if one feature can predict another. `regplot()` draws a scatter plot and fits a **Linear Regression** line automatically. + +```python +# Check for linear relationship between 'SquareFootage' and 'Price' +sns.regplot(data=df, x="SquareFootage", y="Price") + +``` + +### The Pair Plot + +The `pairplot()` is perhaps the most useful tool in EDA. It creates a matrix of plots, showing every feature's relationship with every other feature. + +```python +# Instant overview of the entire dataset +sns.pairplot(df, hue="Target_Class") + +``` + +## 4. Visualizing Categorical Data + +When dealing with discrete categories (like "City" or "Product Type"), we use plots that show the central tendency and variance. + +* **Box Plot:** Shows the median, quartiles, and **outliers**. +* **Violin Plot:** Combines a box plot with a KDE to show the density of the data at different values. + +```python +# Comparing distribution across categories +sns.boxplot(data=df, x="Category", y="Value") + +``` + +## 5. Matrix Plots: Correlation Analysis + +Before selecting features for your model, you must check for **Multicollinearity** (features that are too similar to each other). We do this using a Correlation Matrix visualized as a Heatmap. + +```python +# Compute correlation matrix +corr = df.corr() + +# Visualize with Heatmap +sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f") + +``` + +## 6. Seaborn "Themes" and Aesthetics + +Seaborn makes it easy to change the look of your plots globally to match a professional report or dark-mode dashboard. + +* `sns.set_theme(style="whitegrid")` +* `sns.set_context("talk")` (Scales labels for presentations) + +## References for More Details + +* **[Seaborn Example Gallery](https://seaborn.pydata.org/examples/index.html):** Finding the specific "look" you want for your data. +* **[Python Graph Gallery](https://python-graph-gallery.com/seaborn/):** Learning how to customize Seaborn plots beyond the defaults. + +--- + +You have now mastered the "Big Three" of Python data science: NumPy for math, Pandas for data, and Matplotlib/Seaborn for sight. You are ready to stop preparing and start predicting. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/object-oriented-programming.mdx b/docs/machine-learning/programming-fundamentals/object-oriented-programming.mdx index e69de29..2949b21 100644 --- a/docs/machine-learning/programming-fundamentals/object-oriented-programming.mdx +++ b/docs/machine-learning/programming-fundamentals/object-oriented-programming.mdx @@ -0,0 +1,89 @@ +--- +title: OOP in Machine Learning +sidebar_label: OOP Basics +description: "Understanding Classes, Objects, and the four pillars of OOP in the context of Machine Learning model development." +tags: [python, oop, classes, objects, inheritance, mathematics-for-ml] +--- + +Most beginner code is **Procedural** (a long list of instructions). However, professional Machine Learning code is almost always **Object-Oriented**. OOP allows us to bundle data (like model weights) and functions (like the training logic) into a single unit called an **Object**. + +## 1. Classes vs. Objects + +Think of a **Class** as a blueprint and an **Object** as the actual house built from that blueprint. + +* **Class:** The template for a "Model" (e.g., defines that all models need a `fit` and `predict` method). +* **Object:** A specific instance (e.g., a `RandomForest` trained on housing data). + +```mermaid +graph TD + Blueprint[Class: LinearModel] --> Instance1[Object: Model_for_Price] + Blueprint --> Instance2[Object: Model_for_Sales] + + style Blueprint fill:#e1f5fe,stroke:#01579b,color:#333 + style Instance1 fill:#f3e5f5,stroke:#7b1fa2,color:#333 + +``` + +## 2. The Core Components: Attributes and Methods + +In ML, an object typically consists of: + +1. **Attributes (Data):** The "State" of the model. (e.g., `self.weights`, `self.learning_rate`). +2. **Methods (Behavior):** The "Actions" the model can take. (e.g., `self.fit()`, `self.predict()`). + +```python +class SimpleModel: + def __init__(self, lr): + # Attribute: Initializing the state + self.learning_rate = lr + self.weights = None + + def fit(self, X, y): + # Method: Defining behavior + print(f"Training with LR: {self.learning_rate}") + +``` + +## 3. The Four Pillars of OOP in ML + +### A. Encapsulation + +Hiding the internal complexity. You don't need to know the calculus inside `.fit()` to use it; you just call the method. It "encapsulates" the math away from the user. + +### B. Inheritance + +Creating a new class based on an existing one. In libraries like PyTorch, your custom neural network **inherits** from a base `Module` class. + +### C. Polymorphism + +The ability for different objects to be treated as instances of the same general class. For example, you can loop through a list of different models and call `.predict()` on all of them, regardless of their internal math. + +### D. Abstraction + +Using simple interfaces to represent complex tasks. An "Optimizer" object abstracts away the specific update rules (SGD, Adam, RMSProp). + +## 4. Why use OOP for ML? + +1. **Organization:** Keeps weights and training logic together. Without OOP, you'd have to pass `weights` as an argument to every single function. +2. **Reproducibility:** You can save an entire object (the "state_dict") and reload it later to get the exact same results. +3. **Extensibility:** Want to try a new loss function? You can create a subclass and just override one method without rewriting the whole training loop. + +## 5. Standard ML Pattern: The Class Structure + +```python +class MyNeuralNet: + def __init__(self, input_size): + self.weights = initialize(input_size) # State + + def forward(self, x): + return x @ self.weights # Behavior 1 + + def backward(self, grad): + # Update weights logic # Behavior 2 + pass + +``` + +--- + +Now that you understand how objects work, you can begin to navigate the source code of major ML libraries. But before we build complex classes, we need to master the math engine that powers them. \ No newline at end of file diff --git a/docs/machine-learning/programming-fundamentals/python.mdx b/docs/machine-learning/programming-fundamentals/python.mdx index e69de29..774691c 100644 --- a/docs/machine-learning/programming-fundamentals/python.mdx +++ b/docs/machine-learning/programming-fundamentals/python.mdx @@ -0,0 +1,121 @@ +--- +title: Python for Machine Learning +sidebar_label: Python +description: "Mastering the Python essentials required for ML: from data structures to vectorization and the scientific ecosystem." +tags: [python, programming, numpy, pandas, mathematics-for-ml] +--- + +Python is the "lingua franca" of Machine Learning. Its simplicity allows researchers to focus on algorithms rather than syntax, while its robust ecosystem of libraries provides the heavy lifting for mathematical computations. + +## 1. Why Python for ML? + +The power of Python in ML doesn't come from its speed (it is actually quite slow compared to C++), but from its **ecosystem**. + +```mermaid +mindmap + root((Python ML Ecosystem)) + Data Processing + Pandas + NumPy + Visualization + Matplotlib + Seaborn + Plotly + Modeling + Scikit-Learn + PyTorch + TensorFlow + Deployment + FastAPI + Flask + +``` + +## 2. Core Data Structures for ML + +In ML, we don't just store values; we store **features** and **labels**. Understanding how Python holds this data is vital. + +| Structure | Syntax | Best Use Case in ML | +| --- | --- | --- | +| **List** | `[1, 2, 3]` | Storing a sequence of layer sizes or hyperparameter values. | +| **Dictionary** | `{"lr": 0.01}` | Passing hyperparameters to a model. | +| **Tuple** | `(640, 480)` | Storing immutable shapes of images or tensors. | +| **Set** | `{1, 2}` | Finding unique classes/labels in a dataset. | + +## 3. The Power of Vectorization (NumPy) + +Standard Python `for` loops are slow. In ML, we use **Vectorization** via NumPy to perform operations on entire arrays at once. This pushes the computation down to optimized C and Fortran code. + +```python +import numpy as np + +# Standard Python (Slow) +result = [x + 5 for x in range(1000000)] + +# NumPy Vectorization (Fast) +arr = np.arange(1000000) +result = arr + 5 + +``` + +### Multi-dimensional Data + +Most ML data is represented as **Tensors** (ND-Arrays): + +* **1D Array:** A single feature vector. +* **2D Array:** A dataset (rows = samples, columns = features). +* **3D Array:** A batch of grayscale images. +* **4D Array:** A batch of color images (Batch, Height, Width, Channels). + +## 4. Functional Programming Tools + +ML code often involves transforming data. These three tools are used constantly for feature engineering: + +1. **List Comprehensions:** Creating new lists from old ones in one line. +* `normalized_data = [x / 255 for x in pixels]` + + +2. **Lambda Functions:** Small, anonymous functions for quick transformations. +* `clean_text = lambda x: x.lower().strip()` + + +3. **Map/Filter:** Applying functions across datasets efficiently. + +--- + +## 5. Object-Oriented Programming (OOP) in ML + +Most ML frameworks (like Scikit-Learn and PyTorch) use Classes to define models. Understanding `self`, `__init__`, and `inheritance` is necessary for building custom model pipelines. + +```mermaid +classDiagram + class Model { + +weights: array + +bias: float + +train(data) + +predict(input) + } + Model <-- LinearRegression + Model <-- LogisticRegression + +``` + +## 6. Common ML Patterns in Python + +### The Fit-Transform Pattern + +Almost all Python ML libraries follow this logical flow: + +```mermaid +flowchart LR + A[Raw Data] --> B["fit() : Learn parameters from data"] + B --> C["transform() : Apply changes to data"] + C --> D["predict() : Generate output"] + style B fill:#e1f5fe,stroke:#01579b,color:#333 + style D fill:#f9f,stroke:#333,color:#333 + +``` + +--- + +Python provides the syntax, but for heavy mathematical operations, we need a specialized engine. Let's dive into the core library that makes numerical computing in Python possible. \ No newline at end of file diff --git a/static/img/tutorials/ml/bubble-visualization.jpg b/static/img/tutorials/ml/bubble-visualization.jpg new file mode 100644 index 0000000..e25a2c6 Binary files /dev/null and b/static/img/tutorials/ml/bubble-visualization.jpg differ