---

<div style="
    text-align: center;
    padding: 80px 40px;
    background: linear-gradient(to right, #1d407cff, #2b71abff);
    color: #E8E8E8;
    border-radius: 8px;
    margin: 40px 0;
">
    <div style="font-size: 48px; font-weight: 700; margin-bottom: 20px; letter-spacing: 1px;">
        MATHEMATICS FOR DATA SCIENCE
    </div>
    <div style="font-size: 28px; font-weight: 300; margin-bottom: 40px; color: #B8C5D0;">
        Theory, Intuition, and Practice
    </div>
    <div style="font-size: 20px; font-weight: 400; margin-top: 60px;">
        Claudenir Vasconcelos Laurindo Nojosa
    </div>
    <div style="font-size: 16px; font-weight: 300; margin-top: 10px; color: #B8C5D0;">
        Independent Study
    </div>
</div>

---

# Preface

This notebook represents a comprehensive independent study of the mathematical foundations underlying modern data science and machine learning. Rather than simply presenting formulas and theorems, this work aims to build genuine mathematical understanding through three interconnected pillars: rigorous theory, geometric intuition, and practical implementation.

## Motivation

Data science sits at the intersection of mathematics, statistics, and computation. While many practitioners can apply algorithms and use libraries effectively, a deep understanding of the underlying mathematics provides several critical advantages.

Understanding enables innovation. When you comprehend the mathematical principles behind an algorithm, you can adapt it to novel situations, diagnose failures, and create new approaches tailored to specific problems.

Rigorous thinking builds intuition. Working through mathematical derivations develops a form of analytical thinking that applies far beyond the specific problems at hand.

## Structure and Philosophy

This book progresses systematically from foundational concepts to advanced topics. The structure follows a pedagogical arc designed to build knowledge incrementally.

Chapters 1-6 establish fundamental concepts in arithmetic, algebra, linear algebra, statistics, geometry, and probability. These chapters form the bedrock upon which all subsequent material rests.

Chapters 7-15 develop the core mathematical machinery of data science: calculus, optimization, eigenvalue theory, and matrix decompositions. These tools appear repeatedly throughout machine learning.

Chapters 16-21 apply these foundations to classical machine learning topics: dimensionality reduction, regularization, support vector machines, probabilistic models, information theory, and neural networks.

Chapters 22-30 explore specialized topics in time series, natural language processing, computer vision, causal inference, reinforcement learning, Bayesian methods, graph theory, and advanced techniques.

## Learning Approach

Each chapter follows a consistent structure designed to promote deep learning.

Theory first. Mathematical concepts are presented with precision, including formal definitions and key theorems. This rigor ensures correctness and enables further study.

Visual intuition. Abstract concepts become concrete through carefully designed visualizations. Seeing mathematical objects—whether vectors, transformations, or probability distributions—builds geometric and spatial understanding.

Practical implementation. Each chapter concludes with hands-on exercises that implement concepts in code. This bridges the gap between theory and application, reinforcing understanding through practice.

---

# How to Use This Notebook

This Jupyter notebook is designed as a comprehensive, self-contained mathematical reference for data science. It combines rigorous exposition, visual intuition, and practical computation in a single unified document.

## Study Recommendations

Work sequentially. Each chapter assumes knowledge from previous chapters. While you may skip familiar material, understanding the dependencies is important.

Run all code. Execute cells in order to ensure proper state. Modify parameters and experiment to develop intuition.

Work through derivations. Keep paper and pen handy. Follow mathematical derivations step by step.

Complete end-of-chapter exercises. These consolidate learning and ensure you can apply concepts independently.

---

<div style="
    background: linear-gradient(to right, #1d407cff, #2b71abff);
    padding: 20px 30px;
    border-radius: 8px;
    color: #E8E8E8;
    margin: 30px 0;
">
<h2 style="margin: 0; color: #E8E8E8; font-weight: 600;">Table of Contents</h2>
<p style="margin: 10px 0 0 0; color: #B8C5D0; font-size: 14px;">30 Chapters | Foundations to Advanced Topics</p>
</div>

## Part I: Foundations (Chapters 1-6)

**Chapter 1: Arithmetic, Algebra, and Basic Functions**
- Basic arithmetic and algebra
- Functions, graphs, and transformations
- Logarithms and exponentials
- Summation notation (Σ)
- Basic logic and analytical thinking

**Chapter 2: Introduction to Linear Algebra**
- Scalars, vectors, and matrices
- Vector and matrix operations (addition, multiplication)
- Transpose and properties
- Systems of linear equations
- Matrix inverse and determinant

**Chapter 3: Descriptive Statistics and Data Visualization**
- Measures of central tendency (mean, median, mode)
- Measures of dispersion (variance, standard deviation, range)
- Data visualization basics (histograms, box plots, scatter plots)
- Correlation and covariance
- Populations and samples

**Chapter 4: Analytic Geometry**
- Vector norms (L1, L2, L∞)
- Inner products and dot products
- Distances and similarities
- Angles and orthogonality
- Projections

**Chapter 5: Basic Probability**
- Probability axioms and rules
- Conditional probability
- Bayes' theorem
- Random variables (discrete and continuous)
- Expectation and variance

**Chapter 6: Probability Distributions**
- Common discrete distributions (Bernoulli, Binomial, Poisson)
- Common continuous distributions (Uniform, Normal, Exponential)
- Central Limit Theorem
- Joint distributions
- Covariance and correlation matrices

## Part II: Core Mathematical Machinery (Chapters 7-15)

**Chapter 7: Differential Calculus** 
- Limits and continuity
- Derivatives and partial derivatives
- Gradient vectors
- Chain rule
- Taylor series approximation

**Chapter 8: Vector Calculus** 
- Gradient of vector-valued functions
- Jacobian matrix
- Hessian matrix
- Gradient descent intuition
- Multivariate Taylor expansion

**Chapter 9: Inferential Statistics** 
- Point estimation
- Confidence intervals
- Hypothesis testing (p-values, significance levels)
- Type I and Type II errors
- Statistical power

**Chapter 10: Linear Algebra II** 
- Vector spaces and subspaces
- Linear independence
- Basis and dimension
- Rank of a matrix
- Linear transformations

**Chapter 11: Linear Models** 
- Simple linear regression
- Multiple linear regression
- Least squares estimation
- Model assumptions and diagnostics
- Logistic regression

**Chapter 12: Advanced Probability** 
- Exponential family of distributions
- Maximum Likelihood Estimation (MLE)
- Method of moments
- Change of variables theorem
- Information inequality

**Chapter 13: Optimization Theory** 
- Unconstrained optimization
- Gradient descent algorithms
- Lagrange multipliers for constrained optimization
- Karush-Kuhn-Tucker (KKT) conditions
- Convex sets and functions

**Chapter 14: Eigenvalues and Eigenvectors** 
- Eigenvalue decomposition
- Characteristic polynomial
- Spectral theorem
- Positive definite matrices
- Applications to quadratic forms

**Chapter 15: Matrix Decompositions**
- LU decomposition
- QR decomposition
- Cholesky decomposition
- Singular Value Decomposition (SVD)
- Low-rank matrix approximations

## Part III: Classical Machine Learning (Chapters 16-21)

**Chapter 16: Dimensionality Reduction** 
- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
- Multidimensional scaling
- t-SNE and UMAP concepts
- Autoencoders introduction

**Chapter 17: Regularization and Model Selection** 
- Bias-variance tradeoff
- L1 regularization (Lasso)
- L2 regularization (Ridge)
- Elastic Net
- Cross-validation techniques

**Chapter 18: Support Vector Machines** 
- Maximum margin classifiers
- Primal and dual formulation
- Kernel trick
- Kernel functions (linear, polynomial, RBF)
- Soft margin classification

**Chapter 19: Probabilistic Graphical Models** 
- Bayesian networks
- Markov random fields
- Gaussian Mixture Models (GMM)
- Expectation-Maximization (EM) algorithm
- Hidden Markov Models introduction

**Chapter 20: Information Theory** 
- Entropy and cross-entropy
- Kullback-Leibler divergence
- Mutual information
- Jensen-Shannon divergence
- Applications to machine learning

**Chapter 21: Mathematics of Deep Learning**
- Backpropagation algorithm
- Activation functions and their derivatives
- Weight initialization strategies
- Batch normalization
- Loss functions and their properties


## Part IV: Specialized Topics (Chapters 22-30)

**Chapter 22: Time Series Mathematics** 
- Stationarity and autocorrelation
- ARIMA models
- State space models
- Kalman filter basics
- Fourier analysis introduction

**Chapter 23: Natural Language Processing Mathematics** 
- Vector space models
- Word embeddings (Word2Vec, GloVe)
- Attention mechanism mathematics
- Transformer architecture components
- Sequence modeling basics

**Chapter 24: Computer Vision Mathematics** 
- Convolution operations
- Pooling operations
- Image transformations
- Homogeneous coordinates
- Camera geometry basics

**Chapter 25: Causal Inference Mathematics** 
- Directed Acyclic Graphs (DAGs)
- d-separation
- Potential outcomes framework
- Instrumental variables
- Causal calculus (do-operator)

**Chapter 26: Reinforcement Learning Mathematics** 
- Markov Decision Processes
- Bellman equations
- Value iteration and policy iteration
- Q-learning mathematics
- Policy gradient theorem

**Chapter 27: Advanced Optimization** 
- Stochastic gradient descent variants
- Second-order optimization methods
- Natural gradient descent
- Meta-learning mathematics
- Federated learning optimization

**Chapter 28: Bayesian Methods** 
- Conjugate priors
- Markov Chain Monte Carlo (MCMC)
- Variational inference
- Gaussian processes
- Bayesian neural networks

**Chapter 29: Graph Theory and Network Science** 
- Graph representations (adjacency matrices)
- Centrality measures
- Community detection algorithms
- Random graph models
- Graph neural networks introduction

**Chapter 30: Advanced Linear Algebra**
- Tensor operations
- Kronecker product
- Matrix calculus identities
- Generalized inverses
- Numerical linear algebra basics


---


# CÉLULA 5 - SETUP (PYTHON)


In [2]:
# Setup: Import necessary libraries and configure plotting style
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy import linalg
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
import warnings
warnings.filterwarnings('ignore')

# Corporate color palette
corporate_palette = ['#1A2332', '#3B5366', '#7A8B99', '#B8C5D0', '#E8E8E8']

# Configure matplotlib for professional styling
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['legend.fontsize'] = 10
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=corporate_palette)
plt.rcParams['axes.facecolor'] = '#FFFFFF'
plt.rcParams['figure.facecolor'] = '#FFFFFF'
plt.rcParams['axes.edgecolor'] = '#7A8B99'
plt.rcParams['grid.color'] = '#E8E8E8'
plt.rcParams['grid.linestyle'] = '-'
plt.rcParams['grid.linewidth'] = 0.5

print("Environment configured successfully.")
print(f"NumPy version: {np.__version__}")
print(f"Corporate palette: {corporate_palette}")

Environment configured successfully.
NumPy version: 2.3.5
Corporate palette: ['#1A2332', '#3B5366', '#7A8B99', '#B8C5D0', '#E8E8E8']


# Chapter 1: Arithmetic, Algebra, and Basic Functions

Mathematics begins with numbers and the operations we perform on them. While these concepts may seem elementary, they form the foundation upon which all of data science rests. This chapter establishes the rigorous framework needed for more advanced topics, ensuring that our mathematical reasoning stands on solid ground.

The journey from basic arithmetic to the sophisticated mathematics of machine learning requires careful attention to fundamentals. We begin by examining number systems, proceed through algebraic manipulation, and conclude with functions—the mathematical objects that model relationships in data.

## Basic Arithmetic and Algebra

### Number Systems

Mathematics organizes numbers into hierarchical systems, each extending the previous with additional properties and capabilities.

> **Definition: Natural Numbers**
>
> The natural numbers $\mathbb{N} = \{0, 1, 2, 3, \ldots\}$ are the counting numbers. Some authors exclude zero, writing $\mathbb{N} = \{1, 2, 3, \ldots\}$. We adopt the convention that includes zero.

Natural numbers support addition and multiplication. Given any two natural numbers $a$ and $b$, their sum $a + b$ and product $ab$ are also natural numbers. However, subtraction poses a problem: $3 - 5$ is not a natural number. This limitation motivates the integers.

> **Definition: Integers**
>
> The integers $\mathbb{Z} = \{\ldots, -2, -1, 0, 1, 2, \ldots\}$ extend the natural numbers to include negative numbers. The notation $\mathbb{Z}$ comes from the German word "Zahlen" (numbers).

Integers are closed under addition, subtraction, and multiplication. For any integers $a$ and $b$, the results $a + b$, $a - b$, and $ab$ are all integers. Division, however, presents difficulties: $7 \div 3$ is not an integer. This motivates rational numbers.

> **Definition: Rational Numbers**
>
> The rational numbers $\mathbb{Q}$ consist of all numbers that can be expressed as fractions $\frac{p}{q}$ where $p \in \mathbb{Z}$ and $q \in \mathbb{Z}$ with $q \neq 0$. The notation comes from "quotient."

> **Example**
>
> The numbers $\frac{3}{4}$, $-\frac{7}{2}$, and $5$ (which equals $\frac{5}{1}$) are all rational. Every integer is rational since $n = \frac{n}{1}$.

Rational numbers suffice for many practical purposes. However, geometric considerations reveal gaps. The length of the diagonal of a unit square equals $\sqrt{2}$, which cannot be expressed as a fraction. This necessitates real numbers.

> **Definition: Real Numbers**
>
> The real numbers $\mathbb{R}$ include all rational numbers and all irrational numbers (numbers that cannot be expressed as fractions). Formally, $\mathbb{R}$ consists of all points on the number line, providing a complete ordered field.

> **Example**
>
> The numbers $1.5$, $-2$, and $\sqrt{2}$ are real numbers. The irrational number $\pi \approx 3.14159\ldots$ is also real. Its decimal expansion neither terminates nor repeats.

The hierarchy of number systems follows the pattern:
$$\mathbb{N} \subset \mathbb{Z} \subset \mathbb{Q} \subset \mathbb{R}$$

Each system contains the previous as a proper subset. This hierarchical structure allows us to work at the appropriate level of generality for any given problem.

![capa]('../../../../Assets/math/1.png)

### Fundamental Arithmetic Operations

The four basic arithmetic operations—addition, subtraction, multiplication, and division—form the building blocks of mathematical computation.

**Addition** combines quantities. For real numbers $a$ and $b$, the sum $a + b$ represents the total when $a$ and $b$ are combined. Addition satisfies several fundamental properties:

- **Commutativity**: $a + b = b + a$ for all $a, b \in \mathbb{R}$
- **Associativity**: $(a + b) + c = a + (b + c)$ for all $a, b, c \in \mathbb{R}$
- **Identity element**: There exists $0 \in \mathbb{R}$ such that $a + 0 = a$ for all $a \in \mathbb{R}$
- **Inverse elements**: For each $a \in \mathbb{R}$, there exists $-a \in \mathbb{R}$ such that $a + (-a) = 0$

**Subtraction** is defined as addition of the additive inverse: $a - b = a + (-b)$. Subtraction is neither commutative nor associative.

**Multiplication** scales quantities. For real numbers $a$ and $b$, the product $ab$ (or $a \times b$ or $a \cdot b$) represents repeated addition when $b$ is a positive integer. Multiplication also satisfies key properties:

- **Commutativity**: $ab = ba$ for all $a, b \in \mathbb{R}$
- **Associativity**: $(ab)c = a(bc)$ for all $a, b, c \in \mathbb{R}$
- **Identity element**: There exists $1 \in \mathbb{R}$ such that $a \cdot 1 = a$ for all $a \in \mathbb{R}$
- **Inverse elements**: For each $a \in \mathbb{R}$ with $a \neq 0$, there exists $a^{-1} \in \mathbb{R}$ such that $a \cdot a^{-1} = 1$

**Division** is defined as multiplication by the multiplicative inverse: $\frac{a}{b} = a \cdot b^{-1}$ for $b \neq 0$. Division by zero is undefined.

The operations interact through the distributive property:

> **Property: Distributive Law**
>
> For all $a, b, c \in \mathbb{R}$: $a(b + c) = ab + ac$ and $(a + b)c = ac + bc$.

These properties define a **field**, the algebraic structure underlying real number arithmetic. Understanding these properties is crucial for algebraic manipulation and simplification.

### Order of Operations

When expressions involve multiple operations, we must establish conventions for evaluation order. The standard order of operations (often remembered by the acronym PEMDAS) is:

1. **Parentheses** (or other grouping symbols)
2. **Exponents** (powers and roots)
3. **Multiplication and Division** (left to right)
4. **Addition and Subtraction** (left to right)

> **Example**
>
> Evaluate $3 + 4 \times 2^2 - (8 - 3)$.
>
> Following the order of operations:
> - $= 3 + 4 \times 2^2 - 5$ (parentheses first)
> - $= 3 + 4 \times 4 - 5$ (exponents)
> - $= 3 + 16 - 5$ (multiplication)
> - $= 19 - 5$ (addition left to right)
> - $= 14$ (subtraction)

The order of operations eliminates ambiguity. Without these conventions, the expression $2 + 3 \times 4$ could yield either $20$ (if we add first) or $14$ (if we multiply first). The convention dictates that multiplication precedes addition, giving $14$.

### Algebraic Expressions and Manipulation

Algebra extends arithmetic by introducing variables—symbols that represent numbers. An **algebraic expression** combines variables and constants using arithmetic operations.

> **Definition: Algebraic Expression**
>
> An algebraic expression is a mathematical phrase that can contain numbers, variables, and operation symbols. Examples include $3x + 5$, $x^2 - 4x + 4$, and $\frac{2a + b}{c - 1}$.

**Terms** are the parts of an expression separated by addition or subtraction. In the expression $3x^2 - 5x + 7$, the terms are $3x^2$, $-5x$, and $7$.

**Coefficients** are the numerical factors in terms. In $3x^2$, the coefficient is $3$. In $-5x$, the coefficient is $-5$.

**Like terms** have identical variable parts. The terms $3x^2$ and $-7x^2$ are like terms, while $3x^2$ and $3x$ are not.

Algebraic manipulation relies on the field properties. Common techniques include:

**Combining like terms**: Use the distributive property in reverse.
$$3x + 5x = (3 + 5)x = 8x$$

**Expanding products**: Apply the distributive property.
$$3(x + 4) = 3x + 12$$
$$(x + 2)(x + 3) = x^2 + 3x + 2x + 6 = x^2 + 5x + 6$$

**Factoring**: Reverse of expansion.
$$x^2 + 5x + 6 = (x + 2)(x + 3)$$
$$6x + 9 = 3(2x + 3)$$

> **Example**
>
> Simplify $(2x + 3)(x - 1) - (x^2 - 4)$.
>
> - $= 2x^2 - 2x + 3x - 3 - x^2 + 4$ (expand products)
> - $= 2x^2 - x^2 - 2x + 3x - 3 + 4$ (rearrange)
> - $= x^2 + x + 1$ (combine like terms)

### Equations and Solutions

An **equation** is a statement that two expressions are equal. Solving an equation means finding all values of the variables that make the equation true.

> **Definition: Solution**
>
> A solution (or root) of an equation is a value that, when substituted for the variable, makes the equation true. The solution set is the set of all solutions.

**Linear equations** in one variable have the form $ax + b = 0$ where $a \neq 0$. These equations have exactly one solution:
$$x = -\frac{b}{a}$$

> **Example**
>
> Solve $3x - 7 = 11$.
>
> - $3x - 7 = 11$
> - $3x = 18$ (add 7 to both sides)
> - $x = 6$ (divide both sides by 3)

**Quadratic equations** have the form $ax^2 + bx + c = 0$ where $a \neq 0$. The quadratic formula provides the solutions:

$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$

The expression $\Delta = b^2 - 4ac$ is called the **discriminant**. It determines the nature of solutions:
- If $\Delta > 0$: two distinct real solutions
- If $\Delta = 0$: one repeated real solution
- If $\Delta < 0$: two complex conjugate solutions (no real solutions)

> **Example**
>
> Solve $x^2 - 5x + 6 = 0$.
>
> Using the quadratic formula with $a = 1$, $b = -5$, $c = 6$:
> $$x = \frac{5 \pm \sqrt{25 - 24}}{2} = \frac{5 \pm 1}{2}$$
>
> Solutions: $x = 3$ or $x = 2$.
>
> Alternatively, by factoring: $(x - 2)(x - 3) = 0$, giving the same solutions.

![capa]('../../../../Assets/math/2.png)

### Inequalities

Inequalities compare the relative sizes of expressions. The basic inequality symbols are:
- $a < b$: $a$ is less than $b$
- $a \leq b$: $a$ is less than or equal to $b$
- $a > b$: $a$ is greater than $b$
- $a \geq b$: $a$ is greater than or equal to $b$

> **Properties of Inequalities**
>
> For all $a, b, c \in \mathbb{R}$:
> - **Transitivity**: If $a < b$ and $b < c$, then $a < c$
> - **Addition**: If $a < b$, then $a + c < b + c$
> - **Multiplication by positive**: If $a < b$ and $c > 0$, then $ac < bc$
> - **Multiplication by negative**: If $a < b$ and $c < 0$, then $ac > bc$ (inequality reverses)

The last property is crucial: multiplying or dividing both sides of an inequality by a negative number reverses the inequality direction.

> **Example**
>
> Solve $-3x + 5 > 11$.
>
> - $-3x + 5 > 11$
> - $-3x > 6$ (subtract 5 from both sides)
> - $x < -2$ (divide by $-3$, reverse inequality)
>
> Solution set: $\{x \in \mathbb{R} : x < -2\}$ or in interval notation: $(-\infty, -2)$.

> **Observation**
>
> Interval notation provides a compact way to express solution sets. The interval $(a, b)$ represents all $x$ with $a < x < b$ (open interval), while $[a, b]$ represents all $x$ with $a \leq x \leq b$ (closed interval). Mixed notations like $[a, b)$ are also used.

![capa]('../../../../Assets/math/3.png)

### Exponents and Powers

Exponentiation represents repeated multiplication. For a real number $a$ and positive integer $n$, we define:
$$a^n = \underbrace{a \times a \times \cdots \times a}_{n \text{ times}}$$

This definition extends to all integer and rational exponents through the following rules:

> **Exponent Rules**
>
> For all real $a, b$ (with appropriate restrictions) and integers $m, n$:
> - $a^m \cdot a^n = a^{m+n}$
> - $\frac{a^m}{a^n} = a^{m-n}$
> - $(a^m)^n = a^{mn}$
> - $(ab)^n = a^n b^n$
> - $\left(\frac{a}{b}\right)^n = \frac{a^n}{b^n}$ (for $b \neq 0$)
> - $a^0 = 1$ (for $a \neq 0$)
> - $a^{-n} = \frac{1}{a^n}$ (for $a \neq 0$)
> - $a^{1/n} = \sqrt[n]{a}$ (the $n$-th root)

> **Example**
>
> Simplify $(2x^3)^2 \cdot x^{-1}$.
>
> - $(2x^3)^2 \cdot x^{-1} = 2^2 (x^3)^2 \cdot x^{-1}$ (distribute exponent)
> - $= 4x^6 \cdot x^{-1}$ (evaluate)
> - $= 4x^{6-1}$ (multiply same base)
> - $= 4x^5$

Fractional exponents represent roots combined with powers:
$$a^{m/n} = \sqrt[n]{a^m} = \left(\sqrt[n]{a}\right)^m$$

This notation provides a unified framework for all power operations, essential for calculus and beyond.

> **Observation**
>
> In data science, we frequently encounter power transformations of data. For instance, taking square roots ($x^{1/2}$) or logarithms can stabilize variance or linearize relationships. Understanding exponent rules is fundamental to manipulating these transformations algebraically.

## Functions, Graphs, and Transformations

### Introduction to Functions

Functions are fundamental mathematical objects that describe relationships between quantities. In data science, functions model how features relate to outcomes, how parameters affect predictions, and how transformations map data from one space to another.

> **Definition: Function**
>
> A function $f$ from set $A$ to set $B$, written $f: A \to B$, is a rule that assigns to each element $x \in A$ exactly one element $y \in B$. We write $y = f(x)$ and call $y$ the **image** of $x$ under $f$.

The set $A$ is called the **domain** of $f$, denoted $\text{dom}(f)$. The set $B$ is called the **codomain**. The **range** (or image) of $f$ is the set of all actual outputs:
$$\text{range}(f) = \{f(x) : x \in A\}$$

Note that $\text{range}(f) \subseteq B$, but they need not be equal.

> **Example**
>
> The function $f: \mathbb{R} \to \mathbb{R}$ defined by $f(x) = x^2$ assigns to each real number its square.
> - Domain: $\mathbb{R}$ (all real numbers)
> - Codomain: $\mathbb{R}$
> - Range: $[0, \infty)$ (non-negative real numbers)
>
> Note that $-1$ is in the codomain but not in the range, since no real number squares to give $-1$.

### Function Notation and Evaluation

Standard notation for functions uses letters like $f$, $g$, $h$, often with the independent variable in parentheses: $f(x)$, $g(t)$, $h(n)$. The choice of variable names is arbitrary—the functions $f(x) = x^2$ and $f(t) = t^2$ represent the same relationship.

To evaluate a function at a specific input, substitute that value for the variable:

> **Example**
>
> Given $f(x) = 3x^2 - 2x + 1$, evaluate $f(2)$ and $f(-1)$.
>
> - $f(2) = 3(2)^2 - 2(2) + 1 = 12 - 4 + 1 = 9$
> - $f(-1) = 3(-1)^2 - 2(-1) + 1 = 3 + 2 + 1 = 6$

Functions can also be evaluated at expressions:
$$f(a + b) = 3(a + b)^2 - 2(a + b) + 1$$

### Types of Functions

Functions can be classified based on their mapping properties.

> **Definition: Injective (One-to-One)**
>
> A function $f: A \to B$ is **injective** (or one-to-one) if different inputs always produce different outputs. Formally, if $f(x_1) = f(x_2)$, then $x_1 = x_2$.

> **Definition: Surjective (Onto)**
>
> A function $f: A \to B$ is **surjective** (or onto) if every element in the codomain $B$ is the image of at least one element in $A$. That is, $\text{range}(f) = B$.

> **Definition: Bijective**
>
> A function is **bijective** if it is both injective and surjective. Bijective functions establish a perfect one-to-one correspondence between domain and codomain.

> **Example**
>
> Consider $f: \mathbb{R} \to \mathbb{R}$ with $f(x) = 2x + 1$.
> - **Injective**: If $2x_1 + 1 = 2x_2 + 1$, then $x_1 = x_2$. ✓
> - **Surjective**: For any $y \in \mathbb{R}$, we can find $x = \frac{y-1}{2}$ such that $f(x) = y$. ✓
> - Therefore $f$ is bijective.
>
> Consider $g: \mathbb{R} \to \mathbb{R}$ with $g(x) = x^2$.
> - **Not injective**: $g(2) = g(-2) = 4$, but $2 \neq -2$. ✗
> - **Not surjective**: No real $x$ satisfies $g(x) = -1$. ✗

### Composition of Functions

Functions can be combined by composition: applying one function to the output of another.

> **Definition: Function Composition**
>
> Given functions $f: A \to B$ and $g: B \to C$, the **composition** $g \circ f: A \to C$ is defined by:
> $$(g \circ f)(x) = g(f(x))$$
>
> Read "$g$ composed with $f$" or "$g$ of $f$". The function $f$ is applied first, then $g$.

> **Example**
>
> Let $f(x) = x^2$ and $g(x) = 2x + 1$. Find $(g \circ f)(x)$ and $(f \circ g)(x)$.
>
> $(g \circ f)(x) = g(f(x)) = g(x^2) = 2x^2 + 1$
>
> $(f \circ g)(x) = f(g(x)) = f(2x + 1) = (2x + 1)^2 = 4x^2 + 4x + 1$
>
> Note that composition is not commutative: $(g \circ f)(x) \neq (f \circ g)(x)$ in general.

> **Observation**
>
> In machine learning pipelines, composition represents sequential data transformations. For example, standardizing features then applying PCA can be viewed as $(PCA \circ standardize)(data)$.

### Inverse Functions

Some functions can be "reversed"—given an output, we can uniquely determine the input.

> **Definition: Inverse Function**
>
> A function $f: A \to B$ has an **inverse function** $f^{-1}: B \to A$ if:
> $$f^{-1}(f(x)) = x \text{ for all } x \in A$$
> $$f(f^{-1}(y)) = y \text{ for all } y \in B$$

> **Theorem**
>
> A function has an inverse if and only if it is bijective.

To find an inverse function:
1. Write $y = f(x)$
2. Solve for $x$ in terms of $y$
3. Swap variables: write $y = f^{-1}(x)$

> **Example**
>
> Find the inverse of $f(x) = 2x + 1$.
>
> 1. $y = 2x + 1$
> 2. $x = \frac{y - 1}{2}$
> 3. $f^{-1}(x) = \frac{x - 1}{2}$
>
> Verify: $f(f^{-1}(x)) = f\left(\frac{x-1}{2}\right) = 2\left(\frac{x-1}{2}\right) + 1 = x$ ✓

**Warning**: The notation $f^{-1}$ for inverse function should not be confused with $f^{-1}(x) = \frac{1}{f(x)}$ (reciprocal). Context determines meaning.

![capa]('../../../../Assets/math/4.png)

### Graphs of Functions

The **graph** of a function $f: A \to B$ is the set of all ordered pairs $(x, f(x))$ where $x \in A$. When $A$ and $B$ are subsets of $\mathbb{R}$, we can visualize the graph in the Cartesian plane.

The **vertical line test** determines whether a curve represents a function: a curve is the graph of a function if and only if every vertical line intersects the curve at most once. This reflects the requirement that each input has exactly one output.

**Key features of graphs include:**

- **Intercepts**: Points where the graph crosses the axes
  - $y$-intercept: $f(0)$ (if $0$ is in the domain)
  - $x$-intercepts: solutions to $f(x) = 0$ (also called roots or zeros)

- **Symmetry**:
  - **Even function**: $f(-x) = f(x)$ for all $x$ (symmetric about $y$-axis)
  - **Odd function**: $f(-x) = -f(x)$ for all $x$ (symmetric about origin)

- **Monotonicity**:
  - **Increasing**: $f(x_1) < f(x_2)$ whenever $x_1 < x_2$
  - **Decreasing**: $f(x_1) > f(x_2)$ whenever $x_1 < x_2$

- **Boundedness**: A function is bounded above if $f(x) \leq M$ for some constant $M$ and all $x$ in the domain.

> **Example**
>
> Analyze $f(x) = x^2$.
> - **Intercepts**: $y$-intercept at $(0, 0)$; $x$-intercept at $(0, 0)$
> - **Symmetry**: Even function since $f(-x) = (-x)^2 = x^2 = f(x)$
> - **Monotonicity**: Decreasing on $(-\infty, 0]$, increasing on $[0, \infty)$
> - **Boundedness**: Bounded below by $0$, unbounded above

### Common Functions

Several families of functions appear frequently in mathematics and data science.

**Linear Functions**: $f(x) = ax + b$ where $a$ and $b$ are constants.
- Graph: straight line with slope $a$ and $y$-intercept $b$
- Domain: $\mathbb{R}$
- Range: $\mathbb{R}$
- Bijective when $a \neq 0$

**Quadratic Functions**: $f(x) = ax^2 + bx + c$ where $a \neq 0$.
- Graph: parabola opening upward (if $a > 0$) or downward (if $a < 0$)
- Vertex at $x = -\frac{b}{2a}$
- Domain: $\mathbb{R}$
- Range: $[f(-\frac{b}{2a}), \infty)$ if $a > 0$, or $(-\infty, f(-\frac{b}{2a})]$ if $a < 0$

**Polynomial Functions**: $f(x) = a_n x^n + a_{n-1}x^{n-1} + \cdots + a_1 x + a_0$ where $n \in \mathbb{N}$ and $a_n \neq 0$.
- Degree $n$ determines general shape
- Domain: $\mathbb{R}$
- Continuous and smooth everywhere

**Rational Functions**: $f(x) = \frac{p(x)}{q(x)}$ where $p$ and $q$ are polynomials.
- Domain: $\{x \in \mathbb{R} : q(x) \neq 0\}$
- May have vertical asymptotes where $q(x) = 0$
- May have horizontal or oblique asymptotes

**Absolute Value Function**: $f(x) = |x|$ defined by:
$$|x| = \begin{cases} x & \text{if } x \geq 0 \\ -x & \text{if } x < 0 \end{cases}$$
- Graph: V-shaped, vertex at origin
- Domain: $\mathbb{R}$
- Range: $[0, \infty)$
- Even function

> **Observation**
>
> In data science, linear functions model proportional relationships, quadratic functions capture acceleration effects, and absolute value functions appear in loss functions (e.g., mean absolute error). Polynomial and rational functions provide flexible fitting capabilities for regression tasks.

In [9]:
# Visualization: Common Function Families
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

x = np.linspace(-3, 3, 300)

# Linear functions
ax = axes[0, 0]
for a, label in zip([0.5, 1, 2], ['$f(x) = 0.5x + 1$', '$f(x) = x + 1$', '$f(x) = 2x + 1$']):
    ax.plot(x, a*x + 1, linewidth=2.5, label=label)
ax.grid(True, alpha=0.3)
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title('Linear Functions', fontsize=12, fontweight='bold')
ax.legend()
ax.axhline(y=0, color='black', linewidth=0.5)
ax.axvline(x=0, color='black', linewidth=0.5)

# Quadratic functions
ax = axes[0, 1]
for a, label in zip([1, -1, 0.5], ['$f(x) = x^2$', '$f(x) = -x^2$', '$f(x) = 0.5x^2$']):
    ax.plot(x, a*x**2, linewidth=2.5, label=label)
ax.grid(True, alpha=0.3)
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title('Quadratic Functions', fontsize=12, fontweight='bold')
ax.legend()
ax.axhline(y=0, color='black', linewidth=0.5)
ax.axvline(x=0, color='black', linewidth=0.5)
ax.set_ylim(-5, 5)

# Polynomial functions
ax = axes[0, 2]
ax.plot(x, x**3 - 2*x,
        color=corporate_palette[0], linewidth=2.5, label='$f(x) = x^3 - 2x$')
ax.plot(x, x**4 - 2*x**2,
        color=corporate_palette[1], linewidth=2.5, label='$f(x) = x^4 - 2x^2$')
ax.grid(True, alpha=0.3)
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title('Polynomial Functions', fontsize=12, fontweight='bold')
ax.legend()
ax.axhline(y=0, color='black', linewidth=0.5)
ax.axvline(x=0, color='black', linewidth=0.5)
ax.set_ylim(-5, 5)

# Rational function
ax = axes[1, 0]
x_rational = np.linspace(-3, 3, 500)
# Avoid division by zero
x_rational = x_rational[np.abs(x_rational) > 0.1]
y_rational = 1 / x_rational
ax.plot(x_rational, y_rational,
        color=corporate_palette[0], linewidth=2.5, label='$f(x) = \\frac{1}{x}$')
ax.axvline(x=0, color=corporate_palette[2], linestyle='--',
           linewidth=1.5, alpha=0.5, label='Asymptote')
ax.axhline(y=0, color=corporate_palette[2],
           linestyle='--', linewidth=1.5, alpha=0.5)
ax.grid(True, alpha=0.3)
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title('Rational Function', fontsize=12, fontweight='bold')
ax.legend()
ax.set_xlim(-3, 3)
ax.set_ylim(-5, 5)

# Absolute value
ax = axes[1, 1]
ax.plot(x, np.abs(x),
        color=corporate_palette[0], linewidth=2.5, label='$f(x) = |x|$')
ax.plot(x, np.abs(x - 1),
        color=corporate_palette[1], linewidth=2.5, label='$f(x) = |x - 1|$')
ax.plot(x, 2*np.abs(x),
        color=corporate_palette[3], linewidth=2.5, label='$f(x) = 2|x|$')
ax.grid(True, alpha=0.3)
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title('Absolute Value Functions', fontsize=12, fontweight='bold')
ax.legend()
ax.axhline(y=0, color='black', linewidth=0.5)
ax.axvline(x=0, color='black', linewidth=0.5)

# Piecewise function
ax = axes[1, 2]
x_piece1 = x[x < 0]
x_piece2 = x[x >= 0]
ax.plot(x_piece1, x_piece1**2, color=corporate_palette[0], linewidth=2.5)
ax.plot(x_piece2, 2*x_piece2, color=corporate_palette[0], linewidth=2.5)
ax.plot(0, 0, 'o', color=corporate_palette[0], markersize=8, zorder=5)
ax.grid(True, alpha=0.3)
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title('Piecewise Function', fontsize=12, fontweight='bold')
ax.text(0.05, 0.95, r'$f(x) = \begin{cases} x^2 & x < 0 \\ 2x & x \geq 0 \end{cases}$',
        transform=ax.transAxes, fontsize=11, verticalalignment='top',
        bbox=dict(boxstyle='round', facecolor=corporate_palette[4], alpha=0.8))
ax.axhline(y=0, color='black', linewidth=0.5)
ax.axvline(x=0, color='black', linewidth=0.5)

plt.tight_layout()
plt.show()

ValueError: 
f(x) = \begin{cases} x^2 & x < 0 \\ 2x & x \geq 0 \end{cases}
       ^
ParseFatalException: Unknown symbol: \begin, found '\'  (at char 7), (line:1, col:8)

Error in callback <function _draw_all_if_interactive at 0x000001F199D24510> (for post_execute), with arguments args (),kwargs {}:


ValueError: 
f(x) = \begin{cases} x^2 & x < 0 \\ 2x & x \geq 0 \end{cases}
       ^
ParseFatalException: Unknown symbol: \begin, found '\'  (at char 7), (line:1, col:8)

ValueError: 
f(x) = \begin{cases} x^2 & x < 0 \\ 2x & x \geq 0 \end{cases}
       ^
ParseFatalException: Unknown symbol: \begin, found '\'  (at char 7), (line:1, col:8)

<Figure size 1500x1000 with 6 Axes>

### Function Transformations

Understanding how to transform functions systematically is crucial for data preprocessing and feature engineering. Given a base function $f(x)$, we can create new functions through transformations.

**Vertical Translation**: $g(x) = f(x) + k$
- Shifts the graph up by $k$ units (if $k > 0$) or down by $|k|$ units (if $k < 0$)
- Does not change shape, only position

**Horizontal Translation**: $g(x) = f(x - h)$
- Shifts the graph right by $h$ units (if $h > 0$) or left by $|h|$ units (if $h < 0$)
- Counter-intuitive: $f(x - 2)$ shifts right, not left

**Vertical Scaling**: $g(x) = a \cdot f(x)$ where $a > 0$
- Stretches vertically by factor $a$ if $a > 1$
- Compresses vertically by factor $a$ if $0 < a < 1$
- If $a < 0$, also reflects about $x$-axis

**Horizontal Scaling**: $g(x) = f(bx)$ where $b > 0$
- Compresses horizontally by factor $b$ if $b > 1$
- Stretches horizontally by factor $b$ if $0 < b < 1$
- Counter-intuitive: larger $b$ means compression, not stretching

**Reflection about $x$-axis**: $g(x) = -f(x)$
- Flips graph upside down

**Reflection about $y$-axis**: $g(x) = f(-x)$
- Flips graph left to right

> **Example**
>
> Starting with $f(x) = x^2$, describe the transformations to obtain $g(x) = -2(x - 3)^2 + 1$.
>
> 1. $f(x - 3) = (x - 3)^2$: shift right 3 units
> 2. $2f(x - 3) = 2(x - 3)^2$: stretch vertically by factor 2
> 3. $-2f(x - 3) = -2(x - 3)^2$: reflect about $x$-axis
> 4. $-2f(x - 3) + 1 = -2(x - 3)^2 + 1$: shift up 1 unit
>
> The vertex moves from $(0, 0)$ to $(3, 1)$, and the parabola opens downward.

**Order of transformations matters**: When multiple transformations are applied, the order affects the result. Generally, apply transformations in this order:
1. Horizontal scaling and reflection
2. Horizontal translation
3. Vertical scaling and reflection
4. Vertical translation

![capa]('../../../../Assets/math/6.png)

### Applications to Data Science

Function transformations play a central role in data science:

**Feature Scaling**: Standardization $(x - \mu) / \sigma$ and normalization $(x - \min) / (\max - \min)$ are function transformations that change the location and scale of data distributions.

**Activation Functions**: Neural networks apply transformations like $\text{ReLU}(x) = \max(0, x)$ and $\text{sigmoid}(x) = \frac{1}{1 + e^{-x}}$ to introduce non-linearity.

**Loss Functions**: Optimization minimizes functions like mean squared error $L(\theta) = \frac{1}{n}\sum_{i=1}^n (y_i - f(x_i; \theta))^2$, where $f$ is a model function parameterized by $\theta$.

**Kernel Methods**: Support vector machines use function transformations $\phi: \mathbb{R}^n \to \mathbb{R}^m$ to map data to higher-dimensional spaces where linear separation becomes possible.

**Data Augmentation**: In computer vision, images are transformed through rotations, translations, and scalings to increase training data diversity.

> **Observation**
>
> Understanding functions abstractly—as mappings between sets—provides a unifying framework for seemingly disparate data science techniques. Whether preprocessing data, designing model architectures, or engineering features, we are fundamentally composing and transforming functions.