Skip to content

Mathematical notation

Carlos Ramos Carreño edited this page Sep 7, 2022 · 7 revisions

Notation to follow in scikit-fda documentation and related publications, if possible. The red circles (🔴) indicate points that we should discuss.

Naming variables

The name used to denote a particular variable usually depends on its type:

  • Scalars: in lowercase, like $x$.
    • A exception is made for quantities that are used as limits for counters, which are in uppercase (e.g. $N$)
  • Vectors: in bold and lowercase, like $\mathbf{x}$.
  • Matrices: in bold and uppercase, like $\mathbf{X}$.
  • Functions: in lowercase, like $x$.
  • Sets and sequences: 🔴 ???
  • Random variables, random vectors, stochastic processes: in uppercase, like $X$.

There are cases in which special rules apply, as we will now explain.

Iteration counters

For variables representing iteration counters, usually we use the lowercase name of the counter limit (which in this case would be uppercase). For example: $$\sum_{n=1}^N x_n$$.

List of useful predefined notations

We then give a list of some symbols that we always use for some quantities, in order to be consistent:

  • The main object of study is usually a random process/field denoted as $X$, belonging to the space of functions $\mathcal{X}$ (usually $L^2([0, 1]) $). We will take a set of $N$ observations denoted as $\{x_n\}_n^N$.
    • 🔴 Notation for the sequence of observations?
  • If there is a target for the problem (for example in regression or classification problems), the target associated with $x_n$ will be denoted as $y_n$. Its corresponding random variable/vector/process will be $Y$.
    • 🔴 Notation for the sequence of targets?
  • In the case that the functions are vector-valued or not univariate, $P$ will denote the domain dimension and $Q$ the codomain dimension. If there is no need of mentioning these cases, omit them for simplicity.
  • The domain of the functions will be denoted as $\mathcal{T}$. We usually use the variable $t$ to denote a particular point of the domain. If the domain dimension is not one, remember to use vector notation $\boldsymbol{t} = (t_1, \ldots, t_P)$.
    • 🔴 Thus $x_n$ is a function, while $x_n(t)$ is its value at point $t$.
  • Given the above we could write $x_n: \mathcal{T} \subset \mathbb{R}^P \to \mathbb{R}^Q$.
  • If the observations are discretized on the same grid points, the number of grid points for each dimension will be denoted as $\{M_p\}_{p=1}^P$. If there is only one domain dimension, we should use just $M$ for simplicity.
  • For classification problems, the number of classes will be denoted as $K$. The classes themselves will be referred as by number for simplicity. Thus $y_n \in \{1, \ldots, K\}$.

Names of concepts

  • The name "sample" has different meanings in Statistics (sequence of observations) and Machine Learning (the observations themselves). We prefer to not use this term, and use dataset for the first case and observation for the second. If the functions are scalar and univariate, the words "curve" or "trajectory" can also be used for a observation. If it could be confusion with the word "observation" due to talking about the function values at measured points, try to use "functional observation" (or the other synonyms if applicable) and "measured/observed value of the function".