In [1]:
import numpy as np


# Logarithms

![various logarithmic functions, graphed](img\wiki-log_curves.png)

Mathematically, logarithms are the inverse function to exponentiation:

$log{_b}{x} = y$

$b^y = x$

$b = x^\frac{1}{y}$


In [7]:
print(np.log10(1000))
print(np.power(10, 3))
# actually 10, ignore floating point error in result
print(np.power(1000, 1/3))


3.0
1000
9.999999999999998


## Utility / Purpose

One of the main historical motivations for introducing logarithms is the fact that they greatly facilitated complex calculations, especially at a time when computers didn't yet exist and slide rules or log tables where used instead.

These simplifications dramatically advanced the field of astrology.

For example, ⚠️ provided that...

-   b, x, and y are positive
-   b != 1

... We have some very useful refactorings:

-   The logarithm of a product is the sum of the separate logarithms of the factors:
    -   $log{_b}{(xy)} = log{_b}{x} + log{_b}{y}$
    -   e.g.
        -   $log{_3}{243} = log{_3}{(9*27)} = log{_3}{9} + log{_3}{27} = 2 + 3 = 5$
-   The logarithm of a division/ratio is the difference of the separate logarithms of the factors:
    -   $log{_b}{(\frac{x}{y})} = log{_b}{x} - log{_b}{y}$
-   The logarithm of a exponentiated number is the log of the number multiplied by the exponent:
    -   $log{_b}{(x^y)} = ylog{_b}{x}$
    -   e.g.
        -   $log{_2}{64} = log{_2}{(2^6)} = 6*log{_2}{2} = 6*1 = 6$
-   Similarly, the logarithm of a the square root of a number is the log of the number divided by the root:
    -   $log{_b}{(\sqrt[y]{x}))} = \frac{log{_b}{x}}{y}$
    -   e.g.
        -   $log{_{10}}{\sqrt{1000}} = \frac{1}{2}*log{_{10}}{1000} = \frac{1}{2}*3 = 1.5$

We can also change bases:

-   $log{_b}{x} = \frac{log{_{1_0}}{x}}{log{_{10}}{b}} = \frac{log{_e}{x}}{log{_e}{b}}$

Finally, it's important to remember that:

-   $log{_b}{b} = 1$

### Logarithmic Scale

These are very useful for quantifying the relative change in a value as opposed to an absolute change in value. They're also very good for compressing large-scale data.

Examples of data that uses log scales:

-   decibels
-   light absorbance
-   signal-to-noise ratio
-   earthquakes (Richter magnitude scale)

### Probability and Statistics

Logarithms are also used for maximum-likelihood estimation in parametric statistical models. In such models, the likelihood function depends on at least one parameter which must be estimated, and the max of the likelihood function occurs at the same value as the max of the "likelihood log".

(Note to self: Might be worth looking more into this)

[Benford's law](https://en.wikipedia.org/wiki/Benford%27s_law) describes how numbers occur in many data sets. A logarithmic function is used to predict how many numbers in a dataset (regardless of unit) will start with the number 1, 2, 3, etc. This is used for forensic accounting, among other uses. It is also used in search algorithms

Logarithms also pop up in Big O notation, for example in binary search algorithms, and merge sort algorithms.

### Other fields

-   Music tones and intervals
-   fractals
-   entropy and chaos
-   counting prime numbers


## Computations

Calculating the value of a multiplication or division can be greatly accelerated by using logarithms and log tables as well. This was routinely done in the pre-computing days.

For example:

$cd = 10^{log{_{10}}{c}}*10^{log{_{10}}{d}} = log^{log{_{10}}{c}+log{_{10}}{d}}$

$\frac{c}{d} = cd^{-1} = 10^{log{_{10}}{c}-log{_{10}}{d}}$

$c^d = (10^{log{_{10}}{c}})^d = 10^{d*log{_{10}}{c}}$

$\sqrt[d]{c} = c^{\frac{1}{d}} = 10^{\frac{1}{d}*log{_{10}}{c}}$

Though it appears at first glance to be a needlessly complicated approach to the calculations, for large calculations that require high precision, the approach of using logarithms and looking up their values (aka antilogarithms) in a log table is significantly faster than performing "simple" multiplication


## Decimal/Common Logarirthm

In general use, $log{_1}{_0}{X}$ is the most common logarithm, and called "decimal logarithm" or "common logarithm". It's often simply abbreviated to $logX$.

Because it is decimal, it has some particularly useful characteristics for our decimal numbering system:

-   We can determine how many decimals are in the value of a decimal log by looking at the number being "logged" -> the number of digits will tell us the approximate value of the log
    -   e.g. $log{_{10}}{123456789} = 8.09151...$
        -   123456789 has 9 digits, therefore even without computing the log, we know the value will lie between 8.0 and 9.0
        -   the resulting value's integer "8" is called the **"characteristic"** and the fractional decimals are called the **"mantissa"**


## Natural Logarithm

The "natural logarithm" $log{_e}{X}$ is very widely used in math and physics because it has a simple derivative. It is also represented as $ln{x}$.

The natural logarithm can be represented as the integral of $\frac{1}{x}$, or in other words, the area under the curve of $\frac{1}{x}$:

![graphical representation of ln(t)](img\wiki-natural_logarithm_graph.png)


## Binary Logarithm

The "binary logarithm" $log{_2}X$ is frequently used in computer science.


# Sigmoid Functions

A sigmoid function has a characteristic "S" shaped curve, known as a sigmoid curve:

![sigmoid curve example](img\google-sigmoid_curve.jpg)

The example above is known as a logistic function/curve, defined by:

$f(x) = \dfrac{1}{1+e^{-x}}$

They are often scaled from 0 to 1, but can equally be represented with a range of -1 to 1.

A more precise definition is that of a **bounded, differentiable, real function that is defined for all real input values and has a non-negative derivative at each point and exactly one inflection point**.

A sigmoid function is constrained by a pair of horizontal asymptotes as $x \rightarrow \pm\infty$.

## In artifial intelligence

In the field of artificial neural networks, the term "sigmoid function" is often used an alias for a logistic function.

Commonly used as an **activation function**, sigmoid functions help the model learn non-linear relationships between features and a label by taking the weighted sum of all inputs to a neuron (i.e. value*weight for each input, which is a linear result) and adjusting it by passing the weighted sum as the input to a sigmoid function. This results in a non-linear remapping of the weighted sums to values between 0 and 1, where (TBC?) outlier values are given less importance.

They appear in the **learning curves** of machine learning models, where the improvements are slow at first, accelerate, and then approach a climax over a longer time horizon.


# Matrix
In mathematics, a matrix (plural matrices) is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object.

Without further specifications, matrices represent linear maps, and allow explicit computations in linear algebra. Note that a linear map is a mapping $V \rightarrow W$ between two vector spaces that preserves the operations of vector addition and scalar multiplication.

In numerical analysis, many computational problems are solved by reducing them to a matrix computation, and this often involves computing with matrices of huge dimension. Matrices are used in most areas of mathematics and most scientific fields, either directly, or through their use in geometry and numerical analysis.

## Terms
The number of rows and columns in a matrix are its **_dimensions_**. Here, an "m x n" matrix, or "m-by-n" matrix:

![matrix](img\wiki-matrix.png)

Each value in a matrix is called an "entry".

Matrices are usually symbolized using upper-case letters (such as **A**), while the corresponding lower-case letters, with two subscript indices (e.g., $a_{11}$, or $a_{1,1}$), represent the entries.

Matrices with a single row are called **row vectors**. Matrices with one column are called **column vectors**. If the matrix has the same number of rows and columns, it is called a **square matrix**. Finally, matrices with infinite rows or columns is termed an **infinite matrix**.

Sometimes, the entries of a matrix can be defined by a formula such as $a_{i,j} = f(i, j)$. For example, each of the entries of the following matrix A is determined by the formula $a_{ij} = i − j$:

![sample function matrix](img\wiki-function_based_matrix.svg)

## Basic Matrix Operations

### Matrix Addition
Matrices should have the same size and shape.

$(A + B)_{ij} = A_{ij} + B_{ij}$

$
\begin{bmatrix}
	1 & 0 & 5 \\
	6 & 3 & 2
\end{bmatrix} +
\begin{bmatrix}
	4 & 1 & 0 \\
	3 & 1 & 2
\end{bmatrix} = 
\begin{bmatrix}
	1+4 & 0+1 & 5+0 \\
	6+3 & 3+1 & 2+2
\end{bmatrix} =
\begin{bmatrix}
	5 & 1 & 5 \\
	9 & 4 & 4
\end{bmatrix}
$

### Scalar Multiplication
The product of a scalar and a matrix (i.e. c**A**).

$
2\cdot
\begin{bmatrix}
	1 & 0 & -3 \\
	4 & -2 & 2
\end{bmatrix} =
\begin{bmatrix}
	2\cdot1 & 2\cdot0 & 2\cdot-3 \\
	2\cdot4 & 2\cdot-2 & 2\cdot2
\end{bmatrix} = 
\begin{bmatrix}
	2 & 0 & -6 \\
	8 & -4 & 4
\end{bmatrix}
$

### Transposition
$
\begin{bmatrix}
	1 & 0 & -3 \\
	4 & -2 & 2
\end{bmatrix}^T =
\begin{bmatrix}
	1 & 4 \\
	0 & -2 \\
	-3 & 2
\end{bmatrix}
$


### Matrix Multiplication

For matrix multiplication (aka **matmul** in tensorflow), the number of **columns in the first matrix** must be equal to the number of **rows in the second matrix**. The resulting matrix, known as the **matrix product**, has the number of rows of the first matrix and the number of columns of the second matrix.

It is important to realize therefore that $AB \ne BA$.
 
The product of matrices **A** and **B** is denoted as **AB**.

![matrix multiplication image](img\wiki-matrix_multiplication_image.png)

For a given entry in the resulting matrix **AB**, which we'll call **C** , the value is obtained by multiplying term-by-term the entries of the ith row of **A** and the jth column of **B**, and summing these n products.
- $c_{ij} = a_{i1}*b_{1j} + a_{i2}*b_{2j} + ... + a_{in}*b_{ni}$
    - e.g. for **A** with 3 rows, and **B** with three columns:
	    - $c_{12} = a_{11}*b_{12} + a_{12}*b_{22} + a_{13}*b_{32}$

- (Mnemonic: spin an multiply)

Example:

$
\begin{bmatrix}
	2 & 3 & 4 \\
	1 & 0 & 0
\end{bmatrix} 
\begin{bmatrix}
	0 & 1000 \\
	1 & 100 \\
	0 & 10
\end{bmatrix} = 
\begin{bmatrix}
	3 & 2340 \\
	0 & 1000
\end{bmatrix}
$

### Row Operations
There are three types of row operations:
1. row addition, that is adding a row to another.
2. row multiplication, that is multiplying all entries of a row by a non-zero constant;
3. row switching, that is interchanging two rows of a matrix;

🧠🧠🧠 **TODO - flesh this section out**

### Submatrix
A submatrix is obtained by deleting any collection of rows and/or columns. Just remove the entire row/column from the original matrix, and shift the entries accordingly to the top-right. (Like deleting rows in a spreadsheet).

## Linear Transformations
Matrices and matrix multiplication reveal their essential features when related to linear transformations, also known as linear maps.

For example, the 2x2 matrix

$
A = 
\begin{bmatrix}
	a & c \\
	b & d
\end{bmatrix}
$

can be viewed as the transform of the unit square (1x1) into a parallelogram with vertices (0,0), (a,b), (c,d), and (a+c, b+d):

![linear map parallelogram](img\wiki-linear_map_parallelogram.jpg)

Some examples of the power of linear maps $\textbf{R}^2$. Blue is the original matrix, green is the result of the linear transformation using the map, and the black dot is the origin of the transform:

![linear transformation map examples](img\wiki-linear_maps_and_their_transformations.jpg)


🧠🧠🧠 **TODO - Much more to study about linear algebra and matrices. Come back and update with deeper knowledge when the time comes.**

# Hyperbolic tangent

⚠️🧠🧠🧠 I have never been exposed to Tanh before. I need to study it way more in depth when the time comes.

$Tanh(x)$ is the hyperbolic tangent function and can be defined as:

$tanh(\alpha) = \dfrac{e^{2\alpha} -1}{e^{2\alpha}+1}$

Tanh automatically evaluates to exact values when its argument is the (natural) logarithm of a rational number. When given exact numeric expressions as arguments, Tanh may be evaluated to arbitrary numeric precision.

❓❓❓ Tanh threads element-wise over lists and matrices. In contrast, MatrixFunction can be used to give the hyperbolic tangent of a square matrix (i.e. the power series for the hyperbolic tangent function with ordinary powers replaced by matrix powers) as opposed to the hyperbolic tangents of the individual matrix elements.

Tanh[x] approaches -1 for small negative x and +1 for large positive x.

Tanh satisfies an identity similar to the Pythagorean identity satisfied by Tan, namely $tanh^2(\alpha) = 1 - sech^2(\alpha)$.