$$\sum_{i=1}^m x_i = m \Rightarrow \max P_m = \prod_{i=1}^m {x_i}^i = ?$$

We apply the [Weighted AM-GM Inequality](https://en.wikipedia.org/wiki/AM%E2%80%93GM_inequality#Weighted_AM%E2%80%93GM_inequality) and set the weights $w_i =  i$ for each case. Then $$w = \sum_{i = 1}^m w_i = \sum_{i = 1}^m i = \frac{m(m + 1)}{2}$$

and 
$$
\displaystyle{P_m = \prod_{i=1} {x_i}^i \Rightarrow {P_m}^{\frac{1}{w}} = \sqrt[w]{\prod_{i=1} {x_i}^i} \leq \frac{\sum_{i=1}^m w_i x_i}{w} = \frac{\sum_{i=1}^m i x_i}{w}} \\~\\

= \frac{1}{w} \sum_{i=1}^m i x_i
$$



So in other words the max value of $P_m$ is $\displaystyle{\left(\frac{1}{w} \sum_{i=1}^m ix_i\right)^w}$

The problem is now 

this is hard to visualize, let's write it down

In [1]:
import numpy as np

m = 10



In [2]:
np.pow(4112, 1/10)

np.float64(2.2982925569784243)

In [3]:
import scipy

$$
\nabla f\left(\left[ x_1, \ \dots, \ x_n \right]\right) = \begin{bmatrix} \frac{\partial}{\partial x_1} f(p) \\~\\ \dots \\~\\ \frac{\partial}{\partial x_n} f(p) \end{bmatrix}
$$

In [4]:
from typing import Callable

def derivative(f: Callable[[float], float], x: float) -> float:
    eps = 1e-7
    return (f(x + eps) - f(x)) / eps


def gradient(f: Callable[[np.ndarray], np.ndarray], x: np.ndarray) -> np.ndarray:
    eps = 1e-8
    for i in range(len(x)):
        v = []
        y = np.array(x)
        y[i] += eps
        v.append((f(y) - f(x)) / eps)
    return np.array(v)

gradient(np.cos, [0, np.pi / 2, np.pi])
        

array([[0., 0., 0.]])

$$
\sum_{i=1}^m x_i = m \Rightarrow P_m(x) = \prod_{i=1}^m {x_i}^i = ? \\~\\
\ln P_m(x) = \sum_{i=1}^n i \ln(x_i)
$$

In [5]:
def ln_P(x: np.ndarray) -> float:
    return np.dot(np.arange(1, len(x) + 1), x)

def P(x: np.ndarray) -> float:
    return np.exp(ln_P(x))

$$
lagrange
$$

In [6]:
from typing import Callable

def P_minima(x0: np.ndarray, lr: float = 0.01) -> np.ndarray:
    for _ in range(10000):
        z = gradient(ln_P(x))        
        

so @griff and @hacatu made me read about the Lagrange Multiplier

We have $m$ variables and one constraint, so we define a single $\lambda$

We have $$P(x_1, x_2, \dots, x_m) = \prod_{i=1}^m x_i^i$$ subject to $g(x_1, x_2, \dots, x_m) = \sum_{i=1}^m x_i - m = 0$

Sadly we don't have the Laplace transform symbol used in Wikipedia so we use the letter $L$ instead

$$
L(x_1, x_2, \dots, x_m, \lambda) = P(x_1, x_2, \dots, x_m) + \lambda g(x_1, x_2, \dots, x_m) \\~\\
= \prod_{i=1}^m x_i^i + \lambda\left(\sum_{i=1}^m x_i - m\right)
$$

\

Now  we can calculate the gradient

$$
\nabla_{x_1, x_2, \dots, x_m, \lambda} L(x_1, x_2, \dots, x_m, \lambda) = \left(\frac{\partial L}{\partial x_1}, \frac{\partial L}{\partial x_2}, \dots, \frac{\partial L}{\partial \lambda}\right)
$$

but first, we redefine thd derivative of a power a bit:

$$
\frac{\text{ d}}{\text{ d}x} x^n = nx^{n-1} = \frac{n}{x} x^n
$$

and we calculate the entrywise derivatives, for $1 \leq a \leq m$:

$$
\frac{\partial P}{\partial x_a} = \frac{\partial }{\partial x_a}\left(\prod_{i=1}^m x_i^i\right) = \frac{a}{x_a} \prod_{i=1}^m x_i^i \\~\\
\frac{\partial g}{\partial x_a} = \frac{\partial }{\partial x_a}\left(\sum_{i=1}^m x_i - m\right) = 1
$$

$$
\nabla_{x_1, x_2, \dots, x_m, \lambda} L(x_1, x_2, \dots, x_m, \lambda) = \left(\frac{1}{x_1} \prod_{i=1}^m x_i^i + \lambda, \frac{2}{x_2} \prod_{i=1}^m x_i^i + \lambda, \dots, \frac{m}{x_m} \prod_{i=1}^m x_i^i + \lambda, \sum_{i=1}^m x_i - m \right)
$$

We check the $\nabla = 0$:

$$
\nabla_{x_1, x_2, \dots, x_m, \lambda} L(x_1, x_2, \dots, x_m, \lambda) = 0 \Rightarrow 
\begin{cases}
\frac{1}{x_1} \prod_{i=1}^m x_i^i + \lambda = 0 \\
\frac{2}{x_2} \prod_{i=1}^m x_i^i + \lambda = 0 \\
\dots \\
\frac{m}{x_m} \prod_{i=1}^m x_i^i + \lambda = 0\\
\sum_{i=1}^m x_i - m = 0
\end{cases} \\~\\
\Rightarrow \frac{1}{x_1} \prod_{i=1}^m x_i^i = \frac{2}{x_2} \prod_{i=1}^m x_i^i = \dots = \frac{m}{x_m} \prod_{i=1}^m x_i^i = -\lambda \\~\\
\Rightarrow \frac{1}{x_1} = \frac{2}{x_2} = \dots = \frac{m}{x_m} \Rightarrow \frac{x_1}{1} = \frac{x_2}{2} = \dots = \frac{x_m}{m}
$$

Combine this with the other condition $\sum_{i=1}^m x_i - m = 0$, we have:

$$
\begin{cases}
\frac{x_1}{1} = \frac{x_2}{2} = \dots = \frac{x_m}{m} \\
\sum_{i=1}^m x_i = m
\end{cases}
$$

We can solve this by noting that: $x_1 = \frac{x_a}{a} \forall 1 \leq a \leq m$ 

and rewrite the sum:
$$
\sum_{i=1}^m x_i = m \Rightarrow \sum_{i=1}^m ix_1 = m \\~\\
\Rightarrow \frac{m(m+1)}{2} x_1 = m \Rightarrow x_1 = \frac{2}{m + 1}
$$

and evaluate the other $x$ values ($x_2, x_3, \dots, x_m)$ to get the result

so let's test this via code...

In [7]:
import math
def P(x: list[float]) -> float:
    t = 1.0
    for i, v in enumerate(x):
        t *= math.pow(v, i + 1)
    return t

def assumed_optimal_list(m: int) -> list[float]:
    x1 = 2 / (m + 1)
    res: list[float] = []
    res.append(x1)
    for i in range(2, m + 1):
        res.append(x1 * i)
    assert len(res) == m
    return res

P(assumed_optimal_list(10))
    

4112.08500285362

In [14]:
n = 0
for i in range(2, 15 + 1):
    n += int(P(assumed_optimal_list(i)))
n

371048281

We acknowledged above that 

$$
$4

In [27]:
from typing import Callable
import math
import sys
def partial_derivative(f: Callable[[list[float]], float], x: list[float], n: int) -> float:
    n -= 1
    h = math.sqrt(sys.float_info.epsilon)
    y = x.copy()
    y[n] += h
    return (f(y) - f(x)) / h

partial_derivative(P, [1, 2, 2], 2)

32.0

In [28]:
def P_partial_derivative(x: list[float], a: int) -> float:
    # note that we still use index from 1 to shift the index
    return a / x[a - 1] * P(x)

# P([1, 2])
P_partial_derivative([1, 3], 2)


6.0

In [37]:
def P_gradient(x: list[float]) -> list[float]:
    z = P(x)
    return [(i + 1) / x[i] * z for i in range(len(x))]

P_gradient([1, 1, 1, 1])

[1.0, 2.0, 3.0, 4.0]

we consider a small possibility. Suppose we have a small vector $X$, and gradient $\nabla$, we want to find a learning rate $\alpha$ such that if we take the result $Y = X - \alpha\nabla$ it still remains in the space, so we have

$$
\sum_{i=1}^m (X_i - \alpha\nabla_i) = m \\~\\ 
\Rightarrow \sum_{i=1}^m X_i = m + \sum_{i=1}^m \alpha\nabla_i = m  + \alpha\sum_{i=1}^m \nabla_i \\~\\
$$

wait nvm

In [None]:
def P_gradient_ascent(x0: list[float], iter: int = 10 ** 3, lr: float = 0.01) -> tuple[list[float], float]:
    # optimize the P-function, based on the requirement
    for _ in range(iter):
        grad = P_gradient(x0)
        # we customize the learning rate alpha to keep the list x0 in the plane of \sum__{i=1}^n = m 
    

SyntaxError: incomplete input (1092139060.py, line 1)