# Optimal control course project
# Solving the Cart-pole-v1 problem

by Artem Petrov and Ivan Kudriakov




# Mathematical model

The environment simulation of Cart-pole-v1 is performed according to the model, described here: https://coneural.org/florian/papers/05_cart_pole.pdf

([Proof from the code of Cart-Pole-v1](https://github.com/openai/gym/blob/2dddaf722acccfd0412d745890c40dcd972586d5/gym/envs/classic_control/cartpole.py#L126))

Interested reader who want to check out the whole inference of the following equations may want to read [the original source](https://coneural.org/florian/papers/05_cart_pole.pdf)

Here, I will provide the brief explanation of the model

<img src="presentation_media/forces_model.png" alt="drawing" width="500"/>

<!-- Here $G_c$ and $G_p$ are the forces of gravity acting on cart and pole respectively.

Here $N_c$ - reaction force and $F_f$ - friction force between the cart and the rail -->

Here are the parameters of the model:

$m_c$ - the mass of the cart 

$m_p$ - the mass of the pole 

$l$ - half-length of the pole

Original model also considers the friction forces, however in our model there are no friction.

<!-- 
$\mu_c$ - friction coefficient between the cart and the rail

$\mu_p$ - friction coefficient between the cart and the pole

Side note: the initial model assumes Lubricated friction between the pole and the cart which is somewhat counterintuitive. -->

Form the frictionless Newton equations the following  movement equations can be derived:

$$
\ddot{\theta} = \frac{g \sin \theta + \cos \theta \left( \frac{- F - m_p l \dot{\theta}^2\sin \theta}{m_c + m+p} \right)}
{l\left(\frac{4}{3} - \frac{m_p \cos ^2 \theta}{m_c + m_p}\right)}
\label{eq:theta} \tag{1}
$$

$$
\ddot{x} = \frac{F + m_p l(\dot{\theta}^2 \sin \theta - \ddot{\theta}\cos\theta)}
{m_c + m_p}
\label{eq:x} \tag{2}
$$

Where $F = \pm F_0$ - horizontal force applied to the cart by the agent


Now, let's pose the problem in terms of the optimal control theory.

Let $\mathbf{q} = (x, \dot{x}, \theta, \dot{\theta})^T$

$F = u F_0$, where $u \in \{-1, +1\}$

Then,

$$\dot{q} =  $$

The goal is to keep the pole upright for the longest time. 

This suggests the usage of the following simple functional:

$$ J = \int\limits_{0}^{\infty} \theta^2 dt \rightarrow \min $$ 

However, we must consider the fact, that the simulation will stop if one of the following conditions is reached:

- $|\theta| > 12$&deg;  (which contradicts the [frontpage](https://gym.openai.com/envs/CartPole-v1/), but is consistant with the [actual code](https://github.com/openai/gym/blob/2dddaf722acccfd0412d745890c40dcd972586d5/gym/envs/classic_control/cartpole.py#L90))

- $|x| > 2.4$

- $ t > 500\tau$, where $\tau = 0.02$sec - is the length of one episode of simulation



This conditions can be formulated as the restriction on the trajectory:

$h_1(