### Apply Iterative Policy Evaluation <br><br>


__Apply iterative policy evaluation to compute value functions__ <br><br><br>



__Sweep process to every states__ <br><br>


state | $s_1$ | $s_2$ | $s_3$ | $s_4$
--- | --- | --- | --- | ---
update | <img src="https://drive.google.com/uc?id=18hr7U-e1zfr2x6pkHThMy7Mg65HPrk3A" alt="1-16" width="300"> | <img src="https://drive.google.com/uc?id=14my_pX5MUb11WIjQi5VY5eQsOju5Zb8m" alt="1-17" width="300"> | <img src="https://drive.google.com/uc?id=123RzCrUUlniON8IZhbP1LC7YERokCj-q" alt="1-18" width="300"> | <img src="https://drive.google.com/uc?id=1nKqej8IQS6NYGFVvu5pTS3YutpIo-UDB" alt="1-19" width="300"> 

<br>

To implement iterative policy evaluation, <br>
we store two arrays, each has one entry for every state. <br><br>

$V \quad \; : \;$ One array stores the __current approximate value function__ <br>
$V' \quad : \;$ Another array stores the __updated values__ <br><br><br>



<img src="https://drive.google.com/uc?id=1fACrBK0Yq94kvB3udarkAitpluFhPpyQ" alt="1-20" width="400"> <br>

  - By using two arrays, <br>
we can compute the new values from the old one state at a time without old values being changed in the process. <br><br>

  At the end of a full sweep, we can write all the new values into $V$ (from updated $V'$). <br>
Then we do the next iteration. <br><br><br>



<img src="https://drive.google.com/uc?id=1Iep3q8FOPZAXI9kSxINlu9_oraTYaQ64" alt="1-20" width="400"> <br>

  - By using one array, <br>
  It is also possible to implement a version with only one array. <br>
  in which case, some updates will themselves use new values instead of old. <br><br>

  This single array version will usually converge faster. <br>
This is because it gets to use the updated values sooner. 


<br><br><br>



For simplicity, <br>
we focus on the two array version for example.


<br><br><br>





### Example : iterative policy evaluation (on a grid world) <br><br>


" How iterative policy evaluation works ? " <br>
( on a particular example ) <br><br><br>




#### 4 x 4 grid world <br>
$\rightarrow \quad$ Episodic MDP (with terminal state located in the top-left and the bottom-right corners) <br>


<img src="https://drive.google.com/uc?id=1ruUa6nXLB8bNDMV435chWpa6LAbXnr1I" alt="1-22"> <br><br>


$R = -1 \qquad : \;$ the reward is -1 for every transition <br><br>

$\gamma = 1 \qquad \quad : \;$ the undiscounted case (since the problem is episodic) <br><br>

$a_u, a_d, a_l, a_r \; : \;$ four possible actions in each state up, right, left, right <br>
$\qquad \qquad \qquad$ (each action is deterministic) <br>

$s_{1}, \; \text{...} \; , s_{14} \quad : \;$ Each location on the grid except terminal state location <br><br>

$\pi \qquad \qquad \; \; \; : \;$ let's evaluate the uniform random policy <br>
$\qquad \qquad \qquad$ (which selects each of the four actions 1/4 of the time) <br>

$v \qquad \qquad \quad : \;$ the value function represents the expected number of steps until termination <br>
$\qquad \qquad \qquad$ (from a given state) <br><br><br>




<img src="https://drive.google.com/uc?id=1y290ZePPGOH8P8EJHE755fCUSbnCEXKb" alt="1-23">

<br><br>


$\text{sweep} \qquad \quad : \;$ The order we sweep through the states is not important, <br>
$\qquad \qquad \qquad \; \;$ since we are using the two array version of the algorithm. <br>

$\qquad \qquad \qquad \; \;$ Let's assume we sweep the states first from left to right, and then from top to bottom. <br>

$\text{initializer} \quad \; \; : \;$ Initialize all the values in $V$ to $0$ <br>
$\qquad \qquad \qquad \; \;$ (We never update the value of the terminal state as it is defined to be $0$) <br>
$\qquad \qquad \qquad \; \;$ (The initial value stored in $V'$ are relevant since they'll always be updated before they are used) <br>


<br><br><br><br><br>



> Note. <br><br>
>
>정책 policy 의 확률은 전체 actions 중 어떤 action 을 할 지 결정하는 확률분포. <br><br>
>
>deterministic 과 stochastic 에서 stochastic 의 확률은 정책 policy 이 특정 action 을 결정했을 때에, 그 action 이 일어날지 말지에 대한 확률. (정책이 action 을 결정하여도 해당 action 이 stochastic 하게 일어나지 않을 수도 있다 !) <br>
>deterministic 의 경우 정책이 결정한 action 이 1의 확률로 무조건(결정)적으로 일어난다 !


<br><br><br><br><br>



### One sweep iteration proccess <br><br><br>



#### __steps of iteration__ with the update to states $s_1, s_2, \; \text{...} \;, s_{14}$ <br><br>

To compute the update, we have to sum over all actions <br><br><br>



sequense of states | $s_1$ | $s_2$ | $...$ | $s_{14}$
--- | --- | --- | --- | ---
updated state value | <img src="https://drive.google.com/uc?id=1b77b3Ke3iHPWz48-DNTVqgp44y3tqSXD" alt="1-24" width="400"> | <img src="https://drive.google.com/uc?id=1aRY0MgtE4k_Taq1rQT3mG11LkJKoAfw2" alt="1-25" width="400"> | $\quad \text{...} \quad$ | <img src="https://drive.google.com/uc?id=1w7yfgDtSpWc4T1CEvzvsovJL3oQw-qnH" alt="1-26" width="400">

<br><br>


$s_1$ &nbsp; : &nbsp; the result is that $V'(s_1)$ is set to $-1$ <br>

  - $a_l$ <br>
  Consider the left action for eacxmple one. It has probability 1/4 under the uniform random policy. The dynamics function, $p( \; )$, is deterministic here. So only the reward $r$ and value for a $s_1'$ contributes to the sum. The sum includes $-1$ for reward, and $0$ for the value of the terminal state. <br>

  - $a_r, a_u, a_d$ <br>
  Since we initialized all state values to $0$ and the reward for each transition is $-1$, the computation for all the other actions will look much the same. <br><br>



$s_2$ &nbsp; : &nbsp; the result is that $V'(s_2)$ is set to $-1$ too <br>

  - $a_l$ <br>
  Evaluate the left action for eaxample one. The action probability is 1/4 again. In this case, the next state is $s_1$. Although we have updated the value of $s_1$ already, the version of the algorithm we are running we'll use the old value stored in $V$. So the value for $s_1$ in the update is still $0$. <br>

  - $a_r, a_u, a_d$ <br>
  Again, all the other actions will look much the same. <br><br><br>



... <br><br><br>



$s_{14}$ &nbsp; : &nbsp; the result is that $V'(s_2)$ is set to $-1$ too <br><br>

  - Since every state value is initialized to $0$, <br>
  every state's value will be set to $-1$.



<br><br><br>



#### __Copy__ the updated __state value__ <br><br>


<img src="https://drive.google.com/uc?id=1F8CtAmxlcUW4TQ6PV-CnF1sJ3H6Bv4Hs" alt="1-27" width="400">


<br><br>


After one sweep is done, update $V$ to whole updated state values of $V'$ 


<br><br><br>


This has been only __one sweep__.


<br><br><br>







### Whole sweep iteration proccess <br><br><br>



#### The full algorithm for iterative policy evaluation <br>

Pseudo code for Iterative policy evaluation <br><br>


<img src="https://drive.google.com/uc?id=1dHc2-Y3o7tYjIG0aepOJVqF36nIUxcAr" alt="1-28">

<br><br>

$\text{input} \; \pi \qquad \qquad : \;$ Take any policy we want to evaluate <br><br>

$V \leftarrow \vec{0} \; , \; V' \leftarrow \vec{0} \; : \;$ Initialize two arrays $V$ and $V'$, and let's set them to $0$ <br><br>

$\text{Loop } : \;\; \text{~} \qquad \quad : \; $ loop one sweep of iterative policy evaluation works for multiple sweep <br><br>

$max(\Delta) \qquad \quad \;\;\; : \;$ maximum change between value $V'$ and $V$ <br><br>

$\theta \qquad \qquad \qquad \quad : \;$ some user-specified constant (like threshold ?)


<br><br>


The outer loop continues until the change in the approximate value function becomes small. We track the largest update $max(\Delta)$ to the state value in a given iteration. <br>
The outer loop terminates when this maximum change is less than some user-specified constant called theta $\theta$. $\quad \text{...} \quad \Delta < \theta$ <br><br>


Once the change in the approxiate value function is very small, this means we are close to $v_{\pi}$. <br>
As discussed before, once the approximate value function stops chaning, we have converged to $v_{\pi}$ <br><br>


$\Rightarrow \quad$ estimating (determining) velue function $v_{\pi}$ !




<br><br><br>




#### steps of iteration with the update to $V$ <br>

$\quad$ $V$ $\quad \leftarrow \quad$ $V'$ which updated through completion of sweep <br><br><br>



step of sweeps | updated values of $V$
--- | ---
first sweep <br>$V_{1}$ | <img src="https://drive.google.com/uc?id=10UE0QFT8aHgrkP_pywjeiUIajdfQz8z2" alt="1-29" width="400"> 
second sweep <br>$V_{2}$ | <img src="https://drive.google.com/uc?id=10wWP2ndzXkUTqVo1tlfJA4eNR1gns6O9" alt="1-30" width="400"> 
third sweep <br>$V_{3}$ | <img src="https://drive.google.com/uc?id=1bWMI-aWg_rA1j0Hliq6wnJgA1QUwv1sE" alt="1-31" width="400"> 
fourth sweep <br>$V_{4}$ | <img src="https://drive.google.com/uc?id=1TXNC9Ia3tJURydDGR-48ZiA3XVW6_jrx" alt="1-32" width="400"> 
fifth sweep <br>$V_{5}$ | <img src="https://drive.google.com/uc?id=1SScQ1gTdt2WOCqT8n4H4qvf89ZY9A2lt" alt="1-33" width="400"> 
sixth sweep <br>$V_{6}$ | <img src="https://drive.google.com/uc?id=1_Jxn8wZXdMzDHZ8FimelgbcSHepoWrtH" alt="1-34" width="400"> 
$$\quad \text{...} \quad$$ | $$\begin{align} \quad \\ \text{...} \\ \quad \end{align}$$ 
final sweep <br>$V_{\pi}$ | <img src="https://drive.google.com/uc?id=1ZcDtCz1lX_Ae4UudwZ0sDF3ZKHLQHDsa" alt="1-35" width="400"> 

<br><br>


$\theta = 0.001 \qquad : \;$ stopping parameter. <br><br>


The smaller value we choose, the more accurate our final value estimate will be. <br><br><br>




__first sweep__ <br>
After first one sweep complete, the maximum change $max(\Delta) = 1.0$. <br>
Since this is greater than 0.001 ($\theta$, we carry on to the next iteration. <br><br>



__second sweep__ <br>
After the second sweep, noticw how the terminal state starts to influence the value of the nearest states first. <br><br>

__third sweep__ <br>
After one more sweep, we see that now the influence of the terminal state has propagated further. <br><br>


__&emsp; ...__ <br>
We can see how the value of each state is related to it's proximity to the terminal state ! <br>
Let's keep running untul our maximum $\Delta$ is less than $\theta$ <br><br>


__final sweep__ <br>
Here is the result we eventually arrive at. <br>
Our approximate value function has converged to the value function for the random policy ! 



<br><br><br><br><br>





### Summary <br><br>


  - We can turn the __Bellman equation__ into an __update rule__, to __iteratively compute value functions__

<br><br>

>Soon, you'll see how these ideas can also be used for policy improvement


<br><br><br><br><br><br><br>








## __2. Policy Iteration (Control)__ <br><br>


  - Policy improvement <br><br>

  - Policy iteration


<br><br><br>


We just looked at how __dynamic programming__ can be used to __iteratively evaluate__ a __policy__. <br><br>

This is the first step towards the control policy task. <br>
The goal is to __improve__ a __policy__ ! 



<br><br><br><br><br>




## __$\cdot$ Policy improvement__ <br><br>


  - Understand the __policy improvement theorem__ <br>
  (and how it can be used to construct improved policies) <br><br>

  - Use a __value function__ for a policy to produce a __better policy__ for a given MDP


<br><br><br><br><br>






### __Greedification__ <br><br>


Choose greedy action in each states <br>
with respect to the value function $v_{\pi}$, not to the optimal value function $v_{*}$ <br>
( So the agent follows an arbitrary(?) policy $\pi$, not follows optimal policy $\pi_{*}$ ) <br><br>


<img src="https://drive.google.com/uc?id=1krTcv3sKkuXubF4jjMHIavXHIcttsQFO" alt="1-36">



Previously, <br><br>

__given $v_{*}$ (optimal value function)__, we can find the optimal policy ($\pi_{*}$) __by choosing the Greedy action__. <br>
The greedy action __maximizes the Bellman optimality equation__ in each state. <br><br><br>



Imagine, <br><br>

instead of the optimal value function ($v_{*}$), <br>
we __select an action__ which is __greedy__ with respect to the __value function $v_{\pi}$__ of an __arbitrary policy $\pi$__. <br><br>

$\Rightarrow \quad$ __" Greeification "__ <br><br>


$\qquad$ What can we say about this new policy ? <br>
$\Rightarrow \quad$ It is greedy with repect to $v_{\pi}$. 

<br><br><br>



the first thing to note <br><br>

  - this new policy must be different than $\pi$ <br><br>

  - If this greedification doesn't change $\pi$, <br>
  then $\pi$ was already greedy with respect to its own value function ($v_{\pi}$). <br>

  >This is just another way of saying that $v_{\pi}$ obeys the Bellman optimality equation. <br>
  >In which case, $\pi$ is already optimal !


<br><br><br>



### __Policy improvement theorem__ <br><br>


In fact, the __new policy obtained__ in this way must be a strict __improvement on $\pi$__, unless $\pi$ was already optimal. <br>
This is a consequence of a general result called the __policy improvement theorem__. <br><br><br>



<img src="https://drive.google.com/uc?id=1NzU8OPIAy5Ze8-5VtZ7W7I88pxsWauB-" alt="1-37">


Recall <br><br>

definition of $q_{\pi} \quad : \;$ Action value function <br>
$\qquad \qquad \qquad \quad \;$ the value of a state if you take action $a$ and then follow policy $\pi$. <br><br><br>


Imagine, <br><br>

we take action $a$ according to $\pi'$, and then follow policy $\pi$ <br><br><br>


  - Policy $\pi'$ is at least as good as $\pi$ <br>
  If this action $a$ has higher value than the action under $\pi$, then $\pi'$ must be better.<br><br>

  $\begin{align} &\rightarrow & q_{\pi}\big(s, \pi'(s)\big) &\geq q_{\pi}\big(s, \pi(s)\big) \quad \text{for all} \quad s \in \mathbb{S} \\ \\ &\Rightarrow & \text{then, } \quad \;\; \pi' &\geq \pi \end{align}$


<br><br><br>



  - Policy $\pi'$ is stricktly better then $\pi$, if the value is strictly greater __at least one state__. <br>
  If in each state, the value of the action selected by $\pi$ is greater than or equal to the value of the action selected by $\pi$.  <br><br>

  $\begin{align} &\rightarrow & q_{\pi}\big(s, \pi'(s)\big) &> q_{\pi}\big(s, \pi(s)\big) \quad \text{for at least one} \quad s \in \mathbb{S} \\ \\ &\Rightarrow & \text{then, } \quad \;\; \pi' &> \pi \end{align}$

<br><br><br>


The __policy improvement theorem__ formalizes this idea.


<br><br><br>






### Example <br><br>


" How policy improvement theotem works ? " <br>
(on the 4 by 4 grid ) <br><br><br>




#### Greedy $\pi$ policy <br>


<img src="https://drive.google.com/uc?id=1-t9LTQh8hV6gWdv3QFAhe9aXkIb88OGG" alt="1-38" width="500"> <br><br>


Here's the final value function we found before. <br>
This is the value function for the uniform random policy <br><br>




<img src="https://drive.google.com/uc?id=1Ry0lBXLOwzZBT17pPV-_Ne5j46JzwLhQ" alt="1-39" width="500">  <br><br>


Now, " what might the greedy $\pi$ policy look like ? " <br><br>

In each state, we need to select the action that leads to the next state with the highest value. <br>
(in this case, the highest value is the least neagative one) <br><br>

The picture shows $\pi'$ <br>
( The greedy actions ) <br><br>

This is quite different from the uniform random policy we started with. <br>

>Note. <br>
>the value shown here do not correspond to the values for $\pi'$ ! (those are just for $\pi$?)

<br><br>

The new policy ($\pi'$) is guaranteed to be an improvement on the uniform random policy we started with according to the policy improvement theorem. <br><br>

If you look more closely at the new policy, we can see that it is in fact optimal ! <br>
In every state, the choosen actions lie on the shortest path to the terminal state ! <br><br>

>Note. <br>
>the value function we started with was not the optimal value function, <br>
>and yet the greedy policy with respect to $v_{\pi}$ is optimal ! 

<br><br>

More generally, <br>
the policy improvement theorem only guarantees that the new policy is an improvement on the the original. <br>
We cannot always expect to find the optimal policy so easily. 



<br><br><br><br><br>





### Summary <br><br>

  - The policy Imtprovement Theorem tells us that a greedified policy is a strict improvement <br><br>

  - Use the value function under a given policy, to produce a strictly better policy 


<br><br>

>Next time : <br>
>How to use this result to create an iterative dynamic programming algorithm to find the optimal policy. 


<br><br><br><br><br>





## __$\cdot$ Policy iteration__ <br><br>


  - Outline the __policy iteration algorithm__ for finding the optimal policy <br><br>

  - Understand the __dance of policy and value__, <br>
  how policy iteration reaches the optimal policy by alternating between __evaluating a policy__ and __improving a policy__ <br><br>

  - Apply policy iteration to compute optimal policies and optimal value functions  <br><br><br>



>We just learned how the value function computed for the a given policy can be used to find a better policy. <br>
>In this video, we will show how we can use this to find the optimal policy by iteratively evaluating and proving a sequence of policies. 


<br><br><br><br><br>



### Policy iteration algorithm <br><br>


<img src="https://drive.google.com/uc?id=1mNXJ2tUI5rI1UPTqJabUjfcG1BFdeFS6" alt="1-40" width="500"> 

Recall <br>
the Policy Improvement Theorem. <br><br>

It tells us that we can construct a strcktly better policy by acting greedily with respect to the value function of a given policy (unless the given policy was already optimal). <br><br>
$\rightarrow \quad$ 
>given policy 가 uniform distribution 으로 주어지든 다른 정책이든 간에 <br>
>iterative evaluation 을 통해 추산한 values 들 중에서 큰 것만 고르는 new policy 를 고를 수 있으면 그 policy 를 골라야 더 이득 !

<br>


[ ~ 02:00 ] <br><br>


iteration | picture
--- | ---
1st <br>Evaluation | <img src="https://drive.google.com/uc?id=18_ox9A7fKyJdkVfoHEO-iEJUolhwYk1E" alt="1-41" width="400">
2nd <br>Improvement | <img src="https://drive.google.com/uc?id=1oorqwBo_R4QOpQsBY1In8uxN69Lr0bOP" alt="1-42" width="400">
3rd <br>Evaluation | <img src="https://drive.google.com/uc?id=1hGoJ6kfyCa7Vftn4bQ1qFXiXXf5PFzSr" alt="1-43" width="400">
4th <br>Improvement | <img src="https://drive.google.com/uc?id=1ejLAaY312mxNMHbd0RYJnoUk6S-2jS1G" alt="1-44" width="400">
<br>$$...$$<br>&emsp; | <br>$$...$$<br>&emsp;
final <br>Iteration | <img src="https://drive.google.com/uc?id=1YoeNJQfPKQa2hSqU_He4BRY5puxpEtzh" alt="1-45" width="400">


Iteratively alternating two steps for policy <br><br>

policy evaluation step $\qquad \qquad$ (using estimation by iterative policy evaluation) <br>
policy improvement step $\quad \qquad$ (using greedification by policy inprovement theorem) <br>
policy evaluation step <br>
policy improvement step <br>
$\qquad \qquad ...$ <br>

<br><br>

This gives us a sequence of better policies. <br>
Each policy is guaranteed to be an improvement on the last unless the last policy was already optimal. <br><br>

When we complete an iteration and the policy remains unchanged, we have found the optimal policy. <br>
At that point, we terminate the algorithm. 

<br><br>

iteration | convergence
--- | ---
<img src="https://drive.google.com/uc?id=1IC2qVpcIhQIva8m-r7TckuJGoZnhwnbm" alt="1-46" width="500"> | <img src="https://drive.google.com/uc?id=1O09fgUSLIP5obyxStbuuuDQPTe7x5p9Y" alt="1-47" width="500">

<br><br>


Each policy generated in this way is deterministic. <br>
There are finiter number of deterministic policies, <br>
so this iterative improvement must eventually reach an optimal policy. <br><br>


This method of finding an optimal policy is called policy iteration. <br><br><br>



>First, evaluate our current policy $\pi_1$, <br>
>which gives us a new value function that accurately reflects the value of $\pi_1$ <br><br>
>
>Then, improve the evaluated policy $\pi_2$ using $v_{\pi_1}$, <br>
>at this point $\pi_2$ is greedy with respect to the value function of $\pi_1$ ($v_{\pi_1}$ no longer reflects the value of $v_{\pi_2}$) <br><br>
>
>Again, evaluation makes our value function accurate with respect to the policy $\pi_2$. <br>
>Once we do this, our policy is once again not greedy ! <br><br>
>
>... <br><br>
>
>This dance of policy and value proceeds back and forth, <br>
>until we reach the only policy, which is greedy with repect to it's own value finction (the optimal policy) <br><br><br>
>
>
>Only at this point, <br>
>the policy is greedy, and the value function is accurate. 

<br><br>

Policy iteration always makes progress towards the intersection <br>
by projecting first onto the line $v = v_{\pi}$ and then onto the line where $\pi$ is greedy wirh respect to $v$, $\pi = \text{greedy(v)}$. <br><br>

The real geometry of the space of policies and value functons is more complicated ! <br>
but the same intuition holds. <br>



<br><br><br>





### Pseudo code <br>


<img src="https://drive.google.com/uc?id=1T4lUersO3ElyXZsODeb_pQZhvg0mUrba" alt="1-48">

<br>

  1. We initialize $v$ and $\pi$ in any way we like for each state $s$. <br><br>

  2. Next, we do the iterative policy evaluation <br>
to make $v$ reflect the value of $\pi$. <br><br>

  3. Then, we do greedification. <br>
in each state, we set $\pi$ to select the maximizing action under the value function. <br><br>

  4. If this procedure changes the selected action in any state, we note that the policy is still changing, and set policy stable to false. <br><br>

  5. After completing step 3, we chech if the policy is stable. <br>
if not, we carry on and evaluate the new policy. <br>


<br><br><br>

### Example : Grid world <br><br>


Now look at how this works on a simple problem to build some intuition. <br><br>

>Previously, we showed that by evaluating the random policy and greedifying just once, we could find the optimal policy. It's not a interesting caee for policy iteration. Let's modify this problem a little bit to make the control task a bit harder.

<br>

problem | first state
--- | ---
<img src="https://drive.google.com/uc?id=1fYcSbtm16piEdbNmq5qoF8Yyd4B3ffPN" alt="1-49" width="500"> | <img src="https://drive.google.com/uc?id=1SaOjwFkjWKo-BzmXjqWIJraWh0QMBrvB" alt="1-50" width="500">


<br>

Goal state $\quad$ : $(1, 1)$ <br><br>

Reward $\qquad$ : $R = \begin{cases} -10 &\text{Bad states} \\ -1 &\text{Other states} \end{cases}$ <br><br>

Discount $\quad \;\;$ : $\gamma = 1$ <br><br>

Policy $\qquad \;\;$ : follow uniform distribution


<br><br>

The optima policy should follow the winding low cost path in white to the termminal state. <br>
>This additional complexity means that policy iteration takes several iterations to discover the path.


<br><br><br>

Let's see how this play out. <br><br>

&emsp; | evaluation | improvement
--- | --- | ---
step 1 | <img src="https://drive.google.com/uc?id=1c0zzx5xzuEsAlmkdAhU0MMdL9J0BvsDQ" alt="1-51" width="500"> | <img src="https://drive.google.com/uc?id=1cJlQRAl7v_uKgUexMmxb8rPHk6iaG_7a" alt="1-52" width="500"> 
step 2 | <img src="https://drive.google.com/uc?id=1GY3Odroqmzu4NMEQ05kodB7AWGDjj4Z8" alt="1-53" width="500"> | <img src="https://drive.google.com/uc?id=1ollJoe45jGXQ1EqKVIKPftwblzSvvqWO" alt="1-54" width="500">
step 3 | <img src="https://drive.google.com/uc?id=1M_SICgCOzp2BpNNQ7Ms0rjyWk1KPDvYJ" alt="1-55" width="500"> | <img src="https://drive.google.com/uc?id=122eP44ILDJ_0D7H_wShqqpHb2HqgtSQa" alt="1-56" width="500">
step 4 | <img src="https://drive.google.com/uc?id=1oOnBsVJBETV1emjkalZdelqcVTVMqFpW" alt="1-57" width="500"> | <img src="https://drive.google.com/uc?id=1LQPuM_C_1ps--k1S-nLXOBbK1Y5wsrmf" alt="1-58" width="500">
step 5 | <img src="https://drive.google.com/uc?id=16f4cjX3qt4kjyjIZphsEDZwEwxteEeTb" alt="1-59" width="500"> | <img src="https://drive.google.com/uc?id=13pIiPnrdlBczbnf0gL0LnHgVTVPuVcU2" alt="1-60" width="500">

<br>

~ 

<br><br><br>



### ~ <br><br>

<img src="https://drive.google.com/uc?id=18gBUeVlj8Hrk8MeCcqTuV0nTnBAvlOaL" alt="1-61" width="500">

<br><br>

~ 


<br><br><br><br><br>



