MrSyee · MrSyee · Apr 13, 2020 · Apr 6, 2020
diff --git a/02.PPO.ipynb b/02.PPO.ipynb
@@ -40,7 +40,7 @@
     "\n",
     "There are two kinds of algorithms of PPO: PPO-Penalty and PPO-Clip. Here, we'll implement PPO-clip version.\n",
     "\n",
-    "TRPO computes the gradients with a complex second-order method. On the other hand, PPO tries to solve the problem with a first-order methods that keep new polices close to old. To simplify the surrogate objective, let $r(\\theta)$ denote the probability ratio\n",
+    "TRPO computes the gradients with a complex second-order method. On the other hand, PPO tries to solve the problem with a first-order methods that keep new policies close to old. To simplify the surrogate objective, let $r(\\theta)$ denote the probability ratio\n",
     "\n",
     "$$ L^{CPI}(\\theta) = \\hat {\\mathbb{E}}_t \\left [ {\\pi_\\theta(a_t|s_t) \\over \\pi_{\\theta_{old}}(a_t|s_t)} \\hat A_t\\right] = \\hat {\\mathbb{E}}_t \\left [ r_t(\\theta) \\hat A_t \\right ].$$\n",
     "\n",