diff --git a/lectures/ak2.md b/lectures/ak2.md
index e1a516725..cdcf78c0b 100644
--- a/lectures/ak2.md
+++ b/lectures/ak2.md
@@ -209,7 +209,7 @@ Units of the rental rates are:
 * for $r_t$,  output at time $t$  per unit of capital at time $t$ 
 
 
-We take output at time $t$ as *numeraire*, so the price of output at time $t$ is one.
+We take output at time $t$ as **numeraire**, so the price of output at time $t$ is one.
 
 The firm's profits at time $t$ are 
 
diff --git a/lectures/cake_eating_stochastic.md b/lectures/cake_eating_stochastic.md
index ff1187f0b..246b69b54 100644
--- a/lectures/cake_eating_stochastic.md
+++ b/lectures/cake_eating_stochastic.md
@@ -164,13 +164,13 @@ In summary, the agent's aim is to select a path $c_0, c_1, c_2, \ldots$ for cons
 1. nonnegative,
 1. feasible in the sense of {eq}`outcsdp0`,
 1. optimal, in the sense that it maximizes {eq}`texs0_og2` relative to all other feasible consumption sequences, and
-1. *adapted*, in the sense that the action $c_t$ depends only on
+1. **adapted**, in the sense that the action $c_t$ depends only on
    observable outcomes, not on future outcomes such as $\xi_{t+1}$.
 
 In the present context
 
-* $x_t$ is called the *state* variable --- it summarizes the "state of the world" at the start of each period.
-* $c_t$ is called the *control* variable --- a value chosen by the agent each period after observing the state.
+* $x_t$ is called the **state** variable --- it summarizes the "state of the world" at the start of each period.
+* $c_t$ is called the **control** variable --- a value chosen by the agent each period after observing the state.
 
 ### The Policy Function Approach
 
diff --git a/lectures/cake_eating_time_iter.md b/lectures/cake_eating_time_iter.md
index 21f30141f..9fe5d4ad9 100644
--- a/lectures/cake_eating_time_iter.md
+++ b/lectures/cake_eating_time_iter.md
@@ -237,7 +237,7 @@ whenever $\sigma \in \mathscr P$.
 It is possible to prove that there is a tight relationship between iterates of
 $K$ and iterates of the Bellman operator.
 
-Mathematically, the two operators are *topologically conjugate*.
+Mathematically, the two operators are **topologically conjugate**.
 
 Loosely speaking, this means that if iterates of one operator converge then
 so do iterates of the other, and vice versa.
diff --git a/lectures/career.md b/lectures/career.md
index 63cb10626..c8fed1268 100644
--- a/lectures/career.md
+++ b/lectures/career.md
@@ -66,8 +66,8 @@ from matplotlib import cm
 
 In what follows we distinguish between a career and a job, where
 
-* a *career* is understood to be a general field encompassing many possible jobs, and
-* a *job*  is understood to be a position with a particular firm
+* a **career** is understood to be a general field encompassing many possible jobs, and
+* a **job**  is understood to be a position with a particular firm
 
 For workers, wages can be decomposed into the contribution of job and career
 
diff --git a/lectures/cass_fiscal.md b/lectures/cass_fiscal.md
index fdf5c274d..b7e3646df 100644
--- a/lectures/cass_fiscal.md
+++ b/lectures/cass_fiscal.md
@@ -147,8 +147,8 @@ $$ (eq:gov_budget)
 
 Given a budget-feasible government policy $\{g_t\}_{t=0}^\infty$ and $\{\tau_{ct}, \tau_{kt}, \tau_{nt}, \tau_{ht}\}_{t=0}^\infty$ subject to {eq}`eq:gov_budget`,
 
-- *Household* chooses $\{c_t\}_{t=0}^\infty$, $\{n_t\}_{t=0}^\infty$, and $\{k_{t+1}\}_{t=0}^\infty$ to maximize utility{eq}`eq:utility` subject to budget constraint{eq}`eq:house_budget`, and 
-- *Frim* chooses sequences of capital $\{k_t\}_{t=0}^\infty$ and $\{n_t\}_{t=0}^\infty$ to maximize profits
+- **Household** chooses $\{c_t\}_{t=0}^\infty$, $\{n_t\}_{t=0}^\infty$, and $\{k_{t+1}\}_{t=0}^\infty$ to maximize utility{eq}`eq:utility` subject to budget constraint{eq}`eq:house_budget`, and 
+- **Firm** chooses sequences of capital $\{k_t\}_{t=0}^\infty$ and $\{n_t\}_{t=0}^\infty$ to maximize profits
 
     $$
          \sum_{t=0}^\infty q_t [F(k_t, n_t) - \eta_t k_t - w_t n_t]
diff --git a/lectures/kalman.md b/lectures/kalman.md
index a516a8eb2..fa089320f 100644
--- a/lectures/kalman.md
+++ b/lectures/kalman.md
@@ -85,7 +85,7 @@ One way to summarize our knowledge is a point prediction $\hat x$
 * Then it is better to summarize our initial beliefs with a bivariate probability density $p$
   * $\int_E p(x)dx$ indicates the probability that we attach to the missile being in region $E$.
 
-The density $p$ is called our *prior* for the random variable $x$.
+The density $p$ is called our **prior** for the random variable $x$.
 
 To keep things tractable in our example,  we  assume that our prior is Gaussian.
 
@@ -317,7 +317,7 @@ We have obtained probabilities for the current location of the state (missile) g
 This is called "filtering" rather than forecasting because we are filtering
 out noise rather than looking into the future.
 
-* $p(x \,|\, y) = N(\hat x^F, \Sigma^F)$ is called the *filtering distribution*
+* $p(x \,|\, y) = N(\hat x^F, \Sigma^F)$ is called the **filtering distribution**
 
 But now let's suppose that we are given another task: to predict the location of the missile after one unit of time (whatever that may be) has elapsed.
 
@@ -331,7 +331,7 @@ Let's suppose that we have one, and that it's linear and Gaussian. In particular
 x_{t+1} = A x_t + w_{t+1}, \quad \text{where} \quad w_t \sim N(0, Q)
 ```
 
-Our aim is to combine this law of motion and our current distribution $p(x \,|\, y) = N(\hat x^F, \Sigma^F)$ to come up with a new *predictive* distribution for the location in one unit of time.
+Our aim is to combine this law of motion and our current distribution $p(x \,|\, y) = N(\hat x^F, \Sigma^F)$ to come up with a new **predictive** distribution for the location in one unit of time.
 
 In view of {eq}`kl_xdynam`, all we have to do is introduce a random vector $x^F \sim N(\hat x^F, \Sigma^F)$ and work out the distribution of $A x^F + w$ where $w$ is independent of $x^F$ and has distribution $N(0, Q)$.
 
@@ -356,7 +356,7 @@ $$
 $$
 
 The matrix $A \Sigma G' (G \Sigma G' + R)^{-1}$ is often written as
-$K_{\Sigma}$ and called the *Kalman gain*.
+$K_{\Sigma}$ and called the **Kalman gain**.
 
 * The subscript $\Sigma$ has been added to remind us that  $K_{\Sigma}$ depends on $\Sigma$, but not $y$ or $\hat x$.
 
@@ -373,7 +373,7 @@ Our updated prediction is the density $N(\hat x_{new}, \Sigma_{new})$ where
 \end{aligned}
 ```
 
-* The density $p_{new}(x) = N(\hat x_{new}, \Sigma_{new})$ is called the *predictive distribution*
+* The density $p_{new}(x) = N(\hat x_{new}, \Sigma_{new})$ is called the **predictive distribution**
 
 The predictive distribution is the new density shown in the following figure, where
 the update has used parameters.
diff --git a/lectures/likelihood_bayes.md b/lectures/likelihood_bayes.md
index c9b203f11..da81a560f 100644
--- a/lectures/likelihood_bayes.md
+++ b/lectures/likelihood_bayes.md
@@ -129,8 +129,8 @@ $$
 where we use the conventions 
 that $f(w^t) = f(w_1) f(w_2) \ldots f(w_t)$ and $g(w^t) = g(w_1) g(w_2) \ldots g(w_t)$.
 
-Notice that the likelihood process satisfies the *recursion* or
-*multiplicative decomposition*
+Notice that the likelihood process satisfies the **recursion** or
+**multiplicative decomposition**
 
 $$
 L(w^t) = \ell (w_t) L (w^{t-1}) .
diff --git a/lectures/linear_algebra.md b/lectures/linear_algebra.md
index 9e89b2a33..33f3dc53e 100644
--- a/lectures/linear_algebra.md
+++ b/lectures/linear_algebra.md
@@ -85,7 +85,7 @@ from scipy.linalg import inv, solve, det, eig
 ```{index} single: Linear Algebra; Vectors
 ```
 
-A *vector* of length $n$ is just a sequence (or array, or tuple) of $n$ numbers, which we write as $x = (x_1, \ldots, x_n)$ or  $x = [x_1, \ldots, x_n]$.
+A **vector** of length $n$ is just a sequence (or array, or tuple) of $n$ numbers, which we write as $x = (x_1, \ldots, x_n)$ or  $x = [x_1, \ldots, x_n]$.
 
 We will write these sequences either horizontally or vertically as we please.
 
@@ -225,15 +225,15 @@ x + y
 ```{index} single: Vectors; Norm
 ```
 
-The *inner product* of vectors $x,y \in \mathbb R ^n$ is defined as
+The **inner product** of vectors $x,y \in \mathbb R ^n$ is defined as
 
 $$
 x' y := \sum_{i=1}^n x_i y_i
 $$
 
-Two vectors are called *orthogonal* if their inner product is zero.
+Two vectors are called **orthogonal** if their inner product is zero.
 
-The *norm* of a vector $x$ represents its "length" (i.e., its distance from the zero vector) and is defined as
+The **norm** of a vector $x$ represents its "length" (i.e., its distance from the zero vector) and is defined as
 
 $$
 \| x \| := \sqrt{x' x} := \left( \sum_{i=1}^n x_i^2 \right)^{1/2}
@@ -273,7 +273,7 @@ np.linalg.norm(x)      # Norm of x, take three
 
 Given a set of vectors $A := \{a_1, \ldots, a_k\}$ in $\mathbb R ^n$, it's natural to think about the new vectors we can create by performing linear operations.
 
-New vectors created in this manner are called *linear combinations* of $A$.
+New vectors created in this manner are called **linear combinations** of $A$.
 
 In particular, $y \in \mathbb R ^n$ is a linear combination of $A := \{a_1, \ldots, a_k\}$ if
 
@@ -282,9 +282,9 @@ y = \beta_1 a_1 + \cdots + \beta_k a_k
 \text{ for some scalars } \beta_1, \ldots, \beta_k
 $$
 
-In this context, the values $\beta_1, \ldots, \beta_k$ are called the *coefficients* of the linear combination.
+In this context, the values $\beta_1, \ldots, \beta_k$ are called the **coefficients** of the linear combination.
 
-The set of linear combinations of $A$ is called the *span* of $A$.
+The set of linear combinations of $A$ is called the **span** of $A$.
 
 The next figure shows the span of $A = \{a_1, a_2\}$ in $\mathbb R ^3$.
 
@@ -349,7 +349,7 @@ plt.show()
 If $A$ contains only one vector $a_1 \in \mathbb R ^2$, then its
 span is just the scalar multiples of $a_1$, which is the unique line passing through both $a_1$ and the origin.
 
-If $A = \{e_1, e_2, e_3\}$ consists  of the *canonical basis vectors* of $\mathbb R ^3$, that is
+If $A = \{e_1, e_2, e_3\}$ consists  of the **canonical basis vectors** of $\mathbb R ^3$, that is
 
 $$
 e_1 :=
@@ -399,8 +399,8 @@ The condition we need for a set of vectors to have a large span is what's called
 
 In particular, a collection of vectors $A := \{a_1, \ldots, a_k\}$ in $\mathbb R ^n$ is said to be
 
-* *linearly dependent* if some strict subset of $A$ has the same span as $A$.
-* *linearly independent* if it is not linearly dependent.
+* **linearly dependent** if some strict subset of $A$ has the same span as $A$.
+* **linearly independent** if it is not linearly dependent.
 
 Put differently, a set of vectors is linearly independent if no vector is redundant to the span and linearly dependent otherwise.
 
@@ -469,19 +469,19 @@ Often, the numbers in the matrix represent coefficients in a system of linear eq
 
 For obvious reasons, the matrix $A$ is also called a vector if either $n = 1$ or $k = 1$.
 
-In the former case, $A$ is called a *row vector*, while in the latter it is called a *column vector*.
+In the former case, $A$ is called a **row vector**, while in the latter it is called a **column vector**.
 
-If $n = k$, then $A$ is called *square*.
+If $n = k$, then $A$ is called **square**.
 
-The matrix formed by replacing $a_{ij}$ by $a_{ji}$ for every $i$ and $j$ is called the *transpose* of $A$ and denoted $A'$ or $A^{\top}$.
+The matrix formed by replacing $a_{ij}$ by $a_{ji}$ for every $i$ and $j$ is called the **transpose** of $A$ and denoted $A'$ or $A^{\top}$.
 
-If $A = A'$, then $A$ is called *symmetric*.
+If $A = A'$, then $A$ is called **symmetric**.
 
-For a square matrix $A$, the $i$ elements of the form $a_{ii}$ for $i=1,\ldots,n$ are called the *principal diagonal*.
+For a square matrix $A$, the $i$ elements of the form $a_{ii}$ for $i=1,\ldots,n$ are called the **principal diagonal**.
 
-$A$ is called *diagonal* if the only nonzero entries are on the principal diagonal.
+$A$ is called **diagonal** if the only nonzero entries are on the principal diagonal.
 
-If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then $A$ is called the *identity matrix* and denoted by $I$.
+If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then $A$ is called the **identity matrix** and denoted by $I$.
 
 ### Matrix Operations
 
@@ -641,9 +641,9 @@ See [here](https://python-programming.quantecon.org/numpy.html#matrix-multiplica
 
 Each $n \times k$ matrix $A$ can be identified with a function $f(x) = Ax$ that maps $x \in \mathbb R ^k$ into $y = Ax \in \mathbb R ^n$.
 
-These kinds of functions have a special property: they are *linear*.
+These kinds of functions have a special property: they are **linear**.
 
-A function $f \colon \mathbb R ^k \to \mathbb R ^n$ is called *linear* if, for all $x, y \in \mathbb R ^k$ and all scalars $\alpha, \beta$, we have
+A function $f \colon \mathbb R ^k \to \mathbb R ^n$ is called **linear** if, for all $x, y \in \mathbb R ^k$ and all scalars $\alpha, \beta$, we have
 
 $$
 f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)
@@ -773,7 +773,7 @@ In particular, the following are equivalent
 1. The columns of $A$ are linearly independent.
 1. For any $y \in \mathbb R ^n$, the equation $y = Ax$ has a unique solution.
 
-The property of having linearly independent columns is sometimes expressed as having *full column rank*.
+The property of having linearly independent columns is sometimes expressed as having **full column rank**.
 
 #### Inverse Matrices
 
@@ -788,7 +788,7 @@ solution is $x = A^{-1} y$.
 A similar expression is available in the matrix case.
 
 In particular, if square matrix $A$ has full column rank, then it possesses a multiplicative
-*inverse matrix* $A^{-1}$, with the property that $A A^{-1} = A^{-1} A = I$.
+**inverse matrix** $A^{-1}$, with the property that $A A^{-1} = A^{-1} A = I$.
 
 As a consequence, if we pre-multiply both sides of $y = Ax$ by $A^{-1}$, we get $x = A^{-1} y$.
 
@@ -800,11 +800,11 @@ This is the solution that we're looking for.
 ```
 
 Another quick comment about square matrices is that to every such matrix we
-assign a unique number called the *determinant* of the matrix --- you can find
+assign a unique number called the **determinant** of the matrix --- you can find
 the expression for it [here](https://en.wikipedia.org/wiki/Determinant).
 
 If the determinant of $A$ is not zero, then we say that $A$ is
-*nonsingular*.
+**nonsingular**.
 
 Perhaps the most important fact about determinants is that $A$ is nonsingular if and only if $A$ is of full column rank.
 
@@ -929,8 +929,8 @@ $$
 A v = \lambda v
 $$
 
-then we say that $\lambda$ is an *eigenvalue* of $A$, and
-$v$ is an *eigenvector*.
+then we say that $\lambda$ is an **eigenvalue** of $A$, and
+$v$ is an **eigenvector**.
 
 Thus, an eigenvector of $A$ is a vector such that when the map $f(x) = Ax$ is applied, $v$ is merely scaled.
 
@@ -1034,7 +1034,7 @@ to one.
 
 ### Generalized Eigenvalues
 
-It is sometimes useful to consider the *generalized eigenvalue problem*, which, for given
+It is sometimes useful to consider the **generalized eigenvalue problem**, which, for given
 matrices $A$ and $B$, seeks generalized eigenvalues
 $\lambda$ and eigenvectors $v$ such that
 
@@ -1076,10 +1076,10 @@ $$
 $$
 
 The norms on the right-hand side are ordinary vector norms, while the norm on
-the left-hand side is a *matrix norm* --- in this case, the so-called
-*spectral norm*.
+the left-hand side is a **matrix norm** --- in this case, the so-called
+**spectral norm**.
 
-For example, for a square matrix $S$, the condition $\| S \| < 1$ means that $S$ is *contractive*, in the sense that it pulls all vectors towards the origin [^cfn].
+For example, for a square matrix $S$, the condition $\| S \| < 1$ means that $S$ is **contractive**, in the sense that it pulls all vectors towards the origin [^cfn].
 
 (la_neumann)=
 #### {index}`Neumann's Theorem <single: Neumann's Theorem>`
@@ -1112,7 +1112,7 @@ $$
 \rho(A) = \lim_{k \to \infty} \| A^k \|^{1/k}
 $$
 
-Here $\rho(A)$ is the *spectral radius*, defined as $\max_i |\lambda_i|$, where $\{\lambda_i\}_i$ is the set of eigenvalues of $A$.
+Here $\rho(A)$ is the **spectral radius**, defined as $\max_i |\lambda_i|$, where $\{\lambda_i\}_i$ is the set of eigenvalues of $A$.
 
 As a consequence of Gelfand's formula, if all eigenvalues are strictly less than one in modulus,
 there exists a $k$ with $\| A^k \| < 1$.
@@ -1128,8 +1128,8 @@ Let $A$ be a symmetric $n \times n$ matrix.
 
 We say that $A$ is
 
-1. *positive definite* if $x' A x > 0$ for every $x \in \mathbb R ^n \setminus \{0\}$
-1. *positive semi-definite* or *nonnegative definite* if $x' A x \geq 0$ for every $x \in \mathbb R ^n$
+1. **positive definite** if $x' A x > 0$ for every $x \in \mathbb R ^n \setminus \{0\}$
+1. **positive semi-definite** or **nonnegative definite** if $x' A x \geq 0$ for every $x \in \mathbb R ^n$
 
 Analogous definitions exist for negative definite and negative semi-definite matrices.
 
diff --git a/lectures/linear_models.md b/lectures/linear_models.md
index 43ae68038..feb76976a 100644
--- a/lectures/linear_models.md
+++ b/lectures/linear_models.md
@@ -112,7 +112,7 @@ The primitives of the model are
 Given $A, C, G$ and draws of $x_0$ and $w_1, w_2, \ldots$, the
 model {eq}`st_space_rep` pins down the values of the sequences $\{x_t\}$ and $\{y_t\}$.
 
-Even without these draws, the primitives 1--3 pin down the *probability distributions* of $\{x_t\}$ and $\{y_t\}$.
+Even without these draws, the primitives 1--3 pin down the **probability distributions** of $\{x_t\}$ and $\{y_t\}$.
 
 Later we'll see how to compute these distributions and their moments.
 
@@ -259,7 +259,7 @@ C = \begin{bmatrix}
      \end{bmatrix}
 $$
 
-The matrix $A$ has the form of the *companion matrix* to the vector
+The matrix $A$ has the form of the **companion matrix** to the vector
 $\begin{bmatrix}\phi_1 &  \phi_2 & \phi_3 & \phi_4 \end{bmatrix}$.
 
 The next figure shows the dynamics of this process when
@@ -301,7 +301,7 @@ Now suppose that
 * $\phi_j$ is a $k \times k$ matrix and
 * $w_t$ is $k \times 1$
 
-Then {eq}`eq_ar_rep` is termed a *vector autoregression*.
+Then {eq}`eq_ar_rep` is termed a **vector autoregression**.
 
 To map this into {eq}`st_space_rep`, we set
 
@@ -345,8 +345,8 @@ where $I$ is the $k \times k$ identity matrix and $\sigma$ is a $k \times k$ mat
 
 We can use {eq}`st_space_rep` to represent
 
-1. the *deterministic seasonal* $y_t = y_{t-4}$
-1. the *indeterministic seasonal* $y_t = \phi_4 y_{t-4} + w_t$
+1. the **deterministic seasonal** $y_t = y_{t-4}$
+1. the **indeterministic seasonal** $y_t = \phi_4 y_{t-4} + w_t$
 
 In fact, both are special cases of {eq}`eq_ar_rep`.
 
@@ -376,7 +376,7 @@ The *indeterministic* seasonal produces recurrent, but aperiodic, seasonal fluct
 ```{index} single: Linear State Space Models; Time Trends
 ```
 
-The model $y_t = a t + b$ is known as a *linear time trend*.
+The model $y_t = a t + b$ is known as a **linear time trend**.
 
 We can represent this model in the linear state space form by taking
 
@@ -462,7 +462,7 @@ $x_0, w_1, w_2, \ldots,  w_t$ can be found by using {eq}`st_space_rep` repeatedl
 \end{aligned}
 ```
 
-Representation {eq}`eqob5` is a  *moving average* representation.
+Representation {eq}`eqob5` is a  **moving average** representation.
 
 It expresses $\{x_t\}$ as a linear function of
 
@@ -503,7 +503,7 @@ The first term on the right is a cumulated sum of martingale differences and is
 
 The second term is a translated linear function of time.
 
-For this reason, $x_{1t}$ is called a *martingale with drift*.
+For this reason, $x_{1t}$ is called a **martingale with drift**.
 
 ## Distributions and Moments
 
@@ -548,8 +548,8 @@ As with $\mu_0$, the matrix $\Sigma_0$ is a primitive given in {eq}`st_space_rep
 
 As a matter of terminology, we will sometimes call
 
-* $\mu_t$ the *unconditional mean*  of $x_t$
-* $\Sigma_t$ the *unconditional variance-covariance matrix*  of $x_t$
+* $\mu_t$ the **unconditional mean**  of $x_t$
+* $\Sigma_t$ the **unconditional variance-covariance matrix**  of $x_t$
 
 This is to distinguish $\mu_t$ and $\Sigma_t$ from related objects that use conditioning
 information, to be defined below.
@@ -763,8 +763,8 @@ In the preceding figure, we approximated the population distribution of $y_T$ by
 1. recording each observation $y^i_T$
 1. histogramming this sample
 
-Just as the histogram approximates the population distribution, the *ensemble* or
-*cross-sectional average*
+Just as the histogram approximates the population distribution, the **ensemble** or
+**cross-sectional average**
 
 $$
 \bar y_T := \frac{1}{I} \sum_{i=1}^I y_T^i
@@ -870,7 +870,7 @@ $$
 
 #### Autocovariance Functions
 
-An important object related to the joint distribution is the *autocovariance function*
+An important object related to the joint distribution is the **autocovariance function**
 
 ```{math}
 :label: eqnautodeff
@@ -958,11 +958,11 @@ the distribution at $T$.
 Apparently, the distributions of $y_t$  converge to a fixed long-run
 distribution as $t \to \infty$.
 
-When such a distribution exists it is called a *stationary distribution*.
+When such a distribution exists it is called a **stationary distribution**.
 
 ### Stationary Distributions
 
-In our setting, a distribution $\psi_{\infty}$ is said to be *stationary* for $x_t$ if
+In our setting, a distribution $\psi_{\infty}$ is said to be **stationary** for $x_t$ if
 
 $$
 x_t \sim \psi_{\infty}
@@ -1016,7 +1016,7 @@ Moreover, in view of {eq}`eqnautocov`, the autocovariance function takes the for
 
 This motivates the following definition.
 
-A  process $\{x_t\}$ is said to be *covariance stationary* if
+A  process $\{x_t\}$ is said to be **covariance stationary** if
 
 * both $\mu_t$ and $\Sigma_t$ are constant in $t$
 * $\Sigma_{t+j,t}$ depends on the time gap $j$ but not on time $t$
@@ -1246,7 +1246,7 @@ $$
 The right-hand side follows from $x_{t+1} = A x_t + C w_{t+1}$ and the
 fact that $w_{t+1}$ is zero mean and independent of $x_t, x_{t-1}, \ldots, x_0$.
 
-That $\mathbb{E}_t [x_{t+1}] = \mathbb{E}[x_{t+1} \mid x_t]$ is an implication of $\{x_t\}$ having the *Markov property*.
+That $\mathbb{E}_t [x_{t+1}] = \mathbb{E}[x_{t+1} \mid x_t]$ is an implication of $\{x_t\}$ having the **Markov property**.
 
 The one-step-ahead forecast error is
 
@@ -1313,7 +1313,7 @@ $V_j$ defined in {eq}`eqob9a` can be calculated recursively via $V_1 = CC'$ and
 V_j = CC^\prime + A V_{j-1} A^\prime, \quad j \geq 2
 ```
 
-$V_j$ is the *conditional covariance matrix* of the errors in forecasting
+$V_j$ is the **conditional covariance matrix** of the errors in forecasting
 $x_{t+j}$, conditioned on time $t$ information $x_t$.
 
 Under particular conditions, $V_j$ converges to
@@ -1324,7 +1324,7 @@ Under particular conditions, $V_j$ converges to
 V_\infty = CC' + A V_\infty A'
 ```
 
-Equation {eq}`eqob10` is an example of a *discrete Lyapunov* equation in the covariance matrix $V_\infty$.
+Equation {eq}`eqob10` is an example of a **discrete Lyapunov** equation in the covariance matrix $V_\infty$.
 
 A sufficient condition for $V_j$ to converge is that the eigenvalues of $A$ be strictly less than one in modulus.
 
diff --git a/lectures/lln_clt.md b/lectures/lln_clt.md
index 7aa6954ae..d2839bede 100644
--- a/lectures/lln_clt.md
+++ b/lectures/lln_clt.md
@@ -84,7 +84,7 @@ will converge to their population means.
 The classical law of large numbers concerns independent and
 identically distributed (IID) random variables.
 
-Here is the strongest version of the classical LLN, known as *Kolmogorov's strong law*.
+Here is the strongest version of the classical LLN, known as **Kolmogorov's strong law**.
 
 Let $X_1, \ldots, X_n$ be independent and identically
 distributed scalar random variables, with common distribution $F$.
@@ -563,7 +563,7 @@ $$
 \right) =: \boldsymbol \mu
 $$
 
-The *variance-covariance matrix* of random vector $\mathbf X$ is defined as
+The **variance-covariance matrix** of random vector $\mathbf X$ is defined as
 
 $$
 \mathop{\mathrm{Var}}[\mathbf X]
diff --git a/lectures/markov_asset.md b/lectures/markov_asset.md
index 1739dde0f..4421b37b4 100644
--- a/lectures/markov_asset.md
+++ b/lectures/markov_asset.md
@@ -264,7 +264,7 @@ $$
 p_t = \frac{1 + \kappa}{ \rho - \kappa} d_t
 $$
 
-This is called the *Gordon formula*.
+This is called the **Gordon formula**.
 
 (mass_mg)=
 ### Example 3: Markov Growth, Risk-Neutral Pricing
@@ -473,7 +473,7 @@ where $u$ is a concave utility function and $c_t$ is time $t$ consumption of a r
 
 Assume the existence of an endowment that follows growth process {eq}`mass_fmce`.
 
-The asset being priced is a claim on the endowment process, i.e., the *Lucas tree* described above.
+The asset being priced is a claim on the endowment process, i.e., the **Lucas tree** described above.
 
 Following {cite}`Lucas1978`, we suppose  that in equilibrium the representative consumer's  consumption equals the aggregate endowment, so that $d_t = c_t$ for all $t$.
 
@@ -748,7 +748,7 @@ We'll study an option that  gives the owner the  right to purchase a consol at a
 
 #### An Infinite Horizon Call Option
 
-We want to price an *infinite horizon*  option to purchase a consol at a price $p_S$.
+We want to price an **infinite horizon**  option to purchase a consol at a price $p_S$.
 
 The option entitles the owner at the beginning of a period either
 
@@ -757,7 +757,7 @@ The option entitles the owner at the beginning of a period either
 
 Thus, the owner either *exercises* the option now or chooses *not to exercise* and wait until next period.
 
-This is termed an infinite-horizon *call option* with *strike price* $p_S$.
+This is termed an infinite-horizon **call option** with **strike price** $p_S$.
 
 The owner of the option is entitled to purchase the consol at  price $p_S$ at the beginning of any period, after the coupon has been paid to the previous owner of the bond.
 
diff --git a/lectures/markov_perf.md b/lectures/markov_perf.md
index c7b99c7ad..923cd07af 100644
--- a/lectures/markov_perf.md
+++ b/lectures/markov_perf.md
@@ -140,7 +140,7 @@ v_i(q_i, q_{-i}) = \max_{\hat q_i}
    \left\{\pi_i (q_i, q_{-i}, \hat q_i) + \beta v_i(\hat q_i, f_{-i}(q_{-i}, q_i)) \right\}
 ```
 
-**Definition**  A *Markov perfect equilibrium* of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
+**Definition**  A **Markov perfect equilibrium** of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
 
 * The value function $v_i$ satisfies  Bellman equation {eq}`game4`.
 * The maximizer on the right side of {eq}`game4`  equals $f_i(q_i, q_{-i})$.
diff --git a/lectures/mle.md b/lectures/mle.md
index 929eed27c..7a1942d42 100644
--- a/lectures/mle.md
+++ b/lectures/mle.md
@@ -183,7 +183,7 @@ In Treisman's paper, the dependent variable --- the number of billionaires $y_i$
 
 Hence, the distribution of $y_i$ needs to be conditioned on the vector of explanatory variables $\mathbf{x}_i$.
 
-The standard formulation --- the so-called *Poisson regression* model --- is as follows:
+The standard formulation --- the so-called **Poisson regression** model --- is as follows:
 
 ```{math}
 :label: poissonreg
@@ -861,7 +861,7 @@ f(y_i; \boldsymbol{\beta}) = \mu_i^{y_i} (1-\mu_i)^{1-y_i}, \quad y_i = 0,1 \\
 \end{aligned}
 $$
 
-$\Phi$ represents the *cumulative normal distribution* and
+$\Phi$ represents the **cumulative normal distribution** and
 constrains the predicted $y_i$ to be between 0 and 1 (as required
 for a probability).
 
diff --git a/lectures/odu.md b/lectures/odu.md
index 656b141db..c62519468 100644
--- a/lectures/odu.md
+++ b/lectures/odu.md
@@ -111,7 +111,7 @@ v(w)
 ```
 
 The optimal policy has the form $\mathbf{1}\{w \geq \bar w\}$, where
-$\bar w$ is a constant called the *reservation wage*.
+$\bar w$ is a constant called the **reservation wage**.
 
 ### Offer Distribution Unknown
 
@@ -545,7 +545,7 @@ and using $\circ$ for composition of functions yields
 
 Equation {eq}`odu_mvf4` can be understood as a functional equation, where $\bar w$ is the unknown function.
 
-* Let's call it the *reservation wage functional equation* (RWFE).
+* Let's call it the **reservation wage functional equation** (RWFE).
 * The solution $\bar w$ to the RWFE is the object that we wish to compute.
 
 ## Solving the RWFE
diff --git a/lectures/ols.md b/lectures/ols.md
index 22b5a0844..bb4cc68af 100644
--- a/lectures/ols.md
+++ b/lectures/ols.md
@@ -169,7 +169,7 @@ The most common technique to estimate the parameters ($\beta$'s)
 of the linear model is Ordinary Least Squares (OLS).
 
 As the name implies, an OLS model is solved by finding the parameters
-that minimize *the sum of squared residuals*, i.e.
+that minimize **the sum of squared residuals**, i.e.
 
 $$
 \underset{\hat{\beta}}{\min} \sum^N_{i=1}{\hat{u}^2_i}
diff --git a/lectures/rational_expectations.md b/lectures/rational_expectations.md
index dbc6e628a..ec8be4bc8 100644
--- a/lectures/rational_expectations.md
+++ b/lectures/rational_expectations.md
@@ -309,7 +309,7 @@ Y_{t+1} =  H(Y_t)
 
 where $Y_0$ is a known initial condition.
 
-The *belief function* $H$ is an equilibrium object, and hence remains to be determined.
+The **belief function** $H$ is an equilibrium object, and hence remains to be determined.
 
 #### Optimal Behavior Given Beliefs
 
@@ -364,7 +364,7 @@ $$
 v_y(y,Y) = a_0 - a_1 Y + \gamma (y' - y)
 $$
 
-Substituting this equation into {eq}`comp5` gives the *Euler equation*
+Substituting this equation into {eq}`comp5` gives the **Euler equation**
 
 ```{math}
 :label: ree_comp7
@@ -377,7 +377,7 @@ The firm optimally sets  an output path that satisfies {eq}`ree_comp7`, taking {
 * the initial conditions for $(y_0, Y_0)$.
 * the terminal condition $\lim_{t \rightarrow \infty } \beta^t y_t v_y(y_{t}, Y_t) = 0$.
 
-This last condition is called the *transversality condition*, and acts as a first-order necessary condition "at infinity".
+This last condition is called the **transversality condition**, and acts as a first-order necessary condition "at infinity".
 
 A representative  firm's decision rule solves the difference equation {eq}`ree_comp7` subject to the given initial condition $y_0$ and the transversality condition.
 
@@ -388,7 +388,7 @@ a decision rule that automatically imposes both the Euler equation {eq}`ree_comp
 
 As we've seen, a given belief translates into a particular decision rule $h$.
 
-Recalling that in equilbrium  $Y_t = y_t$, the *actual law of motion* for market-wide output is then
+Recalling that in equilbrium  $Y_t = y_t$, the **actual law of motion** for market-wide output is then
 
 ```{math}
 :label: ree_comp9a
@@ -401,7 +401,7 @@ Thus, when firms believe that the law of motion for market-wide output is {eq}`r
 (ree_def)=
 ### Definition of Rational Expectations Equilibrium
 
-A *rational expectations equilibrium* or *recursive competitive equilibrium*  of the model with adjustment costs is a decision rule $h$ and an aggregate law of motion $H$ such that
+A **rational expectations equilibrium** or **recursive competitive equilibrium**  of the model with adjustment costs is a decision rule $h$ and an aggregate law of motion $H$ such that
 
 1. Given belief $H$, the map $h$ is the firm's optimal policy function.
 1. The law of motion $H$ satisfies $H(Y)= h(Y,Y)$ for all
@@ -469,7 +469,7 @@ s(Y_t, Y_{t+1})
 
 The first term is the area under the demand curve, while the second measures the social costs of changing output.
 
-The *planning problem* is to choose a production plan $\{Y_t\}$ to maximize
+The **planning problem** is to choose a production plan $\{Y_t\}$ to maximize
 
 $$
 \sum_{t=0}^\infty \beta^t s(Y_t, Y_{t+1})
diff --git a/lectures/re_with_feedback.md b/lectures/re_with_feedback.md
index 48a0aae94..da5a647dc 100644
--- a/lectures/re_with_feedback.md
+++ b/lectures/re_with_feedback.md
@@ -78,14 +78,14 @@ first-order and second-order linear difference equations.
 
 ## Linear Difference Equations
 
-We'll use the *backward shift* or *lag* operator $L$.
+We'll use the **backward shift** or **lag** operator $L$.
 
 The lag operator $L$  maps a sequence $\{x_t\}_{t=0}^\infty$ into the sequence $\{x_{t-1}\}_{t=0}^\infty$
 
 We'll deploy  $L$  by using the equality
 $L x_t \equiv x_{t-1}$ in algebraic expressions.
 
-Further,  the inverse $L^{-1}$ of the lag operator is  the *forward shift*
+Further,  the inverse $L^{-1}$ of the lag operator is  the **forward shift**
 operator.
 
 We'll often use the equality  $L^{-1} x_t \equiv x_{t+1}$ below.
@@ -345,7 +345,7 @@ F = (1-\lambda) G (I - \lambda A)^{-1}
 ```
 
 ```{note}
-As mentioned above, an *explosive solution* of difference
+As mentioned above, an **explosive solution** of difference
 equation {eq}`equation_1` can be constructed by adding to the right hand of {eq}`equation_4` a
 sequence $c \lambda^{-t}$ where $c$ is an arbitrary positive
 constant.
diff --git a/lectures/samuelson.md b/lectures/samuelson.md
index 44aaf3b91..8616328e9 100644
--- a/lectures/samuelson.md
+++ b/lectures/samuelson.md
@@ -86,7 +86,7 @@ equal amount of *aggregate supply*.
 
 Samuelson used the model to analyze how particular values of the
 marginal propensity to consume and the accelerator coefficient might
-give rise to transient *business cycles* in national output.
+give rise to transient **business cycles** in national output.
 
 Possible dynamic properties include
 
@@ -100,7 +100,7 @@ adds a random shock to the right side of the national income
 identity representing random fluctuations in aggregate demand.
 
 This modification makes national output become governed by a second-order
-*stochastic linear difference equation* that, with appropriate parameter values,
+**stochastic linear difference equation** that, with appropriate parameter values,
 gives rise to recurrent irregular business cycles.
 
 (To read about stochastic linear difference equations see chapter XI of
@@ -152,7 +152,7 @@ and the national income identity
 Y_t = C_t + I_t + G_t
 ```
 
-- The parameter $\alpha$ is peoples' *marginal propensity to consume*
+- The parameter $\alpha$ is peoples' **marginal propensity to consume**
   out of income - equation {eq}`consumption` asserts that people consume a fraction of
   $\alpha \in (0,1)$ of each additional dollar of income.
 - The parameter $\beta > 0$ is the investment accelerator coefficient - equation
@@ -193,7 +193,7 @@ a constant value as $t$ becomes large.
 We are interested in studying
 
 - the transient fluctuations in $Y_t$ as it converges to its
-  *steady state* level
+  **steady state** level
 - the *rate* at which it converges to a steady state level
 
 The deterministic version of the model described so far --- meaning that
@@ -235,7 +235,7 @@ Y_{t+2} - \rho_1 Y_{t+1} - \rho_2 Y_t  = 0
 ```
 
 To discover the properties of the solution of {eq}`second_stochastic2`,
-it is useful first to form the *characteristic polynomial*
+it is useful first to form the **characteristic polynomial**
 for {eq}`second_stochastic2`:
 
 ```{math}
@@ -246,7 +246,7 @@ z^2 - \rho_1 z  - \rho_2
 
 where $z$ is possibly a complex number.
 
-We want to find the two *zeros* (a.k.a. *roots*) -- namely
+We want to find the two **zeros** (a.k.a. **roots**) -- namely
 $\lambda_1, \lambda_2$ -- of the characteristic polynomial.
 
 These are two special values of $z$, say $z= \lambda_1$ and
diff --git a/lectures/sir_model.md b/lectures/sir_model.md
index 24682939e..5b0c5305c 100644
--- a/lectures/sir_model.md
+++ b/lectures/sir_model.md
@@ -108,9 +108,9 @@ dynamics are
 
 In these equations,
 
-* $\beta(t)$ is called the *transmission rate* (the rate at which individuals bump into others and expose them to the virus).
-* $\sigma$ is called the *infection rate* (the rate at which those who are exposed become infected)
-* $\gamma$ is called the *recovery rate* (the rate at which infected people recover or die).
+* $\beta(t)$ is called the **transmission rate** (the rate at which individuals bump into others and expose them to the virus).
+* $\sigma$ is called the **infection rate** (the rate at which those who are exposed become infected)
+* $\gamma$ is called the **recovery rate** (the rate at which infected people recover or die).
 * the dot symbol $\dot y$ represents the time derivative $dy/dt$.
 
 We do not need to model the fraction $r$ of the population in state $R$ separately because the states form a partition.
@@ -141,7 +141,7 @@ As in Atkeson's note, we set
 
 The transmission rate is modeled as
 
-* $\beta(t) := R(t) \gamma$ where $R(t)$ is the *effective reproduction number* at time $t$.
+* $\beta(t) := R(t) \gamma$ where $R(t)$ is the **effective reproduction number** at time $t$.
 
 (The notation is slightly confusing, since $R(t)$ is different to
 $R$, the symbol that represents the removed state.)
diff --git a/lectures/uncertainty_traps.md b/lectures/uncertainty_traps.md
index 2cef53ace..52aa5b4dd 100644
--- a/lectures/uncertainty_traps.md
+++ b/lectures/uncertainty_traps.md
@@ -323,7 +323,7 @@ at once, for a given set of shocks
 
 Notice how the traps only take hold after a sequence of bad draws for the fundamental.
 
-Thus, the model gives us a *propagation mechanism* that maps bad random draws into long downturns in economic activity.
+Thus, the model gives us a **propagation mechanism** that maps bad random draws into long downturns in economic activity.
 
 ## Exercises
 
diff --git a/lectures/von_neumann_model.md b/lectures/von_neumann_model.md
index b26016286..70c7afd9a 100644
--- a/lectures/von_neumann_model.md
+++ b/lectures/von_neumann_model.md
@@ -364,11 +364,11 @@ respectively.
 A pair $(A,B)$ of $m\times n$ non-negative matrices defines
 an economy.
 
-- $m$ is the number of *activities* (or sectors)
-- $n$ is the number of *goods* (produced and/or consumed).
-- $A$ is called the *input matrix*; $a_{i,j}$ denotes the
+- $m$ is the number of **activities** (or sectors)
+- $n$ is the number of **goods** (produced and/or consumed)
+- $A$ is called the **input matrix**; $a_{i,j}$ denotes the
   amount of good $j$ consumed by activity $i$
-- $B$ is called the *output matrix*; $b_{i,j}$ represents
+- $B$ is called the **output matrix**; $b_{i,j}$ represents
   the amount of good $j$ produced by activity $i$
 
 Two key assumptions restrict economy $(A,B)$:
@@ -388,28 +388,28 @@ Two key assumptions restrict economy $(A,B)$:
 ```
 ````
 
-A semi-positive *intensity* $m$-vector $x$ denotes levels at which
+A semi-positive **intensity** $m$-vector $x$ denotes levels at which
 activities are operated.
 
 Therefore,
 
-- vector $x^\top A$ gives the total amount of *goods used in
-  production*
-- vector $x^\top B$ gives *total outputs*
+- vector $x^\top A$ gives the total amount of **goods used in
+  production**
+- vector $x^\top B$ gives **total outputs**
 
-An economy $(A,B)$ is said to be *productive*, if there exists a
+An economy $(A,B)$ is said to be **productive**, if there exists a
 non-negative intensity vector $x \geq 0$ such
 that $x^\top B > x^\top A$.
 
 The semi-positive $n$-vector $p$ contains prices assigned to
 the $n$ goods.
 
-The $p$ vector implies *cost* and *revenue* vectors
+The $p$ vector implies **cost** and **revenue** vectors
 
-- the vector $Ap$ tells *costs* of the vector of activities
-- the vector $Bp$ tells *revenues* from the vector of activities
+- the vector $Ap$ tells **costs** of the vector of activities
+- the vector $Bp$ tells **revenues** from the vector of activities
 
-Satisfaction of a property of an input-output pair $(A,B)$ called *irreducibility*
+Satisfaction of a property of an input-output pair $(A,B)$ called **irreducibility**
 (or indecomposability) determines whether an economy can be decomposed
 into multiple "sub-economies".