
### Solution

#### 1. Derive the Gradient Vector of the Log-Likelihood Function with Respect to $\beta$

##### The logistic regression probability model is given by:
$$
P(y = 1 | X) = \frac{1}{1 + \exp(-X \beta)}
$$
##### where $X$ is the $n \times p$ design matrix and $\beta$ is the $p \times 1$ vector of coefficients.

##### The log-likelihood function for logistic regression is:
$$
\log \mathcal{L}(\beta) = \sum_{i=1}^n \left[ y_i \log \left( \frac{1}{1 + \exp(-X_i \beta)} \right) + (1 - y_i) \log \left( 1 - \frac{1}{1 + \exp(-X_i \beta)} \right) \right]
$$
##### where $X_i$ is the $i$-th row of $X$, and $y_i$ is the $i$-th observed response.

##### To simplify this, rewrite the probability:
$$
P(y = 1 | X_i) = \frac{1}{1 + \exp(-X_i \beta)} = \sigma(X_i \beta)
$$
##### Where $\sigma(z) = \frac{1}{1 + e^{-z}}$ is the sigmoid function.

##### Thus, the log-likelihood can be rewritten as:
$$
\log \mathcal{L}(\beta) = \sum_{i=1}^n \left[ y_i \log(\sigma(X_i \beta)) + (1 - y_i) \log(1 - \sigma(X_i \beta)) \right]
$$

##### Let’s take the derivative of $\log \mathcal{L}(\beta)$ with respect to $\beta$:

#### Step 1: Compute $\frac{\partial}{\partial \beta} \log \sigma(X_i \beta)$ and $\frac{\partial}{\partial \beta} \log(1 - \sigma(X_i \beta))$

##### i. The derivative of $\sigma(X_i \beta)$ with respect to $\beta$ is:
   $$
   \frac{\partial \sigma(X_i \beta)}{\partial \beta} = \sigma(X_i \beta)(1 - \sigma(X_i \beta)) X_i
   $$

##### ii. Using this, the derivative of $\log \sigma(X_i \beta)$ with respect to $\beta$ is:
   $$
   \frac{\partial}{\partial \beta} \log(\sigma(X_i \beta)) = (1 - \sigma(X_i \beta)) X_i
   $$

##### iii. Similarly, the derivative of $\log(1 - \sigma(X_i \beta))$ with respect to $\beta$ is:
   $$
   \frac{\partial}{\partial \beta} \log(1 - \sigma(X_i \beta)) = -\sigma(X_i \beta) X_i
   $$

#### Step 2: Combine Terms

##### The derivative of the log-likelihood function with respect to $\beta$ is:
$$
\frac{\partial \log \mathcal{L}(\beta)}{\partial \beta} = \sum_{i=1}^n \left[ y_i (1 - \sigma(X_i \beta)) X_i - (1 - y_i) \sigma(X_i \beta) X_i \right]
$$

##### Simplify by factoring out $X_i$:
$$
= \sum_{i=1}^n \left[ y_i - \sigma(X_i \beta) \right] X_i
$$

##### Therefore, the gradient vector of the log-likelihood function with respect to $\beta$ is:
$$
\frac{\partial \log \mathcal{L}(\beta)}{\partial \beta} = \sum_{i=1}^n \left( y_i - \sigma(X_i \beta) \right) X_i
$$

##### Or, in matrix form:
$$
\frac{\partial \log \mathcal{L}(\beta)}{\partial \beta} = X^T (y - \sigma(X \beta))
$$
##### where $\sigma(X \beta)$ is the vector of predicted probabilities for all observations, and $y$ is the vector of observed outcomes.

#### 2. Update Rule Using Gradient Descent

##### To maximize the log-likelihood function, we use gradient ascent (or equivalently, to minimize the negative log-likelihood, we use gradient descent).

##### The gradient ascent update rule for $\beta$ is:
$$
\beta^{(t+1)} = \beta^{(t)} + \alpha \nabla \log \mathcal{L}(\beta)
$$
##### where $\alpha$ is the learning rate, and $\nabla \log \mathcal{L}(\beta)$ is the gradient of the log-likelihood function with respect to $\beta$.

##### Using the gradient derived in the previous section, the update rule becomes:
$$
\beta^{(t+1)} = \beta^{(t)} + \alpha X^T (y - \sigma(X \beta^{(t)}))
$$

#### Step-by-Step Update Process

##### i. Initialize $\beta$ (often with zeros or small random values).
##### ii. Compute the Predicted Probabilities: For each observation, calculate $\sigma(X_i \beta^{(t)})$.
##### iii. Compute the Gradient: Use $\nabla \log \mathcal{L}(\beta) = X^T (y - \sigma(X \beta^{(t)}))$.
##### iv. Update $\beta$: Update $\beta$ using the rule $\beta^{(t+1)} = \beta^{(t)} + \alpha X^T (y - \sigma(X \beta^{(t)}))$.
##### v. Repeat steps 2–4 until convergence, which is achieved when changes in $\beta$ are below a chosen tolerance level or after a set number of iterations. 

##### This iterative process refines the values of $\beta$, allowing the model to better estimate the probability $P(y = 1 | X)$ through logistic regression.



<brb>

<brb>

<brb>

<brb>

#### References
##### Hastie, T., Friedman, J., & Tibshirani, R. (2001). The elements of statistical learning. In Springer series in statistics.
##### Menard, S.W. (2010). Logistic regression: From introductory to advanced concepts and applications. Sage.