# Exercise 01: Signal Processing in a PET System

Consider a positron-emission tomography (PET) system that detects two **511 keV** gamma rays produced by positron–electron annihilation.

### 1.1. Positron Source
**Propose a reasonable positron-emission flux (number of positrons emitted per second) from a typical PET radionuclide. Assume the resulting gamma flux enters a liquid scintillator detector.**
### 1.2. Scintillation Process
**A) Discuss at least two different scintillation materials that could be used in such a detector. For each material, describe its approximate light yield and decay time.**

**B) Estimate the total number of scintillation photons produced for a 511 keV gamma-ray interaction and the approximate time width of their emission.**
### 1.3. Detection Chain
Assume the following detector parameters:
- Light collection efficiency: 60%
- Photocathode quantum efficiency: 20%
- Photomultiplier tube (PMT) total gain: $10^{6}$

**A) Calculate the number of electrons reaching the first dynode and the number of electrons at the anode for one event.**

**B) If the signal is read out through a $50 \Omega$ load resistor, compute the approximate voltage pulse amplitude produced by the event.**
### 1.4. Statistical Energy Resolution
**Assuming the number of photoelectrons follows Poisson statistics, determine the fractional energy resolution ($\frac{\sigma_E}{E} $) of the detector output.**
### 1.5. Noise Discussion
**A) Identify and discuss the possible sources of noise in this detection system, such as statistical fluctuations, gain variations, and electronic or thermal noise.**

**B) Compare the statistical fluctuation computed above with the expected thermal noise level in the electronics, and comment on which contribution dominates.**


#### Your answer:

### 1.1. Positron Source
A reasonable positron-emission flux from a typical PET radionuclide is approximately **3.7 × 10⁸ positrons per second**.

This value is derived from the **activity** of the standard radiopharmaceutical dose administered to a patient.

- **Radionuclide and Dose:** The most common PET radiotracer is **Fluorine-18 (¹⁸F)**, often as part of Fluorodeoxyglucose (FDG). A typical clinical dose for an adult has an activity in the range of **370 to 740 Megabecquerels (MBq)**.

- **Activity and Flux:** Activity is measured in Becquerels (Bq), where 1 Bq equals one nuclear decay per second. For a PET isotope like ¹⁸F, each decay emits one positron. Therefore, the activity in Bq is numerically equal to the positron emission flux.

- **Calculation:** Using a common dose of 370 MBq:  
  - Activity = 370 MBq = 370 × 10⁶ Bq  
  - Positron Flux = 3.7 × 10⁸ decays/second = **3.7 × 10⁸ positrons/second**

- **Gamma Ray Production:** Since each positron-electron annihilation produces **two** 511 keV gamma photons, the total rate of gamma production is twice the positron flux, or **7.4 × 10⁸ photons per second**. These photons are emitted isotropically (in all directions). The actual number of photons detected by the scanner is much lower due to factors like geometric efficiency and attenuation in the body.

### 1.2.A
For a PET detector, two common scintillation materials are Bismuth Germanate (BGO) and Lutetium-Yttrium Oxyorthosilicate (LYSO). Their key properties are compared below.

| Material | Light Yield (photons/MeV) | Decay Time (nanoseconds) | Key Feature |
|----------|----------------------------|---------------------------|-------------|
| BGO      | ~8,500 (Low)               | ~300 (Slow)               | High stopping power and low cost. |
| LYSO     | ~30,000 (High)             | ~40 (Fast)                | High light output and fast timing for TOF PET. |

Bismuth Germanate (BGO) is an older, cost-effective material with excellent density for stopping gamma rays. However, its low light yield and slow decay time limit scanner speed and timing precision.

Lutetium-Yttrium Oxyorthosilicate (LYSO) is the modern standard for high-performance PET scanners. Its high light yield and fast decay time allow for superior energy and timing resolution, which is essential for Time-of-Flight (TOF) PET imaging. Its main drawback is higher cost.

### 1.2.B

To estimate the total number of scintillation photons and the time width of their emission for a 511 keV gamma-ray interaction, we use the light yield and decay time properties of the two previously discussed materials: BGO and LYSO.

The energy of the gamma-ray is **0.511 MeV**.

## Calculations for Each Material

### 1. Bismuth Germanate (BGO)

**Total Number of Scintillation Photons:**  
The calculation is the product of the gamma-ray energy and the material's light yield.

$ \text{Photons} = \text{Energy (MeV)} \times \text{Light Yield (photons/MeV)} $

$ \text{Photons} = 0.511 \, \text{MeV} \times 8{,}500 \, \text{photons/MeV} \approx 4{,}340 \, \text{photons} $

So, a 511 keV event in BGO produces approximately **4,340 scintillation photons**.

**Approximate Time Width of Emission:**  
The emission of these photons follows an exponential decay pattern governed by the material's decay time. The decay time for BGO is **300 ns**. While the process starts almost instantly, the majority of the light is emitted over a period of a few decay times. Therefore, the approximate time width during which most photons are released is on the order of **several hundred nanoseconds**.

---

### 2. Lutetium-Yttrium Oxyorthosilicate (LYSO)

**Total Number of Scintillation Photons:**  
Using the same formula with LYSO's higher light yield:

$ \text{Photons} = 0.511 \, \text{MeV} \times 30{,}000 \, \text{photons/MeV} \approx 15{,}330 \, \text{photons} $

A 511 keV event in LYSO produces approximately **15,330 scintillation photons**, which is significantly more than BGO and leads to a better signal.

**Approximate Time Width of Emission:**  
The decay time for LYSO is much faster at **40 ns**. This means the flash of light is much shorter and more intense. The time width of the emission pulse is therefore on the order of **tens of nanoseconds**, allowing the detector to be ready for the next event much more quickly than a BGO-based detector.

### 1.3
**Initial Photon Counts (from previous section):**

- BGO: ~4,340 scintillation photons  
- LYSO: ~15,330 scintillation photons  

**Given Detector Parameters:**

- Light Collection Efficiency (LCE) = 60% or 0.60  
- Photocathode Quantum Efficiency (QE) = 20% or 0.20  
- Photomultiplier Tube (PMT) Gain ($G$) = $10^6$  
- Load Resistor ($R$) = 50 Ω  
- Elementary Charge ($e$) ≈ $1.602 \times 10^{-19}$ C  

---

### A) Number of Electrons at First Dynode and Anode

#### 1. For a BGO Scintillator

**Number of Electrons at the First Dynode (Photoelectrons):**  
These are the electrons produced by the photocathode.

$ \text{Photoelectrons} = (\text{Scintillation Photons}) \times (\text{LCE}) \times (\text{QE}) $

$ \text{Photoelectrons} = 4{,}340 \times 0.60 \times 0.20 \approx 521 \, \text{electrons} $

**Number of Electrons at the Anode:**  
This is the number of photoelectrons multiplied by the PMT gain.

$ \text{Anode Electrons} = (\text{Photoelectrons}) \times (\text{Gain}) $

$ \text{Anode Electrons} = 521 \times 10^6 = 5.21 \times 10^8 \, \text{electrons} $

---

#### 2. For an LYSO Scintillator

**Number of Electrons at the First Dynode (Photoelectrons):**

$ \text{Photoelectrons} = 15{,}330 \times 0.60 \times 0.20 \approx 1{,}840 \, \text{electrons} $

**Number of Electrons at the Anode:**

$ \text{Anode Electrons} = 1{,}840 \times 10^6 = 1.84 \times 10^9 \, \text{electrons} $

---

### B) Approximate Voltage Pulse Amplitude

The voltage amplitude is calculated using Ohm’s Law ($V = IR$), where the current ($I$) is the total charge at the anode ($Q$) divided by the time width of the signal ($t$), approximated by the scintillator’s decay time.

#### 1. For a BGO Scintillator (Decay Time ≈ 300 ns)

**Total Charge at Anode ($Q$):**

$ Q = (\text{Anode Electrons}) \times e = (5.21 \times 10^8) \times (1.602 \times 10^{-19} \, \text{C}) \approx 8.35 \times 10^{-11} \, \text{C} $

**Approximate Peak Current ($I$):**

$ I = \frac{Q}{t} = \frac{8.35 \times 10^{-11} \, \text{C}}{300 \times 10^{-9} \, \text{s}} \approx 0.278 \, \text{mA} $

**Approximate Voltage Amplitude ($V$):**

$ V = I \times R = (0.278 \times 10^{-3} \, \text{A}) \times 50 \, \Omega \approx 13.9 \, \text{mV} $

---

#### 2. For an LYSO Scintillator (Decay Time ≈ 40 ns)

**Total Charge at Anode ($Q$):**

$ Q = (1.84 \times 10^9) \times (1.602 \times 10^{-19} \, \text{C}) \approx 2.95 \times 10^{-10} \, \text{C} $

**Approximate Peak Current ($I$):**

$ I = \frac{Q}{t} = \frac{2.95 \times 10^{-10} \, \text{C}}{40 \times 10^{-9} \, \text{s}} \approx 7.37 \, \text{mA} $

**Approximate Voltage Amplitude ($V$):**

$ V = I \times R = (7.37 \times 10^{-3} \, \text{A}) \times 50 \, \Omega \approx 368.5 \, \text{mV} $

### 1.4

The statistical energy resolution of a detector quantifies its ability to distinguish between gamma rays with very similar energies. A lower value indicates better performance. Assuming the number of generated photoelectrons ($N$) follows Poisson statistics, the fractional energy resolution is determined by the statistical fluctuation in $N$.

For a Poisson process, the standard deviation is the square root of the mean number of events ($\sigma_N = \sqrt{N}$). Since the measured energy ($E$) is directly proportional to the number of photoelectrons, the fractional energy resolution is given by:

$ \frac{\sigma_E}{E} = \frac{1}{\sqrt{N}} $

Using the number of photoelectrons calculated in the previous section (1.3.A), we can determine the energy resolution for both BGO and LYSO detectors.

**Photoelectron Counts (from previous section):**

- $N_{\text{BGO}} \approx 521$  
- $N_{\text{LYSO}} \approx 1{,}840$

---

### 1. Energy Resolution for BGO

$ \left( \frac{\sigma_E}{E} \right)_{\text{BGO}} = \frac{1}{\sqrt{521}} \approx 0.0438 $

Expressed as a percentage, the energy resolution is **4.38%**.

---

### 2. Energy Resolution for LYSO

$ \left( \frac{\sigma_E}{E} \right)_{\text{LYSO}} = \frac{1}{\sqrt{1840}} \approx 0.0233 $

Expressed as a percentage, the energy resolution is **2.33%**.

### 1.5 
## A) Noise Sources

The main sources of noise in a PET detection system are:

- **Statistical Fluctuations (Shot Noise):** The dominant noise source arising from the random, discrete nature of photon and electron generation in the scintillator and PMT. It fundamentally limits the detector's precision.

- **Electronic Noise:** Includes **Thermal Noise** from the random motion of electrons in the readout electronics (e.g., the 50 Ω resistor) and **Dark Current** from spontaneous electron emission in the PMT.

## B) Noise Comparison

A quantitative comparison for a modern LYSO detector reveals:

| Noise Source            | Calculated Voltage |
|-------------------------|--------------------|
| Statistical Fluctuation | ~8.59 mV (8,590 µV) |
| Thermal Noise           | ~4.55 µV           |

# Exercise 02: CCD Noise Estimation with Finite Background Sampling

**A)** Show for an ideal CCD with measurement of the background using a large number of pixels in the vicinity of the signal, the SNR is given by

$$\mathrm{SNR} = \dfrac{ \sqrt{\eta_d N_{\nu S}}}{\sqrt{1 + n_{\text{pix}} \left(\dfrac{N_{\nu B}}{N_{\nu S}}\right)}}$$

Where $\eta_d$ is responsive quantum efficiency. $N_{\nu S}$ is total number of detected signal photons (summed over all $n_{\text{pix}}$ pixels). $N_{\nu B}$  mean number of background photons per pixel during the exposure.

**B)** Show how the SNR expression is modified when the background level is estimated from a finite number of background pixels, $n_{\text{bck}}$, comparable to $n_{\text{pix}}$.

**C)** In a real CCD, there will be additional noise contributions from dark current and read noise. Suppose $I$ is the dark current per pixel in unit of electrons per unit time. Also Read noise introduces a fixed charge uncertainty, $\Delta q_R$, per pixel, regardless of the integration time. Allowing for these noise contributions show that the read noise contribution to the overall noise decreases with integration time, but the dark current contribution does not.


#### Your answer:

## A) Ideal CCD with Large Background Sampling

In this ideal case, we consider only the shot noise from the signal and the background. The signal is the total number of photoelectrons from the source, $S = \eta_d N_{\nu S}$.

The measurement process involves recording the total counts in the signal aperture (signal + background) and subtracting the background counts.

- **Total photoelectrons in the aperture:**  
  $ S_{\text{aperture}} = \eta_d N_{\nu S} + n_{\text{pix}} (\eta_d N_{\nu B}) $

- **Noise in the aperture:** Since this follows Poisson statistics, the variance is equal to the mean.  
  $ \sigma^2_{\text{aperture}} = \eta_d N_{\nu S} + n_{\text{pix}} \eta_d N_{\nu B} $

- **Background Subtraction:** We measure the background from a "large number of pixels," which means we can determine the mean background level $\eta_d N_{\nu B}$ with negligible uncertainty. Subtracting this known value does not add noise.

- **Final Noise:** The total noise in the final signal is therefore just the noise from the aperture measurement:  
  $ \sigma_{\text{total}} = \sqrt{\sigma^2_{\text{aperture}}} = \sqrt{\eta_d N_{\nu S} + n_{\text{pix}} \eta_d N_{\nu B}} $

- **Signal-to-Noise Ratio (SNR):**  
  $$
  \text{SNR} = \frac{\text{Signal}}{\text{Noise}} = \frac{\eta_d N_{\nu S}}{\sqrt{\eta_d N_{\nu S} + n_{\text{pix}} \eta_d N_{\nu B}}}
  $$

To get the desired form, we factor out $\sqrt{\eta_d N_{\nu S}}$ from the denominator:

$$
\text{SNR} = \frac{\eta_d N_{\nu S}}{\sqrt{\eta_d N_{\nu S}} \sqrt{1 + n_{\text{pix}} \left( \frac{N_{\nu B}}{N_{\nu S}} \right)}} = \frac{\sqrt{\eta_d N_{\nu S}}}{\sqrt{1 + n_{\text{pix}} \left( \frac{N_{\nu B}}{N_{\nu S}} \right)}}
$$

This completes the proof.

---

## B) Finite Background Sampling

Now, the background level is estimated from a finite number of pixels, $n_{\text{bck}}$. This introduces uncertainty into our background estimate, which must be added to the total noise.

- **Uncertainty in Background Estimate:** The mean background photoelectrons per pixel, $S_B = \eta_d N_{\nu B}$, is estimated from $n_{\text{bck}}$ pixels. The variance of this mean estimate is:  
  $ \sigma^2_{\text{mean bck}} = \frac{\text{Variance of a single pixel}}{\text{Number of pixels}} = \frac{S_B}{n_{\text{bck}}} = \frac{\eta_d N_{\nu B}}{n_{\text{bck}}} $

- **Noise from Background Subtraction:** We subtract this estimated background from $n_{\text{pix}}$ signal pixels. The variance added by this subtraction process (using error propagation for scaling a variable) is:  
  $ \sigma^2_{\text{sub}} = n_{\text{pix}}^2 \times \sigma^2_{\text{mean bck}} = n_{\text{pix}}^2 \left( \frac{\eta_d N_{\nu B}}{n_{\text{bck}}} \right) $

- **New Total Noise:** This new variance term adds in quadrature to the original noise variance.  
  $$
  \sigma^2_{\text{new total}} = \sigma^2_{\text{original}} + \sigma^2_{\text{sub}} = (\eta_d N_{\nu S} + n_{\text{pix}} \eta_d N_{\nu B}) + \frac{n_{\text{pix}}^2}{n_{\text{bck}}} (\eta_d N_{\nu B})
  $$

- **Modified SNR:**  
  $$
  \text{SNR}_{\text{modified}} = \frac{\eta_d N_{\nu S}}{\sqrt{\eta_d N_{\nu S} + n_{\text{pix}} \eta_d N_{\nu B} + \frac{n_{\text{pix}}^2}{n_{\text{bck}}} \eta_d N_{\nu B}}}
  $$

Factoring out $\sqrt{\eta_d N_{\nu S}}$ from the denominator gives the modified expression:

$$
\text{SNR}_{\text{modified}} = \frac{\sqrt{\eta_d N_{\nu S}}}{\sqrt{1 + n_{\text{pix}} \left( \frac{N_{\nu B}}{N_{\nu S}} \right) + \frac{n_{\text{pix}}^2}{n_{\text{bck}}} \left( \frac{N_{\nu B}}{N_{\nu S}} \right)}}
$$

As $n_{\text{bck}} \to \infty$, the new term vanishes, and we recover the ideal formula from Part A.

---

## C) Dark Current and Read Noise

In a real CCD, we add dark current and read noise. Let $t$ be the integration time.

- **Dark Current Noise:** The dark current, $I$ (electrons/pixel/s), generates $n_{\text{pix}} \times I \times t$ electrons in the signal aperture over time $t$. The associated shot noise variance is:  
  $ \sigma^2_{\text{dark}} = n_{\text{pix}} I t $

- **Read Noise:** The read noise, $\Delta q_R$ (electrons), is a fixed charge uncertainty added per pixel readout, independent of integration time. For $n_{\text{pix}}$ pixels, the total read noise variance is:  
  $ \sigma^2_{\text{read}} = n_{\text{pix}} (\Delta q_R)^2 $

- **Total Noise Variance:** The total noise variance for a real CCD is the sum of all independent noise variances:

$$
\sigma_{\text{total}}^2 = 
\underbrace{(\eta_d N_{\nu S} + n_{\text{pix}} \eta_d N_{\nu B})}_{\text{Photon Shot Noise}} + 
\underbrace{n_{\text{pix}} I t}_{\text{Dark Shot Noise}} + 
\underbrace{n_{\text{pix}} (\Delta q_R)^2}_{\text{Read Noise}}
$$

### Contribution Analysis with Integration Time

- **Read Noise Contribution:**  
  The read noise variance, $\sigma^2_{\text{read}} = n_{\text{pix}} (\Delta q_R)^2$, is constant and does not depend on $t$. All other noise terms increase with $t$. Therefore:

  $$
  \text{Contribution}_{\text{read}} = \frac{\sigma_{\text{read}}^2}{\sigma_{\text{total}}^2} = \frac{n_{\text{pix}} (\Delta q_R)^2}{(\text{terms that grow with } t) + n_{\text{pix}} (\Delta q_R)^2}
  $$

  As integration time $t$ increases, the denominator grows while the numerator remains constant. Consequently, **the relative contribution of read noise decreases with integration time**.

- **Dark Current Contribution:**  
  The dark current noise variance, $\sigma^2_{\text{dark}} = n_{\text{pix}} I t$, grows linearly with $t$. Its contribution is:

  $$
  \text{Contribution}_{\text{dark}} = \frac{\sigma_{\text{dark}}^2}{\sigma_{\text{total}}^2} = \frac{n_{\text{pix}} I t}{(\text{other terms growing with } t) + n_{\text{pix}} (\Delta q_R)^2}
  $$

  As $t \to \infty$, the constant read noise term becomes negligible, and the ratio approaches a constant value determined by the relative rates of signal, background, and dark current generation. Thus, **unlike read noise, the dark current’s relative contribution does not vanish—it approaches a fixed fraction of the total shot noise**. This is consistent with the statement that its contribution "does not" decrease in the same manner as read noise.

# Exercise 03: Optical Properties and Magnification of Telescopes Observing the Moon

The Moon was photographed with a telescope, the objective of which had a diameter of 20 cm and focal length of 150 cm. The exposure time was 0.1 s. 

**A)** What should the exposure time be, if the diameter of the objective were 15 cm and focal length 200 cm?

**B)** What is the size of the image of the Moon in both cases? 

**C)** Both telescopes are used to look at the Moon with an eyepiece the focal length of which is 25 mm. What are the magnifications?

#### Your answer:

**Given Parameters:**

- **Telescope 1:** Diameter $D_1 = 20$ cm, Focal Length $f_1 = 150$ cm, Exposure Time $t_1 = 0.1$ s  
- **Telescope 2:** Diameter $D_2 = 15$ cm, Focal Length $f_2 = 200$ cm  
- **Eyepiece:** Focal Length $f_e = 25$ mm = $2.5$ cm  

---

## A) New Exposure Time

To maintain the same image surface brightness for an extended object like the Moon, the exposure time must be adjusted based on the square of the telescope's focal ratio ($f/\# = f/D$). A larger $f/\#$ results in a dimmer image, requiring a longer exposure.

The relationship is:

$$
t_2 = t_1 \times \left( \frac{f_2 / D_2}{f_1 / D_1} \right)^2
$$

First, calculate the focal ratios for both telescopes:

- Telescope 1: $ f/\#_1 = \dfrac{150 \, \text{cm}}{20 \, \text{cm}} = 7.5 $  
- Telescope 2: $ f/\#_2 = \dfrac{200 \, \text{cm}}{15 \, \text{cm}} \approx 13.33 $

Now, calculate the new exposure time:

$$
t_2 = 0.1 \, \text{s} \times \left( \frac{13.33}{7.5} \right)^2 \approx 0.1 \, \text{s} \times (1.777)^2 \approx 0.316 \, \text{s}
$$

The second telescope is "slower" (has a larger $f/\#$), so it requires a significantly longer exposure time to achieve the same image brightness.

---

## B) Size of the Moon's Image

The linear size of the image formed at the focal plane is given by the formula:

$$
\text{Image Size} = \text{Focal Length} \times \text{Angular Size (in radians)}
$$

The average angular diameter of the Moon is approximately $0.53^\circ$, which is about $0.0093$ radians.

- **For Telescope 1:**  
  $$
  \text{Image Size}_1 = 150 \, \text{cm} \times 0.0093 \, \text{rad} \approx 1.40 \, \text{cm}
  $$

- **For Telescope 2:**  
  $$
  \text{Image Size}_2 = 200 \, \text{cm} \times 0.0093 \, \text{rad} \approx 1.86 \, \text{cm}
  $$

The telescope with the longer focal length produces a larger image of the Moon.

---

## C) Magnification

The angular magnification for visual observation is the ratio of the objective's focal length to the eyepiece's focal length:

$$
M = \frac{f_{\text{objective}}}{f_{\text{eyepiece}}}
$$

- **For Telescope 1:**  
  $$
  M_1 = \frac{150 \, \text{cm}}{2.5 \, \text{cm}} = 60\times
  $$

- **For Telescope 2:**  
  $$
  M_2 = \frac{200 \, \text{cm}}{2.5 \, \text{cm}} = 80\times
  $$

# Exercise 04: Maximum Likelihood Estimator

Consider n data points draw independently from a gaussian distribution. Starting from maximum likelihood estimator, find that it equivalent to minimizing the sum-of-squares error function.


#### Your answer:

Let's consider a set of $n$ data points, $\{x_1, x_2, \ldots, x_n\}$, drawn independently from a Gaussian (normal) distribution with a mean $\mu$ and a standard deviation $\sigma$. The probability density function (PDF) for a single data point $x_i$ is:

$$
P(x_i \mid \mu, \sigma) = \frac{1}{\sqrt{2\pi}\, \sigma} \exp\left( -\frac{(x_i - \mu)^2}{2\sigma^2} \right)
$$

---

### Step 1: Formulate the Likelihood Function

The likelihood function, $L(\mu, \sigma \mid \{x_i\})$, represents the probability of observing the entire dataset given the parameters $\mu$ and $\sigma$. Since the data points are independent, the total likelihood is the product of the individual probabilities:

$$
L = \prod_{i=1}^{n} P(x_i \mid \mu, \sigma) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}\, \sigma} \exp\left( -\frac{(x_i - \mu)^2}{2\sigma^2} \right)
$$

---

### Step 2: Switch to the Log-Likelihood Function

Maximizing the likelihood function $L$ is equivalent to maximizing its natural logarithm, $\ln L$. Using a logarithm is mathematically convenient because it converts the product into a sum, which is much easier to differentiate.

$$
\ln L = \ln \left( \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}\, \sigma} \exp\left( -\frac{(x_i - \mu)^2}{2\sigma^2} \right) \right)
$$

Using the properties of logarithms ($\ln(ab) = \ln a + \ln b$ and $\ln(e^x) = x$):

$$
\ln L = \sum_{i=1}^{n} \left[ \ln\left( \frac{1}{\sqrt{2\pi}\, \sigma} \right) - \frac{(x_i - \mu)^2}{2\sigma^2} \right]
$$

This can be split into two parts:

$$
\ln L = \sum_{i=1}^{n} \ln\left( \frac{1}{\sqrt{2\pi}\, \sigma} \right) - \sum_{i=1}^{n} \frac{(x_i - \mu)^2}{2\sigma^2}
$$

Since the first term does not depend on the summation index $i$, we can simplify it:

$$
\ln L = n \ln\left( \frac{1}{\sqrt{2\pi}\, \sigma} \right) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2
$$

---

### Step 3: Maximize the Log-Likelihood

The goal of Maximum Likelihood Estimation (MLE) is to find the parameters ($\mu$ and $\sigma$) that maximize the log-likelihood function, $\ln L$. Let's focus on finding the optimal $\mu$.

Observe the structure of the $\ln L$ expression:

$$
\ln L = \underbrace{n \ln\left( \frac{1}{\sqrt{2\pi}\, \sigma} \right)}_{\text{Term 1}} - \underbrace{\frac{1}{2\sigma^2}}_{\text{Term 2}} \times \underbrace{\sum_{i=1}^{n} (x_i - \mu)^2}_{\text{Term 3}}
$$

- **Term 1** and **Term 2** do not depend on the parameter $\mu$. They are constant with respect to $\mu$.
- **Term 2** ($\frac{1}{2\sigma^2}$) is a positive constant.

Therefore, the entire expression is maximized with respect to $\mu$ when the subtracted term (a positive constant multiplied by Term 3) is **minimized**.

Thus, maximizing the log-likelihood function is mathematically equivalent to minimizing Term 3:

$$
\min_{\mu} \left( \sum_{i=1}^{n} (x_i - \mu)^2 \right)
$$

---

### Step 4: The Equivalence

The expression $\sum_{i=1}^{n} (x_i - \mu)^2$ is precisely the definition of the **sum-of-squares error function**. It measures the sum of the squared differences (errors) between each data point $x_i$ and the model's prediction, which in this case is the mean $\mu$.

**Conclusion:** We have shown that starting from the principle of maximizing the likelihood for data drawn from a Gaussian distribution, we arrive at the objective of minimizing the sum-of-squares error. This demonstrates that the **method of least squares is a special case of the more general Maximum Likelihood Estimator** under the assumption of Gaussian-distributed errors.