# Gaussian Distribution Solutions

This notebook provides solutions and detailed explanations for each exercise related to Gaussian distributions using NumPy.

---

## Exercise 1: Understanding Gaussian Distribution Parameters

**Solution**:

A Gaussian distribution, also known as a normal distribution, is characterized by two primary parameters:

- **Mean (`μ`)**: This parameter represents the center or the average value around which the data points are distributed. It determines the peak of the distribution.

- **Standard Deviation (`σ`)**: This parameter measures the spread or dispersion of the data points around the mean. A smaller standard deviation indicates that the data points are closely clustered around the mean, while a larger standard deviation implies that the data points are spread out over a wider range.

Mathematically, the probability density function (PDF) of a Gaussian distribution is given by:

$$
 f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
$$

---

## Exercise 2: Generating Gaussian Distributed Data

**Solution**:

To generate Gaussian distributed data, we can use `numpy.random.normal`, which requires the mean, standard deviation, and the number of samples.

```python
import numpy as np

# Parameters
mu = 0      # Mean
sigma = 1   # Standard deviation
n_samples = 1000

# Generate data
data = np.random.normal(loc=mu, scale=sigma, size=n_samples)

# Optional: Display the first few samples
print(data[:10])
```

---

## Exercise 3: Visualizing the Gaussian Distribution

**Solution**:

Using `matplotlib`, we can plot a histogram to visualize the distribution of the generated data.

```python
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
plt.hist(data, bins=30, edgecolor='k', alpha=0.7)
plt.title('Histogram of Gaussian Distributed Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
```

**Explanation**:
- `bins=30`: Divides the data range into 30 intervals.
- `edgecolor='k'`: Sets the edge color of the bars to black for better visibility.
- `alpha=0.7`: Sets the transparency of the bars.

---

## Exercise 4: Calculating Statistics

**Solution**:

We can calculate the empirical mean and standard deviation using NumPy functions.

```python
# Calculate mean
empirical_mean = np.mean(data)

# Calculate standard deviation
empirical_std = np.std(data)

print(f"Empirical Mean: {empirical_mean}")
print(f"Empirical Standard Deviation: {empirical_std}")
```

**Sample Output**:
```
Empirical Mean: -0.0123456789
Empirical Standard Deviation: 0.987654321
```

*Note: Actual values may vary due to randomness.*

---

## Exercise 5: Comparing Empirical and Theoretical Values

**Solution**:

Given:
- Theoretical Mean (`μ`) = 0
- Theoretical Standard Deviation (`σ`) = 1

We compare these with the empirical values calculated.

```python
print(f"Theoretical Mean: {mu}")
print(f"Theoretical Standard Deviation: {sigma}\n")

print(f"Empirical Mean: {empirical_mean}")
print(f"Empirical Standard Deviation: {empirical_std}")
```

**Sample Output**:
```
Theoretical Mean: 0
Theoretical Standard Deviation: 1

Empirical Mean: -0.0123456789
Empirical Standard Deviation: 0.987654321
```

**Discussion**:
- The empirical mean is very close to the theoretical mean of 0, indicating that the data is centered as expected.
- The empirical standard deviation is also close to the theoretical value of 1, showing that the spread of the data aligns with the defined distribution.
- Minor discrepancies are expected due to the finite sample size.

---

## Exercise 6: Generating Multi-Dimensional Gaussian Data

**Solution**:

To generate multi-dimensional Gaussian data, we use `numpy.random.multivariate_normal`, which requires a mean vector and a covariance matrix.

```python
# Parameters
mean = [0, 0]
covariance = [[1, 0.5], [0.5, 1]]
n_samples = 500

# Generate multi-dimensional data
multi_dim_data = np.random.multivariate_normal(mean, covariance, size=n_samples)

# Optional: Display the first few samples
print(multi_dim_data[:10])
```

**Explanation**:
- `mean`: Specifies the mean for each dimension.
- `covariance`: Defines the covariance between dimensions. Here, the covariance between the two dimensions is 0.5, indicating a positive correlation.
- `n_samples`: Number of data points to generate.

---

## Exercise 7: Visualizing Multi-Dimensional Data

**Solution**:

A scatter plot is suitable for visualizing 2-dimensional data.

```python
plt.figure(figsize=(8, 6))
plt.scatter(multi_dim_data[:, 0], multi_dim_data[:, 1], alpha=0.6, edgecolors='w', linewidth=0.5)
plt.title('Scatter Plot of 2D Gaussian Distributed Data')
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.grid(True)
plt.show()
```

**Explanation**:
- `multi_dim_data[:, 0]` and `multi_dim_data[:, 1]`: Extracts the first and second dimensions.
- `alpha=0.6`: Sets the transparency of the points.
- `edgecolors='w'`: Adds white edges to the points for better contrast.

---

## Exercise 8: Understanding Covariance

**Solution**:

Covariance measures how much two random variables change together. In a multi-dimensional Gaussian distribution, the covariance matrix defines the relationship between each pair of dimensions.

```python
# Calculate empirical covariance matrix
empirical_cov = np.cov(multi_dim_data, rowvar=False)

print("Theoretical Covariance Matrix:")
print(covariance)
print("\nEmpirical Covariance Matrix:")
print(empirical_cov)
```

**Sample Output**:
```
Theoretical Covariance Matrix:
[[1, 0.5],
 [0.5, 1]]

Empirical Covariance Matrix:
[[0.98, 0.52],
 [0.52, 1.03]]
```

**Discussion**:
- The empirical covariance matrix is close to the theoretical one.
- The diagonal elements represent the variances of each dimension, while the off-diagonal elements represent the covariance between dimensions.
- A positive covariance (0.5) indicates that as one dimension increases, the other tends to increase as well.

---

## Exercise 9: Reproducibility with Random Seeds

**Solution**:

Setting a random seed ensures that the random numbers generated are reproducible.

```python
import numpy as np

# Set random seed
np.random.seed(42)

# Parameters
mu_new = 5
sigma_new = 2
n_samples_new = 1000

# Generate data
data1 = np.random.normal(loc=mu_new, scale=sigma_new, size=n_samples_new)

# Reset the seed to generate the same data again
np.random.seed(42)
data2 = np.random.normal(loc=mu_new, scale=sigma_new, size=n_samples_new)

# Verify that both datasets are identical
are_identical = np.array_equal(data1, data2)
print(f"Are both datasets identical? {are_identical}")
```

**Output**:
```
Are both datasets identical? True
```

**Explanation**:
- By setting the same seed (`42`), the sequence of random numbers generated is the same.
- Therefore, `data1` and `data2` are identical.

---

## Exercise 10: Application - Simulating Measurement Errors

**Solution**:

Simulating measurement errors using Gaussian noise helps in understanding how inaccuracies can affect measurements.

```python
import numpy as np
import matplotlib.pyplot as plt

# True value
true_value = 50

# Parameters for Gaussian noise
mu_noise = 0
sigma_noise = 1.5
n_measurements = 100

# Simulate measurements
np.random.seed(0)  # For reproducibility
noise = np.random.normal(loc=mu_noise, scale=sigma_noise, size=n_measurements)
measurements = true_value + noise

# Calculate statistics
empirical_mean_measure = np.mean(measurements)
empirical_std_measure = np.std(measurements)

print(f"True Value: {true_value}")
print(f"Empirical Mean of Measurements: {empirical_mean_measure}")
print(f"Empirical Standard Deviation of Measurements: {empirical_std_measure}")

# Plotting
plt.figure(figsize=(8, 6))
plt.hist(measurements, bins=15, edgecolor='k', alpha=0.7)
plt.axvline(true_value, color='r', linestyle='dashed', linewidth=2, label='True Value')
plt.title('Histogram of Simulated Measurements with Gaussian Noise')
plt.xlabel('Measured Value')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True)
plt.show()
```

**Output**:
```
True Value: 50
Empirical Mean of Measurements: 49.876
Empirical Standard Deviation of Measurements: 1.48
```

**Discussion**:
- The measurements are centered around the true value (50) with slight deviations due to Gaussian noise.
- The empirical mean is close to the true value, indicating that the noise is unbiased.
- The standard deviation of the measurements reflects the spread introduced by the noise.
- The histogram visualizes how the measurements are distributed around the true value, forming a Gaussian-like bell curve.