# **Optional Topics**

# 1. Eigenvectors and Eigenvalues

**Exercise**: Implement a Python function that takes a square matrix as input and returns its eigenvalues and eigenvectors. Use this function to analyze the eigenvalues and eigenvectors of a symmetric matrix and a non-symmetric matrix. Discuss the differences in your findings.

In [None]:
import numpy as np

def find_eigens(matrix):
    eigenvalues, eigenvectors = np.linalg.eig(matrix)
    return eigenvalues, eigenvectors

# Example matrices
symmetric_matrix = np.array([[4, 1], [1, 4]])
nonsymmetric_matrix = np.array([[0, 1], [-2, -3]])

# Calculate eigenvalues and eigenvectors for the symmetric matrix
sym_eigenvalues, sym_eigenvectors = find_eigens(symmetric_matrix)
print("Symmetric Matrix Eigenvalues:", sym_eigenvalues)
print("Symmetric Matrix Eigenvectors:\n", sym_eigenvectors)

# Calculate eigenvalues and eigenvectors for the non-symmetric matrix
nonsym_eigenvalues, nonsym_eigenvectors = find_eigens(nonsymmetric_matrix)
print("Non-Symmetric Matrix Eigenvalues:", nonsym_eigenvalues)
print("Non-Symmetric Matrix Eigenvectors:\n", nonsym_eigenvectors)

## Discussion

- **Symmetric Matrices**: For symmetric matrices, eigenvalues are always real, and eigenvectors are orthogonal. This means that the eigenvectors can be thought of as defining a new coordinate system where the matrix acts simply by stretching or compressing.
- **Non-Symmetric Matrices**: For non-symmetric matrices, eigenvalues can be complex, and eigenvectors are not necessarily orthogonal. The interpretation of eigenvalues and eigenvectors in this case can be more nuanced, often relating to the system's stability (in dynamical systems) or other properties depending on the context.
- **General Observations**:
    - **Eigenvalues** tell us about the scaling factor applied along the directions defined by their corresponding eigenvectors.
    - **Eigenvectors** indicate directions in the space that are invariant under the application of the matrix. For symmetric matrices, these directions are perpendicular to each other, while for non-symmetric matrices, they may not be.

---------------

# 2. Spaces

**Exercise**: Demonstrate the concept of vector spaces by showing examples of addition and scalar multiplication. Define two vectors in ![image.png](attachment:ffdfb2f4-3eb9-4e69-9323-b298ab4090ac.png) and show that their sum and any scalar multiple of them also belong to ![image.png](attachment:1fe7c61f-762a-4998-8987-5597055acc49.png).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Define two vectors in R^2
vector1 = np.array([1, 2])
vector2 = np.array([3, 4])
scalar = 2

# Perform vector addition
vector_addition = vector1 + vector2

# Perform scalar multiplication
scalar_multiplication1 = scalar * vector1
scalar_multiplication2 = scalar * vector2

# Plotting the vectors to visualize the operations
plt.figure(figsize=(8, 6))

# Original vectors
plt.quiver(0, 0, vector1[0], vector1[1], angles='xy', scale_units='xy', scale=1, color='r', label='Vector 1')
plt.quiver(0, 0, vector2[0], vector2[1], angles='xy', scale_units='xy', scale=1, color='g', label='Vector 2')

# Vector addition
plt.quiver(0, 0, vector_addition[0], vector_addition[1], angles='xy', scale_units='xy', scale=1, color='b', label='Vector 1 + Vector 2')

# Scalar multiplication (using alpha for visual distinction)
plt.quiver(0, 0, scalar_multiplication1[0], scalar_multiplication1[1], angles='xy', scale_units='xy', scale=1, color='r', alpha=0.5, label=f'{scalar} * Vector 1')
plt.quiver(0, 0, scalar_multiplication2[0], scalar_multiplication2[1], angles='xy', scale_units='xy', scale=1, color='g', alpha=0.5, label=f'{scalar} * Vector 2')

plt.xlim(-1, max(vector_addition[0], scalar_multiplication1[0], scalar_multiplication2[0]) + 1)
plt.ylim(-1, max(vector_addition[1], scalar_multiplication1[1], scalar_multiplication2[1]) + 1)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.grid(color='gray', linestyle='--', linewidth=0.5)
plt.legend()
plt.title('Vector Addition and Scalar Multiplication')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

### **Discussion**

- **Vector Addition**: The sum of **`vector1`** and **`vector2`** is plotted in blue. This demonstrates that adding two vectors in ![image.png](attachment:7ddaf8f1-a5f3-49c2-8c2f-3e43f2944679.png) results in another vector in ![image.png](attachment:6c1933d3-05c2-4a9c-bb9a-741b1dcea781.png).
- **Scalar Multiplication**: The scalar multiples of **`vector1`** and **`vector2`** are shown with dotted lines. Multiplying a vector by a scalar enlarges or shrinks the vector but keeps it in the same direction (or the opposite direction if the scalar is negative), showing that the result is still a vector in ![image.png](attachment:c098c35b-1ebf-4d3a-980f-4c036e1bea43.png).
- **Visual Representation**: The plot visually confirms that both vector addition and scalar multiplication result in vectors that belong to the same space (![image.png](attachment:c81401a2-f429-4a26-86a7-a6b56e268a5f.png) in this case), illustrating the closed nature of vector spaces under these operations.

----------

# 3. Convexity

**Exercise**: Write a function to determine if a set of points forms a convex polygon. Plot the points and the polygon formed by them. Use this to show examples of convex and non-convex sets.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import ConvexHull

def is_convex(points):
    hull = ConvexHull(points)
    return len(points) == len(hull.vertices)

def plot_points_and_hull(points, is_convex):
    hull = ConvexHull(points)
    plt.plot(points[:, 0], points[:, 1], 'o')
    for simplex in hull.simplices:
        plt.plot(points[simplex, 0], points[simplex, 1], 'k-')
    
    # Label the plot based on convexity
    plt.title(f"Convex: {is_convex}")


# Example points for a convex set
convex_points = np.array([[0, 0], [2, 0], [2, 2], [0, 2]])
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plot_points_and_hull(convex_points, is_convex(convex_points))

# Example points for a non-convex set (concave)
non_convex_points = np.array([[0, 0], [2, 0], [2, 2], [1, 1], [0, 2]])
plt.subplot(1, 2, 2)
plot_points_and_hull(non_convex_points, is_convex(non_convex_points))

plt.show()

## Explanation

- **Function `is_convex`**: This function uses **`ConvexHull`** to determine if the provided set of points forms a convex polygon by comparing the number of points in the input with the number of vertices in the convex hull. If these numbers are equal, the set is convex; otherwise, it's non-convex.
- **Function `plot_points_and_hull`**: This function plots the given points and the convex hull formed by them. It uses **`hull.simplices`** to get the edges of the convex hull and plots lines accordingly. The plot is labeled based on whether the set is convex.
- **Example Points**: We define two sets of points:
    - **`convex_points`** forms a square, a convex shape, since all points are vertices of the hull.
    - **`non_convex_points`** includes a point inside the square, making the shape non-convex, as not all points are vertices of the hull.
- **Visualization**: The plots visually demonstrate the concept of convexity. For the convex set, you see a square. For the non-convex set, the convex hull does not include the inner point, indicating the shape is not convex as defined by our function.

---------

# 4. Bayes' Theorem

**Exercise**: Implement Bayes' theorem to calculate the probability of an event given prior knowledge. For instance, calculate the probability of having a disease given a positive test result, using hypothetical probabilities for the test accuracy and disease prevalence.

Let's calculate the probability of having a disease given a positive test result. We'll define hypothetical probabilities for test accuracy and disease prevalence to demonstrate this calculation.

We'll use the following hypothetical values:

- The prior probability of having the disease (***P(Disease)***) is ***0.01*** (***1%*** of the population has the disease).
    
- The likelihood of testing positive given the disease (***P(Positive∣Disease)***) is ***0.95*** (the test correctly identifies ***95%*** of those with the disease).
       
- The probability of testing positive given no disease (***P(Positive∣No Disease)***) is ***0.05*** (the test incorrectly identifies ***5%*** of healthy individuals as having the disease). 

First, we need to calculate the evidence ***P(Positive)***, which is the total probability of testing positive, whether or not the person has the disease. This can be calculated as:

***P(Positive)*** = ***P(Positive∣Disease)*** × ***P(Disease)*** + ***P(Positive∣No Disease)*** × ***P(No Disease)***

Then, we can apply Bayes' theorem to find ***P(Disease∣Positive)***.

In [None]:
def bayes_theorem(prior, likelihood, false_positive_rate, population_rate):
    # Calculate P(Positive)
    evidence = likelihood * prior + false_positive_rate * (1 - prior)
    # Apply Bayes' Theorem
    posterior = (likelihood * prior) / evidence
    return posterior

# Hypothetical probabilities
prior_probability = 0.01  # P(Disease)
likelihood_positive_given_disease = 0.95  # P(Positive | Disease)
false_positive_rate = 0.05  # P(Positive | No Disease)
population_rate = 0.01  # Not directly used in this calculation, mentioned for clarity

# Calculate the probability of having the disease given a positive test result
probability_disease_given_positive = bayes_theorem(prior_probability, likelihood_positive_given_disease, false_positive_rate, population_rate)

print(f"The probability of having the disease given a positive test result is: {probability_disease_given_positive:.2f}")

----------

# 5. Covariance

**Exercise**: Calculate the covariance between two datasets. Generate two sets of random data that are positively correlated, and another set that is negatively correlated. Calculate and interpret the covariance of these datasets.

On calculating the covariance between two datasets, we will:
1. Generate two sets of random data that are positively correlated.
2. Generate another set of data that is negatively correlated with one of the previous sets.
3. Use the provided **`calculate_covariance`** function to calculate and interpret the covariance of these datasets.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def calculate_covariance(x, y):
    return np.cov(x, y)[0][1]

# Generate a set of random data for X
np.random.seed(0)  # Ensure reproducibility
x = np.random.rand(100) * 100

# Generate a positively correlated set of data for Y
y_positively_correlated = x * 0.5 + (np.random.rand(100) * 10)

# Generate a negatively correlated set of data for Y
y_negatively_correlated = -x * 0.5 + (100 + np.random.rand(100) * 10)

# Calculate covariance for the positively correlated datasets
cov_pos = calculate_covariance(x, y_positively_correlated)

# Calculate covariance for the negatively correlated datasets
cov_neg = calculate_covariance(x, y_negatively_correlated)

# Output the covariance values
print(f"Covariance of the positively correlated datasets: {cov_pos}")
print(f"Covariance of the negatively correlated datasets: {cov_neg}")

# Plotting to visualize
plt.figure(figsize=(14, 6))

# Positively correlated datasets
plt.subplot(1, 2, 1)
plt.scatter(x, y_positively_correlated, color='blue')
plt.title('Positively Correlated Datasets')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)

# Negatively correlated datasets
plt.subplot(1, 2, 2)
plt.scatter(x, y_negatively_correlated, color='red')
plt.title('Negatively Correlated Datasets')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)

plt.tight_layout()
plt.show()

## Explanation:

- **Covariance Calculation**: The **`calculate_covariance`** function computes the covariance between two arrays, **`x`** and **`y`**. A positive value of covariance indicates a positive relationship between the variables, whereas a negative value indicates a negative relationship.
- **Data Generation**:
    - For positively correlated data, **`y_positively_correlated`** is generated as a function of **`x`** with a positive slope and some random noise added to introduce variability.
    - For negatively correlated data, **`y_negatively_correlated`** is generated as a function of **`x`** with a negative slope, plus a constant and some random noise, to ensure the values are spread out and negatively related to **`x`**.
- **Visualization**: The scatter plots provide a visual representation of the correlation between the datasets. The positively correlated dataset shows an upward trend as **`x`** increases, **`y`** also increases. Conversely, the negatively correlated dataset shows a downward trend.
- **Covariance Interpretation**:
    - A positive covariance value for the positively correlated dataset confirms the positive linear relationship between **`x`** and **`y`**.
    - A negative covariance value for the negatively correlated dataset confirms the inverse relationship between **`x`** and **`y`**.

---------------

# 6. Correlation

**Exercise**: Similar to the covariance exercise, but calculate the Pearson correlation coefficient. Discuss the difference in interpretation between covariance and correlation.

On calculating the Pearson correlation coefficient, we will use the same datasets from the covariance exercise to calculate and interpret the correlation. This will help demonstrate the difference in interpretation between covariance and correlation.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def calculate_correlation(x, y):
    return np.corrcoef(x, y)[0, 1]

# Assuming x, y_positively_correlated, and y_negatively_correlated are defined as before
# Generate the same sets of random data for consistency
np.random.seed(0)  # Ensure reproducibility
x = np.random.rand(100) * 100
y_positively_correlated = x * 0.5 + (np.random.rand(100) * 10)
y_negatively_correlated = -x * 0.5 + (100 + np.random.rand(100) * 10)

# Calculate correlation for the positively correlated datasets
corr_pos = calculate_correlation(x, y_positively_correlated)

# Calculate correlation for the negatively correlated datasets
corr_neg = calculate_correlation(x, y_negatively_correlated)

# Output the correlation values
print(f"Correlation of the positively correlated datasets: {corr_pos}")
print(f"Correlation of the negatively correlated datasets: {corr_neg}")

# Plotting to visualize
plt.figure(figsize=(14, 6))

# Positively correlated datasets
plt.subplot(1, 2, 1)
plt.scatter(x, y_positively_correlated, color='blue')
plt.title(f'Positively Correlated Datasets\nCorrelation: {corr_pos:.2f}')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)

# Negatively correlated datasets
plt.subplot(1, 2, 2)
plt.scatter(x, y_negatively_correlated, color='red')
plt.title(f'Negatively Correlated Datasets\nCorrelation: {corr_neg:.2f}')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)

plt.tight_layout()
plt.show()

## Explanation

- **Correlation Calculation**: The **`calculate_correlation`** function computes the Pearson correlation coefficient between two arrays, **`x`** and **`y`**, which measures the linear relationship between the variables. The coefficient ranges from -1 (perfect negative linear relationship) to 1 (perfect positive linear relationship), with 0 indicating no linear relationship.
- **Visualization**: The scatter plots visually demonstrate the linear relationship between the datasets, with titles indicating the calculated correlation coefficient for each.
- **Difference in Interpretation**:
    - **Covariance** provides a measure of the direction of the relationship between two variables but does not indicate the strength of the relationship nor is it normalized, meaning its value depends on the scales of the variables.
    - **Correlation**, on the other hand, is dimensionless and normalized to the range [-1, 1], providing both the direction and the strength of the linear relationship between two variables.
- **Interpretation**:
    - The positive correlation value for the first dataset indicates a strong positive linear relationship between **`x`** and **`y`**.
    - The negative correlation value for the second dataset indicates a strong negative linear relationship.