# Introduction to Dimensionality Reduction

Dimensionality reduction is a crucial strategy in fields like machine learning and data analysis. It aims to reduce the number of variables in a dataset while preserving essential information. This technique simplifies complex data, reduces noise, and improves algorithm performance. Challenges in handling high-dimensional data include increased computational demands, risk of overfitting, and difficulties in visualization.

There are two main types of dimensionality reduction methods:

1. **Feature Selection:** This approach involves selecting a subset of relevant features and discarding the rest. The goal is to identify features that have a significant impact on the outcome or problem being addressed. This process requires expertise in the domain and can be done manually or automatically using statistical or machine learning techniques. Recursive Feature Elimination (RFE) is an example where features are recursively removed based on model weights until the desired number of features is reached.

2. **Feature Extraction:** This method transforms original features into a new set of features with reduced dimensions using mathematical methods. These new features often combine aspects of the original ones, chosen to capture the most variance or information. For example, Principal Component Analysis (PCA) is a technique that transforms the original variables into a new set of uncorrelated variables, called principal components, which are linear combinations of the original variables. The first principal component captures the most variance in the data.

## Benefits of Reducing Dimensions in Data

1. **Easier Data Understanding:** High-dimensional data can be overwhelming and hard to grasp. Reducing dimensions makes the data easier to interpret, aiding in more effective analysis.
2. **Improved Algorithm Speed:** High numbers of features can slow down algorithms and increase the need for computational resources. Reducing dimensions helps algorithms run faster and more efficiently.
3. **Reduction of Irrelevant Information:** Often, high-dimensional data includes unnecessary information that can negatively affect a model's performance. Dimensionality reduction helps to focus on the most important features.
4. **Better Data Visualization:** It's difficult to visualize data beyond three dimensions. Dimensionality reduction allows for visual representations that can be more easily understood, providing clearer insights.

## Key Points and Hurdles in Reducing Data Dimensions

1. **Balancing Complexity and Information Retention:** The process of dimensionality reduction necessitates a balance between simplifying the data and preserving valuable information. It's essential to assess if the information lost is acceptable for the task's goals.
2. **Technique Selection:** Choosing the right dimensionality reduction method depends on the data's nature, the goals of the analysis, and how the reduced data will be used. Careful testing and evaluation are crucial to find the best method.
3. **Hyperparameter Tuning:** Some methods, like Principal Component Analysis (PCA), require setting hyperparameters, such as the number of components to keep.
4. **Overcoming the Curse of Dimensionality:** The "curse of dimensionality" refers to various phenomena that arise when analyzing high-dimensional data, such as increased data sparsity. Dimensionality reduction can help overcome these issues and reveal the underlying data structure.
5. **Preventing Overfitting:** Dimensionality reduction can help prevent models from overfitting, but there's also a risk of overfitting during the reduction process. Using regularization methods can help avoid this and ensure the reduction is effective.

## Introduction to Principal Component Analysis (PCA)

**Principal Component Analysis (PCA)** is a statistical technique used for dimensionality reduction. It simplifies the complexity in high-dimensional data while retaining trends and patterns. PCA achieves this by transforming the original variables into a new set of variables, the principal components, which are uncorrelated and ordered so that the first few retain most of the variation present in all of the original variables.

### Core Concepts of PCA:

- **Principal Axes**: These are the directions in the feature space that maximize the variance of the data. The data is projected onto these axes to obtain the principal components.
- **Principal Components (Scores)**: These are the new features formed from linear combinations of the original features, aligned with the principal axes.
- **Loadings**: The weights assigned to the original variables that define the principal axes.
- **Explained Variance**: The amount of variance captured by each principal component.
- **Explained Variance Ratio**: The proportion of the dataset's total variance that is explained by each principal component.

### Terminology Mapping:

The table below maps the PCA terminology across different contexts:

| Concept | sklearn Attribute | Statistical Term | Other Descriptions |
|---------|--------------------|------------------|--------------------|
| Principal Axes | `pca.components_` | Loadings | Eigenvectors of the covariance matrix |
| Explained Variance | `pca.explained_variance_` | Variance Explained | Eigenvalues of the covariance matrix |
| Explained Variance Ratio | `pca.explained_variance_ratio_` | Proportion of Variance Explained | - |
| Principal Components | `pca.transform(X)` | Scores | Transformed/Projected Features |


---

<font color='Red'><b>Note:</b></font> The terminology in PCA can be extensive and may vary across different fields and applications.

---

This format provides a clear and concise overview of PCA, making it accessible for readers familiar with different terminologies.

<font color='Blue'><b>Example - Principal Component Analysis (PCA):</b></font>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Creating a random state for reproducibility
rng = np.random.RandomState(0)

# Generating the data
X = np.dot(rng.rand(2, 2), rng.randn(2, 300)).T

# Creating a DataFrame
df = pd.DataFrame(X, columns=['X1', 'X2'])
display(df)

# Set the custom style from the provided URL
plt.style.use('https://raw.githubusercontent.com/HatefDastour/ENSF444/main/Files/mystyle.mplstyle')

# Create the plot with fig and ax
fig, ax = plt.subplots()

# Scatter plot with specified attributes
ax.scatter(df['X1'], df['X2'], fc='blue', ec='navy', alpha=0.3)

# Set axis aspect ratio to equal
ax.axis('equal')

# Set X and Y axis labels
ax.set(xlabel=r'$X_1$', ylabel= r'$X_2$')

# Set the plot title with specified attributes
ax.set_title('Sample Data', fontsize=14, weight='bold', color='navy')

# Adjust layout for better presentation
plt.tight_layout()

1. **PCA Initialization**: `PCA(n_components=2)` initializes the PCA process to find the two principal components that capture the most variance in the data.

In [None]:
from sklearn.decomposition import PCA

# Initialize PCA with 2 components and fit to data X
pca = PCA(n_components=2)

2. **Fitting PCA**: `pca.fit(X)` computes the principal components for the dataset `X`. This involves finding the eigenvalues and eigenvectors of the covariance matrix of `X`, which represent the explained variance and the directions of maximum variance in the data, respectively.


In [None]:
pca.fit(X)

3. **Principal Components**: `pca.components_` are the principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained variance, with the first component having the highest variance.

In [None]:
# Create a DataFrame for principal components
p_components = pd.DataFrame(pca.components_, columns=['X0', 'X1'])
display(p_components.style.hide(axis="index"))

In [None]:
# Create subplots for each principal component
f, axs = plt.subplots(p_components.shape[0], 1, figsize=(4, 5), sharex=True)

# Adjust spacing between subplots
f.subplots_adjust(hspace=0.5)

# Plot bar charts for each principal component
colors = ['blue', 'green']  # Choose colors for better visibility
for i, ax in enumerate(axs):
    p_components.loc[i, :].plot(kind='bar', fc = 'none', ax=ax, ec=colors[i], lw = 3, hatch = '---')
    ax.set_ylabel(f'PCA{i + 1}')
    ax.set_ylim([-1, 1])
    ax.grid(True)
    ax.set_xticklabels([])  # Remove x-axis labels for better clarity

# Set title for the subplot
f.suptitle('Principal Component Axes (Loadings)', fontsize=14, weight='bold')

# Set x-axis label for the last subplot
axs[-1].set_xlabel('Features')

# Adjust layout for better presentation
plt.tight_layout()

# Show the plot
plt.show()

The numbers we are seeing are the **loadings** for each principal component in a PCA analysis. Loadings are coefficients that represent how much each original variable contributes to the principal component. Here's what they mean:

- **First Principal Component (Row 0):**
  - **X0 Loading**: -0.743255
  - **X1 Loading**: -0.669008

  This component is a weighted combination of the original variables X0 and X1. The negative signs indicate that as X0 increases, the value on this principal component decreases, and similarly for X1. The values are relatively close in magnitude, which suggests that both original variables contribute similarly to this component.

- **Second Principal Component (Row 1):**
  - **X0 Loading**: -0.669008
  - **X1 Loading**: 0.743255

  For the second component, the signs are opposite, indicating that the variables are inversely related in this component. As X0 increases, the value of this principal component decreases, but as X1 increases, the value of this component increases. The change in sign between the loadings of X0 and X1 indicates that they have a reverse effect on the second principal component.

In PCA, each principal component is orthogonal to the others, meaning they capture different aspects or patterns in the data. The first principal component captures the direction of maximum variance, and the second captures the direction of the next highest variance, subject to being orthogonal to the first.

These components can be used to visualize high-dimensional data in a lower-dimensional space, often revealing clustering or patterns that were not apparent in the original space. They also serve to reduce the dimensionality of the data for further analysis, such as regression or classification, by focusing on the components that capture the most variance and thus, most of the information in the dataset.

4. **Explained Variance**: `pca.explained_variance_` indicates the amount of variance each principal component holds. This tells us how much information (variability) is captured by each principal component.

In [None]:
# Print the explained variance of each principal component
explained_variance = pd.DataFrame(pca.explained_variance_, columns = ['Explained Variance'], index=['PCA1', 'PCA2'])
display(explained_variance)

In [None]:
# Plot bar chart for explained variance
fig, ax1 = plt.subplots(figsize=(4, 4))
explained_variance.plot.bar(legend=False, ax=ax1, fc='none', ec=['blue', 'green'], lw = 3, hatch = '---')
ax1.set_title('Explained Variance')
plt.tight_layout()

The "Explained Variance" values represent how much variance each principal component (PCA1 and PCA2) captures from the dataset after performing PCA.

- **PCA1: 1.521875** - This value indicates that the first principal component accounts for approximately 1.522 units of the total variance in the data. Since this number is relatively large, it suggests that PCA1 captures a significant portion of the information in the dataset.
- **PCA2: 0.011287** - This value shows that the second principal component accounts for approximately 0.011 units of the total variance. This is much smaller compared to PCA1, which implies that PCA2 captures a very small portion of the information in the data.

In PCA, the explained variance is used to measure the importance of each principal component. Higher values mean that the component captures more of the variability in the data. In our case, PCA1 is the dominant component, capturing most of the variance, while PCA2 contributes very little. This is typical in PCA, where the first few components capture the majority of the variance, and the rest capture progressively less, allowing for dimensionality reduction by focusing on the components that contain the most information.

5. **Explained Variance Ratio**: `pca.explained_variance_ratio_` represents the proportion of the dataset's total variance that each principal component accounts for. This helps in understanding the contribution of each principal component to the overall data structure.

In [None]:
explained_variance_ratio = pd.DataFrame(pca.explained_variance_ratio_,  columns = ['Explained Variance Ratio'], index=['PCA1', 'PCA2'])
display(explained_variance_ratio)

In [None]:
# Plot bar chart for explained variance ratio (R2)
fig, ax2 = plt.subplots(figsize=(4, 4))

explained_variance_ratio.plot.bar(legend=False, ax=ax2, fc='none', ec=['blue', 'green'], lw = 3, hatch = '---')
ax2.set_title('Explained Variance Ratio)')

# Adjust layout for better presentation
plt.tight_layout()

The "Explained Variance Ratio" represents the proportion of the dataset's total variance that is explained by each principal component. In the context of our data:

- **PCA1: 0.992638** - This means that the first principal component (PCA1) explains approximately 99.26% of the variance in the dataset. It captures the majority of the information contained in the original variables.
- **PCA2: 0.007362** - This indicates that the second principal component (PCA2) explains about 0.74% of the variance. This is a much smaller proportion compared to PCA1, suggesting that PCA2 captures a very small amount of the information in the data.

Together, PCA1 and PCA2 account for 100% of the variance in the dataset, with PCA1 being the dominant component. The explained variance ratio is useful for understanding the significance of each principal component in representing the dataset's structure. In this case, PCA1 is significantly more important than PCA2.

6. **Data Transformation**: The original data `X` is projected onto the principal component axes to transform it (into a lower-dimensional space). This results in new features (principal components) that are linear combinations of the original features, with the most significant features (in terms of variance) coming first.

In [None]:
# Transform the original data using PCA
X_pca = pca.transform(X)

# Create a DataFrame for the transformed data with PCA components
X_pca = pd.DataFrame(X_pca, columns=['PCA1', 'PCA2'])

# Display the transformed data with PCA components
display(X_pca)

The table we are looking at represents the transformed dataset after applying Principal Component Analysis (PCA). Each row corresponds to an observation in the original dataset, and the columns 'PCA1' and 'PCA2' are the coordinates of these observations in the new feature space defined by the first and second principal components, respectively.

Here's what the values mean:

- **PCA1**: This column contains the coordinates along the first principal component. This component captures the majority of the variance in the data, and the values indicate the position of each observation along this axis.
- **PCA2**: This column has the coordinates along the second principal component, which is orthogonal to the first and captures the remaining variance not accounted for by the first component.

The numbers in the table are the new, simplified representations of the original data. By transforming the data into the principal components, we reduce its dimensionality while preserving as much of the data's variation as possible. This makes it easier to analyze and visualize, especially when dealing with high-dimensional data.

For example:
- A value of -2.232783 in PCA1 for the first observation means that this observation lies far along the negative direction of the first principal component axis.
- A value of 0.090926 in PCA2 for the same observation indicates a small positive coordinate along the second principal component axis.

In [None]:
import pandas as pd
import seaborn as sns
corr = pd.DataFrame(X_pca).corr()
display(corr.round(2))

The correlation table for the transformed dataset using PCA shows the correlation coefficients between the principal components. Here's what the values indicate:

- **On the Main Diagonal (PCA1 with PCA1, PCA2 with PCA2):**
  - The correlation coefficient of 1 (or \(1.000000e+00\) in scientific notation) indicates a perfect positive correlation. This is expected because it's the correlation of each component with itself.

- **On the Off-Diagonal (PCA1 with PCA2, and vice versa):**
  - The correlation coefficient close to 0 (or \(2.096479e-15\) in scientific notation, which is essentially zero) indicates no linear correlation between the different principal components.

**This result is a fundamental property of PCA:** the principal components are orthogonal to each other, meaning they are uncorrelated. This lack of correlation is what allows PCA to spread out the variance (information) across the components, making them useful for reducing dimensionality without losing significant information. The correlation plot confirms that the transformation has been successful in creating independent features from the original dataset.

7. **Visualization**: The bar charts for loadings and explained variance visually represent the contribution of each original feature to the principal components and the amount of variance explained by each principal component, respectively. This helps in interpreting the PCA results and understanding the data's underlying structure.

In [None]:
def plot_arrow(ax, start, end, **kwargs):
    """
    Function to plot an arrow on the given axis.

    Parameters:
    - ax: matplotlib.axes.Axes
        The axis on which the arrow will be plotted.
    - start: list
        Starting point of the arrow.
    - end: list
        Ending point of the arrow.
    - **kwargs: dict
        Additional keyword arguments for customizing arrow properties.
    """
    ax.arrow(start[0], start[1], end[0], end[1], **kwargs)

# Define arrow plot settings
arrow_settings = dict(head_width=0.1, head_length=0.2, linewidth=1.5, facecolor='black', edgecolor='black')

# Create subplots for original data and principal components
fig, ax = plt.subplots(1, 2, figsize=(10, 6))

# Plot original data and principal components
_ = ax[0].scatter(X[:, 0], X[:, 1], c='Aqua', edgecolors='DodgerBlue', s=30)
_ = ax[0].set(aspect='equal', xlim=[-4, 4], ylim=[-4, 4], xlabel=r'$X_1$', ylabel=r'$X_2$', title='Original Data')
for length, vector in zip(pca.explained_variance_, pca.components_):
    v = vector * 3 * np.sqrt(length)
    plot_arrow(ax[0], start=pca.mean_, end=v, **arrow_settings)

# Plot principal components
X_pca = pca.transform(X)
_ = ax[1].scatter(X_pca[:, 0], X_pca[:, 1], c='MistyRose', edgecolors='OrangeRed', s=30)
plot_arrow(ax[1], start=[0, 0], end=[0, 3], **arrow_settings)
plot_arrow(ax[1], start=[0, 0], end=[3, 0], **arrow_settings)
_ = ax[1].set(aspect='equal', xlabel='First Principal Component 1',
              ylabel='First Principal Component 2',
              title='Transformed', xlim=(-4, 4), ylim=(-4, 4))

# Adjust layout for better presentation
plt.tight_layout()

Here's a breakdown of what each part of the plot represents:

1. **Left Panel - Original Data:**
   - The scatter plot displays the original data points in their original feature space, with `X1` and `X2` as axes.
   - Arrows represent the principal axes (directions of maximum variance) found by PCA. The length of the arrows is proportional to the explained variance of each principal component, scaled by a factor for visibility.
   - The starting point of the arrows is the mean of the dataset, indicating the center from where the principal components are drawn.

2. **Right Panel - Transformed Data:**
   - This scatter plot shows the data after it has been transformed by PCA. The axes here are the principal components themselves, labeled as 'Component 1' and 'Component 2'.
   - The arrows in this plot are drawn from the origin (0,0) along the axes of the principal components. They are standardized to have lengths that make them visible on the plot, but they do not represent variance as in the left panel.
   - The transformed data points are now plotted according to their values in the new coordinate system defined by the principal components.

This visualization helps to illustrate how PCA transforms the data from the original feature space to a new space where the axes are the principal components that capture the most variance. The left panel shows where the data originally lies and the directions of maximum variance, while the right panel shows the data in the reduced-dimensionality space where each point's position is determined by its scores on the principal components. This is useful for understanding the effect of PCA on the data and for further analysis like clustering or classification in the reduced space.

## Transforming Data Between Representations

Here, we use Principal Component Analysis (PCA) to transform a dataset from a higher-dimensional space to a lower-dimensional one while preserving as much variance as possible.

In [None]:
pca = PCA(n_components=1)
pca.fit(X)
X_pca = pca.transform(X)
print("original shape:   ", X.shape)
print("transformed shape:", X_pca.shape)

In [None]:
# Create subplots for original data and principal components
fig, ax = plt.subplots(1, 2, figsize=(10, 6))

# Plot original data and principal components
_ = ax[0].scatter(X[:, 0], X[:, 1], c='Aqua', edgecolors='DodgerBlue', s=30)
_ = ax[0].set(aspect='equal', xlim=[-4, 4], ylim=[-4, 4], xlabel=r'$X_1$', ylabel=r'$X_2$', title='Original Data')
for length, vector in zip(pca.explained_variance_, pca.components_):
    v = vector * 3 * np.sqrt(length)
    plot_arrow(ax[0], start=pca.mean_, end=v, **arrow_settings)

# Plot principal components
X_pca = pca.transform(X)
_ = ax[1].scatter(X_pca, np.zeros(X_pca.shape), c='MistyRose', edgecolors='OrangeRed', s=30)
plot_arrow(ax[1], start=[0, 0], end=[3, 0], **arrow_settings)
_ = ax[1].set(aspect='equal', xlabel='First Principal Component 1',
              ylabel='First Principal Component 2',
              title='Transformed', xlim=(-4, 4), ylim=(-4, 4))

# Adjust layout for better presentation
plt.tight_layout()

The figure described is a side-by-side comparison of the original dataset and its transformation through Principal Component Analysis (PCA).

- **Left Panel - Original Data**: This panel shows the scatter plot of the original dataset in its two-dimensional feature space. The arrows represent the principal components, with their length proportional to the explained variance. They indicate the directions in which the data varies the most.

- **Right Panel - PCA Transformed Data**: This panel displays the data after being transformed by PCA into a one-dimensional space along the first principal component. Since it's a one-dimensional representation, all data points are aligned horizontally (at zero on the y-axis). The arrow indicates the first principal component's direction.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(8, 8))

X_new = pca.inverse_transform(X_pca)
plt.scatter(X[:, 0], X[:, 1], c='Aqua', edgecolors='DodgerBlue', s=30)
plt.scatter(X_new[:, 0], X_new[:, 1], c='MistyRose', edgecolors='OrangeRed', s=30, alpha = 0.8)
_ = ax.set(aspect='equal', xlabel='First Principal Component 1',
           ylabel='First Principal Component 2',
           title='Data projected onto PCA1 axis in original space', xlim=(-4, 3), ylim=(-4, 3))

The figure described is showing a scatter plot that visualizes the original dataset alongside its inverse transformation from PCA back into the original feature space.

- The **blue points** represent the original data in its two-dimensional feature space.
- The **orange points** represent the data after it has been transformed to one dimension using PCA and then inversely transformed back into the two-dimensional space.

The inverse transformation maps the one-dimensional PCA data back onto the original feature space, but since PCA reduces dimensionality by projecting the data onto the direction of maximum variance (the first principal component), the orange points lie along a line. This line represents the first principal component axis in the original feature space.

The plot essentially shows what information is preserved and what is lost when you reduce the data to one principal component. The spread of the blue points shows the original variance in the data, while the alignment of the red points along a line shows the variance captured by the first principal component. Any variance orthogonal to this line is lost in the PCA transformation. The plot is useful for understanding the effect of dimensionality reduction on the dataset.

### Visualizing High-Dimensional Data with PCA: The Case of Handwritten Digits

Visualizing data that has many dimensions can be challenging. Dimensionality reduction techniques like PCA (Principal Component Analysis) allow us to simplify this data into a form that's easier to understand and work with. This simplification is particularly useful for spotting patterns, structures, or anomalies within complex datasets.

Consider the [Optical Recognition of Handwritten Digits](https://archive.ics.uci.edu/dataset/80/optical+recognition+of+handwritten+digits), a collection of 1797 digit images, each described by 64 pixel values. These pixels, indicating varying levels of grayscale from 0 to 16, serve as individual features, making each image a 64-dimensional point in space. Given its complexity, this dataset is ideal for demonstrating how PCA can facilitate visualization.

By applying PCA, we can compress the 64-dimensional data down to just 2 dimensions, distilling the essence of the data while retaining its most significant variations. Consequently, we can represent the images on a two-dimensional plot, allowing us to observe the relationships between different digits visually.

In [None]:
from sklearn.datasets import load_digits
digits = load_digits()
print(digits.DESCR)

import matplotlib.pyplot as plt

fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(6, 6))
for idx, ax in enumerate(axs.ravel()):
    ax.imshow(digits.data[idx].reshape((8, 8)), cmap=plt.cm.binary)
    ax.axis("off")
_ = fig.suptitle("A selection from the 64-dimensional digits dataset", fontsize=16)
plt.tight_layout()

To gain clearer insights into how these data points relate to each other, we can utilize PCA to condense them into a simpler form with fewer dimensions, like two.

In [None]:
# Import necessary libraries
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt

# Load the digits dataset
digits = load_digits()

# Apply PCA to project data to 2 dimensions
pca = PCA(n_components=2)  # project from 64 to 2 dimensions
projected = pca.fit_transform(digits.data)

# Print shapes before and after PCA
print("Original data shape:", digits.data.shape)
print("Projected data shape:", projected.shape)

# Create a scatter plot using Matplotlib
fig, ax = plt.subplots(1, 1, figsize=(9.5, 9.5))
scatter = ax.scatter(projected[:, 0], projected[:, 1],
                     c=digits.target, cmap='Spectral',
                     edgecolor='k', linewidth=0.5,
                     alpha=0.7, s=40)

# Set labels and title
ax.set(xlabel='Component 1', ylabel='Component 2',
       title='PCA Projection of Handwritten Digits')

# Add legend inside on the top right
legend = ax.legend(*scatter.legend_elements(), title="Digits",
                   loc='upper right', fontsize=12)

# Display the plot with tight layout
plt.tight_layout()

In [None]:
# Import necessary libraries
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

# Load the digits dataset
digits = load_digits()

# Apply PCA to project data to 3 dimensions
pca = PCA(n_components=3)  # project from 64 to 3 dimensions
projected = pca.fit_transform(digits.data)

# Print shapes before and after PCA
print("Original data shape:", digits.data.shape)
print("Projected data shape:", projected.shape)

# Create a 3D scatter plot
fig = plt.figure(figsize=(9, 9))
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(projected[:, 0], projected[:, 1], projected[:, 2],
                     c=digits.target, cmap='Spectral',
                     edgecolor='k', linewidth=0.5,
                     alpha=0.7, s=40)

# Set axis labels
ax.set_xlabel('First Principal Component')
ax.set_ylabel('Second Principal Component')
ax.set_zlabel('Third Principal Component')

# Set view angle
ax.view_init(elev=20., azim=30)

# Add legend inside on the top right
legend = ax.legend(*scatter.legend_elements(), title="Digits",
                   loc='right', fontsize=12)

# Display the plot with tight layout
plt.tight_layout()