In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from keras.preprocessing.image import ImageDataGenerator

In [2]:
# Load the data from the CSV file into a numpy array
data = pd.read_csv('MNIST_train.csv')

In [3]:
print(data.info)

<bound method DataFrame.info of        Unnamed: 0  index  labels  0  1  2  3  4  5  6  ...  774  775  776  \
0               0      0       5  0  0  0  0  0  0  0  ...    0    0    0   
1               1      1       0  0  0  0  0  0  0  0  ...    0    0    0   
2               2      2       4  0  0  0  0  0  0  0  ...    0    0    0   
3               3      3       1  0  0  0  0  0  0  0  ...    0    0    0   
4               4      4       9  0  0  0  0  0  0  0  ...    0    0    0   
...           ...    ...     ... .. .. .. .. .. .. ..  ...  ...  ...  ...   
59995       59995  59995       8  0  0  0  0  0  0  0  ...    0    0    0   
59996       59996  59996       3  0  0  0  0  0  0  0  ...    0    0    0   
59997       59997  59997       5  0  0  0  0  0  0  0  ...    0    0    0   
59998       59998  59998       6  0  0  0  0  0  0  0  ...    0    0    0   
59999       59999  59999       8  0  0  0  0  0  0  0  ...    0    0    0   

       777  778  779  780  781  782  783  


In this code, we first load the MNIST dataset using pandas library and extract the pixel values and labels into separate variables. Then, we normalize the pixel values by dividing them by 255.0, which scales them to a range between 0 and 1. Finally, we verify the normalization results by printing the minimum and maximum pixel values in the normalized dataset.

In [4]:
# Extract pixel values and labels
X_train = data.iloc[:, 1:].values
y_train = data.iloc[:, 0].values

In [5]:
# Normalize pixel values
X_train = X_train.astype('float32') / 255.0

In [6]:
# Verify the normalization results
print('Min pixel value:', np.min(X_train))
print('Max pixel value:', np.max(X_train))

Min pixel value: 0.0
Max pixel value: 235.29019


PCA stands for Principal Component Analysis.

we use the PCA class from the sklearn.decomposition module to perform PCA with the n_components parameter set to 50. This means that the dimensionality of the data will be reduced to 50 features. The fit_transform method is then used to fit the PCA model to the data and transform it into the reduced feature space.

In [7]:
# Perform PCA to reduce dimensionality
pca = PCA(n_components=50)  # Set the number of components to keep
X_train_pca = pca.fit_transform(X_train)

In [8]:
# Verify the results
print('Original shape:', X_train.shape)
print('Reduced shape:', X_train_pca.shape)

Original shape: (60000, 786)
Reduced shape: (60000, 50)


To generate new images using data augmentation, you can use the ImageDataGenerator class from the keras.preprocessing.image module. Here's an example code snippet that applies random rotation, width shift, and height shift to the images: