## FASHION CLASSIFICATION USING PRINICIPAL COMPONENT ANALYSIS

In [15]:
# Fashion classification using PCA
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv("E:/Ducat/Machine Learning/fashion.csv")

# Prepare the features and target
x = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Check for missing values or preprocessing requirements (if necessary)
# Example: x.fillna(0, inplace=True)

# Standardize the data (apply before splitting to avoid data leakage)
sc = StandardScaler()
x = sc.fit_transform(x)

# Split the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Initialize the KNN classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)

# Make predictions
pred = knn.predict(x_test)

# Evaluate the model
print("Accuracy Score (before PCA):", accuracy_score(y_test, pred))
print("Confusion Matrix (before PCA):\n", confusion_matrix(y_test, pred))
print("Classification Report (before PCA):\n", classification_report(y_test, pred))

# Apply PCA for dimensionality reduction
pca = PCA(n_components=10)
x_train_new = pca.fit_transform(x_train)
x_test_new = pca.transform(x_test)

# Check explained variance ratio
print("Explained Variance Ratio:", pca.explained_variance_ratio_)

# Train the model again with the reduced features
knn.fit(x_train_new, y_train)

# Make predictions with the reduced features
pred_new = knn.predict(x_test_new)

# Evaluate the model with the reduced features
print("New Accuracy Score (after PCA):", accuracy_score(y_test, pred_new))
print("Confusion Matrix (after PCA):\n", confusion_matrix(y_test, pred_new))
print("Classification Report (after PCA):\n", classification_report(y_test, pred_new))


Accuracy Score (before PCA): 0.997
Confusion Matrix (before PCA):
 [[1994    0    0    0]
 [   4    0    0    0]
 [   1    0    0    0]
 [   1    0    0    0]]
Classification Report (before PCA):
               precision    recall  f1-score   support

           0       1.00      1.00      1.00      1994
           1       0.00      0.00      0.00         4
          21       0.00      0.00      0.00         1
          28       0.00      0.00      0.00         1

    accuracy                           1.00      2000
   macro avg       0.25      0.25      0.25      2000
weighted avg       0.99      1.00      1.00      2000

Explained Variance Ratio: [0.22018317 0.14249483 0.05515267 0.05117367 0.04111788 0.03046004
 0.0282791  0.02343509 0.01809162 0.01431002]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


New Accuracy Score (after PCA): 0.997
Confusion Matrix (after PCA):
 [[1994    0    0    0]
 [   4    0    0    0]
 [   1    0    0    0]
 [   1    0    0    0]]
Classification Report (after PCA):
               precision    recall  f1-score   support

           0       1.00      1.00      1.00      1994
           1       0.00      0.00      0.00         4
          21       0.00      0.00      0.00         1
          28       0.00      0.00      0.00         1

    accuracy                           1.00      2000
   macro avg       0.25      0.25      0.25      2000
weighted avg       0.99      1.00      1.00      2000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [17]:
# import all library
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [19]:
# Load and print the dataset
df = pd.read_csv("E:/Ducat/Machine Learning/fashion.csv")
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,774,775,776,777,778,779,780,781,782,783
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,2,3,0,3,174,189,67,0,0,0
2,0,0,0,0,0,0,0,0,1,0,...,164,58,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,21,...,1,0,0,0,0,0,0,0,0,0
4,0,0,0,2,0,1,1,0,0,0,...,71,12,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9996,0,0,0,0,0,0,0,0,0,31,...,0,0,0,0,0,0,0,0,0,0
9997,0,0,0,0,0,0,0,0,0,0,...,27,0,0,0,0,0,0,0,0,0
9998,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
# Prepare the features
x = df.iloc[:, :-1]
x

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,773,774,775,776,777,778,779,780,781,782
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,2,2,3,0,3,174,189,67,0,0
2,0,0,0,0,0,0,0,0,1,0,...,176,164,58,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,21,...,146,1,0,0,0,0,0,0,0,0
4,0,0,0,2,0,1,1,0,0,0,...,101,71,12,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9996,0,0,0,0,0,0,0,0,0,31,...,91,0,0,0,0,0,0,0,0,0
9997,0,0,0,0,0,0,0,0,0,0,...,101,27,0,0,0,0,0,0,0,0
9998,0,0,0,0,0,0,0,0,0,0,...,14,0,0,0,0,0,0,0,0,0


In [23]:
# prepare the target
y = df.iloc[:, -1]
y

0       0
1       0
2       0
3       0
4       0
       ..
9995    0
9996    0
9997    0
9998    0
9999    0
Name: 783, Length: 10000, dtype: int64

In [29]:
# Standardize the data
sc = StandardScaler()
x = sc.fit_transform(x)

In [31]:
# Split the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [33]:
# Initialize the KNN classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)

In [35]:
# Make predictions
pred = knn.predict(x_test)

In [37]:
# Evaluate the model
print("Accuracy Score (before PCA):", accuracy_score(y_test, pred))
print("Confusion Matrix (before PCA):\n", confusion_matrix(y_test, pred))
print("Classification Report (before PCA):\n", classification_report(y_test, pred))

Accuracy Score (before PCA): 0.997
Confusion Matrix (before PCA):
 [[1994    0    0    0]
 [   4    0    0    0]
 [   1    0    0    0]
 [   1    0    0    0]]
Classification Report (before PCA):
               precision    recall  f1-score   support

           0       1.00      1.00      1.00      1994
           1       0.00      0.00      0.00         4
          21       0.00      0.00      0.00         1
          28       0.00      0.00      0.00         1

    accuracy                           1.00      2000
   macro avg       0.25      0.25      0.25      2000
weighted avg       0.99      1.00      1.00      2000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [39]:
# Apply PCA for dimensionality reduction
pca = PCA(n_components=10)
x_train_new = pca.fit_transform(x_train)
x_test_new = pca.transform(x_test)

In [41]:
# Check explained variance ratio
print("Explained Variance Ratio:", pca.explained_variance_ratio_)

Explained Variance Ratio: [0.22018317 0.14249483 0.05515267 0.05117367 0.04111788 0.03046004
 0.0282791  0.02343509 0.01809162 0.01431002]


In [43]:
# Train the model again with the reduced features
knn.fit(x_train_new, y_train)

In [45]:
# Make predictions with the reduced features
pred_new = knn.predict(x_test_new)

In [47]:
# Evaluate the model with the reduced features
print("New Accuracy Score (after PCA):", accuracy_score(y_test, pred_new))
print("Confusion Matrix (after PCA):\n", confusion_matrix(y_test, pred_new))
print("Classification Report (after PCA):\n", classification_report(y_test, pred_new))

New Accuracy Score (after PCA): 0.997
Confusion Matrix (after PCA):
 [[1994    0    0    0]
 [   4    0    0    0]
 [   1    0    0    0]
 [   1    0    0    0]]
Classification Report (after PCA):
               precision    recall  f1-score   support

           0       1.00      1.00      1.00      1994
           1       0.00      0.00      0.00         4
          21       0.00      0.00      0.00         1
          28       0.00      0.00      0.00         1

    accuracy                           1.00      2000
   macro avg       0.25      0.25      0.25      2000
weighted avg       0.99      1.00      1.00      2000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


# Fashion Classification using PCA

In this project, I built a machine learning model to classify fashion items using the `KNeighborsClassifier` and 
applied Principal Component Analysis (PCA) for dimensionality reduction. The goal was to explore the impact of PCA
on the performance of a KNN classifier.

## Steps Involved:

### 1. Data Loading and Preprocessing:
- Loaded the dataset (`fashion.csv`) using `pandas`.
- Prepared the feature set `X` by selecting all columns except the last one (target).
- Extracted the target variable `Y` as the last column in the dataset.

### 2. Standardization:
- Applied `StandardScaler` to standardize the feature set `X` to ensure all features are on the same scale before model training.
- Standardization was performed before splitting the data to avoid data leakage.

### 3. Train-Test Split:
- Split the dataset into training (80%) and testing (20%) sets using `train_test_split` from `sklearn`.

### 4. Model Training without PCA:
- Initialized the K-Nearest Neighbors (KNN) classifier with `n_neighbors=5`.
- Trained the KNN model using the training data and made predictions on the test set.

### 5. Model Evaluation (Before PCA):
- Evaluated the model's performance by checking the accuracy score, confusion matrix, and classification report.
- **Results:**
  - **Accuracy:** 99.7%
  - **Confusion Matrix:**
    ```plaintext
    [[1994    0    0    0]
     [   4    0    0    0]
     [   1    0    0    0]
     [   1    0    0    0]]
    ```
  - **Classification Report:**
    - The classifier performed well on the majority class (label `0`), but struggled with the minority classes (labels `1`, `21`, `28`).

### 6. PCA for Dimensionality Reduction:
- Applied PCA to reduce the number of features to 10 components (`n_components=10`).
- Printed the explained variance ratio for each principal component to understand how much variance is captured by each component.

### 7. Model Training with PCA:
- Retrained the KNN model using the reduced feature set (after applying PCA).
- Made predictions on the test set using the transformed features.

### 8. Model Evaluation (After PCA):
- Evaluated the model's performance again using accuracy, confusion matrix, and classification report.
- **Results:**
  - **Accuracy:** 99.7%
  - **Confusion Matrix:**
    ```plaintext
    [[1994    0    0    0]
     [   4    0    0    0]
     [   1    0    0    0]
     [   1    0    0    0]]
    ```
  - **Classification Report:**
    - Similar to the model without PCA, it performed well on the majority class (`0`), but had difficulties with the minority classes.

## Key Insights:
- The model achieved very high accuracy (99.7%) both before and after PCA, indicating that the KNN classifier worked well even without dimensionality reduction.
- PCA helped reduce the dimensionality, but the model's performance on the test set remained almost the same, suggesting that most important features were captured by the first 10 principal components.
