### Load and Prepare Data

In [1]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
from sklearn.preprocessing import StandardScaler

# Load the cleaned recommendation dataset
recommendation_data = pd.read_csv("C:/project/fashion-recommender-system/data/processed/fashion-mnist_train_cleaned.csv")

# Separate features and labels
X_recommend = recommendation_data.iloc[:, 1:].values  # Pixel values
y_recommend = recommendation_data.iloc[:, 0].values  # Labels (category or class)

# Normalize pixel values (0-255 to 0-1)
X_recommend = X_recommend / 255.0


The dataset is loaded, and features (pixel values) and labels (classes) are separated. Normalization is performed for better computational efficiency.




### Standardization of Data

In [2]:
# Standardize the data (mean = 0, variance = 1)
scaler = StandardScaler()
X_recommend_standardized = scaler.fit_transform(X_recommend)



Standardization scales the features to have a mean of 0 and a variance of 1. This is particularly important for algorithms like PCA and cosine similarity, as it ensures that the distance calculations are not biased by the scale of the features

###  Dimensionality Reduction using PCA

In [3]:
from sklearn.decomposition import PCA

# Apply PCA to reduce dimensions to 50 components
pca = PCA(n_components=10)
X_recommend_pca = pca.fit_transform(X_recommend_standardized)


PCA reduces the dimensionality of the feature set to 50 components, helping maintain data variance while decreasing computation time and memory usage.


###  Dimensionality Reduction via TruncatedSVD

In [4]:
from sklearn.decomposition import TruncatedSVD

# Apply Truncated SVD to reduce dimensions to 50 components
svd = TruncatedSVD(n_components=10)
X_recommend_svd = svd.fit_transform(X_recommend_standardized)


Truncated SVD is effective for sparse datasets, reducing dimensionality while preserving essential features, which allows for faster computations and lower memory requirements.


###  Define Updated Batch Cosine Similarity Calculation

In [5]:
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import csr_matrix

def batch_cosine_similarity(X, batch_size=100):
    n_samples = X.shape[0]
    similarity_matrix = csr_matrix((n_samples, n_samples))  # Create a sparse matrix

    for i in range(0, n_samples, batch_size):
        end = min(i + batch_size, n_samples)
        # Compute cosine similarity for the current batch
        batch_similarity = cosine_similarity(X[i:end], X)
        similarity_matrix[i:end, :] = batch_similarity

    return similarity_matrix


1. The updated batch_cosine_similarity function computes cosine similarity in smaller batches to manage memory usage effectively.
2. It returns a sparse matrix, saving memory while still providing necessary similarity scores for large datasets.


### Recalculate Similarity Matrices

In [None]:
# Calculate similarity matrix for PCA reduced data
similarity_matrix_batch_pca = batch_cosine_similarity(X_recommend_pca, batch_size=100)  # Adjust batch size

# Calculate similarity matrix for SVD reduced data
similarity_matrix_batch_svd = batch_cosine_similarity(X_recommend_svd, batch_size=100)


Similarity matrices for both PCA and SVD reduced datasets are recomputed using batch processing, ensuring efficient memory utilization.


### Modify Recommendation Function

In [None]:
def get_recommendations(item_index, similarity_matrix, N=5):
    similarity_scores = similarity_matrix[item_index].toarray().flatten()  # Convert sparse to dense for indexing
    top_N_indices = similarity_scores.argsort()[::-1][1:N + 1]  # Exclude the item itself
    return top_N_indices


The modified get_recommendations function retrieves the top N similar items based on similarity scores, ensuring that the input item is excluded from its own recommendations.


### Example Recommendation Using Batch PCA Similarity Matrix


In [None]:
recommended_items_batch_pca = get_recommendations(0, similarity_matrix_batch_pca, N=5)
print("Recommended items (Batch PCA):", recommended_items_batch_pca)


1. This step demonstrates how to use the recommendation function to find similar items for the first item in the dataset.
2. The output will display the indices of the top recommended items based on PCA similarity.


### Save the Models

In [None]:
import joblib

# Save PCA and SVD models
joblib.dump(pca, 'pca_model.pkl')
joblib.dump(svd, 'svd_model.pkl')

# Save the scaler model
joblib.dump(scaler, 'scaler_model.pkl')


The trained PCA, SVD, and standardization models are saved using joblib, enabling quick loading in future sessions and preventing the need for retraining.
