# Recommendation Model Summary

This recommendation model uses a simple content-based approach. The main idea is to find the similarity between products based on their textual features (product name, brand, category, and description) and recommend a product that is most similar to the desired product when it's not available.

The code performs the following steps:

- Import the necessary libraries
- Load the product data from a CSV file
- Preprocess the text data by combining the product name, brand, category, and description columns into a single text column.
- Create a term frequency-inverse document frequency (TF-IDF) matrix from the combined text data to represent the importance of words in the text
- Compute the cosine similarity matrix based on the TF-IDF matrix to measure the similarity between products
- Define a function recommend_similar_product that takes a product code and the similarity matrix as inputs and returns the most similar product information when the desired product is not available

# Limitations and Potential Errors

- The model relies solely on textual features, which may not capture all the relevant aspects of product similarity. Other features, such as price, nutritional information, or customer reviews, are not taken into account

# Personal opinion

In conclusion, the current recommendation model provides a basic solution for recommending similar products when the product is not available. However, it has several limitations that need to be addressed. To improve the model's accuracy and effectiveness and more measures I think other machine learning techniques should be considered.


In [15]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

products = pd.read_csv("product_data.csv")

products = products.fillna('')

products['combined_features'] = products['product_name'] + " " + products['brand'] + " " + products['category'] + " " + products['description']

vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(products['combined_features'])

similarity_matrix = linear_kernel(tfidf_matrix, tfidf_matrix)

def recommend_similar_product(product_code, similarity_matrix):
    product_indices = products[products['product_code'] == product_code].index

    if len(product_indices) == 0:
        print(f"Product code {product_code} not found in the dataset.")
        return None

    index = product_indices[0]
    similar_products = list(enumerate(similarity_matrix[index]))
    sorted_similar_products = sorted(similar_products, key=lambda x: x[1], reverse=True)

    most_similar_product_index = sorted_similar_products[1][0]
    most_similar_product = products.iloc[most_similar_product_index]

    return {
        'product_code': most_similar_product['product_code'],
        'product_name': most_similar_product['product_name'],
        'brand': most_similar_product['brand'],
        'category': most_similar_product['category'],
        'description': most_similar_product['description']
    }

desired_product_code = '4000540021479'
recommended_product = recommend_similar_product(desired_product_code, similarity_matrix)

if recommended_product:
    print(f"Recommended product: {recommended_product}")


Recommended product: {'product_code': '4021851436929', 'product_name': 'Fruchtmüsli', 'brand': 'Dennree', 'category': 'Pflanzliche Lebensmittel und Getränke, Pflanzliche Lebensmittel, Frühstücke, Getreide und Kartoffeln, Getreideprodukte, Frühstückscerealien, Frühstückscerealien mit Früchten, Müslis, Müslis mit Früchten', 'description': ''}
