<a href="https://colab.research.google.com/github/Shakeelkhuhro/Cheese-Recommendation-Model/blob/main/Cheese.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

import all necessary libraries:
- **Pandas**: For data loading and manipulation.
- **Scikit-learn**: For data preprocessing, model building, evaluation, and metrics.
- **RandomForestClassifier**: The classifier we’ll use for predicting whether a cheese is organic.
- **PCA and euclidean_distances**: To help generate recommendations based on similarity.

In [53]:
# Step 1:  Import Libraries and Load Data
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.decomposition import PCA
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Load the dataset
df = pd.read_csv('cheese_data.csv')

# Display the first few rows to understand the structure
df.head()

Unnamed: 0,CheeseId,ManufacturerProvCode,ManufacturingTypeEn,MoisturePercent,FlavourEn,CharacteristicsEn,Organic,CategoryTypeEn,MilkTypeEn,MilkTreatmentTypeEn,RindTypeEn,CheeseName,FatLevel
0,228,NB,Farmstead,47.0,"Sharp, lactic",Uncooked,0,Firm Cheese,Ewe,Raw Milk,Washed Rind,Sieur de Duplessis (Le),lower fat
1,242,NB,Farmstead,47.9,"Sharp, lactic, lightly caramelized",Uncooked,0,Semi-soft Cheese,Cow,Raw Milk,Washed Rind,Tomme Le Champ Doré,lower fat
2,301,ON,Industrial,54.0,"Mild, tangy, and fruity","Pressed and cooked cheese, pasta filata, inter...",0,Firm Cheese,Cow,Pasteurized,,Provolone Sette Fette (Tre-Stelle),lower fat
3,303,NB,Farmstead,47.0,Sharp with fruity notes and a hint of wild honey,,0,Veined Cheeses,Cow,Raw Milk,,Geai Bleu (Le),lower fat
4,319,NB,Farmstead,49.4,Softer taste,,1,Semi-soft Cheese,Cow,Raw Milk,Washed Rind,Gamin (Le),lower fat


load the cheese dataset into a DataFrame and display the first few rows. This helps us understand the data's structure and identify the columns we'll use for our features and target.

select the columns we want to use as features (`features`) and define our target variable (`target`).

we predicting if a cheese is organic or not. We then separate the features into numerical and categorical for targeted preprocessing.


define two separate pipelines:
- **Numerical Transformer**: Imputes missing values using the median and then scales the values.
- **Categorical Transformer**: Imputes missing values with "missing" and encodes categorical features as one-hot vectors.
  
These transformations are combined using `ColumnTransformer` to ensure the data is consistently preprocessed for both training and prediction.


** Train the model **
model_pipeline.fit(X_train, y_train)

We define a complete pipeline with two main components:
1. **Preprocessor**: To transform our features.
2. **Classifier**: A `RandomForestClassifier` that predicts if the cheese is organic.
   
We then split the data into training and testing sets and train the model on the training set.


To assess model performance, we:
- Make predictions on the test set.
- Calculate **accuracy**, **precision**, and **recall** scores.
  
These metrics give us insights into how well our model is performing. Higher values indicate better performance.

The function collects the user's preferences for a cheese. It asks for specific details, such as the manufacturing type, category, fat level, and moisture percentage. These inputs will be used to provide predictions and recommendations

In [54]:
# Step 2: Preprocess Data and Train the Model
# Define features and target
features = ['ManufacturingTypeEn', 'CategoryTypeEn', 'FatLevel', 'MoisturePercent']
target = 'Organic'  # Predicting if cheese is organic

# Split features into numerical and categorical
numerical_features = ['MoisturePercent']
categorical_features = ['ManufacturingTypeEn', 'CategoryTypeEn', 'FatLevel']

# Create preprocessing pipelines
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

# Combine transformations
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Define the full pipeline with RandomForestClassifier
model_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(random_state=42))
])

# Split data into training and test sets
X = df[features]
y = df[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model_pipeline.fit(X_train, y_train)

# Evaluate the model
y_pred = model_pipeline.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")

Accuracy: 0.8899521531100478
Precision: 0.23076923076923078
Recall: 0.1875


user_preferences function uses the trained model to predict if a cheese that matches the user’s preferences is likely to be organic. It returns both the prediction (organic or not) and the probability of this prediction for confidence.




In [55]:
# Step 3: Collect User Preferences
def get_user_preferences():
    print("Please enter your cheese preferences:")
    manufacturing_type = input("Manufacturing Type (e.g., Farmstead, Industrial): ")
    category_type = input("Category Type (e.g., Firm Cheese, Semi-soft Cheese): ")
    fat_level = input("Fat Level (e.g., lower fat): ")
    moisture_percent = float(input("Moisture Percent (e.g., 45.0): "))

    # Compile user input into a dictionary
    user_preferences = {
        'ManufacturingTypeEn': manufacturing_type,
        'CategoryTypeEn': category_type,
        'FatLevel': fat_level,
        'MoisturePercent': moisture_percent
    }

    return user_preferences


'generate_recommendations' function generates recommendations based on similarity:
- **Dimensionality Reduction**: PCA reduces feature dimensionality for easier distance computation.
- **Similarity Calculation**: The Euclidean distance between the user input and each cheese in the dataset is computed. The closest matches are selected as recommendations


In [56]:
# Step 4: Define Prediction and Recommendation Functions
# Prediction function
def predict_user_cheese(user_preferences):
    user_input_df = pd.DataFrame([user_preferences])
    prediction = model_pipeline.predict(user_input_df)
    prediction_proba = model_pipeline.predict_proba(user_input_df)[0][1]

    return "Organic" if prediction[0] == 1 else "Not Organic", prediction_proba

# Recommendation function
def generate_recommendations(df, user_preferences, n_recommendations=5):
    transformed_data = preprocessor.fit_transform(df[features])
    pca = PCA(n_components=5)
    transformed_data = pca.fit_transform(transformed_data)

    user_input_df = pd.DataFrame([user_preferences])
    transformed_user_input = preprocessor.transform(user_input_df)
    user_vector = pca.transform(transformed_user_input)

    distances = euclidean_distances(user_vector, transformed_data).flatten()
    closest_indices = distances.argsort()[:n_recommendations]

    recommendations = df.iloc[closest_indices]
    return recommendations[['CheeseName', 'CategoryTypeEn', 'FatLevel']]

The main execution block:
1. **Collects user input** for cheese preferences.
2. **Predicts** if a cheese matching those preferences would likely be organic.
3. **Generates recommendations** for similar cheeses from the dataset based on the preferences provided.

The output includes both the organic prediction and a list of recommended cheeses, providing a complete user experience.


In [58]:
# Step 5: Run the Complete System
# Collect user preferences
user_preferences = get_user_preferences()

# Predict if the user's preferred cheese is likely to be "Organic"
organic_prediction, probability = predict_user_cheese(user_preferences)
print(f"\nPrediction: The cheese is likely to be '{organic_prediction}' with probability {probability:.2f}")

# Generate recommendations based on similarity
recommended_cheeses = generate_recommendations(df, user_preferences)
print("\nRecommended Cheeses based on your preferences:")
print(recommended_cheeses)

Please enter your cheese preferences:
Manufacturing Type (e.g., Farmstead, Industrial): Farmstead
Category Type (e.g., Firm Cheese, Semi-soft Cheese): Firm Cheese
Fat Level (e.g., lower fat): lower fat
Moisture Percent (e.g., 45.0): 50.0

Prediction: The cheese is likely to be 'Not Organic' with probability 0.40

Recommended Cheeses based on your preferences:
                  CheeseName CategoryTypeEn   FatLevel
947                  Cheddar    Firm Cheese  lower fat
954                     Feta    Firm Cheese  lower fat
969               Mozzarella    Firm Cheese  lower fat
965                 Mamamia!    Firm Cheese  lower fat
0    Sieur de Duplessis (Le)    Firm Cheese  lower fat
