Food Recommendation System Backend Documentation

This documentation provides a comprehensive overview of the backend components of the Food Recommendation System. The goal is to elucidate each module, explain the reasoning behind the design choices, and ensure clarity and understandability.

## Table of Contents
- Project Structure
- Core Modules
  - data_processing.py
  - graph_triples.py
  - data_loading.py
  - model.py
  - recommender.py
  - utils.py
- Models
  - schemas.py
- Routers
  - recommend.py
  - recipe_info.py
  - unique_items.py
- Main Application
  - main.py

## Project Structure
```
backend/
├── main.py
├── core/
│   ├── data_processing.py
│   ├── data_loading.py
│   ├── graph_triples.py
│   ├── model.py
│   ├── recommender.py
│   └── utils.py
├── models/
│   └── schemas.py
└── routers/
    ├── recommend.py
    ├── recipe_info.py
    └── unique_items.py
```

## Core Modules

### data_processing.py
**Overview**: The `data_processing.py` module is responsible for preprocessing the raw recipe data. It performs data cleaning, label normalization, and prepares the data for graph construction and model training.

**Key Functions**
- **process_list_column(x)**
  - **Purpose**: Cleans and processes list-like columns in the DataFrame.
  - **Logic**:
    - Checks if the value is NaN or empty.
    - Splits the string on commas to separate items.
    - Applies `create_node_label` to each item for normalization.
  - **Returns**: A comma-separated string of normalized items.

- **preprocess_data(file_path)**
  - **Purpose**: Preprocesses the entire dataset.
  - **Logic**:
    - Reads the CSV file into a DataFrame.
    - Drops duplicate recipes based on the 'Name' column.
    - Resets the index for a clean DataFrame.
    - Normalizes the 'Name' column using `create_node_label`.
    - Processes list-like columns using `process_list_column`.
    - Normalizes the 'cook_time' column.
  - **Returns**: A cleaned and preprocessed DataFrame.

- **get_unique_regions(df)**
  - **Purpose**: Extracts unique regions from the 'RegionPart' column.
  - **Logic**:
    - Iterates over the 'RegionPart' column.
    - Splits the regions and adds them to a set to ensure uniqueness.
  - **Returns**: A sorted list of unique regions, including an empty string.

- **get_unique_countries(df)**
  - **Purpose**: Similar to `get_unique_regions`, but for countries.

- **get_unique_ingredients(df)**
  - **Purpose**: Extracts unique ingredients from the 'Best_foodentityname' column.

- **create_recipes_dict(df)**
  - **Purpose**: Creates a dictionary where each key is a recipe name, and the value is a dictionary of its attributes.
  - **Logic**:
    - Iterates over each row of the DataFrame.
    - Processes and normalizes various attributes like ingredients, healthy types, meal types, etc.
    - Handles missing values and placeholders.
  - **Returns**: A dictionary of recipes with their attributes.

**Main Execution Block**
- **Purpose**: Orchestrates the data preprocessing workflow.
- **Logic**:
  - Defines file paths for input data and outputs.
  - Calls `preprocess_data` to clean the data.
  - Extracts unique regions, countries, and ingredients.
  - Creates the recipes dictionary.
  - Generates the graph and triples using `create_graph_and_triples`.
  - Saves the processed data, triples, graph, and unique items to disk.
- **Outcome**: Prepared data ready for model training and recommendations.

**Reasoning**
- **Data Cleaning**: Ensures that the data is consistent, free of duplicates, and normalized, which is crucial for accurate model predictions.
- **Normalization**: Using `create_node_label` standardizes labels for nodes in the graph, aiding in matching and retrieval.
- **Graph Preparation**: Preparing triples and the graph structure facilitates knowledge graph embedding models to learn relationships.

### graph_triples.py
**Overview**: The `graph_triples.py` module handles the construction of the knowledge graph and the creation of triples that represent relationships between entities.

**Key Functions**
- **create_graph_and_triples(recipes)**
  - **Purpose**: Builds a graph (`networkx.Graph`) and an array of triples from the recipes dictionary.
  - **Logic**:
    - Iterates over each recipe and its details.
    - Adds recipe nodes to the graph.
    - For each attribute (e.g., ingredients, diet types), it:
      - Adds attribute nodes.
      - Determines the relation type.
      - Adds edges between recipes and attributes with the appropriate relation.
    - Special handling for 'healthy_types' to generalize relations (e.g., 'HasProteinLevel').
  - **Returns**: A graph `G` and an array of triples.

- **save_triples(triples_array, file_path)**
  - **Purpose**: Saves the triples to a CSV file.
  - **Logic**: Converts the triples array to a pandas DataFrame and writes it to CSV.

- **save_graph(G, file_path)**
  - **Purpose**: Serializes and saves the graph object to disk using pickle.

**Reasoning**
- **Knowledge Graph Construction**: Representing data as a graph captures the complex relationships between entities, which is beneficial for recommendation systems.
- **Triples Creation**: Triples are the backbone of knowledge graphs and are essential for training embedding models.
- **Relation Generalization**: Simplifying 'healthy_types' relations allows the model to generalize and learn patterns more effectively.

### data_loading.py
**Overview**: The `data_loading.py` module is responsible for loading preprocessed data, unique items, recipes dictionary, and optionally the graph for use in other parts of the application.

**Key Functions**
- **load_processed_recipes_df()**
  - **Purpose**: Loads the preprocessed recipes DataFrame from disk.
  - **Logic**:
    - Checks if the file exists.
    - Reads the CSV into a DataFrame.
    - Raises an error if the file is not found.

- **load_unique_regions(), load_unique_countries(), load_unique_ingredients()**
  - **Purpose**: Load pickled lists of unique regions, countries, and ingredients.
  - **Logic**: Checks for file existence and uses pickle to load the data.

- **load_recipes_dict()**
  - **Purpose**: Loads the recipes dictionary from disk.

- **load_graph()**
  - **Purpose**: Optionally loads the graph object from disk.

**Data Loading Block**
- **Purpose**: Loads all necessary data upon module import.
- **Outcome**: Data is ready for use in recommendation and API endpoints.

**Reasoning**
- **Modular Data Access**: Separating data loading allows for clean and reusable code.
- **Error Handling**: Ensures that the application fails gracefully if data is missing.
- **Data Persistence**: Loading data from disk reduces the need to preprocess or reconstruct data each time, improving performance.

### model.py
**Overview**: The `model.py` module handles the training and loading of the knowledge graph embedding model using PyKEEN.

**Key Functions**
- **train_model(triples_factory)**
  - **Purpose**: Trains a knowledge graph embedding model or loads an existing one.
  - **Logic**:
    - Checks if a pre-trained model file exists.
    - If not, trains a new model using the specified algorithm (e.g., 'QuatE').
    - Uses early stopping to prevent overfitting.
    - Saves the trained model to disk.
  - **Returns**: The trained model result object.

**Model Loading Block**
- **Purpose**: Loads or trains the model upon module import.
- **Outcome**: The model is ready for making predictions in the recommendation system.

**Reasoning**
- **Knowledge Graph Embeddings**: Embedding models capture semantic relationships between entities in a continuous vector space, enabling similarity computations.
- **Model Persistence**: Saving the model allows for reusability and faster startup times.
- **Early Stopping**: Prevents overfitting by stopping training when the model performance stops improving.

### recommender.py
**Overview**: The `recommender.py` module contains the logic for mapping user input to model criteria, generating recommendations, and fetching detailed recipe information.

**Key Functions**
- **map_user_input_to_criteria(...)**
  - **Purpose**: Translates user preferences into model criteria.
  - **Parameters**: User inputs like meal type, calories, diet type, ingredients, and custom weights.
  - **Logic**: Normalizes the input and creates tuples of (entity, relation, weight).

- **normalize_scores(predictions)**
  - **Purpose**: Normalizes prediction scores between 0 and 1.
  - **Logic**: Uses `MinMaxScaler` from scikit-learn and adds a 'normalized_score' column.

- **get_matching_recipes(criteria)**
  - **Purpose**: Generates a list of recipe recommendations based on the criteria.
  - **Logic**: Predicts recipes that match criteria, normalizes scores, and filters recipes.
  - **Returns**: A sorted list of recommended recipe names.

- **fetch_recipe_info(recipe_name)**
  - **Purpose**: Retrieves detailed information about a specific recipe.
  - **Logic**: Fetches and formats various recipe fields.
  - **Returns**: A dictionary containing recipe details or `None` if not found.

**Reasoning**
- **User Personalization**: Mapping user inputs to model criteria allows for personalized recommendations.
- **Score Normalization and Weighting**: Ensures that different criteria contribute fairly to the final recommendation score.
- **Strict Matching**: Enforces that recommended recipes meet all specified criteria.
- **Data Fetching**: Provides detailed recipe information to enhance user experience.

### utils.py
**Overview**: The `utils.py` module provides utility functions used across multiple modules.

**Key Functions**
- **create_node_label(label)**
  - **Purpose**: Normalizes strings to create consistent node labels for the graph.
  - **Logic**: Replaces spaces and special characters, converts to lowercase.
  - **Returns**: A normalized string.

- **UNKNOWN_PLACEHOLDER**
  - **Purpose**: A constant used to represent unknown or missing values.
  - **Value**: 'Unknown'

**Reasoning**
- **Label Normalization**: Ensures consistency in node labeling.
- **Constants**: Using a placeholder for unknown values standardizes the handling of missing data.

## Models

### schemas.py
**Overview**: The `schemas.py` module defines data models (schemas) using Pydantic for request validation and response formatting.

**Key Classes**
- **RecommendationRequest**
  - **Purpose**: Defines the expected structure of a recommendation request.

- **RecipeInfo**
  - **Purpose**: Defines the structure of recipe information returned by the API.

**Reasoning**
- **Data Validation**: Ensures that incoming requests have the correct structure and data types.
- **Response Formatting**: Provides a consistent and structured format for API responses.

## Routers

### recommend.py
**Overview**: The `recommend.py` module defines the API endpoint for generating recipe recommendations.

**Endpoint**
- **POST /recommend**
  - **Request Body**: An instance of `RecommendationRequest`.
  - **Response**: A list of recommended recipe names.

**Logic**
- Receives user input and weights via the request body.
- Calls `map_user_input_to_criteria` to translate input into criteria.
- Calls `get_matching_recipes` to generate recommendations.
- Returns the list of recommended recipes.

**Reasoning**
- **API Design**: Separates concerns by handling recommendations in its own router module.
- **User Personalization**: Allows users to specify weights for different criteria.

### recipe_info.py
**Overview**: The `recipe_info.py` module provides an API endpoint to retrieve detailed information about a specific recipe.

**Endpoint**
- **GET /recipe/{recipe_name}**
  - **Path Parameter**: `recipe_name` (string).
  - **Response**: An instance of `RecipeInfo`.

**Logic**
- Fetches recipe information using `fetch_recipe_info`.
- Raises a 404 Not Found error if the recipe does not exist.
- Returns the recipe information in the structured format.

**Reasoning**
- **Data Accessibility**: Allows clients to retrieve all relevant information about a recipe.
- **Error Handling**: Provides meaningful HTTP status codes and messages.

### unique_items.py
**Overview**: The `unique_items.py` module provides API endpoints to retrieve lists of unique ingredients, regions, and countries.

**Endpoints**
- **GET /unique_ingredients**: Returns a list of unique ingredient names.
- **GET /unique_regions**: Returns a list of unique region names.
- **GET /unique_countries**: Returns a list of unique country names.

**Logic**
- Returns the preloaded lists from `data_loading.py`.

**Reasoning**
- **Client-Side Assistance**: Provides necessary data for populating dropdowns or autocomplete fields in the frontend.
- **Performance**: Preloading data and serving it directly improves response times.

## Main Application

### main.py
**Overview**: The `main.py` module initializes the FastAPI application, includes middleware, and registers the API routers.

**Key Components**
- **FastAPI Application Initialization**: Creates an instance of FastAPI.
- **CORS Middleware**: Configured using `CORSMiddleware` to allow cross-origin requests.
  - Note: `allow_origins=["*"]` is set for development purposes and should be restricted in production.
- **Router Inclusion**: Registers routers from `recommend.py`, `recipe_info.py`, and `unique_items.py`.

**Reasoning**
- **Modular API Structure**: Separating routers into different modules improves maintainability.
- **Middleware Configuration**: Setting up CORS is essential for frontend-backend communication.
- **Security Considerations**: The CORS configuration should be tightened in production to prevent unauthorized access.

## Conclusion
This documentation has provided an in-depth explanation of each component within the backend of the Food Recommendation System. By understanding the purpose and logic of each module and function, developers and stakeholders can effectively maintain, extend, and utilize the system. The design choices, such as using a knowledge graph and embedding models, are geared toward delivering personalized and accurate recommendations to users, enhancing their experience with the application.

