Skip to content

EncHawk/Amazon-ML-challenge-2025

Repository files navigation

ML Challenge 2025: Smart Product Pricing Solution

Team ET members:

Banshal Yadav
Dilip Kumar R
HR Prasith

Submission Date: 13/10/2025


1. Executive Summary

This document outlines a multimodal solution for the Smart Product Pricing Challenge. Our approach integrates comprehensive feature engineering from both textual and visual data, utilizing a pre-trained Vision Transformer for image embeddings and regex-based parsing for text. A Gradient Boosting Regressor was selected as the final model, achieving a validation SMAPE of 64.438%.


2. Methodology Overview

2.1 Problem Analysis

The pipeline is structured around comprehensive feature engineering from both modalities, followed by training and evaluation of several regression models. The primary source of signal is assumed to be in the structured and unstructured data within the provided inputs.

Key Observations:

  • The catalog_content field is a composite of structured tags (e.g., Value:, Unit:) and unstructured descriptions, requiring careful parsing.
  • Product images provide non-textual cues like brand recognition, product quality, and category, which can be captured via deep learning embeddings.
  • A significant portion of price variance is likely driven by quantitative features like item pack quantity and weight/volume.

2.2 Solution Strategy

Our high-level approach involves creating a rich, flat feature set by combining handcrafted text features with deep visual features, and then applying a robust tree-based regression model.

Approach Type: Single Model (Gradient Boosting) on a Multimodal Feature Set.
Core Innovation: The fusion of 768-dimension visual embeddings from a Vision Transformer with meticulously parsed textual metadata to create a comprehensive feature matrix for a classical machine learning model.


3. Model Architecture

3.1 Architecture Overview

The architecture is a sequential feature-engineering pipeline that feeds into a single regression model. Raw text and image URLs are processed in parallel to generate feature vectors, which are then concatenated. This final matrix is used for training the Gradient Boosting model.

3.2 Model Components

Text Processing Pipeline:

  • Preprocessing steps: Regex and string operations are used to parse catalog_content and extract features like text_length, word_count, brand_name, value_amount, unit_type, brand_frequency, and keyword counts. Categorical features are then LabelEncoded.
  • Model type: Not a model, but a feature extraction script.
  • Key parameters: N/A (rule-based extraction).

Image Processing Pipeline:

  • Preprocessing steps: Images are loaded from URLs, converted to RGB, and resized. The AutoImageProcessor handles normalization and tokenization for the ViT model.
  • Model type: Vision Transformer (google/vit-base-patch16-224) for feature extraction. The output from the [CLS] token's last hidden state is used as the embedding.
  • Key parameters: Output embedding dimension is 768.

4. Model Performance

4.1 Validation Results

  • SMAPE Score: 51.94%
  • Other Metrics: The final selected model was a GradientBoostingRegressor. Feature importance analysis indicated that text-derived metadata (text_length, value_amount) and the visual embeddings were the most influential predictors.

5. Conclusion

Our solution demonstrates the efficacy of combining deep visual features with robust text-based feature engineering. The Gradient Boosting model proved effective at capturing the complex, non-linear relationships within the fused feature set. Key lessons include the high signal value present in structured text tags and the significant predictive power of pre-trained vision models even when used only for feature extraction.


Appendix

A. Code artefacts

Include drive link for your complete code directory

B. Additional Results

Include any additional charts, graphs, or detailed results
alt text

About

A multimodal solution to Amazon's stock management, which automates the prices prediction using the visual models and the product name.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors