Skip to content

Qehbr/Home-Depot-Item-Relevance-DS-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Home-Depot-Item-Relevance-DS-Project

Overview

This repository is dedicated to the Home Depot Item Relevance project, which aims to determine relevance of the item according to search query. The project focuses on developing models using different NLP techniques: Classical ML, Character Based, Word Based, Pretrained, Combined

Files

DoubleLSTM_utils

  • DoubleLSTMDataset.py - Dataset for DoubleLSTMSiamese Model
  • DoubleLSTMSiameseLSTM.py - DoubleLSTMSiamese Model

bart_utils

  • BartDataset.py - Dataset for BartSiamese Model
  • BartSiamese.py - BartSiamese Model
  • bart_utils.py - util functions for Bart

char_utils

  • CharDataset.py - Dataset for CharSiameseLSTM Model
  • CharSiameseLSTM.py - CharSiameseLSTM Model
  • char_utils.py - util functions for character-based model

csv - contains csv files from Kaggle Competition

utils

  • ClassicalML.py - contains all funciton used for training classical ML algorithms
  • GLOBALS.py - contains all global variables
  • new_preproc.py - contains new prepocessing functions
  • old_preproc.py - contains old prepocessing functions

word_utils

  • WordDataset.py - Dataset for WordSiameseLSTM Model
  • WordSiameseLSTM.py - WordSiameseLSTM Model
  • word_utils.py - util functions for word-based model

main

  • 2LSTM.ipynb - main for training double LSTM model
  • Bart.ipynb - main for training Bart-based model
  • Character.ipynb - main for training character-based model
  • Naive.ipynb - main for training naive model for comparison
  • Word.ipynb - main for training word-based model

Dataset

Link to Kaggle Dataset

Structure

The dataset is composed of different features about items and search queries. In our project we used:

  • Product Descriptions
  • Search Terms
  • Relevance of the search to item description

Trainining

Character based model:

  • Model Structure:

    image

  • Train/Validation graphs of RMSE/MAE of best experiment: image image

  • Results:

    image

Word based model:

  • Model Structure:

Same as in characted-based model

  • Train/Validation graphs of RMSE/MAE of best experiment: image image

  • Results:

    image

Double LSTM model:

  • Model Structure:

Same as in characted-based model but with 2 LSTM based on input

  • Results:

    image

Classical ML on word-based model:

  • Results:

    image

Bart based model:

  • Model Structure:

    image

  • Train/Validation graphs of RMSE/MAE of best experiment: image image

  • Results:

    image

Classical ML on Bert-based model:

  • Results:

    image

Final Results

image

Final Remarks

  • The project was really challenging, especially preprocessing the data. We think the reason our word-based model got the best results is because of good data preprocessing tuned especially for the task. Although Bert is a very strong and complex model, it trained on very different text and not only Home Depot items, that is the possible reason why it did not outperformed our model. It is important to mention that training Bert model was much faster than model from zero, so it is always trade-off between the quality of the model and time.