Home-Depot-Item-Relevance-DS-Project

Overview

This repository is dedicated to the Home Depot Item Relevance project, which aims to determine relevance of the item according to search query. The project focuses on developing models using different NLP techniques: Classical ML, Character Based, Word Based, Pretrained, Combined

Files

DoubleLSTM_utils

DoubleLSTMDataset.py - Dataset for DoubleLSTMSiamese Model
DoubleLSTMSiameseLSTM.py - DoubleLSTMSiamese Model

bart_utils

BartDataset.py - Dataset for BartSiamese Model
BartSiamese.py - BartSiamese Model
bart_utils.py - util functions for Bart

char_utils

CharDataset.py - Dataset for CharSiameseLSTM Model
CharSiameseLSTM.py - CharSiameseLSTM Model
char_utils.py - util functions for character-based model

csv - contains csv files from Kaggle Competition

utils

ClassicalML.py - contains all funciton used for training classical ML algorithms
GLOBALS.py - contains all global variables
new_preproc.py - contains new prepocessing functions
old_preproc.py - contains old prepocessing functions

word_utils

WordDataset.py - Dataset for WordSiameseLSTM Model
WordSiameseLSTM.py - WordSiameseLSTM Model
word_utils.py - util functions for word-based model

main

2LSTM.ipynb - main for training double LSTM model
Bart.ipynb - main for training Bart-based model
Character.ipynb - main for training character-based model
Naive.ipynb - main for training naive model for comparison
Word.ipynb - main for training word-based model

Dataset

Link to Kaggle Dataset

Structure

The dataset is composed of different features about items and search queries. In our project we used:

Product Descriptions
Search Terms
Relevance of the search to item description

Trainining

Character based model:

Model Structure:
Train/Validation graphs of RMSE/MAE of best experiment:
Results:

Word based model:

Model Structure:

Same as in characted-based model

Train/Validation graphs of RMSE/MAE of best experiment:
Results:

Double LSTM model:

Model Structure:

Same as in characted-based model but with 2 LSTM based on input

Results:

Classical ML on word-based model:

Results:

Bart based model:

Model Structure:
Train/Validation graphs of RMSE/MAE of best experiment:
Results:

Classical ML on Bert-based model:

Results:

Final Results

Final Remarks

The project was really challenging, especially preprocessing the data. We think the reason our word-based model got the best results is because of good data preprocessing tuned especially for the task. Although Bert is a very strong and complex model, it trained on very different text and not only Home Depot items, that is the possible reason why it did not outperformed our model. It is important to mention that training Bert model was much faster than model from zero, so it is always trade-off between the quality of the model and time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Home-Depot-Item-Relevance-DS-Project

Overview

Files

DoubleLSTM_utils

bart_utils

char_utils

csv - contains csv files from Kaggle Competition

utils

word_utils

main

Dataset

Structure

Trainining

Character based model:

Word based model:

Double LSTM model:

Classical ML on word-based model:

Bart based model:

Classical ML on Bert-based model:

Final Results

Final Remarks

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DoubleLSTM_utils		DoubleLSTM_utils
bart_utils		bart_utils
char_utils		char_utils
csv		csv
utils		utils
word_utils		word_utils
2LSTM.ipynb		2LSTM.ipynb
Bart.ipynb		Bart.ipynb
Character.ipynb		Character.ipynb
Naive.ipynb		Naive.ipynb
README.md		README.md
Word.ipynb		Word.ipynb

Qehbr/Home-Depot-Item-Relevance-DS-Project

Folders and files

Latest commit

History

Repository files navigation

Home-Depot-Item-Relevance-DS-Project

Overview

Files

DoubleLSTM_utils

bart_utils

char_utils

csv - contains csv files from Kaggle Competition

utils

word_utils

main

Dataset

Structure

Trainining

Character based model:

Word based model:

Double LSTM model:

Classical ML on word-based model:

Bart based model:

Classical ML on Bert-based model:

Final Results

Final Remarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages