Skip to content

Multimodal Product Classification by applying a Voting Classifier on machine learning, CNN and DNN models.

Notifications You must be signed in to change notification settings

damienld/Rakuteam

Repository files navigation

Multimodal Product Data Classification


In order to complete training as a Data Scientist, we developped this project as a team of 4 people.
For this [contest organized by ENS](https://challengedata.ens.fr/participants/challenges/35/), we worked on the classification of e-commerce articles by developping and aggregating several models.
The data provided for each article included both some text(title and description) and a picture.

Demo

Visit our Streamlit demo here
Features:

  • Predict the classification of a random article (or even an article loaded from Amazon/Rakuten, or manually inputted)
  • Calculate the probablities using your own combination of all 3 models
  • Explore the dataset with a dynamic EDA
  • ...
  • ...
    Page Preview:



Dataset

99 000 articles (85 000 in train + 14 000 in test) and 27 categories
Each article includes:
text data (2 fields: description and title) text data
one picture picture data

EDA

15 most frequent words from the description field for category/class #1560 15 most frequent words from the description field for category/class #1560
15 random images for category/class #10 15 random images for category/class #10

Model 1: Random Forest

Features engineering

tf-idf for category/class 1281 tf-idf for category/class 1281
frequency of regular expressions for each category frequency of regular expressions
% of pixels in green for each category % of pixels in green

Best Model selected




Result: Accuracy 0.77

Model 2: Convolutional neural network (on the pictures)


Result: Accuracy 0.58

Model 3: Dense neural network (on the text)


Result: Accuracy 0.82

Final Model: Voting Classifier between all the 3 weighted models

Result: Accuracy 0.84

About

Multimodal Product Classification by applying a Voting Classifier on machine learning, CNN and DNN models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •