Skip to content

adeshpande3/WalmartLabs-ML-CodeSprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

WalmartLabs-ML-CodeSprint

https://www.hackerrank.com/contests/walmart-codesprint-ml/challenges/products-shelves-tagging

Hackerrank contest where participants have to develop a machine learning solution to the problem of putting products of a certain type on certain shelves, given characteristics about the product.

Hackerrank provides two files (train.tsv and test.tsv) for the contestants. Looking at the files, I noticed that there was a lot of missing data and a lot of formatting errors with the train file. I decided to cut down the number of features that I would consider for each product. The features that I used are:

Seller, Actual Color, Artist ID, Genre ID, ISBN, Item Class ID, Literary Genre, MPAA Rating, Product Name, Publisher, Recommended Location, and Recommended Use. (14 Features In Total)

The label for each product is the tags column, which tells you what shelf each of the training examples(products) went on.

I used a KNN approach for the machine learning algorithm, but looking back on the competition, this was not a good choice, since the most important feature in the data was the product name, and processing those strings into numeric values was not the best choice.

Releases

No releases published

Packages

No packages published

Languages