Skip to content

astray1988/Catshark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Catshark

Description

Inspired by Project Airbnb - Machine Learning of prediction, Catshark aims to help Homedepot to refine their search alogrithm using using a relevance rater. This tool, which based on machine learning technique, can predict a relevance score for a given pair of query term and search result in a simple click.

Plan

Based on our experiences on web development and descriptions metioned above, we take Feb, 2016 as the 1st stage with the primary goal of prototyping our own chat application following the [Development Guildlines](https://github.com/BitTigerNY/AraChat/edit/master/README.md#Development Guildlines) metioned below. Here're some tentative schedules.

  • [2016/02/01 - 2016/02/07] Project Selection, Plan Discussion, and Proposal Draft Writing
  • [2016/02/08 - 2016/02/24] System Design, Resource Discovery, Project Implementation, Document Writing
  • [2016/02/25 - 2016/02/29] User Manual Writing and Video Presentation Making

Details of each schedule and task will be added later.

Resource

  1. BitTiger Project: Airbnb - Machine Learning of prediction

Language & Framework

Python, Javascript, Flask

Flask is a simple and lightweight Python web framework build for rapid development. In this project, we will use Flask web framework to build our Restful backend service. For tutorials and sample code

Handlebars is a powerful javascript template engine help you build front end web view without pain. In this project, we will use Handlebars.java which is a Java port of handlebars. The Spark template handlebars sample code you can found here

Lucene is an open-source full-text search library which makes it easy to add search functionality to an application or website. In this project, we will use Lucene to build our search index based on Home Depot datasets. Some tutorials for beginners can be found here.

We choose python as primary language for machine learning section. Python is a concise scripting language rich with various thrid-party libraries, including scientific computing stack: Numpy, Pandas, etc, and machine learning packages: scikit-learn, nltk.We plan to use Random forest, Xgboost, Bagging Regressor to train our model, and produce final results with ensembling.

Development Guildlines

  • Modularity. Following the principle "loose coupling and high cohesion", each module should be standalone.

  • Minimalism. Each module should be kept short, simple, and concise. Every piece of code should be transparent upon first reading.

  • Easy extensibility. New modules (as new classes and functions) are should be simply add, and existing modules should be extended easily.

Owner

@team: Catshark

About

Platform to predict the relevance of search results on homedepot.com

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 66.9%
  • Python 33.1%