Skip to content
This repository was archived by the owner on Dec 1, 2017. It is now read-only.

shoeffner/datathonms2014

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

datathonms2014

This is a small one-day-project to crawl wikipedia for company logos, build a database out of it and query the database with photos of logos, to find out to which company it belongs.

The project is build in Java 8 (jre1.8.0_25) and uses the jsoup library (jsoup-1.7.3) as well as the lire library (Lire-0.9.4 beta_2).

Crawling

To run the software, you first have to run the crawler (Crawler.main), which crawls (Config.LOAD_ALL = false) the wikipedia List of companies of the United States (Alternatively a wikipage can be used as an program parameter, e.g. java Crawler "List of companies of the United States"). If you set Config.LOAD_ALL = true it will crawl all pages listed in Config.COMPANY_LISTS.

Note that for the first run you should set Config.RENEW_INDEX = true!

Please refrain from crawling the whole pages again and again.

Searching

To query the database you need to simply run the identifier (Identifier.main). It will open a FileChooser which tries to load the selected file as an image and searches for similar images in the database.

Background

The program was built during the Hackathon of data://Münster.

About

This is a small one-day-project to crawl wikipedia for company logos, build a database out of it and query the database with photos of logos, to find out to which company it belongs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages